R Language Basics
CONTENTS
In R:
Define these two matrices In R:
in R:
Here the array in E:
Here the example creating data frame using vectors and factor:
There are methods to find number of rows and columns: nrow and ncol().
As a first step load the reshape2 library
Structures and Types
There are three type of basic structures in the R:- vector
- matrix
- array
Matrix basics
Scalar multiplication is the simplest :In R:
> a <- c(1,2,3)
> 3 * a
[1] 3 6 9
Inner and outer products
Inner product of the following two vectorsDefine these two matrices In R:
> a <- c(1,2,3)
> b <- c(4,5,6)
In R: > a %*% b
[,1]
[1,] 26
Here the vector outer product:in R:
> a %o% b
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 8 10 12
[3,] 12 15 18
First two lines defines the vectors and the last line show the inner vector operation.Arrays
Arrays are multi dimensional.Here the array in E:
> c <- array(c(1,2,3,4,5,6), dim = c(2,2,3))
> c
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 1
[2,] 6 2
, , 3
[,1] [,2]
[1,] 3 5
[2,] 4 6
List
List is a one dimensional list of R objects. Object can by a any type. List is a recursive data structure. > class("list")
[1] "character"
> lst <- list(1,"R", TRUE)
> lst
[[1]]
[1] 1
[[2]]
[1] "R"
[[3]]
[1] TRUE
> class(lst)
[1] "list"
As you see the data type of the lst is list.Factors
In a vector there is no limit of the number of distinct elements, but factor has. Therefore, factor hold the categorical variable. > f <-factor(c(1,2,3,1,2,3))
> f
[1] 1 2 3 1 2 3
Levels: 1 2 3
> levels(f) #values are strings
[1] "1" "2" "3"
> labels(f) #labels are strings
[1] "1" "2" "3" "4" "5" "6"
> mean(f) #calculate mean value
[1] NA
Warning message:
In mean.default(f) : argument is not numeric or logical: returning NA
Here vector hold duplicate of 1,2 and 3 (=6 times) but factor has only the 3 levels. That means it create a set where no duplicates are exists.factors not usable in the math operations.
Data Frames
Data frame is a two dimensional data structure of consist of R objects. Each column in the DF can be different type. > class(lst)
[1] "list"
> df <- data.frame(c(1,2,3), c("A","B","C"), c(T,F,T))
> class(df)
[1] "data.frame"
> df
c.1..2..3. c..A....B....C.. c.T..F..T.
1 1 A TRUE
2 2 B FALSE
3 3 C TRUE
This is really a table.Here the example creating data frame using vectors and factor:
> v1 <- c(101,102,103,104,105)
> v2 <- c(21,22,23,24,25)
> v1 <- c("s1","s2","s3","s4","s5")
> v2 <- c(25, 50, 60 ,40 ,20)
> v3 <- c("Male","Female","Male","Female","Female")
> f1 <- factor(c("Fail","Pass","Pass","Fail","Fail" ))
> > student.dat <- data.frame(v1,v2,f1)
> student.dat <- data.frame(v1,v2,f1,v3)
> head(student.dat)
v1 v2 f1 v3
1 s1 25 Fail Male
2 s2 50 Pass Female
3 s3 60 Pass Male
4 s4 40 Fail Female
5 s5 20 Fail Female
> class(v3)
[1] "character"
# but in the data frame the same vector:
> class(student.dat[[4]])
[1] "factor"
Character vector has been converted to the factor in the data frame unless specify the stringsAsFactors=false
is specified at the data frame creation.
As shown in the following example: > student.dat <- data.frame(v1,v2,f1,v3,stringsAsFactors = FALSE)
> head(student.dat)
v1 v2 f1 v3
1 s1 25 Fail Male
2 s2 50 Pass Female
3 s3 60 Pass Male
4 s4 40 Fail Female
5 s5 20 Fail Female
> class(student.dat[[4]])
[1] "character"
The v3
hasn’t not change.There are methods to find number of rows and columns: nrow and ncol().
Tip: However, if you use nrow() with the vector, you will get an error. But NROW() method will work with both the vector and the data.frame.The dim() function return the dimension of the data.frame. To get the names of the data frame
names(<data.frame>)
is the function. However, to get the third name use the names(<data.frame>)[3]
. To get the row names: rownames(<data.frame>)
.> x <- 1:5
> y <- c('a','b','c','d','e')
> df <- data.frame(x,y)
> df
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
> rownames(df) <- c("one", "two", "three", "four", "five")
> df
x y
one 1 a
two 2 b
three 3 c
four 4 d
five 5 e
As shown in the above example, row names can be used.Data Types
R has 5 basic data types:- Number
- Text
- Logical
- Factor
- Array
- List
> class("Hello")
[1] "character"
> class(c) #c is an array
[1] "array"
> class(a) #a is a vector
[1] "numeric"
> class(TRUE)
[1] "logical"
> fact <- factor(c("A","B", "C" ))
> class(fact)
[1] "factor"
Tidy Data
I’ve created the csv file using Numbers in mac. Here the example student.csv fieSubject | A | B | C | F |
---|---|---|---|---|
Science | 5 | 12 | 23 | 4 |
Maths | 7 | 23 | 26 | 5 |
English | 8 | 15 | 45 | 2 |
Arts | 4 | 16 | 38 | 3 |
Sports | 9 | 35 | 35 | 6 |
> library(reshape2)
then read student.csv to the R: > student <- read.csv("student.csv")
> head(student)
Subject A B C F
1 Science 5 12 23 4
2 Maths 7 23 26 5
3 English 8 15 45 2
4 Arts 4 16 38 3
5 Sports 9 35 35 6
In the above csv file, A to F are the grades. Above dataset shows the number of students exam grading against the subject.Melt operation
This is the melt operation on the above student table data frame.> melt(student, id="Subject")
Subject variable value
1 Science A 5
2 Maths A 7
3 English A 8
4 Arts A 4
5 Sports A 9
6 Science B 12
7 Maths B 23
8 English B 15
9 Arts B 16
10 Sports B 35
11 Science C 23
12 Maths C 26
13 English C 45
14 Arts C 38
15 Sports C 35
16 Science F 4
17 Maths F 5
18 English F 2
19 Arts F 3
20 Sports F 6
decast operation
This is the reverse of the melt:> d <- melt(student, id="Subject")
> dcast(d, Subject ~ variable, value.var = "value")
Subject A B C F
1 Arts 4 16 38 3
2 English 8 15 45 2
3 Maths 7 23 26 5
4 Science 5 12 23 4
5 Sports 9 35 35 6
Comments
Post a Comment
commented your blog