R Language Basics

CONTENTS




Structures and Types

There are three type of basic structures in the R:
  • vector
  • matrix
  • array

Matrix basics

Scalar multiplication is the simplest :

In R:
    > a <- c(1,2,3)
    > 3 * a
    [1] 3 6 9

Inner and outer products

Inner product of the following two vectors


Define these two matrices In R:
    > a <- c(1,2,3)
    > b <- c(4,5,6)
In R:
    > a %*% b
         [,1]
    [1,]   26
Here the vector outer product:

in R:
> a %o% b
         [,1] [,2] [,3]
    [1,]    4    5    6
    [2,]    8   10   12
    [3,]   12   15   18
First two lines defines the vectors and the last line show the inner vector operation.

Arrays

Arrays are multi dimensional.
Here the array in E:
    > c <- array(c(1,2,3,4,5,6), dim = c(2,2,3))
    > c
    , , 1

         [,1] [,2]
    [1,]    1    3
    [2,]    2    4

    , , 2

         [,1] [,2]
    [1,]    5    1
    [2,]    6    2

    , , 3

         [,1] [,2]
    [1,]    3    5
    [2,]    4    6

List

List is a one dimensional list of R objects. Object can by a any type. List is a recursive data structure.
    > class("list")
    [1] "character"
    > lst <- list(1,"R", TRUE)
    > lst
    [[1]]
    [1] 1

    [[2]]
    [1] "R"

    [[3]]
    [1] TRUE

    > class(lst)
    [1] "list"
As you see the data type of the lst is list.

Factors

In a vector there is no limit of the number of distinct elements, but factor has. Therefore, factor hold the categorical variable.
    > f <-factor(c(1,2,3,1,2,3))
    > f
    [1] 1 2 3 1 2 3
    Levels: 1 2 3
    > levels(f) #values are strings
    [1] "1" "2" "3"
    > labels(f) #labels are strings
    [1] "1" "2" "3" "4" "5" "6"
    > mean(f) #calculate mean value
    [1] NA
    Warning message:
    In mean.default(f) : argument is not numeric or logical: returning NA
Here vector hold duplicate of 1,2 and 3 (=6 times) but factor has only the 3 levels. That means it create a set where no duplicates are exists.
factors not usable in the math operations.

Data Frames

Data frame is a two dimensional data structure of consist of R objects. Each column in the DF can be different type.
    > class(lst)
    [1] "list"
    > df <- data.frame(c(1,2,3), c("A","B","C"), c(T,F,T))
    > class(df)
    [1] "data.frame"
    > df
      c.1..2..3. c..A....B....C.. c.T..F..T.
    1          1                A       TRUE
    2          2                B      FALSE
    3          3                C       TRUE
This is really a table.
Here the example creating data frame using vectors and factor:
    > v1 <- c(101,102,103,104,105)
    > v2 <- c(21,22,23,24,25)
    > v1 <- c("s1","s2","s3","s4","s5")
    > v2 <- c(25, 50, 60 ,40 ,20)
    > v3 <- c("Male","Female","Male","Female","Female")
    > f1 <- factor(c("Fail","Pass","Pass","Fail","Fail" ))
    > > student.dat <- data.frame(v1,v2,f1)
    > student.dat <- data.frame(v1,v2,f1,v3)
    > head(student.dat)
      v1 v2   f1     v3
    1 s1 25 Fail   Male
    2 s2 50 Pass Female
    3 s3 60 Pass   Male
    4 s4 40 Fail Female
    5 s5 20 Fail Female
    > class(v3)
    [1] "character"
    # but in the data frame the same vector:
    > class(student.dat[[4]])
    [1] "factor"
Character vector has been converted to the factor in the data frame unless specify the stringsAsFactors=false is specified at the data frame creation.
As shown in the following example:
    > student.dat <- data.frame(v1,v2,f1,v3,stringsAsFactors = FALSE)
    > head(student.dat)
      v1 v2   f1     v3
    1 s1 25 Fail   Male
    2 s2 50 Pass Female
    3 s3 60 Pass   Male
    4 s4 40 Fail Female
    5 s5 20 Fail Female
    > class(student.dat[[4]])
    [1] "character"
The v3 hasn’t not change.
There are methods to find number of rows and columns: nrow and ncol().
Tip: However, if you use nrow() with the vector, you will get an error. But NROW() method will work with both the vector and the data.frame.
The dim() function return the dimension of the data.frame. To get the names of the data frame names(<data.frame>) is the function. However, to get the third name use the names(<data.frame>)[3]. To get the row names: rownames(<data.frame>).
> x <- 1:5
> y <- c('a','b','c','d','e')
> df <- data.frame(x,y)
> df
  x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e

> rownames(df) <- c("one", "two", "three", "four", "five")
> df
      x y
one   1 a
two   2 b
three 3 c
four  4 d
five  5 e
As shown in the above example, row names can be used.

Data Types

R has 5 basic data types:
  • Number
  • Text
  • Logical
  • Factor
  • Array
  • List
Here the data type example, c and a are the array and the vector already defined.
> class("Hello")
[1] "character"
> class(c) #c is an array
[1] "array"
> class(a) #a is a vector
[1] "numeric"
> class(TRUE)
[1] "logical"
> fact <- factor(c("A","B", "C" ))
> class(fact)
[1] "factor"

Tidy Data

I’ve created the csv file using Numbers in mac. Here the example student.csv fie
Subject A B C F
Science 5 12 23 4
Maths 7 23 26 5
English 8 15 45 2
Arts 4 16 38 3
Sports 9 35 35 6
As a first step load the reshape2 library
    > library(reshape2)
then read student.csv to the R:
    > student <- read.csv("student.csv")
    > head(student)
      Subject A  B  C F
    1 Science 5 12 23 4
    2   Maths 7 23 26 5
    3 English 8 15 45 2
    4    Arts 4 16 38 3
    5  Sports 9 35 35 6
In the above csv file, A to F are the grades. Above dataset shows the number of students exam grading against the subject.

Melt operation

This is the melt operation on the above student table data frame.
> melt(student, id="Subject")
   Subject variable value
1  Science        A     5
2    Maths        A     7
3  English        A     8
4     Arts        A     4
5   Sports        A     9
6  Science        B    12
7    Maths        B    23
8  English        B    15
9     Arts        B    16
10  Sports        B    35
11 Science        C    23
12   Maths        C    26
13 English        C    45
14    Arts        C    38
15  Sports        C    35
16 Science        F     4
17   Maths        F     5
18 English        F     2
19    Arts        F     3
20  Sports        F     6

decast operation

This is the reverse of the melt:
> d <- melt(student, id="Subject")
> dcast(d, Subject ~ variable, value.var = "value")
  Subject A  B  C F
1    Arts 4 16 38 3
2 English 8 15 45 2
3   Maths 7 23 26 5
4 Science 5 12 23 4
5  Sports 9 35 35 6

Comments

Popular posts from this blog

How To: GitHub projects in Spring Tool Suite

Spring 3 Part 7: Spring with Databases

Parse the namespace based XML using Python