Posts

Showing posts from October, 2015

ABC Photo stories term frequency Analysis

Download the ABC dataset from the data.gov.au site, ABC Local Online Photo Stories 2009-2014 which is available localphotostories20092014csv.csv . Open the file in the numbers and save the file with the UTF-8 encoding (for example ps.csv in my case) because the unknown-8bit is the encoding of the above document. > file -I localphotostories20092014csv.csv localphotostories20092014csv.csv: text/plain; charset=unknown- 8 bit In the Mac terminal if you type above command, you can find the charset of the csv file. In the RStudio, > library(tm) Loading required package: NLP > ps <- read.csv("data/ps.csv" , stringsAsFactors = FALSE) > vs <- VectorSource(ps$Keywords) > corpus <- Corpus(vs) The tm package is the best for the text mining. First load the tm library after install the package if the package is not being already installed. The VectorSource only accept the character vectors. Now create the corpus from the vector sources(vs) created f

R Data manipulation with dplyr

First install the package > install.packages("dplyr") > library(dplyr) dplyr provide five functions filter select mutate summarise arrange The dataset of the ABC local stations where in the csv file abc-local-radio.csv downloaded from the data.gov.au web site. Another ABC dataset from the same site is ABC Local Online Photo Stories 2009-2014 which is available localphotostories20092014csv.csv . Here the headings for the first file radio . > radio <- read.csv( "abc-local-radio.csv" ) > names(radio) [ 1 ] "State" "Website.URL" "Station" [ 4 ] "Town" "Latitude" "Longitude" [ 7 ] "Talkback.number" "Enquiries.number" "Fax.number" [ 10 ] "Sms.number" "Street.number" "Street.suburb" [ 13 ] "Street.postcode" "PO.box

Markdown blog writer for blogger

Image
CONTENTS Why new blog writer Workflow Typora StackEdit Math Support Complex Math Icons are well supported Flowcharts Image support Syntax Highlighter Conclusion Why new blog writer I was curious to find a new blog writher after some experiments with MarsEdit 2 and the Blogo 2. But both of those tools haven’t fulfil my requirements of having table of contents support latex type math support (latex is special for me because that is the most familiar) standard icons flowchart Tips and Notes which were not supported in my blog for years now. I was curious to have these new features in my blogger. Fortunately, I found a very good Markdown writers StackEdit and Typora . Workflow Here is my blog writing workflow: Created with RaphaĆ«l 2.1.2 Start document available ? publish ready ? StackEdit Save and Publish End Typora yes no yes no Typora This is so far the best desktop editor WYSIWYG editor found to create the document including Tabl

R Language Basics

CONTENTS Structures and Types Matrix basics Inner and outer products Arrays List Factors Data Frames Data Types Tidy Data Melt operation decast operation Structures and Types There are three type of basic structures in the R: vector matrix array Matrix basics Scalar multiplication is the simplest : In R: > a <- c( 1 , 2 , 3 ) > 3 * a [ 1 ] 3 6 9 Inner and outer products Inner product of the following two vectors Define these two matrices In R: > a <- c( 1 , 2 , 3 ) > b <- c( 4 , 5 , 6 ) In R: > a %*% b [,1] [1,] 26 Here the vector outer product: in R: > a % o % b [,1] [,2] [,3] [1,] 4 5 6 [2,] 8 10 12 [3,] 12 15 18 First two lines defines the vectors and the last line show the inner vector operation. Arrays Arrays are multi dimensional. Here the array in E: > c <- array(c( 1 , 2 , 3 , 4 , 5 , 6