ABC Photo stories term frequency Analysis
Download the ABC dataset from the data.gov.au site, ABC Local Online Photo Stories 2009-2014 which is available localphotostories20092014csv.csv . Open the file in the numbers and save the file with the UTF-8 encoding (for example ps.csv in my case) because the unknown-8bit is the encoding of the above document. > file -I localphotostories20092014csv.csv localphotostories20092014csv.csv: text/plain; charset=unknown- 8 bit In the Mac terminal if you type above command, you can find the charset of the csv file. In the RStudio, > library(tm) Loading required package: NLP > ps <- read.csv("data/ps.csv" , stringsAsFactors = FALSE) > vs <- VectorSource(ps$Keywords) > corpus <- Corpus(vs) The tm package is the best for the text mining. First load the tm library after install the package if the package is not being already installed. The VectorSource only accept the character vectors. Now create the corpus from the vector sources(vs) created f...