R: Reading Data

When importing data from a file into R, the common methods are:
read.table()
read.csv()
read.csv2()

read.table() is a common method, but requires more parameters to be passed like (file, header, the separator, row names, number of rows.) It also reads the data directly into your computers RAM, which for big data sets this could be a problem.

If it is a big data set, it might be more prudent (if using read.table()) to read chunks at a time.

Examples:

cameraData <- read.table("myfile.csv",sep=",",header=TRUE)
head(cameraData)

Or, in the case of read.csv, you can just pass the file name and it will default the separator.

cameraData <- read.csv("myfile.csv")
head(cameraData)

Some parameters you can use to help with reading datasets are:
quote - which tells R if there are quoted values in the content. if you set quote="" then that means there are no quoted data elements
na.strings - lets you set the character that represents missing values
nrows - this of course sets how many rows to read in.
skip - tells how many rows to skip.

Excel data can be imported by loading in the xlsx library for R:

library(xlsx)
cameraData <- read.xlsx("camera.xlsx",sheetIndex=1,header=TRUE)
head(cameraData)

XML
Using the library for XML, we can point to an XML file and load it in as well:

library(XML)
fileUrl <- "http://mysite.com/some.xml"
doc <- xmlTreeParse(fileUrl,useInternal=TRUE)
rootNode <- xmlRoot(doc)
xmlName(rootNode)

eg: names(rootNode)

JSON
Similarly JSON can also be scraped as well.

library(jsonlite)
jsonData <- fromJSON("http://mysite.com/mydata/")
names(jsonData)

To get to nested objects in the JSON we use a syntax like:
names(jsonData$owner) or jsonData$owner$login

Converting data into JSON. One of the nice things here is we can make JSON really easy by simply using the toJSON function to turn a dataset into JSON.

For example, if I had a dataset called cameras, and I did camerajson <- toJSON(cameras,pretty=TRUE) it would convert the cameras dataset into JSON for me.

Leave a Reply

Your email address will not be published. Required fields are marked *