## R Basics

**R History**

R is derived from S that came from Bell Labs. S was written over Fortran. S was later rewritten to be driven by C.

**R Types**

base types in R are called “atomic” types.

The atomic types in R are:

- character
- numeric (real numbers)
- integer
- complex
- logical (boolean)

**Vector**

most basic object in R

contains objects of the same class

The only type of vector that can have different atomic types is the list vector. A list could have a character, integer, logical, etc.

To create an empty vector with: vector()

**Numbers**

adding L after a number sets the number as an Integer

Inf is a number type of infinity

NaN is a value that is not a number. an example is 0/0 results in a non number or NaN

**Attributes**

- names, dimnames (dimension names)
- dimensions
- class (not like class in OO terms. But is the type of symbol/variable… Integer, character, etc.
- length
- metadata (user defined)

Attributes are accessed with attributes() function

**Expressions**

A little weird at first… but instead of a =, R uses a reverse rocket: <-

For example:

x <- 1

assigns 1 to the symbol x.

you can check the class or type of a symbol with the class() function like so:

class(x)

Which will return numeric if x is set to numeric.

you can print output with print(object)

**Comments**

Comments are done with #, as in Ruby or Python

**Ranges**

In R ranges are called sequences. they are created like so:

x <- 1:20

that would be a range from 1 to 20.

**Concatenate**

To create vectors of objects you can use the c() function.

if we set:

x <- c(0.5, 0.6)

x would be

0.5 0.6

This is like an array, without the delimiters. For example, if I did

x[1] I would get 0.5

x[2] would return 0.6

Notice in this case the indexing doesn’t start at 0, but at 1.

**Boolean**

True and False can be listed as

x <- TRUE

or simply x <- T

**Length**

x <- vector(“numeric”, length = 10)

would set 0 0 0 0 0 0 0 0 0 0 to x

**Mixing Objects**

if you mix objects like so:

y <- c(TRUE, 2)

true will be converted to a number 1 for True, 0 for false.

however, if you use

y <- c(“a”, TRUE)

it will default to character and convert TRUE to a string, not the boolean value.

**Reading Data**

read.table and read.csv is for reading tabular data

readLines is for reading in lines from a textual file

sournce

dget

load

unseralize

**Write Data**

write.table

writeLines

dump

save

serialize

Examples:

data <- read.csv(“my_data.csv”)

that will read the tabular data in the csv file and assign it to the symbol data.

**Performance for Loading Data**

you can set the colClasses argument on the read function so that R doesn’t have to try and figure out what data type is in each column. For example if all data in the table is numeric, then you could do

data <- read.csv(“my_data.csv”, colClasses = “numeric”)

setting the row value will also help speed up the import. This way R doesn’t have to make the calc on row count. you can do this with

data <- read.csv(“my_data.csv”, nrows=100)

nrows can also be used to pull a segment of rows. if the document had 40,000 rows, you could do a

data <- read.csv(“my_data.csv”, nrows=100) to grab only the first 100 rows.

then if you wanted to find out what kind of data types are in there, you could do:

classes <- sapply(data, class)

This will return the types of classes to the classes symbol.

selecting a row by number:

x[47,]

selecting a row by value:

x[“my string value in table”,]

counting a value. if i had a data frame of x, and it had a column called “Ozone” if I wanted a count of each “NA” in that column I would do:

sum(is.na(x$”Ozone”))

to count a mean of column, omitting NaN (non numbers) you can do:

mean(x$”Ozone”, na.rm = TRUE)

lets say someone has given us a data table with columns like Ozone, Temp, and solar radiation.

They ask us for the mean of Solar Radiation where Ozone is > 31 and Temp is > 90.

we can assign a new symbol/variable called

x.sub <- subset(x, Ozone > 31 & Temp > 90)

This assigns x.sub to the value of the subset of data from the data frame x (x is the data frame), where Ozone is greater than 31 and Temp is greater then 90

Then we can do a mean function on x.sub such as:

mean(x.sub$”Solar.R”, na.rm = TRUE)

the na.rm = True just tells it to not calc any missing values

Other examples…

Temperature by month… you want the mean of june:

x.sub2 <- subset(x, Month == 6)

then mean(x.sub2$Temp)

in boolean logic, if one piece is true in it is true… i.e.

6 == 1 | 5 == 5

the first is false, the second is true. therefore it’s true.

combining char vectors…

say you had something like:

my_string <- c(“My”,”Name”,”is”)

we are creating a vector that will be

My Name Is

If we do paste(my_string, collapse = “ “) it will become

My Name is

Sample Data

if y <- rnorm(1000)

and z <- rep(NA, 1000)

we can use this to sample 100 random items:

my_data <- sample(c(y, z), 100)

more ways to remove NA’s

y <- x[!is.na(x)]

Using the identical function:

x = “hi”

y = “hi”

indentical(x, y)

will produce TRUE