R is derived from S that came from Bell Labs. S was written over Fortran. S was later rewritten to be driven by C.
base types in R are called “atomic” types.
The atomic types in R are:
- numeric (real numbers)
- logical (boolean)
most basic object in R
contains objects of the same class
The only type of vector that can have different atomic types is the list vector. A list could have a character, integer, logical, etc.
To create an empty vector with: vector()
adding L after a number sets the number as an Integer
Inf is a number type of infinity
NaN is a value that is not a number. an example is 0/0 results in a non number or NaN
- names, dimnames (dimension names)
- class (not like class in OO terms. But is the type of symbol/variable… Integer, character, etc.
- metadata (user defined)
Attributes are accessed with attributes() function
A little weird at first… but instead of a =, R uses a reverse rocket: <-
x <- 1
assigns 1 to the symbol x.
you can check the class or type of a symbol with the class() function like so:
Which will return numeric if x is set to numeric.
you can print output with print(object)
Comments are done with #, as in Ruby or Python
In R ranges are called sequences. they are created like so:
x <- 1:20
that would be a range from 1 to 20.
To create vectors of objects you can use the c() function.
if we set:
x <- c(0.5, 0.6)
x would be
This is like an array, without the delimiters. For example, if I did
x I would get 0.5
x would return 0.6
Notice in this case the indexing doesn’t start at 0, but at 1.
True and False can be listed as
x <- TRUE
or simply x <- T
x <- vector(“numeric”, length = 10)
would set 0 0 0 0 0 0 0 0 0 0 to x
if you mix objects like so:
y <- c(TRUE, 2)
true will be converted to a number 1 for True, 0 for false.
however, if you use
y <- c(“a”, TRUE)
it will default to character and convert TRUE to a string, not the boolean value.
read.table and read.csv is for reading tabular data
readLines is for reading in lines from a textual file
data <- read.csv(“my_data.csv”)
that will read the tabular data in the csv file and assign it to the symbol data.
Performance for Loading Data
you can set the colClasses argument on the read function so that R doesn’t have to try and figure out what data type is in each column. For example if all data in the table is numeric, then you could do
data <- read.csv(“my_data.csv”, colClasses = “numeric”)
setting the row value will also help speed up the import. This way R doesn’t have to make the calc on row count. you can do this with
data <- read.csv(“my_data.csv”, nrows=100)
nrows can also be used to pull a segment of rows. if the document had 40,000 rows, you could do a
data <- read.csv(“my_data.csv”, nrows=100) to grab only the first 100 rows.
then if you wanted to find out what kind of data types are in there, you could do:
classes <- sapply(data, class)
This will return the types of classes to the classes symbol.
selecting a row by number:
selecting a row by value:
x[“my string value in table”,]
counting a value. if i had a data frame of x, and it had a column called “Ozone” if I wanted a count of each “NA” in that column I would do:
to count a mean of column, omitting NaN (non numbers) you can do:
mean(x$”Ozone”, na.rm = TRUE)
lets say someone has given us a data table with columns like Ozone, Temp, and solar radiation.
They ask us for the mean of Solar Radiation where Ozone is > 31 and Temp is > 90.
we can assign a new symbol/variable called
x.sub <- subset(x, Ozone > 31 & Temp > 90)
This assigns x.sub to the value of the subset of data from the data frame x (x is the data frame), where Ozone is greater than 31 and Temp is greater then 90
Then we can do a mean function on x.sub such as:
mean(x.sub$”Solar.R”, na.rm = TRUE)
the na.rm = True just tells it to not calc any missing values
Temperature by month… you want the mean of june:
x.sub2 <- subset(x, Month == 6)
in boolean logic, if one piece is true in it is true… i.e.
6 == 1 | 5 == 5
the first is false, the second is true. therefore it’s true.
combining char vectors…
say you had something like:
my_string <- c(“My”,”Name”,”is”)
we are creating a vector that will be
My Name Is
If we do paste(my_string, collapse = “ “) it will become
My Name is
if y <- rnorm(1000)
and z <- rep(NA, 1000)
we can use this to sample 100 random items:
my_data <- sample(c(y, z), 100)
more ways to remove NA’s
y <- x[!is.na(x)]
Using the identical function:
x = “hi”
y = “hi”
will produce TRUE