While a one liner to convert categorical values to numerics is mentioned in a previous post, this post will cover a better way to achieve similar results (using levels.)

Another way to convert data to numeric is with the as.numeric method.

In R, you can access A column of a data frame using the $ (dollar sign) syntax. RStudio can even auto complete this for you.

# Converting a list to a category:
> gender.factor <- factor(genderlist$Gender)

# Or we can convert a dataFrame to a category:
> gender.factor <- factor(gender$Gender)

In the example above, I’m using a list first… for an example I also provide a dataframe (gender) and supply the column header.

Next I created a factor of that list (or data frame.) In R a factor is a category type. By explicitly setting the values of “Male” and “Female” to a category, we can run a numeric operation on it.

Notice the $Gender syntax: factor(df$Gender)

Where $Gender is the column name of the list or data frame. What’s nice about this, unlike Pandas, the IDE auto-completes the Columns by raising a contextual menu as you type after the $ sign.

Once a factor, we can run the as.numeric method like so:

> gender.numeric <- as.numeric(gender.factor)
> gender.numeric
  [1] 2 2 1 1 1 1 1 1 

There we get our output of 1 for Female and 2 for Male. The auto numeric assignment is done. These category numeric assignments are called levels. In this case there are 2 levels.

If we had a case of 6 values, like “Up,” “Down,” “Left,” “Right,” “Forward,” “Back,” we would get 6 levels: 1, 2, 3, 4, 5, 6.

#

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *

Archives
Categories