The Python library numpy has some great features.

Take an example spreadsheet of data in csv form.  We’ll call this example car_races.csv.  The table has several columns (including driver’s last name), winning outcome, place, etc.

What if you wanted only the results of a specific driver?  Let’s say a driver who’s last name is Bosely.

Creating a Filter

In numpy you first create this filter like so:

driver_is_bosely = (car_races[:,3] == “Bosely”)

The above filter is assigned to the variable “driver_is_bosely” and it is pulling the 4th column from the numpy array car_races (which stores data from a csv file – car_races.csv file.)  In our example, we will assume this column is the driver’s last name.

The syntax for identifying the column/rows is in the square bracket syntax.  When it starts with a colon, it means we’re not filtering out row data, we are looking at columns.

The square bracket syntax is like this:

[start row : end row, start column : end_column]

When you’re looking at all rows of a certain column, you pass it through like so:

[:, specific column]  

In our example it’s the 4th column or:

[:,3]

The comparison == is checking if the value exists in this column.  Does “Bosely” exist in the last name column.  So it goes row by row, and it will return a Boolean result: False, False, False, True, False, False, etc.

This sets the filter to know how to find the result we’re looking for, but on its own it doesn’t filter out the data. We must apply the filter to the data set…

Applying the Filter

Now that we have a filter, we construct a new numpy query like so:

driver_bosely = car_races[driver_is_bosely], :]

This is very similar syntax to applying an index to get a result.  In this case, we’re using the filter driver_is_bosely to identify results in the numpy array that only have a driver’s last name of “Bosely.”

Leave a Reply

Your email address will not be published. Required fields are marked *