The hitchhiker’s guide to indexing in Pandas.

2
The hitchhiker’s guide to indexing in Pandas.

Pandas is one of the most powerful data manipulation tools out there but when a data scientist can leverage the power of indexing to his advantage, it makes pandas the best data manipulation tool out there! This guide explores how a data scientist can effectively use indexing to manipulate data.

For the purpose of this tutorial we are going to use the popular human resource analytics dataset found on Kaggle.

The first step is to load the data into jupyter notebook for analysis. We can do that using the code shown below:

The next step is to convert the columns that are not categorical into categorical so that we can index it better:

Now that we’ve got the boring things out of the way let’s start with the indexing!

What is an index in pandas?

Simply put an index is a label that is used to identify a particular row in your data frame. Some of the things that you should probably know about indexes are:

  • Indexes are immutable.
  • An index series as to be of the same data type (Example: Only float, only Int64 or only categorical etc)

Let’s build a simple index series in pandas. We can do this using the code shown below:

This results in an output as shown below:

The code above first creates a list called ‘index_example’ which contains the index series that we want. Next we create a list called ‘value’ which is essentially the columns of the data frame of interest. We then use the pandas series() function to create the series with the ‘index’ argument set to the list ‘index_example’ to specify the index labels.

We can set a name for this index with the code shown below:

This results in an output as shown below:

If you want to get the contents of the index column and the type it is you can use the code shown below:

This results in an output as shown below:

We can view what a particular index using the code show below:

This code prints out the second index which is ‘label2’

We can index any particular row of interest from our data frame using the pandas read_csv() method using the code shown below:

This results in an output as shown below:

 

In the code above we used the index_col = ‘salary’ in order to index the salary column.

Let’s now try multi-indexing. In this example the code below is used to set two columns as our index:

This results in a data frame as shown below: 

However in the above case we can see that our indexes are not sorted very well. We can sort the index using the code shown below:

This results in a data frame as shown below:

 

What if we wanted information regarding employees with ‘high’ salaries only? We can do this with the code shown below:

This results in a data frame having information of employees having only high salaries:

 

We can extract the satisfaction levels alone of the high salaried employees using the code shown below:

This results in a data frame as shown below which contains information regarding satisfaction levels only:

 

We can select a range of index values to slice on. We can extract information regarding the employees with both high and medium salaries using the code shown below:

This returns the information regarding the employees having both ‘high’ and ‘medium salaries.

Perhaps the most powerful way one can make use of indexing in pandas is when you have time series data. Let’s illustrate this with an example below:

The above time-series index can then be used to create time-series visualizations by simply calling the plot() function on the series above using the code shown below:

This generates a plot as shown below: 

 

Thus we see how indexing can be a powerful tool in the toolkit of any data scientist and if leveraged at the right place and right time can be used to extract information and manipulate your data frames in ways you could never have done before!

Happy Indexing!

 

 

 

  1. I simply want to tell you that I am just very new to blogs and honestly enjoyed this web-site. More than likely I’m planning to bookmark your website . You surely have perfect well written articles. Thanks a bunch for sharing your blog.

LEAVE A REPLY