BrainsToBytes

Hands-on Pandas(3): Reindexing and Deletion

Today we will deal with two techniques we need to cover before moving to more advanced Pandas topics: Reindexing and element deletion.

It will be a bit shorter than the first two articles in the series, but that doesn't mean it's not important. Both techniques are very useful, and you will probably use them in your day-to-day work if you become a Pandas practitioner.

Good, let's get started!

Reindexing

Reindexing is a fancy word for creating a new dataframe/series with an altered index.

import pandas as pd

ser = pd.Series([2,1,3,4,7,6,5], index=['b', 'a', 'c', 'd', 'g', 'f', 'e'])
print(ser)
b    2
a    1
c    3
d    4
g    7
f    6
e    5
dtype: int64

The reindex function receives a list of index elements and creates a new dataframe (or series) in which the rows/elements follow the order specified in that list.

For example, we can create a new series where the numbers are ordered in ascending order by providing the following input for reindex:

ordered_ser = ser.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print(ordered_ser)
a    1
b    2
c    3
d    4
e    5
f    6
g    7
dtype: int64

You don't need to pass every element in the original index, you can provide a list with only the elements you need:

# This will create a new dataframe with the last four elements, in descending order
ordered_ser = ser.reindex(['g', 'f', 'e', 'd'])
print(ordered_ser)
g    7
f    6
e    5
d    4
dtype: int64

Sometimes you want to reindex the series/dataframe to expand the range of elements. In this case, you will probably find that some of the elements are set to NaN:

ser = pd.Series(['azul', 'rojo', 'verde'], index=[0,4,8])
ser.reindex(range(12))
0      azul
1       NaN
2       NaN
3       NaN
4      rojo
5       NaN
6       NaN
7       NaN
8     verde
9       NaN
10      NaN
11      NaN
dtype: object
# In this case, you can specify a fill method to dictate what will happen to the empty entries
# ffill, for example, performs a forward fill

ser.reindex(range(12), method='ffill')
0      azul
1      azul
2      azul
3      azul
4      rojo
5      rojo
6      rojo
7      rojo
8     verde
9     verde
10    verde
11    verde
dtype: object

Frames behave pretty much the same way, but they also let you reindex by column. Let's take a look at a final reindexing example using a dataframe:

import numpy as np

frame = pd.DataFrame(np.arange(16).reshape(4,4),
                     index = ['First', 'Second', 'Third', 'Fourth'],
                     columns = ['Alpha', 'Beta', 'Gamma', 'Delta'])

frame
Alpha Beta Gamma Delta
First 0 1 2 3
Second 4 5 6 7
Third 8 9 10 11
Fourth 12 13 14 15
# We can reindex using the row index
frame.reindex(['Fourth', 'Second'])
Alpha Beta Gamma Delta
Fourth 12 13 14 15
Second 4 5 6 7
# Or, reindex using the columns
frame.reindex(columns=['Alpha', 'Gamma'])
Alpha Gamma
First 0 2
Second 4 6
Third 8 10
Fourth 12 14

Deleting elements

Now we will learn how to remove elements from both series and dataframes. This is usually achieved using the drop method.

Note that calls to drop don't alter the original series/dataframe. Instead, they return a new one without the specified elements. If for some reason you need to alter the original series/dataframe, you can pass inplace=True as an argument.

ser = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
print(ser)
a    1
b    2
c    3
d    4
dtype: int64
# You can pass to drop the index value of the element you want to delete
ser.drop('b')
a    1
c    3
d    4
dtype: int64
# You can also pass a list of index values
ser.drop(['a', 'c'])
b    2
d    4
dtype: int64

Dataframes let you drop elements using both the row index and the column index.

frame
Alpha Beta Gamma Delta
First 0 1 2 3
Second 4 5 6 7
Third 8 9 10 11
Fourth 12 13 14 15
# Let's drop the second and fourth rows
frame.drop(['Second', 'Fourth'])
Alpha Beta Gamma Delta
First 0 1 2 3
Third 8 9 10 11
# If you add an additional argument set to axis='columns' (or axis=1) you will drop using the column index
# Let's get rid of the Alpha and Beta columns
frame.drop(['Alpha', 'Beta'], axis='columns')
Gamma Delta
First 2 3
Second 6 7
Third 10 11
Fourth 14 15

Data-wrangling basics

When exploring data, you will need to alter indexes and delete rows with elements you don't need. As with all previous articles, I'd like to encourage you to practice these techniques on your own until you feel comfortable with them.

In the next article, we will learn how to perform arithmetic operations with dataframes and series.

Thank you for reading!

What to do next

Author image
Budapest, Hungary
Hey there, I'm Juan. A programmer currently living in Budapest. I believe in well-engineered solutions, clean code and sharing knowledge. Thanks for reading, I hope you find my articles useful!