Hands-on Pandas(3): Reindexing and Deletion

Today we will deal with two techniques we need to cover before moving to more advanced Pandas topics: Reindexing and element deletion.

It will be a bit shorter than the first two articles in the series, but that doesn't mean it's not important. Both techniques are very useful, and you will probably use them in your day-to-day work if you become a Pandas practitioner.

Good, let's get started!

Reindexing

Reindexing is a fancy word for creating a new dataframe/series with an altered index.

import pandas as pd

ser = pd.Series([2,1,3,4,7,6,5], index=['b', 'a', 'c', 'd', 'g', 'f', 'e'])
print(ser)

b    2
a    1
c    3
d    4
g    7
f    6
e    5
dtype: int64

The reindex function receives a list of index elements and creates a new dataframe (or series) in which the rows/elements follow the order specified in that list.

For example, we can create a new series where the numbers are ordered in ascending order by providing the following input for reindex:

ordered_ser = ser.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print(ordered_ser)

a    1
b    2
c    3
d    4
e    5
f    6
g    7
dtype: int64

You don't need to pass every element in the original index, you can provide a list with only the elements you need:

# This will create a new dataframe with the last four elements, in descending order
ordered_ser = ser.reindex(['g', 'f', 'e', 'd'])
print(ordered_ser)

g    7
f    6
e    5
d    4
dtype: int64

Sometimes you want to reindex the series/dataframe to expand the range of elements. In this case, you will probably find that some of the elements are set to NaN:

ser = pd.Series(['azul', 'rojo', 'verde'], index=[0,4,8])
ser.reindex(range(12))

0      azul
1       NaN
2       NaN
3       NaN
4      rojo
5       NaN
6       NaN
7       NaN
8     verde
9       NaN
10      NaN
11      NaN
dtype: object

# In this case, you can specify a fill method to dictate what will happen to the empty entries
# ffill, for example, performs a forward fill

ser.reindex(range(12), method='ffill')

0      azul
1      azul
2      azul
3      azul
4      rojo
5      rojo
6      rojo
7      rojo
8     verde
9     verde
10    verde
11    verde
dtype: object

Frames behave pretty much the same way, but they also let you reindex by column. Let's take a look at a final reindexing example using a dataframe:

import numpy as np

frame = pd.DataFrame(np.arange(16).reshape(4,4),
                     index = ['First', 'Second', 'Third', 'Fourth'],
                     columns = ['Alpha', 'Beta', 'Gamma', 'Delta'])

frame

	Alpha	Beta	Gamma	Delta
First	0	1	2	3
Second	4	5	6	7
Third	8	9	10	11
Fourth	12	13	14	15

# We can reindex using the row index
frame.reindex(['Fourth', 'Second'])

	Alpha	Beta	Gamma	Delta
Fourth	12	13	14	15
Second	4	5	6	7

# Or, reindex using the columns
frame.reindex(columns=['Alpha', 'Gamma'])

	Alpha	Gamma
First	0	2
Second	4	6
Third	8	10
Fourth	12	14

Deleting elements

Now we will learn how to remove elements from both series and dataframes. This is usually achieved using the drop method.

Note that calls to drop don't alter the original series/dataframe. Instead, they return a new one without the specified elements. If for some reason you need to alter the original series/dataframe, you can pass inplace=True as an argument.

ser = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
print(ser)

a    1
b    2
c    3
d    4
dtype: int64

# You can pass to drop the index value of the element you want to delete
ser.drop('b')

a    1
c    3
d    4
dtype: int64

# You can also pass a list of index values
ser.drop(['a', 'c'])

b    2
d    4
dtype: int64

Dataframes let you drop elements using both the row index and the column index.

frame

	Alpha	Beta	Gamma	Delta
First	0	1	2	3
Second	4	5	6	7
Third	8	9	10	11
Fourth	12	13	14	15

# Let's drop the second and fourth rows
frame.drop(['Second', 'Fourth'])

	Alpha	Beta	Gamma	Delta
First	0	1	2	3
Third	8	9	10	11

# If you add an additional argument set to axis='columns' (or axis=1) you will drop using the column index
# Let's get rid of the Alpha and Beta columns
frame.drop(['Alpha', 'Beta'], axis='columns')

	Gamma	Delta
First	2	3
Second	6	7
Third	10	11
Fourth	14	15

Data-wrangling basics

When exploring data, you will need to alter indexes and delete rows with elements you don't need. As with all previous articles, I'd like to encourage you to practice these techniques on your own until you feel comfortable with them.

In the next article, we will learn how to perform arithmetic operations with dataframes and series.

Thank you for reading!

What to do next

Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
You can find the source code for this series in this repo.
This article is based on Python for Data Analysis. These and other very helpful books can be found in the recommended reading list.
Send me an email with questions, comments or suggestions (it's in the About Me page)

Hands-on Pandas(3): Reindexing and Deletion

Reindexing

Deleting elements

Data-wrangling basics

What to do next

Newsletter

Recent Post

Categories