Today we will deal with two techniques we need to cover before moving to more advanced Pandas topics: Reindexing and element deletion.
It will be a bit shorter than the first two articles in the series, but that doesn't mean it's not important. Both techniques are very useful, and you will probably use them in your day-to-day work if you become a Pandas practitioner.
Good, let's get started!
Reindexing
Reindexing is a fancy word for creating a new dataframe/series with an altered index.
import pandas as pd
ser = pd.Series([2,1,3,4,7,6,5], index=['b', 'a', 'c', 'd', 'g', 'f', 'e'])
print(ser)
b 2
a 1
c 3
d 4
g 7
f 6
e 5
dtype: int64
The reindex function receives a list of index elements and creates a new dataframe (or series) in which the rows/elements follow the order specified in that list.
For example, we can create a new series where the numbers are ordered in ascending order by providing the following input for reindex:
ordered_ser = ser.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print(ordered_ser)
a 1
b 2
c 3
d 4
e 5
f 6
g 7
dtype: int64
You don't need to pass every element in the original index, you can provide a list with only the elements you need:
# This will create a new dataframe with the last four elements, in descending order
ordered_ser = ser.reindex(['g', 'f', 'e', 'd'])
print(ordered_ser)
g 7
f 6
e 5
d 4
dtype: int64
Sometimes you want to reindex the series/dataframe to expand the range of elements. In this case, you will probably find that some of the elements are set to NaN:
ser = pd.Series(['azul', 'rojo', 'verde'], index=[0,4,8])
ser.reindex(range(12))
0 azul
1 NaN
2 NaN
3 NaN
4 rojo
5 NaN
6 NaN
7 NaN
8 verde
9 NaN
10 NaN
11 NaN
dtype: object
# In this case, you can specify a fill method to dictate what will happen to the empty entries
# ffill, for example, performs a forward fill
ser.reindex(range(12), method='ffill')
0 azul
1 azul
2 azul
3 azul
4 rojo
5 rojo
6 rojo
7 rojo
8 verde
9 verde
10 verde
11 verde
dtype: object
Frames behave pretty much the same way, but they also let you reindex by column. Let's take a look at a final reindexing example using a dataframe:
import numpy as np
frame = pd.DataFrame(np.arange(16).reshape(4,4),
index = ['First', 'Second', 'Third', 'Fourth'],
columns = ['Alpha', 'Beta', 'Gamma', 'Delta'])
frame
Alpha | Beta | Gamma | Delta | |
---|---|---|---|---|
First | 0 | 1 | 2 | 3 |
Second | 4 | 5 | 6 | 7 |
Third | 8 | 9 | 10 | 11 |
Fourth | 12 | 13 | 14 | 15 |
# We can reindex using the row index
frame.reindex(['Fourth', 'Second'])
Alpha | Beta | Gamma | Delta | |
---|---|---|---|---|
Fourth | 12 | 13 | 14 | 15 |
Second | 4 | 5 | 6 | 7 |
# Or, reindex using the columns
frame.reindex(columns=['Alpha', 'Gamma'])
Alpha | Gamma | |
---|---|---|
First | 0 | 2 |
Second | 4 | 6 |
Third | 8 | 10 |
Fourth | 12 | 14 |
Deleting elements
Now we will learn how to remove elements from both series and dataframes. This is usually achieved using the drop method.
Note that calls to drop don't alter the original series/dataframe. Instead, they return a new one without the specified elements. If for some reason you need to alter the original series/dataframe, you can pass inplace=True
as an argument.
ser = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
print(ser)
a 1
b 2
c 3
d 4
dtype: int64
# You can pass to drop the index value of the element you want to delete
ser.drop('b')
a 1
c 3
d 4
dtype: int64
# You can also pass a list of index values
ser.drop(['a', 'c'])
b 2
d 4
dtype: int64
Dataframes let you drop elements using both the row index and the column index.
frame
Alpha | Beta | Gamma | Delta | |
---|---|---|---|---|
First | 0 | 1 | 2 | 3 |
Second | 4 | 5 | 6 | 7 |
Third | 8 | 9 | 10 | 11 |
Fourth | 12 | 13 | 14 | 15 |
# Let's drop the second and fourth rows
frame.drop(['Second', 'Fourth'])
Alpha | Beta | Gamma | Delta | |
---|---|---|---|---|
First | 0 | 1 | 2 | 3 |
Third | 8 | 9 | 10 | 11 |
# If you add an additional argument set to axis='columns' (or axis=1) you will drop using the column index
# Let's get rid of the Alpha and Beta columns
frame.drop(['Alpha', 'Beta'], axis='columns')
Gamma | Delta | |
---|---|---|
First | 2 | 3 |
Second | 6 | 7 |
Third | 10 | 11 |
Fourth | 14 | 15 |
Data-wrangling basics
When exploring data, you will need to alter indexes and delete rows with elements you don't need. As with all previous articles, I'd like to encourage you to practice these techniques on your own until you feel comfortable with them.
In the next article, we will learn how to perform arithmetic operations with dataframes and series.
Thank you for reading!
What to do next
- Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
- You can find the source code for this series in this repo.
- This article is based on Python for Data Analysis. These and other very helpful books can be found in the recommended reading list.
- Send me an email with questions, comments or suggestions (it's in the About Me page)