In this article, we will learn about mapping and the apply and applymap functions.
This technique will help you manipulate your data in very convenient ways, and is another important addition to your toolbox.
As always, we will explore the topic with examples that will help you understand what's going on.
Great, let's get started!
Mapping
Mapping means applying a function that transforms the elements of a domain into the elements of another domain. In this case, the entries, rows, and columns in a series or dataframe. Pandas lets you apply functions at element, row, and column level to create new series and dataframes.
Pandas is also compatible with many of the operations defined in NumPy. This lets you apply functions in a very convenient and performant fashion. Let's see some examples:
import numpy as np
import pandas as pd
frame = pd.DataFrame(np.random.randn(4,5),
columns=list('abcde'),
index=['one', 'two', 'three', 'four'])
frame
a | b | c | d | e | |
---|---|---|---|---|---|
one | 3.007277 | 0.388730 | 0.113406 | 2.119481 | -0.975847 |
two | 0.636278 | 0.206911 | 1.778134 | -1.663180 | -1.211043 |
three | 0.946199 | -0.397836 | -0.127306 | -0.588036 | 1.026060 |
four | -0.315198 | -0.496803 | -0.918301 | 0.389656 | -1.515556 |
# You can apply NumPy functions directly on dataframes.
# You can, for example, calculate the absolute value of every entry
np.abs(frame)
a | b | c | d | e | |
---|---|---|---|---|---|
one | 3.007277 | 0.388730 | 0.113406 | 2.119481 | 0.975847 |
two | 0.636278 | 0.206911 | 1.778134 | 1.663180 | 1.211043 |
three | 0.946199 | 0.397836 | 0.127306 | 0.588036 | 1.026060 |
four | 0.315198 | 0.496803 | 0.918301 | 0.389656 | 1.515556 |
# You can also calculate the 3rd power of every entry
np.power(frame, 3)
a | b | c | d | e | |
---|---|---|---|---|---|
one | 27.196948 | 0.058741 | 0.001459 | 9.521129 | -0.929277 |
two | 0.257597 | 0.008858 | 5.622036 | -4.600633 | -1.776145 |
three | 0.847125 | -0.062967 | -0.002063 | -0.203335 | 1.080236 |
four | -0.031315 | -0.122617 | -0.774382 | 0.059162 | -3.481093 |
You can apply many of NumPy's ufuncs to Pandas data structures, in most situations they provide a result with the same dimensions of the original structure.
Another important (and quite common) operation creates a new structure after applying an operation to every row or column in the original dataframe. Let's see how to create a new structure whose entries are the result of summing every column/row of our frame:
# Panda's apply runs a function along an axis.
# The default behavior is to run it using the rows axis (apply the operation on every column)
# Let's produce a Series where each entry is the sum of the values in every column:
ser = frame.apply(np.sum)
ser
a 4.274556
b -0.298998
c 0.845934
d 0.257921
e -2.676385
dtype: float64
# If you want to perform the operation using columns as an axis (the operation will be applied on a per-row basis)
# You can pass the optional argument axis
ser = frame.apply(np.sum, axis='columns')
ser
one 4.653047
two -0.252900
three 0.859082
four -2.856201
dtype: float64
Again, you can use most NumPy ufuncs as an argument for the apply function, but it doesn't end there: You can define your own functions and use them with applymap. The following example applies a function that adds 2 to every entry:
def sum_two(entry):
return entry + 2
frame.applymap(sum_two)
a | b | c | d | e | |
---|---|---|---|---|---|
one | 5.007277 | 2.388730 | 2.113406 | 4.119481 | 1.024153 |
two | 2.636278 | 2.206911 | 3.778134 | 0.336820 | 0.788957 |
three | 2.946199 | 1.602164 | 1.872694 | 1.411964 | 3.026060 |
four | 1.684802 | 1.503197 | 1.081699 | 2.389656 | 0.484444 |
# You can do this using lambdas, it's usually easier to read:
sum_three = lambda x: x+3
frame.apply(sum_three)
a | b | c | d | e | |
---|---|---|---|---|---|
one | 6.007277 | 3.388730 | 3.113406 | 5.119481 | 2.024153 |
two | 3.636278 | 3.206911 | 4.778134 | 1.336820 | 1.788957 |
three | 3.946199 | 2.602164 | 2.872694 | 2.411964 | 4.026060 |
four | 2.684802 | 2.503197 | 2.081699 | 3.389656 | 1.484444 |
Simple concept, endless applications
Performing mappings lets you do almost anything you need with your data. Anything, from statistical aggregations to advanced machine learning tools are built upon this foundation.
As you may have noticed, the concept is very simple, but knowing how to apply NumPy functions to Pandas data structures will help you on a daily basis. This is even more obvious when you start to explore the potential of applying your own functions!
In the next article, we will learn about data summarization and descriptive statistics.
Thank you for reading!
What to do next
- Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
- You can find the source code for this series in this repo.
- This article is based on Python for Data Analysis. These and other very helpful books can be found in the recommended reading list.
- Send me an email with questions, comments or suggestions (it's in the About Me page)