BrainsToBytes

Hands-on Pandas(5): Mapping, apply and applymap

In this article, we will learn about mapping and the apply and applymap functions.

This technique will help you manipulate your data in very convenient ways, and is another important addition to your toolbox.

As always, we will explore the topic with examples that will help you understand what's going on.

Great, let's get started!

Mapping

Mapping means applying a function that transforms the elements of a domain into the elements of another domain. In this case, the entries, rows, and columns in a series or dataframe. Pandas lets you apply functions at element, row, and column level to create new series and dataframes.

Pandas is also compatible with many of the operations defined in NumPy. This lets you apply functions in a very convenient and performant fashion. Let's see some examples:

import numpy as np
import pandas as pd

frame = pd.DataFrame(np.random.randn(4,5),
                     columns=list('abcde'),
                     index=['one', 'two', 'three', 'four'])
frame
a b c d e
one 3.007277 0.388730 0.113406 2.119481 -0.975847
two 0.636278 0.206911 1.778134 -1.663180 -1.211043
three 0.946199 -0.397836 -0.127306 -0.588036 1.026060
four -0.315198 -0.496803 -0.918301 0.389656 -1.515556
# You can apply NumPy functions directly on dataframes.
# You can, for example, calculate the absolute value of every entry
np.abs(frame)
a b c d e
one 3.007277 0.388730 0.113406 2.119481 0.975847
two 0.636278 0.206911 1.778134 1.663180 1.211043
three 0.946199 0.397836 0.127306 0.588036 1.026060
four 0.315198 0.496803 0.918301 0.389656 1.515556
# You can also calculate the 3rd power of every entry
np.power(frame, 3)
a b c d e
one 27.196948 0.058741 0.001459 9.521129 -0.929277
two 0.257597 0.008858 5.622036 -4.600633 -1.776145
three 0.847125 -0.062967 -0.002063 -0.203335 1.080236
four -0.031315 -0.122617 -0.774382 0.059162 -3.481093

You can apply many of NumPy's ufuncs to Pandas data structures, in most situations they provide a result with the same dimensions of the original structure.

Another important (and quite common) operation creates a new structure after applying an operation to every row or column in the original dataframe. Let's see how to create a new structure whose entries are the result of summing every column/row of our frame:

# Panda's apply runs a function along an axis. 
# The default behavior is to run it using the rows axis (apply the operation on every column)

# Let's produce a Series where each entry is the sum of the values in every column:

ser = frame.apply(np.sum)
ser
a    4.274556
b   -0.298998
c    0.845934
d    0.257921
e   -2.676385
dtype: float64
# If you want to perform the operation using columns as an axis (the operation will be applied on a per-row basis)
# You can pass the optional argument axis

ser = frame.apply(np.sum, axis='columns')
ser
one      4.653047
two     -0.252900
three    0.859082
four    -2.856201
dtype: float64

Again, you can use most NumPy ufuncs as an argument for the apply function, but it doesn't end there: You can define your own functions and use them with applymap. The following example applies a function that adds 2 to every entry:

def sum_two(entry):
    return entry + 2

frame.applymap(sum_two)
a b c d e
one 5.007277 2.388730 2.113406 4.119481 1.024153
two 2.636278 2.206911 3.778134 0.336820 0.788957
three 2.946199 1.602164 1.872694 1.411964 3.026060
four 1.684802 1.503197 1.081699 2.389656 0.484444
# You can do this using lambdas, it's usually easier to read:

sum_three = lambda x: x+3

frame.apply(sum_three)
a b c d e
one 6.007277 3.388730 3.113406 5.119481 2.024153
two 3.636278 3.206911 4.778134 1.336820 1.788957
three 3.946199 2.602164 2.872694 2.411964 4.026060
four 2.684802 2.503197 2.081699 3.389656 1.484444

Simple concept, endless applications

Performing mappings lets you do almost anything you need with your data. Anything, from statistical aggregations to advanced machine learning tools are built upon this foundation.

As you may have noticed, the concept is very simple, but knowing how to apply NumPy functions to Pandas data structures will help you on a daily basis. This is even more obvious when you start to explore the potential of applying your own functions!

In the next article, we will learn about data summarization and descriptive statistics.

Thank you for reading!

What to do next

Author image
Budapest, Hungary
Hey there, I'm Juan. A programmer currently living in Budapest. I believe in well-engineered solutions, clean code and sharing knowledge. Thanks for reading, I hope you find my articles useful!