Hands-on Pandas(11): The apply function

We have already covered most of the fundamentals of working with data using the Pandas library. There is one more topic I'd like to discuss before concluding the series: The Apply function.

In the previous article, we learned how to create subgroups of data using the groupby function. This is quite useful when you want to gain a better understanding of certain subsets of data or perform group aggregations. Today we will add another resource to your toolbox that will let you use those groups for much more.

Apply lets you perform more complex computations on the groups you create, it works like this: The function you provide to apply is called on each of the groups, and the results are concatenated into a single final data structure.

ApplyAtWork

Again, this is much easier to understand with practical examples, so let's get started!

Basic applications of apply

We will use the same table with Pokemon data we used in the last article.

First, let's import pandas and examine the contents of our DataFrame.

import pandas as pd

pdata = pd.read_csv('./sample_data/poke_colors.csv')
pdata

	Name	Color	Evolves	HP	Attack	Defense	SpAtk	SpDef	Speed
0	Caterpie	Green	True	45	30	35	20	20	45
1	Metapod	Green	True	50	20	55	25	25	30
2	Scyther	Green	False	70	110	80	55	80	105
3	Bulbasaur	Green	True	45	49	49	65	65	45
4	Dratini	Blue	True	41	64	45	50	50	50
5	Squirtle	Blue	True	44	48	65	50	64	43
6	Poliwag	Blue	True	40	50	40	40	40	90
7	Poliwhirl	Blue	True	65	65	65	50	50	90
8	Charmander	Red	True	39	52	43	60	50	65
9	Magmar	Red	False	65	95	57	100	85	93
10	Paras	Red	True	35	70	55	45	55	25
11	Parasect	Red	False	60	95	80	60	80	30
12	Pikachu	Yellow	True	35	55	40	50	50	90
13	Abra	Yellow	True	25	20	15	105	55	90
14	Psyduck	Yellow	True	50	52	48	65	50	55
15	Kadabra	Yellow	True	40	35	30	120	70	10

Apply's most important argument is a function. This function will be run on every group of data and the results will be concatenated in a final data structure. We will create a simple function that returns the two Pokemon with the highest attack value, something like this:

# Two pokes with the highest attack

def highest_attack(data_frame):
    # Remember how [] works, this selects the last two (highest) Attack entries after sorting
    return data_frame.sort_values(by='Attack')[-2:]

# Let's test it on the complete dataframe
highest_attack(pdata)

	Name	Color	Evolves	HP	Attack	Defense	SpAtk	SpDef	Speed
11	Parasect	Red	False	60	95	80	60	80	30
2	Scyther	Green	False	70	110	80	55	80	105

Now let's see how to use apply to do something a bit more interesting. We want to find the two pokemon with the highest attack value on a by-color basis. For doing this, we will group them by Color and then pass highest_attack to apply, something like this:

# Now, let's find which are the two pokemon with the highest attack on each color group:
pdata.groupby('Color').apply(highest_attack)

		Name	Color	Evolves	HP	Attack	Defense	SpAtk	SpDef	Speed
Color
Blue	4	Dratini	Blue	True	41	64	45	50	50	50
Blue	7	Poliwhirl	Blue	True	65	65	65	50	50	90
Green	3	Bulbasaur	Green	True	45	49	49	65	65	45
Green	2	Scyther	Green	False	70	110	80	55	80	105
Red	9	Magmar	Red	False	65	95	57	100	85	93
Red	11	Parasect	Red	False	60	95	80	60	80	30
Yellow	14	Psyduck	Yellow	True	50	52	48	65	50	55
Yellow	12	Pikachu	Yellow	True	35	55	40	50	50	90

Notice how the final table is the result of concatenating together the results of running highest_attack on every group!

Functions with extra arguments

The functions you pass to the apply method can receive additional arguments. Let's create another version of our function, this time called highest_attribute, that lets you specify the attribute to take into consideration and the n highest pokemon you want to select from each group:

# We set the default attribute as HP and the default n to 2
def highest_attribute(data_frame, attribute='HP', n=2):
    return data_frame.sort_values(by=attribute)[-n:]

pdata.groupby('Color').apply(highest_attribute, 'Defense', 3)

		Name	Color	Evolves	HP	Attack	Defense	SpAtk	SpDef	Speed
Color
Blue	4	Dratini	Blue	True	41	64	45	50	50	50
	5	Squirtle	Blue	True	44	48	65	50	64	43
	7	Poliwhirl	Blue	True	65	65	65	50	50	90
Green	3	Bulbasaur	Green	True	45	49	49	65	65	45
	1	Metapod	Green	True	50	20	55	25	25	30
	2	Scyther	Green	False	70	110	80	55	80	105
Red	10	Paras	Red	True	35	70	55	45	55	25
	9	Magmar	Red	False	65	95	57	100	85	93
	11	Parasect	Red	False	60	95	80	60	80	30
Yellow	15	Kadabra	Yellow	True	40	35	30	120	70	10
	12	Pikachu	Yellow	True	35	55	40	50	50	90
	14	Psyduck	Yellow	True	50	52	48	65	50	55

Notice how the additional parameters are passed to the apply function, not to sort_values itself. Internally, apply makes sure that the right parameters are passed to whatever function it's applying.

Using lambdas as an argument for apply

As a final note, sometimes you won't want to write a complete function definition if what you want to accomplish is very simple. In this case, you can pass a lambda function. In our next example we will use this approach to select from each group the pokemon whose name appears first in alphabetical ordering in each group:

pdata.groupby('Color').apply(lambda df: df.sort_values('Name').head(1) )

		Name	Color	Evolves	HP	Attack	Defense	SpAtk	SpDef	Speed
Color
Blue	4	Dratini	Blue	True	41	64	45	50	50	50
Green	3	Bulbasaur	Green	True	45	49	49	65	65	45
Red	8	Charmander	Red	True	39	52	43	60	50	65
Yellow	13	Abra	Yellow	True	25	20	15	105	55	90

Practice makes perfect

Apply is an incredibly flexible function that, if used in creative ways, lets you solve a huge variety of problems in data manipulation and transformation. This article exposed you to the basic concepts of the function, but make sure to study it further and experiment with real datasets.

As a closing remark, I'd like to share a quotation from the book Python For Data Analysis (2nd), in which this series is largely based on:

Beyond these basic usage mechanics, getting the most out of apply
may require some creativity. What occurs inside the function
passed is up to you; it only needs to return a pandas object or a
scalar value. The rest of this chapter will mainly consist of examples
showing you how to solve various problems using groupby

With this article, we conclude our Hands-on Pandas series. It's been a lot of fun to write, and I really hope you learned one or two interesting things along the way.

Pandas, like every other software tool or skill, requires a good amount of practice before it becomes truly useful. Don't worry if you don't immediately know how to tackle a dataset or which function to call, with experience and continued exposure it will become second nature.

If you need help, remember that Pandas has some of the best docs around and a huge, helpful community that will guide you into finding a solution. I wish you a happy and productive learning process!

Thank you for reading!

What to do next

Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
You can find the source code for this series in this repo.
This article is based on Python for Data Analysis. These and other very helpful books can be found in the recommended reading list.
Send me an email with questions, comments, or suggestions (it's in the About Me page)

Hands-on Pandas(11): The apply function

Basic applications of apply

Functions with extra arguments

Using lambdas as an argument for apply

Practice makes perfect

What to do next

Newsletter

Recent Post

Categories

Hands-on Pandas(11): The apply function

Basic applications of apply

Functions with extra arguments

Using lambdas as an argument for apply

Practice makes perfect

That's all the Pandas I can share, for now

What to do next

Newsletter

Recent Post

Categories