Hands-on Pandas(4): Arithmetics with DataFrames and Series

Arithmetic operations are some of the most fundamental (and important) things you can do with series and dataframes. In this article, we will learn how to perform basic operations using both series and dataframes.

We are interested in the following scenarios:

Operations between series with the same index.
Operations between dataframes with the same index.
Operations between dataframe/series with the same index.
Operations between series with different indexes.
Operations between dataframes with different indexes.
Operations between dataframe/series with different indexes.

Good, let's get started!

Same index, obvious behavior

If two (or more) series/dataframes share the same index (both row and column index in the case of dataframes), operations follow the obvious element-wise behavior you would expect if you've used NumPy in the past:

import pandas as pd
ser_1 = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
ser_2 = pd.Series([10,20,30,40], index=['a', 'b', 'c', 'd'])

print(ser_1)
print(ser_2)

a    1
b    2
c    3
d    4
dtype: int64
a    10
b    20
c    30
d    40
dtype: int64

# Addition of two series with the same index
ser_1 + ser_2

a    11
b    22
c    33
d    44
dtype: int64

# Subtraction of two series with the same index
ser_2 - ser_1

a     9
b    18
c    27
d    36
dtype: int64

# Multiplication of two series with the same index
ser_1 * ser_2

a     10
b     40
c     90
d    160
dtype: int64

# Division of two series with the same index
ser_2 / ser_1

a    10.0
b    10.0
c    10.0
d    10.0
dtype: float64

The same behavior is shown when you apply operations on two dataframes that share both the row and column index:

import numpy as np
df_1 = pd.DataFrame(np.arange(1,17).reshape(4,4),
                    index= ['Fi', 'Se', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'd'])

df_2 = pd.DataFrame(np.arange(1,17).reshape(4,4) * 10,
                    index= ['Fi', 'Se', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'd'])

df_1

	a	b	c	d
Fi	1	2	3	4
Se	5	6	7	8
Th	9	10	11	12
Fo	13	14	15	16

df_2

	a	b	c	d
Fi	10	20	30	40
Se	50	60	70	80
Th	90	100	110	120
Fo	130	140	150	160

# Addition of two dataframes with the same index
df_1 + df_2

	a	b	c	d
Fi	11	22	33	44
Se	55	66	77	88
Th	99	110	121	132
Fo	143	154	165	176

# Multiplication of two dataframes with the same index
df_1 * df_2

	a	b	c	d
Fi	10	40	90	160
Se	250	360	490	640
Th	810	1000	1210	1440
Fo	1690	1960	2250	2560

It's also possible to perform operations between dataframes and series that share an index. The default behavior is to align the index of the series with the column index of the dataframe and perform the operations between each row and the series.

# Sum a series and a dataframe
ser_1 + df_1

	a	b	c	d
Fi	2	4	6	8
Se	6	8	10	12
Th	10	12	14	16
Fo	14	16	18	20

Different index, outer joins

If you perform operations between series/dataframes with different index, the result will be a new data structure whose index is the union of the original indexes. If you have worked with databases before this is similar to an outer join using the indexes of the original series/dataframes. This is much easier to see with an example:

ser_1 = pd.Series([1,1,1,1,1], index=['a', 'b', 'c', 'd', 'e'])
ser_2 = pd.Series([5,5,5,5,5], index=['c', 'd', 'e', 'f', 'g'])

print(ser_1)
print(ser_2)

a    1
b    1
c    1
d    1
e    1
dtype: int64
c    5
d    5
e    5
f    5
g    5
dtype: int64

If the operation is performed on series with different indexes, the result will contain the result of the operation on all entries whose index is contained in the union of the original indexes. Elements outside of the union will be filled with NaN.

In this case, the union is ['c', 'd', 'e'].

ser_1 + ser_2

a    NaN
b    NaN
c    6.0
d    6.0
e    6.0
f    NaN
g    NaN
dtype: float64

ser_1 * ser_2

a    NaN
b    NaN
c    5.0
d    5.0
e    5.0
f    NaN
g    NaN
dtype: float64

Dataframes have the same behavior, but the unions are performed on both the row and column index.

import numpy as np

# In this case, the union are the elements [a,b,c] in the columns and [Fi,Fo,Th] in the rows

df_1 = pd.DataFrame(np.arange(1,17).reshape(4,4),
                    index= ['Fi', 'Ma', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'd'])

df_2 = pd.DataFrame(np.arange(1,17).reshape(4,4) * 10,
                    index= ['Fi', 'Se', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'e'])

df_1 + df_2

	a	b	c	d	e
Fi	11.0	22.0	33.0	NaN	NaN
Fo	143.0	154.0	165.0	NaN	NaN
Ma	NaN	NaN	NaN	NaN	NaN
Se	NaN	NaN	NaN	NaN	NaN
Th	99.0	110.0	121.0	NaN	NaN

In the case of operations between dataframes and series with different indexes, a union will be performed between the column index of the dataframe and the index of the series:

df_1 + ser_2

	a	b	c	d	e	f	g
Fi	NaN	NaN	8.0	9.0	NaN	NaN	NaN
Ma	NaN	NaN	12.0	13.0	NaN	NaN	NaN
Th	NaN	NaN	16.0	17.0	NaN	NaN	NaN
Fo	NaN	NaN	20.0	21.0	NaN	NaN	NaN

Filling in missing values

Instead of using the normal arithmetic operators, you can use a set of built-in Pandas functions that accept an argument to fill-in missing values:

add/radd
sub/rsub
div/rdiv
mul/rmul
pow/rpow

Let's revisit series addition and use 0 as placeholder value:

ser_1.add(ser_2, fill_value=1)

a    2.0
b    2.0
c    6.0
d    6.0
e    6.0
f    6.0
g    6.0
dtype: float64

If an entry is not in the overlap of the two series, the sum operation will be performed against a placeholder value of 0. For example, for indexes a/b, both are 1+0, and for f/g it is 5+0. The same behavior applies to dataframes.

Now you know maths

The toughest thing about working with arithmetic operations using pandas data structures is understanding how it works when indexes are not the same. As long as you remember that it behaves like an outer join, everything will be clear and easy.

In the next article, we will talk about mapping and function application, our first advance-y Pandas topics!

Thanks for reading!

What to do next

Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
You can find the source code for this series in this repo.
This article is based on Python for Data Analysis. These and other very helpful books can be found in the recommended reading list.
Send me an email with questions, comments or suggestions (it's in the About Me page)

Hands-on Pandas(4): Arithmetics with DataFrames and Series

Same index, obvious behavior

Different index, outer joins

Filling in missing values

Now you know maths

What to do next

Newsletter

Recent Post

Categories