BrainsToBytes

Hands-on Pandas(4): Arithmetics with DataFrames and Series

Arithmetic operations are some of the most fundamental (and important) things you can do with series and dataframes. In this article, we will learn how to perform basic operations using both series and dataframes.

We are interested in the following scenarios:

  • Operations between series with the same index.
  • Operations between dataframes with the same index.
  • Operations between dataframe/series with the same index.
  • Operations between series with different indexes.
  • Operations between dataframes with different indexes.
  • Operations between dataframe/series with different indexes.

Good, let's get started!

Same index, obvious behavior

If two (or more) series/dataframes share the same index (both row and column index in the case of dataframes), operations follow the obvious element-wise behavior you would expect if you've used NumPy in the past:

import pandas as pd
ser_1 = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
ser_2 = pd.Series([10,20,30,40], index=['a', 'b', 'c', 'd'])

print(ser_1)
print(ser_2)
a    1
b    2
c    3
d    4
dtype: int64
a    10
b    20
c    30
d    40
dtype: int64
# Addition of two series with the same index
ser_1 + ser_2
a    11
b    22
c    33
d    44
dtype: int64
# Subtraction of two series with the same index
ser_2 - ser_1
a     9
b    18
c    27
d    36
dtype: int64
# Multiplication of two series with the same index
ser_1 * ser_2
a     10
b     40
c     90
d    160
dtype: int64
# Division of two series with the same index
ser_2 / ser_1
a    10.0
b    10.0
c    10.0
d    10.0
dtype: float64

The same behavior is shown when you apply operations on two dataframes that share both the row and column index:

import numpy as np
df_1 = pd.DataFrame(np.arange(1,17).reshape(4,4),
                    index= ['Fi', 'Se', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'd'])

df_2 = pd.DataFrame(np.arange(1,17).reshape(4,4) * 10,
                    index= ['Fi', 'Se', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'd'])
df_1
a b c d
Fi 1 2 3 4
Se 5 6 7 8
Th 9 10 11 12
Fo 13 14 15 16
df_2
a b c d
Fi 10 20 30 40
Se 50 60 70 80
Th 90 100 110 120
Fo 130 140 150 160
# Addition of two dataframes with the same index
df_1 + df_2
a b c d
Fi 11 22 33 44
Se 55 66 77 88
Th 99 110 121 132
Fo 143 154 165 176
# Multiplication of two dataframes with the same index
df_1 * df_2
a b c d
Fi 10 40 90 160
Se 250 360 490 640
Th 810 1000 1210 1440
Fo 1690 1960 2250 2560

It's also possible to perform operations between dataframes and series that share an index. The default behavior is to align the index of the series with the column index of the dataframe and perform the operations between each row and the series.

# Sum a series and a dataframe
ser_1 + df_1
a b c d
Fi 2 4 6 8
Se 6 8 10 12
Th 10 12 14 16
Fo 14 16 18 20

Different index, outer joins

If you perform operations between series/dataframes with different index, the result will be a new data structure whose index is the union of the original indexes. If you have worked with databases before this is similar to an outer join using the indexes of the original series/dataframes. This is much easier to see with an example:

ser_1 = pd.Series([1,1,1,1,1], index=['a', 'b', 'c', 'd', 'e'])
ser_2 = pd.Series([5,5,5,5,5], index=['c', 'd', 'e', 'f', 'g'])

print(ser_1)
print(ser_2)
a    1
b    1
c    1
d    1
e    1
dtype: int64
c    5
d    5
e    5
f    5
g    5
dtype: int64

If the operation is performed on series with different indexes, the result will contain the result of the operation on all entries whose index is contained in the union of the original indexes. Elements outside of the union will be filled with NaN.

In this case, the union is ['c', 'd', 'e'].

ser_1 + ser_2
a    NaN
b    NaN
c    6.0
d    6.0
e    6.0
f    NaN
g    NaN
dtype: float64
ser_1 * ser_2
a    NaN
b    NaN
c    5.0
d    5.0
e    5.0
f    NaN
g    NaN
dtype: float64

Dataframes have the same behavior, but the unions are performed on both the row and column index.

import numpy as np

# In this case, the union are the elements [a,b,c] in the columns and [Fi,Fo,Th] in the rows

df_1 = pd.DataFrame(np.arange(1,17).reshape(4,4),
                    index= ['Fi', 'Ma', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'd'])

df_2 = pd.DataFrame(np.arange(1,17).reshape(4,4) * 10,
                    index= ['Fi', 'Se', 'Th', 'Fo'],
                    columns = ['a', 'b', 'c', 'e'])

df_1 + df_2
a b c d e
Fi 11.0 22.0 33.0 NaN NaN
Fo 143.0 154.0 165.0 NaN NaN
Ma NaN NaN NaN NaN NaN
Se NaN NaN NaN NaN NaN
Th 99.0 110.0 121.0 NaN NaN

In the case of operations between dataframes and series with different indexes, a union will be performed between the column index of the dataframe and the index of the series:

df_1 + ser_2
a b c d e f g
Fi NaN NaN 8.0 9.0 NaN NaN NaN
Ma NaN NaN 12.0 13.0 NaN NaN NaN
Th NaN NaN 16.0 17.0 NaN NaN NaN
Fo NaN NaN 20.0 21.0 NaN NaN NaN

Filling in missing values

Instead of using the normal arithmetic operators, you can use a set of built-in Pandas functions that accept an argument to fill-in missing values:

  • add/radd
  • sub/rsub
  • div/rdiv
  • mul/rmul
  • pow/rpow

Let's revisit series addition and use 0 as placeholder value:

ser_1.add(ser_2, fill_value=1)
a    2.0
b    2.0
c    6.0
d    6.0
e    6.0
f    6.0
g    6.0
dtype: float64

If an entry is not in the overlap of the two series, the sum operation will be performed against a placeholder value of 0. For example, for indexes a/b, both are 1+0, and for f/g it is 5+0. The same behavior applies to dataframes.

Now you know maths

The toughest thing about working with arithmetic operations using pandas data structures is understanding how it works when indexes are not the same. As long as you remember that it behaves like an outer join, everything will be clear and easy.

In the next article, we will talk about mapping and function application, our first advance-y Pandas topics!

Thanks for reading!

What to do next

Author image
Budapest, Hungary
Hey there, I'm Juan. A programmer currently living in Budapest. I believe in well-engineered solutions, clean code and sharing knowledge. Thanks for reading, I hope you find my articles useful!