Arithmetic operations are some of the most fundamental (and important) things you can do with series and dataframes. In this article, we will learn how to perform basic operations using both series and dataframes.
We are interested in the following scenarios:
- Operations between series with the same index.
- Operations between dataframes with the same index.
- Operations between dataframe/series with the same index.
- Operations between series with different indexes.
- Operations between dataframes with different indexes.
- Operations between dataframe/series with different indexes.
Good, let's get started!
Same index, obvious behavior
If two (or more) series/dataframes share the same index (both row and column index in the case of dataframes), operations follow the obvious element-wise behavior you would expect if you've used NumPy in the past:
import pandas as pd
ser_1 = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
ser_2 = pd.Series([10,20,30,40], index=['a', 'b', 'c', 'd'])
print(ser_1)
print(ser_2)
a 1
b 2
c 3
d 4
dtype: int64
a 10
b 20
c 30
d 40
dtype: int64
# Addition of two series with the same index
ser_1 + ser_2
a 11
b 22
c 33
d 44
dtype: int64
# Subtraction of two series with the same index
ser_2 - ser_1
a 9
b 18
c 27
d 36
dtype: int64
# Multiplication of two series with the same index
ser_1 * ser_2
a 10
b 40
c 90
d 160
dtype: int64
# Division of two series with the same index
ser_2 / ser_1
a 10.0
b 10.0
c 10.0
d 10.0
dtype: float64
The same behavior is shown when you apply operations on two dataframes that share both the row and column index:
import numpy as np
df_1 = pd.DataFrame(np.arange(1,17).reshape(4,4),
index= ['Fi', 'Se', 'Th', 'Fo'],
columns = ['a', 'b', 'c', 'd'])
df_2 = pd.DataFrame(np.arange(1,17).reshape(4,4) * 10,
index= ['Fi', 'Se', 'Th', 'Fo'],
columns = ['a', 'b', 'c', 'd'])
df_1
a | b | c | d | |
---|---|---|---|---|
Fi | 1 | 2 | 3 | 4 |
Se | 5 | 6 | 7 | 8 |
Th | 9 | 10 | 11 | 12 |
Fo | 13 | 14 | 15 | 16 |
df_2
a | b | c | d | |
---|---|---|---|---|
Fi | 10 | 20 | 30 | 40 |
Se | 50 | 60 | 70 | 80 |
Th | 90 | 100 | 110 | 120 |
Fo | 130 | 140 | 150 | 160 |
# Addition of two dataframes with the same index
df_1 + df_2
a | b | c | d | |
---|---|---|---|---|
Fi | 11 | 22 | 33 | 44 |
Se | 55 | 66 | 77 | 88 |
Th | 99 | 110 | 121 | 132 |
Fo | 143 | 154 | 165 | 176 |
# Multiplication of two dataframes with the same index
df_1 * df_2
a | b | c | d | |
---|---|---|---|---|
Fi | 10 | 40 | 90 | 160 |
Se | 250 | 360 | 490 | 640 |
Th | 810 | 1000 | 1210 | 1440 |
Fo | 1690 | 1960 | 2250 | 2560 |
It's also possible to perform operations between dataframes and series that share an index. The default behavior is to align the index of the series with the column index of the dataframe and perform the operations between each row and the series.
# Sum a series and a dataframe
ser_1 + df_1
a | b | c | d | |
---|---|---|---|---|
Fi | 2 | 4 | 6 | 8 |
Se | 6 | 8 | 10 | 12 |
Th | 10 | 12 | 14 | 16 |
Fo | 14 | 16 | 18 | 20 |
Different index, outer joins
If you perform operations between series/dataframes with different index, the result will be a new data structure whose index is the union of the original indexes. If you have worked with databases before this is similar to an outer join using the indexes of the original series/dataframes. This is much easier to see with an example:
ser_1 = pd.Series([1,1,1,1,1], index=['a', 'b', 'c', 'd', 'e'])
ser_2 = pd.Series([5,5,5,5,5], index=['c', 'd', 'e', 'f', 'g'])
print(ser_1)
print(ser_2)
a 1
b 1
c 1
d 1
e 1
dtype: int64
c 5
d 5
e 5
f 5
g 5
dtype: int64
If the operation is performed on series with different indexes, the result will contain the result of the operation on all entries whose index is contained in the union of the original indexes. Elements outside of the union will be filled with NaN.
In this case, the union is ['c', 'd', 'e']
.
ser_1 + ser_2
a NaN
b NaN
c 6.0
d 6.0
e 6.0
f NaN
g NaN
dtype: float64
ser_1 * ser_2
a NaN
b NaN
c 5.0
d 5.0
e 5.0
f NaN
g NaN
dtype: float64
Dataframes have the same behavior, but the unions are performed on both the row and column index.
import numpy as np
# In this case, the union are the elements [a,b,c] in the columns and [Fi,Fo,Th] in the rows
df_1 = pd.DataFrame(np.arange(1,17).reshape(4,4),
index= ['Fi', 'Ma', 'Th', 'Fo'],
columns = ['a', 'b', 'c', 'd'])
df_2 = pd.DataFrame(np.arange(1,17).reshape(4,4) * 10,
index= ['Fi', 'Se', 'Th', 'Fo'],
columns = ['a', 'b', 'c', 'e'])
df_1 + df_2
a | b | c | d | e | |
---|---|---|---|---|---|
Fi | 11.0 | 22.0 | 33.0 | NaN | NaN |
Fo | 143.0 | 154.0 | 165.0 | NaN | NaN |
Ma | NaN | NaN | NaN | NaN | NaN |
Se | NaN | NaN | NaN | NaN | NaN |
Th | 99.0 | 110.0 | 121.0 | NaN | NaN |
In the case of operations between dataframes and series with different indexes, a union will be performed between the column index of the dataframe and the index of the series:
df_1 + ser_2
a | b | c | d | e | f | g | |
---|---|---|---|---|---|---|---|
Fi | NaN | NaN | 8.0 | 9.0 | NaN | NaN | NaN |
Ma | NaN | NaN | 12.0 | 13.0 | NaN | NaN | NaN |
Th | NaN | NaN | 16.0 | 17.0 | NaN | NaN | NaN |
Fo | NaN | NaN | 20.0 | 21.0 | NaN | NaN | NaN |
Filling in missing values
Instead of using the normal arithmetic operators, you can use a set of built-in Pandas functions that accept an argument to fill-in missing values:
- add/radd
- sub/rsub
- div/rdiv
- mul/rmul
- pow/rpow
Let's revisit series addition and use 0 as placeholder value:
ser_1.add(ser_2, fill_value=1)
a 2.0
b 2.0
c 6.0
d 6.0
e 6.0
f 6.0
g 6.0
dtype: float64
If an entry is not in the overlap of the two series, the sum operation will be performed against a placeholder value of 0. For example, for indexes a/b, both are 1+0, and for f/g it is 5+0. The same behavior applies to dataframes.
Now you know maths
The toughest thing about working with arithmetic operations using pandas data structures is understanding how it works when indexes are not the same. As long as you remember that it behaves like an outer join, everything will be clear and easy.
In the next article, we will talk about mapping and function application, our first advance-y Pandas topics!
Thanks for reading!
What to do next
- Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
- You can find the source code for this series in this repo.
- This article is based on Python for Data Analysis. These and other very helpful books can be found in the recommended reading list.
- Send me an email with questions, comments or suggestions (it's in the About Me page)