Examples of how to slice (split) a dataframe by column value with pandas in python:
Create a dataframe with pandas
Let's first create a dataframe
import pandas as pd
import random
l1 = [random.randint(1,100) for i in range(15)]
l2 = [random.randint(1,100) for i in range(15)]
l3 = [random.randint(2018,2020) for i in range(15)]
data = {'Column A':l1,'Column B':l2,'Year':l3}
df = pd.DataFrame(data)
print(df)
returns
Column A Column B Year
0 63 9 2018
1 97 29 2018
2 1 92 2019
3 75 38 2020
4 19 50 2019
5 20 71 2019
6 59 60 2020
7 93 46 2019
8 6 17 2020
9 87 82 2018
10 36 12 2020
11 89 71 2018
12 87 69 2019
13 98 21 2018
14 82 67 2020
We want to slice this dataframe according to the column year.
Find unique values in a given column
To find the unique value in a given column:
df['Year'].unique()
returns here:
array([2018, 2019, 2020])
Select dataframe rows for a given column value
To extract dataframe rows for a given column value (for example 2018), a solution is to do:
df[ df['Year'] == 2018 ]
returns
Column A Column B Year
0 63 9 2018
1 97 29 2018
9 87 82 2018
11 89 71 2018
13 98 21 2018
Slice dataframe by column value
Now we can slice the original dataframe using a dictionary for example to store the results:
df_sliced_dict = {}
for year in df['Year'].unique():
df_sliced_dict[year] = df[ df['Year'] == year ]
then
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(df_sliced_dict)
returns
{ 2018: Column A Column B Year
0 63 9 2018
1 97 29 2018
9 87 82 2018
11 89 71 2018
13 98 21 2018,
2019: Column A Column B Year
2 1 92 2019
4 19 50 2019
5 20 71 2019
7 93 46 2019
12 87 69 2019,
2020: Column A Column B Year
3 75 38 2020
6 59 60 2020
8 6 17 2020
10 36 12 2020
14 82 67 2020}
To get the sliced dataframe for a given year from the dictionary just do:
df_sliced_dict[2018]
returns
Column A Column B Year
0 63 9 2018
1 97 29 2018
9 87 82 2018
11 89 71 2018
13 98 21 2018