Examples of how to slice (split) a dataframe by column value with pandas in python:
Create a dataframe with pandas
Let's first create a dataframe
import pandas as pdimport randoml1 = [random.randint(1,100) for i in range(15)]l2 = [random.randint(1,100) for i in range(15)]l3 = [random.randint(2018,2020) for i in range(15)]data = {'Column A':l1,'Column B':l2,'Year':l3}df = pd.DataFrame(data)print(df)
returns
Column A Column B Year0 63 9 20181 97 29 20182 1 92 20193 75 38 20204 19 50 20195 20 71 20196 59 60 20207 93 46 20198 6 17 20209 87 82 201810 36 12 202011 89 71 201812 87 69 201913 98 21 201814 82 67 2020
We want to slice this dataframe according to the column year.
Find unique values in a given column
To find the unique value in a given column:
df['Year'].unique()
returns here:
array([2018, 2019, 2020])
Select dataframe rows for a given column value
To extract dataframe rows for a given column value (for example 2018), a solution is to do:
df[ df['Year'] == 2018 ]
returns
Column A Column B Year0 63 9 20181 97 29 20189 87 82 201811 89 71 201813 98 21 2018
Slice dataframe by column value
Now we can slice the original dataframe using a dictionary for example to store the results:
df_sliced_dict = {}for year in df['Year'].unique():df_sliced_dict[year] = df[ df['Year'] == year ]
then
import pprintpp = pprint.PrettyPrinter(indent=4)pp.pprint(df_sliced_dict)
returns
{ 2018: Column A Column B Year0 63 9 20181 97 29 20189 87 82 201811 89 71 201813 98 21 2018,2019: Column A Column B Year2 1 92 20194 19 50 20195 20 71 20197 93 46 201912 87 69 2019,2020: Column A Column B Year3 75 38 20206 59 60 20208 6 17 202010 36 12 202014 82 67 2020}
To get the sliced dataframe for a given year from the dictionary just do:
df_sliced_dict[2018]
returns
Column A Column B Year0 63 9 20181 97 29 20189 87 82 201811 89 71 201813 98 21 2018
