How to slice (split) a dataframe by column value with pandas in python ?

Published: April 07, 2021

Tags: Python; Pandas; DataFrame;

DMCA.com Protection Status

Examples of how to slice (split) a dataframe by column value with pandas in python:

Create a dataframe with pandas

Let's first create a dataframe

import pandas as pd
import random

l1 = [random.randint(1,100) for i in range(15)]
l2 = [random.randint(1,100) for i in range(15)]
l3 = [random.randint(2018,2020) for i in range(15)]

data = {'Column A':l1,'Column B':l2,'Year':l3}

df = pd.DataFrame(data)

print(df)

returns

    Column A  Column B  Year
0         63         9  2018
1         97        29  2018
2          1        92  2019
3         75        38  2020
4         19        50  2019
5         20        71  2019
6         59        60  2020
7         93        46  2019
8          6        17  2020
9         87        82  2018
10        36        12  2020
11        89        71  2018
12        87        69  2019
13        98        21  2018
14        82        67  2020

We want to slice this dataframe according to the column year.

Find unique values in a given column

To find the unique value in a given column:

df['Year'].unique()

returns here:

array([2018, 2019, 2020])

Select dataframe rows for a given column value

To extract dataframe rows for a given column value (for example 2018), a solution is to do:

df[  df['Year'] == 2018 ]

returns

    Column A  Column B  Year
0         63         9  2018
1         97        29  2018
9         87        82  2018
11        89        71  2018
13        98        21  2018

Slice dataframe by column value

Now we can slice the original dataframe using a dictionary for example to store the results:

df_sliced_dict = {}

for year in df['Year'].unique():

    df_sliced_dict[year] = df[  df['Year'] == year ]

then

import pprint

pp = pprint.PrettyPrinter(indent=4)

pp.pprint(df_sliced_dict)

returns

{   2018:     Column A  Column B  Year
0         63         9  2018
1         97        29  2018
9         87        82  2018
11        89        71  2018
13        98        21  2018,
    2019:     Column A  Column B  Year
2          1        92  2019
4         19        50  2019
5         20        71  2019
7         93        46  2019
12        87        69  2019,
    2020:     Column A  Column B  Year
3         75        38  2020
6         59        60  2020
8          6        17  2020
10        36        12  2020
14        82        67  2020}

To get the sliced dataframe for a given year from the dictionary just do:

df_sliced_dict[2018]

returns

    Column A  Column B  Year
0         63         9  2018
1         97        29  2018
9         87        82  2018
11        89        71  2018
13        98        21  2018

References