One of the most common tasks when working with data in Python is to iterate over the first n rows of a Pandas DataFrame. There are several ways to accomplish this goal, depending on what you need to do.
Synthetic data
To start, let's generate a DataFrame using synthetic data:
import pandas as pd
import numpy as np
data = np.arange(1,31)
data = data.reshape(10,3)
df = pd.DataFrame(data, columns=['A','B','C'])
print(df)
The code displayed above will generate:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
4 13 14 15
5 16 17 18
6 19 20 21
7 22 23 24
8 25 26 27
9 28 29 30
Select first n rows
Using head()
A first solution is to use the pandas head():
n = 4
df.head(n)
will display here the first 4 rows:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
Using iloc()
Another solution is to use iloc()
df.iloc[:n,:]
The code displayed above will generate:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
Loop through the initial n rows using iterrows()
Utilizing iterrows(), we can cycle through the initial n rows of data easily and efficiently:
for index,row in df.head(4).iterrows():
print(index)
print(row)
print()
The code displayed above will then generate:
0
A 1
B 2
C 3
Name: 0, dtype: int64
1
A 4
B 5
C 6
Name: 1, dtype: int64
2
A 7
B 8
C 9
Name: 2, dtype: int64
3
A 10
B 11
C 12
Name: 3, dtype: int64
Same output with
for index,row in df.iloc[:4,:].iterrows():
print(index)
print(row)
print()
Loop through the initial n rows using itertuples()
Another way to iterate over the first n rows is to use the itertuples() method. This method returns an iterator that contains tuples of each row, which you can then loop through as needed. For example, if you need to iterate over the first three rows of a DataFrame named ‘df’, you would use this code:
for row in df.iloc[:3,:].itertuples(index=False, name=None):
print(row)
returns
(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10, 11, 12)
References
Links | Site |
---|---|
head() | pandas.pydata.org |
iloc() | pandas.pydata.org |
iterrows() | pandas.pydata.org |
itertuples() | pandas.pydata.org |