One of the most common tasks when working with data in Python is to iterate over the first n rows of a Pandas DataFrame. There are several ways to accomplish this goal, depending on what you need to do.
Synthetic data
To start, let's generate a DataFrame using synthetic data:
import pandas as pdimport numpy as npdata = np.arange(1,31)data = data.reshape(10,3)df = pd.DataFrame(data, columns=['A','B','C'])print(df)
The code displayed above will generate:
A B C0 1 2 31 4 5 62 7 8 93 10 11 124 13 14 155 16 17 186 19 20 217 22 23 248 25 26 279 28 29 30
Select first n rows
Using head()
A first solution is to use the pandas head():
n = 4df.head(n)
will display here the first 4 rows:
A B C0 1 2 31 4 5 62 7 8 93 10 11 12
Using iloc()
Another solution is to use iloc()
df.iloc[:n,:]
The code displayed above will generate:
A B C0 1 2 31 4 5 62 7 8 93 10 11 12
Loop through the initial n rows using iterrows()
Utilizing iterrows(), we can cycle through the initial n rows of data easily and efficiently:
for index,row in df.head(4).iterrows():print(index)print(row)print()
The code displayed above will then generate:
0A 1B 2C 3Name: 0, dtype: int641A 4B 5C 6Name: 1, dtype: int642A 7B 8C 9Name: 2, dtype: int643A 10B 11C 12Name: 3, dtype: int64
Same output with
for index,row in df.iloc[:4,:].iterrows():print(index)print(row)print()
Loop through the initial n rows using itertuples()
Another way to iterate over the first n rows is to use the itertuples() method. This method returns an iterator that contains tuples of each row, which you can then loop through as needed. For example, if you need to iterate over the first three rows of a DataFrame named ‘df’, you would use this code:
for row in df.iloc[:3,:].itertuples(index=False, name=None):print(row)
returns
(1, 2, 3)(4, 5, 6)(7, 8, 9)(10, 11, 12)
References
| Links | Site |
|---|---|
| head() | pandas.pydata.org |
| iloc() | pandas.pydata.org |
| iterrows() | pandas.pydata.org |
| itertuples() | pandas.pydata.org |
