Examples of how to select one or multiple rows in a pandas DataFrame in python:
Create a DataFrame
Lets consider the following dataset train.csv (that can be downloaded on kaggle). To read the file a solution is to use read_csv():
>>> import pandas as pd
>>> df = pd.read_csv('train.csv')
>>> df.shape
(1460, 81)
Get a dataset preview:
>>> df.head(10)
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \
0 1 60 RL 65.0 8450 Pave NaN Reg
1 2 20 RL 80.0 9600 Pave NaN Reg
2 3 60 RL 68.0 11250 Pave NaN IR1
3 4 70 RL 60.0 9550 Pave NaN IR1
4 5 60 RL 84.0 14260 Pave NaN IR1
5 6 50 RL 85.0 14115 Pave NaN IR1
6 7 20 RL 75.0 10084 Pave NaN Reg
7 8 60 RL NaN 10382 Pave NaN IR1
8 9 50 RM 51.0 6120 Pave NaN Reg
9 10 190 RL 50.0 7420 Pave NaN Reg
LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \
0 Lvl AllPub ... 0 NaN NaN NaN 0
1 Lvl AllPub ... 0 NaN NaN NaN 0
2 Lvl AllPub ... 0 NaN NaN NaN 0
3 Lvl AllPub ... 0 NaN NaN NaN 0
4 Lvl AllPub ... 0 NaN NaN NaN 0
5 Lvl AllPub ... 0 NaN MnPrv Shed 700
6 Lvl AllPub ... 0 NaN NaN NaN 0
7 Lvl AllPub ... 0 NaN NaN Shed 350
8 Lvl AllPub ... 0 NaN NaN NaN 0
9 Lvl AllPub ... 0 NaN NaN NaN 0
MoSold YrSold SaleType SaleCondition SalePrice
0 2 2008 WD Normal 208500
1 5 2007 WD Normal 181500
2 9 2008 WD Normal 223500
3 2 2006 WD Abnorml 140000
4 12 2008 WD Normal 250000
5 10 2009 WD Normal 143000
6 8 2007 WD Normal 307000
7 11 2009 WD Normal 200000
8 4 2008 WD Abnorml 129900
9 1 2008 WD Normal 118000
[10 rows x 81 columns]
Select a given row
>>> df.iloc[4,:]
Id 5
MSSubClass 60
MSZoning RL
LotFrontage 84
LotArea 14260
Street Pave
Alley NaN
LotShape IR1
LandContour Lvl
Utilities AllPub
LotConfig FR2
LandSlope Gtl
Neighborhood NoRidge
Condition1 Norm
Condition2 Norm
BldgType 1Fam
HouseStyle 2Story
OverallQual 8
OverallCond 5
YearBuilt 2000
YearRemodAdd 2000
RoofStyle Gable
RoofMatl CompShg
Exterior1st VinylSd
Exterior2nd VinylSd
MasVnrType BrkFace
MasVnrArea 350
ExterQual Gd
ExterCond TA
Foundation PConc
...
BedroomAbvGr 4
KitchenAbvGr 1
KitchenQual Gd
TotRmsAbvGrd 9
Functional Typ
Fireplaces 1
FireplaceQu TA
GarageType Attchd
GarageYrBlt 2000
GarageFinish RFn
GarageCars 3
GarageArea 836
GarageQual TA
GarageCond TA
PavedDrive Y
WoodDeckSF 192
OpenPorchSF 84
EnclosedPorch 0
3SsnPorch 0
ScreenPorch 0
PoolArea 0
PoolQC NaN
Fence NaN
MiscFeature NaN
MiscVal 0
MoSold 12
YrSold 2008
SaleType WD
SaleCondition Normal
SalePrice 250000
Name: 4, dtype: object
Select a list of rows
>>> df.iloc[[3,5,7],:]
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \
3 4 70 RL 60.0 9550 Pave NaN IR1
5 6 50 RL 85.0 14115 Pave NaN IR1
7 8 60 RL NaN 10382 Pave NaN IR1
LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \
3 Lvl AllPub ... 0 NaN NaN NaN 0
5 Lvl AllPub ... 0 NaN MnPrv Shed 700
7 Lvl AllPub ... 0 NaN NaN Shed 350
MoSold YrSold SaleType SaleCondition SalePrice
3 2 2006 WD Abnorml 140000
5 10 2009 WD Normal 143000
7 11 2009 WD Normal 200000
[3 rows x 81 columns]
Select multiple consecutive rows
>>> df.iloc[2:5,:]
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \
2 3 60 RL 68.0 11250 Pave NaN IR1
3 4 70 RL 60.0 9550 Pave NaN IR1
4 5 60 RL 84.0 14260 Pave NaN IR1
LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \
2 Lvl AllPub ... 0 NaN NaN NaN 0
3 Lvl AllPub ... 0 NaN NaN NaN 0
4 Lvl AllPub ... 0 NaN NaN NaN 0
MoSold YrSold SaleType SaleCondition SalePrice
2 9 2008 WD Normal 223500
3 2 2006 WD Abnorml 140000
4 12 2008 WD Normal 250000
[3 rows x 81 columns]
References
Links | Site |
---|---|
Selecting Subsets of Data in Pandas: Part 1 | medium.com |
Select Rows & Columns by Name or Index in DataFrame using loc & iloc Python Pandas | thispointer.com |
pandas.DataFrame.loc | pandas doc |
pandas.DataFrame.iloc | pandas doc |