Examples of how to select one or multiple rows in a pandas DataFrame in python:
Create a DataFrame
Lets consider the following dataset train.csv (that can be downloaded on kaggle). To read the file a solution is to use read_csv():
>>> import pandas as pd>>> df = pd.read_csv('train.csv')>>> df.shape(1460, 81)
Get a dataset preview:
>>> df.head(10)Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \0 1 60 RL 65.0 8450 Pave NaN Reg1 2 20 RL 80.0 9600 Pave NaN Reg2 3 60 RL 68.0 11250 Pave NaN IR13 4 70 RL 60.0 9550 Pave NaN IR14 5 60 RL 84.0 14260 Pave NaN IR15 6 50 RL 85.0 14115 Pave NaN IR16 7 20 RL 75.0 10084 Pave NaN Reg7 8 60 RL NaN 10382 Pave NaN IR18 9 50 RM 51.0 6120 Pave NaN Reg9 10 190 RL 50.0 7420 Pave NaN RegLandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \0 Lvl AllPub ... 0 NaN NaN NaN 01 Lvl AllPub ... 0 NaN NaN NaN 02 Lvl AllPub ... 0 NaN NaN NaN 03 Lvl AllPub ... 0 NaN NaN NaN 04 Lvl AllPub ... 0 NaN NaN NaN 05 Lvl AllPub ... 0 NaN MnPrv Shed 7006 Lvl AllPub ... 0 NaN NaN NaN 07 Lvl AllPub ... 0 NaN NaN Shed 3508 Lvl AllPub ... 0 NaN NaN NaN 09 Lvl AllPub ... 0 NaN NaN NaN 0MoSold YrSold SaleType SaleCondition SalePrice0 2 2008 WD Normal 2085001 5 2007 WD Normal 1815002 9 2008 WD Normal 2235003 2 2006 WD Abnorml 1400004 12 2008 WD Normal 2500005 10 2009 WD Normal 1430006 8 2007 WD Normal 3070007 11 2009 WD Normal 2000008 4 2008 WD Abnorml 1299009 1 2008 WD Normal 118000[10 rows x 81 columns]
Select a given row
>>> df.iloc[4,:]Id 5MSSubClass 60MSZoning RLLotFrontage 84LotArea 14260Street PaveAlley NaNLotShape IR1LandContour LvlUtilities AllPubLotConfig FR2LandSlope GtlNeighborhood NoRidgeCondition1 NormCondition2 NormBldgType 1FamHouseStyle 2StoryOverallQual 8OverallCond 5YearBuilt 2000YearRemodAdd 2000RoofStyle GableRoofMatl CompShgExterior1st VinylSdExterior2nd VinylSdMasVnrType BrkFaceMasVnrArea 350ExterQual GdExterCond TAFoundation PConc...BedroomAbvGr 4KitchenAbvGr 1KitchenQual GdTotRmsAbvGrd 9Functional TypFireplaces 1FireplaceQu TAGarageType AttchdGarageYrBlt 2000GarageFinish RFnGarageCars 3GarageArea 836GarageQual TAGarageCond TAPavedDrive YWoodDeckSF 192OpenPorchSF 84EnclosedPorch 03SsnPorch 0ScreenPorch 0PoolArea 0PoolQC NaNFence NaNMiscFeature NaNMiscVal 0MoSold 12YrSold 2008SaleType WDSaleCondition NormalSalePrice 250000Name: 4, dtype: object
Select a list of rows
>>> df.iloc[[3,5,7],:]Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \3 4 70 RL 60.0 9550 Pave NaN IR15 6 50 RL 85.0 14115 Pave NaN IR17 8 60 RL NaN 10382 Pave NaN IR1LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \3 Lvl AllPub ... 0 NaN NaN NaN 05 Lvl AllPub ... 0 NaN MnPrv Shed 7007 Lvl AllPub ... 0 NaN NaN Shed 350MoSold YrSold SaleType SaleCondition SalePrice3 2 2006 WD Abnorml 1400005 10 2009 WD Normal 1430007 11 2009 WD Normal 200000[3 rows x 81 columns]
Select multiple consecutive rows
>>> df.iloc[2:5,:]Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \2 3 60 RL 68.0 11250 Pave NaN IR13 4 70 RL 60.0 9550 Pave NaN IR14 5 60 RL 84.0 14260 Pave NaN IR1LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \2 Lvl AllPub ... 0 NaN NaN NaN 03 Lvl AllPub ... 0 NaN NaN NaN 04 Lvl AllPub ... 0 NaN NaN NaN 0MoSold YrSold SaleType SaleCondition SalePrice2 9 2008 WD Normal 2235003 2 2006 WD Abnorml 1400004 12 2008 WD Normal 250000[3 rows x 81 columns]
References
| Links | Site |
|---|---|
| Selecting Subsets of Data in Pandas: Part 1 | medium.com |
| Select Rows & Columns by Name or Index in DataFrame using loc & iloc Python Pandas | thispointer.com |
| pandas.DataFrame.loc | pandas doc |
| pandas.DataFrame.iloc | pandas doc |
