With pandas to get the names of a dataframe, there is the attribute columns (ref):
>>> DataFrame.columns
Examples of applications:
Read a cvs data file and create a dataframe with pandas
Let's consider the cvs data file train.csv (that can be downloaded on kaggle)
>>> import pandas as pd>>> data = pd.read_csv('train.csv')>>> data.head()Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \0 1 60 RL 65.0 8450 Pave NaN Reg1 2 20 RL 80.0 9600 Pave NaN Reg2 3 60 RL 68.0 11250 Pave NaN IR13 4 70 RL 60.0 9550 Pave NaN IR14 5 60 RL 84.0 14260 Pave NaN IR1LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \0 Lvl AllPub ... 0 NaN NaN NaN 01 Lvl AllPub ... 0 NaN NaN NaN 02 Lvl AllPub ... 0 NaN NaN NaN 03 Lvl AllPub ... 0 NaN NaN NaN 04 Lvl AllPub ... 0 NaN NaN NaN 0MoSold YrSold SaleType SaleCondition SalePrice0 2 2008 WD Normal 2085001 5 2007 WD Normal 1815002 9 2008 WD Normal 2235003 2 2006 WD Abnorml 1400004 12 2008 WD Normal 250000[5 rows x 81 columns]
Get dataframe columns names
Get the data frame column names:
>>> data.columns
example
>>> columns = data.columns>>> columnsIndex(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street','Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig','LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType','HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd','RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType','MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1','BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating','HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF','LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath','HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual','TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType','GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual','GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF','EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC','Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType','SaleCondition', 'SalePrice'],dtype='object')>>> type(columns)<class 'pandas.indexes.base.Index'>>>> columns[5]'Street'
Select one or several columns
Example of how to select one column
>>> data['SalePrice']0 2085001 1815002 2235003 1400004 2500005 1430006 3070007 2000008 1299009 11800010 12950011 34500012 14400013 27950014 15700015 13200016 14900017 9000018 15900019 13900020 32530021 13940022 23000023 12990024 15400025 25630026 13480027 30600028 20750029 68500...1430 1921401431 1437501432 645001433 1865001434 1600001435 1740001436 1205001437 3946171438 1497001439 1970001440 1910001441 1493001442 3100001443 1210001444 1796001445 1290001446 1579001447 2400001448 1120001449 920001450 1360001451 2870901452 1450001453 845001454 1850001455 1750001456 2100001457 2665001458 1421251459 147500Name: SalePrice, dtype: int64
Example of how to select two columns
>>> data[['SalePrice','BldgType']]SalePrice BldgType0 208500 1Fam1 181500 1Fam2 223500 1Fam3 140000 1Fam4 250000 1Fam5 143000 1Fam6 307000 1Fam7 200000 1Fam8 129900 1Fam9 118000 2fmCon10 129500 1Fam11 345000 1Fam12 144000 1Fam13 279500 1Fam14 157000 1Fam15 132000 1Fam16 149000 1Fam17 90000 Duplex18 159000 1Fam19 139000 1Fam20 325300 1Fam21 139400 1Fam22 230000 1Fam23 129900 TwnhsE24 154000 1Fam25 256300 1Fam26 134800 1Fam27 306000 1Fam28 207500 1Fam29 68500 1Fam... ... ...1430 192140 1Fam1431 143750 TwnhsE1432 64500 1Fam1433 186500 1Fam1434 160000 1Fam1435 174000 1Fam1436 120500 1Fam1437 394617 1Fam1438 149700 1Fam1439 197000 1Fam1440 191000 1Fam1441 149300 TwnhsE1442 310000 1Fam1443 121000 1Fam1444 179600 1Fam1445 129000 1Fam1446 157900 1Fam1447 240000 1Fam1448 112000 1Fam1449 92000 Twnhs1450 136000 Duplex1451 287090 1Fam1452 145000 TwnhsE1453 84500 1Fam1454 185000 1Fam1455 175000 1Fam1456 210000 1Fam1457 266500 1Fam1458 142125 1Fam1459 147500 1Fam[1460 rows x 2 columns]
References
| Links | Site |
|---|---|
| columns | pandas |
| How to get column names in Pandas dataframe | geeksforgeeks |
| House Prices: Advanced Regression Techniques | kaggle |
