With pandas to get the names of a dataframe, there is the attribute columns (ref):
>>> DataFrame.columns
Examples of applications:
Read a cvs data file and create a dataframe with pandas
Let's consider the cvs data file train.csv (that can be downloaded on kaggle)
>>> import pandas as pd
>>> data = pd.read_csv('train.csv')
>>> data.head()
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \
0 1 60 RL 65.0 8450 Pave NaN Reg
1 2 20 RL 80.0 9600 Pave NaN Reg
2 3 60 RL 68.0 11250 Pave NaN IR1
3 4 70 RL 60.0 9550 Pave NaN IR1
4 5 60 RL 84.0 14260 Pave NaN IR1
LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \
0 Lvl AllPub ... 0 NaN NaN NaN 0
1 Lvl AllPub ... 0 NaN NaN NaN 0
2 Lvl AllPub ... 0 NaN NaN NaN 0
3 Lvl AllPub ... 0 NaN NaN NaN 0
4 Lvl AllPub ... 0 NaN NaN NaN 0
MoSold YrSold SaleType SaleCondition SalePrice
0 2 2008 WD Normal 208500
1 5 2007 WD Normal 181500
2 9 2008 WD Normal 223500
3 2 2006 WD Abnorml 140000
4 12 2008 WD Normal 250000
[5 rows x 81 columns]
Get dataframe columns names
Get the data frame column names:
>>> data.columns
example
>>> columns = data.columns
>>> columns
Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',
'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',
'SaleCondition', 'SalePrice'],
dtype='object')
>>> type(columns)
<class 'pandas.indexes.base.Index'>
>>> columns[5]
'Street'
Select one or several columns
Example of how to select one column
>>> data['SalePrice']
0 208500
1 181500
2 223500
3 140000
4 250000
5 143000
6 307000
7 200000
8 129900
9 118000
10 129500
11 345000
12 144000
13 279500
14 157000
15 132000
16 149000
17 90000
18 159000
19 139000
20 325300
21 139400
22 230000
23 129900
24 154000
25 256300
26 134800
27 306000
28 207500
29 68500
...
1430 192140
1431 143750
1432 64500
1433 186500
1434 160000
1435 174000
1436 120500
1437 394617
1438 149700
1439 197000
1440 191000
1441 149300
1442 310000
1443 121000
1444 179600
1445 129000
1446 157900
1447 240000
1448 112000
1449 92000
1450 136000
1451 287090
1452 145000
1453 84500
1454 185000
1455 175000
1456 210000
1457 266500
1458 142125
1459 147500
Name: SalePrice, dtype: int64
Example of how to select two columns
>>> data[['SalePrice','BldgType']]
SalePrice BldgType
0 208500 1Fam
1 181500 1Fam
2 223500 1Fam
3 140000 1Fam
4 250000 1Fam
5 143000 1Fam
6 307000 1Fam
7 200000 1Fam
8 129900 1Fam
9 118000 2fmCon
10 129500 1Fam
11 345000 1Fam
12 144000 1Fam
13 279500 1Fam
14 157000 1Fam
15 132000 1Fam
16 149000 1Fam
17 90000 Duplex
18 159000 1Fam
19 139000 1Fam
20 325300 1Fam
21 139400 1Fam
22 230000 1Fam
23 129900 TwnhsE
24 154000 1Fam
25 256300 1Fam
26 134800 1Fam
27 306000 1Fam
28 207500 1Fam
29 68500 1Fam
... ... ...
1430 192140 1Fam
1431 143750 TwnhsE
1432 64500 1Fam
1433 186500 1Fam
1434 160000 1Fam
1435 174000 1Fam
1436 120500 1Fam
1437 394617 1Fam
1438 149700 1Fam
1439 197000 1Fam
1440 191000 1Fam
1441 149300 TwnhsE
1442 310000 1Fam
1443 121000 1Fam
1444 179600 1Fam
1445 129000 1Fam
1446 157900 1Fam
1447 240000 1Fam
1448 112000 1Fam
1449 92000 Twnhs
1450 136000 Duplex
1451 287090 1Fam
1452 145000 TwnhsE
1453 84500 1Fam
1454 185000 1Fam
1455 175000 1Fam
1456 210000 1Fam
1457 266500 1Fam
1458 142125 1Fam
1459 147500 1Fam
[1460 rows x 2 columns]
References
Links | Site |
---|---|
columns | pandas |
How to get column names in Pandas dataframe | geeksforgeeks |
House Prices: Advanced Regression Techniques | kaggle |