Example of how to read a csv file using pandas in python:
Table of contents
Read a csv file
Let's consider the csv file train.csv (that can be downloaded on kaggle). To read the csv file a solution is to use the pandas function read_csv():
>>> import pandas as pd
>>> data = pd.read_csv('train.csv')
Get array dimensions:
>>> data.shape
(1460, 81)
Get array column names:
>>> data.columns
Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',
'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',
'SaleCondition', 'SalePrice'],
dtype='object')
Select csv file rows
Option 1:
A first solution is to use the option usecols to select some columns:
>>> data = pd.read_csv('train.csv',usecols=[1,2,3])
>>> data.shape
(1460, 3)
>>> data.columns
Index(['MSSubClass', 'MSZoning', 'LotFrontage'], dtype='object')
Option 2:
Another solution is to select the columns after reading the csv file:
>>> data = pd.read_csv('train.csv')
>>> data_sample = data[['SalePrice','BldgType']]
>>> data_sample.shape
(1460, 2)
Skip csv file rows
It is possible to skip some rows using the option skiprows. Examples:
Skip the first rows
>>> data = pd.read_csv('train.csv',skiprows=1)
>>> data.shape
(1459, 81)
Skip five rows
>>> data = pd.read_csv('train.csv',skiprows=5)
>>> data.shape
(1455, 81)
Remove footer
It is possible to remove rows from the footer using the option "skipfooter".
>>> data = pd.read_csv('train.csv',skipfooter=10, engine='python')
>>> data.shape
(1450, 81)
References
Links | Site |
---|---|
read_csv | pandas.pydata.org |
pandas.DataFrame.head | pandas.pydata.org |
How to select several rows when reading a csv file using pandas? | stackoverflow |
Pandas : skip rows while reading csv file to a Dataframe using read_csv() in Python | thispointer.com |
pandas restricting csv read to certain rows | python-forum |
Select specific CSV columns (Filtering) - Python/pandas | stackoverflow |