How to read a csv file using pandas in python ?

Published: October 20, 2019 Protection Status

Example of how to read a csv file using pandas in python:

Read a csv file

Let's consider the csv file train.csv (that can be downloaded on kaggle). To read the csv file a solution is to use the pandas function read_csv():

>>> import pandas as pd
>>> data = pd.read_csv('train.csv')

Get array dimensions:

>>> data.shape
(1460, 81)

Get array column names:

>>> data.columns
Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
       'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',
       'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',
       'SaleCondition', 'SalePrice'],

Select csv file rows

Option 1:

A first solution is to use the option usecols to select some columns:

>>> data = pd.read_csv('train.csv',usecols=[1,2,3])
>>> data.shape
(1460, 3)
>>> data.columns
Index(['MSSubClass', 'MSZoning', 'LotFrontage'], dtype='object')

Option 2:

Another solution is to select the columns after reading the csv file:

>>> data = pd.read_csv('train.csv')
>>> data_sample = data[['SalePrice','BldgType']]
>>> data_sample.shape
(1460, 2)

Skip csv file rows

It is possible to skip some rows using the option skiprows. Examples:

Skip the first rows

>>> data = pd.read_csv('train.csv',skiprows=1)
>>> data.shape
(1459, 81)

Skip five rows

>>> data = pd.read_csv('train.csv',skiprows=5)
>>> data.shape
(1455, 81)

It is possible to remove rows from the footer using the option "skipfooter".

>>> data = pd.read_csv('train.csv',skipfooter=10, engine='python')
>>> data.shape
(1450, 81)