Example of how to read a csv file using pandas in python:
Table of contents
Read a csv file
Let's consider the csv file train.csv (that can be downloaded on kaggle). To read the csv file a solution is to use the pandas function read_csv():
>>> import pandas as pd>>> data = pd.read_csv('train.csv')
Get array dimensions:
>>> data.shape(1460, 81)
Get array column names:
>>> data.columnsIndex(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street','Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig','LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType','HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd','RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType','MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual','BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1','BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating','HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF','LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath','HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual','TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType','GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual','GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF','EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC','Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType','SaleCondition', 'SalePrice'],dtype='object')
Select csv file rows
Option 1:
A first solution is to use the option usecols to select some columns:
>>> data = pd.read_csv('train.csv',usecols=[1,2,3])>>> data.shape(1460, 3)>>> data.columnsIndex(['MSSubClass', 'MSZoning', 'LotFrontage'], dtype='object')
Option 2:
Another solution is to select the columns after reading the csv file:
>>> data = pd.read_csv('train.csv')>>> data_sample = data[['SalePrice','BldgType']]>>> data_sample.shape(1460, 2)
Skip csv file rows
It is possible to skip some rows using the option skiprows. Examples:
Skip the first rows
>>> data = pd.read_csv('train.csv',skiprows=1)>>> data.shape(1459, 81)
Skip five rows
>>> data = pd.read_csv('train.csv',skiprows=5)>>> data.shape(1455, 81)
Remove footer
It is possible to remove rows from the footer using the option "skipfooter".
>>> data = pd.read_csv('train.csv',skipfooter=10, engine='python')>>> data.shape(1450, 81)
References
| Links | Site |
|---|---|
| read_csv | pandas.pydata.org |
| pandas.DataFrame.head | pandas.pydata.org |
| How to select several rows when reading a csv file using pandas? | stackoverflow |
| Pandas : skip rows while reading csv file to a Dataframe using read_csv() in Python | thispointer.com |
| pandas restricting csv read to certain rows | python-forum |
| Select specific CSV columns (Filtering) - Python/pandas | stackoverflow |
