Introduction
Pandas is a popular library in Python used for data analysis and manipulation. One of its many useful features is the ability to select rows based on specific criteria, such as containing certain text. In this guide, we will go over how to use Pandas to filter and select rows that contain specific text.
Create a pandas dataframe
We will begin by creating a dataframe.
import pandas as pddata = dataset = {'nom_commune_postal': {30658: 'GEORFANS',11190: 'BU',15782: 'LA MURE',17560: 'ST JEAN ST MAURICE SUR LOIRE',11902: 'GROSSA',25819: 'VILLERS ST SEPULCRE',26911: 'MERCK ST LIEVIN',6271: 'TOUVRE',36343: 'CUGAND',32581: 'THUSY'},'code_postal': {30658: 70110,11190: 28410,15782: 38350,17560: 42155,11902: 20100,25819: 60134,26911: 62560,6271: 16600,36343: 85610,32581: 74150},'latitude': {30658: 47.5338346388,11190: 48.8022272101,15782: 44.9099172773,17560: 45.9576591068,11902: 41.6027786723,25819: 49.3689508807,26911: 50.6363344754,6271: 45.658185613,36343: 47.0602388146,32581: 45.949872762},'longitude': {30658: 6.51269421338,11190: 1.49327920018,15782: 5.78817442937,17560: 3.98572458074,11902: 8.8649278809,25819: 2.20925215873,26911: 2.10761618871,6271: 0.267862440841,36343: -1.25289811103,32581: 5.96377064209}}df = pd.DataFrame(data)
returns
nom_commune_postal code_postal latitude. longitude30658 GEORFANS 70110 47.533835 6.51269411190 BU 28410 48.802227 1.49327915782 LA MURE 38350 44.909917 5.78817417560 ST JEAN ST MAURICE SUR LOIRE 42155 45.957659 3.98572511902 GROSSA 20100 41.602779 8.86492825819 VILLERS ST SEPULCRE 60134 49.368951 2.20925226911 MERCK ST LIEVIN 62560 50.636334 2.1076166271 TOUVRE 16600 45.658186 0.26786236343 CUGAND 85610 47.060239 -1.25289832581 THUSY 74150 45.949873 5.963771
Our objective is to identify rows that include the text " ST " in the column named "nom_commune_postal".
Selecting rows that a contain specific word or text
Using contain() method
To select rows that a contain specific word or text with Pandas a solution is to use contain() method, which returns the rows that contain a specific text in a column.
For example, if we have a dataframe called "df" and we want to search for the text " ST " in the column "nom_commune_postal":
df[df['nom_commune_postal'].str.contains(" ST ")]
This will return all the rows that contain the text " ST " in the column "nom_commune_postal:
nom_commune_postal code_postal latitude longitude17560 ST JEAN ST MAURICE SUR LOIRE 42155 45.957659 3.98572525819 VILLERS ST SEPULCRE 60134 49.368951 2.20925226911 MERCK ST LIEVIN 62560 50.636334 2.107616
Case sensitive
Please be aware that the method is case sensitive. If you're trying to match a lowercase string, make sure that the text in the dataframe is also lowercase. For example
df[df['nom_commune_postal'].str.contains(" st ")]
This will return here an empty dataframe:
Empty DataFrameColumns: [nom_commune_postal, code_postal, latitude, longitude]Index: []
Using startswith() method
If you specifically want to locate rows that commence with a particular word or text, an alternative approach would be to utilize the startswith() method. This allows for a more precise and targeted search, ensuring that only the desired results are obtained.
df[df['nom_commune_postal'].str.startswith("ST ")]
This will return here
nom_commune_postal code_postal latitude longitude17560 ST JEAN ST MAURICE SUR LOIRE 42155 45.957659 3.985725
Using endswith() method
To identify rows that conclude with a particular word or text, you can utilize the endswith method:
df[df['nom_commune_postal'].str.endswith("ST ")]
returns here
Empty DataFrameColumns: [nom_commune_postal, code_postal, latitude, longitude]Index: []
References
| Links | Site |
|---|---|
| pandas.Series.str.contains | pandas.pydata.org |
| String methods | pandas.pydata.org |
| pandas.Series.str.startswith | pandas.pydata.org |
| pandas.pydata.org | pandas.pydata.org |
