Introduction
Pandas is a popular library in Python used for data analysis and manipulation. One of its many useful features is the ability to select rows based on specific criteria, such as containing certain text. In this guide, we will go over how to use Pandas to filter and select rows that contain specific text.
Create a pandas dataframe
We will begin by creating a dataframe.
import pandas as pd
data = dataset = {'nom_commune_postal': {30658: 'GEORFANS',
11190: 'BU',
15782: 'LA MURE',
17560: 'ST JEAN ST MAURICE SUR LOIRE',
11902: 'GROSSA',
25819: 'VILLERS ST SEPULCRE',
26911: 'MERCK ST LIEVIN',
6271: 'TOUVRE',
36343: 'CUGAND',
32581: 'THUSY'},
'code_postal': {30658: 70110,
11190: 28410,
15782: 38350,
17560: 42155,
11902: 20100,
25819: 60134,
26911: 62560,
6271: 16600,
36343: 85610,
32581: 74150},
'latitude': {30658: 47.5338346388,
11190: 48.8022272101,
15782: 44.9099172773,
17560: 45.9576591068,
11902: 41.6027786723,
25819: 49.3689508807,
26911: 50.6363344754,
6271: 45.658185613,
36343: 47.0602388146,
32581: 45.949872762},
'longitude': {30658: 6.51269421338,
11190: 1.49327920018,
15782: 5.78817442937,
17560: 3.98572458074,
11902: 8.8649278809,
25819: 2.20925215873,
26911: 2.10761618871,
6271: 0.267862440841,
36343: -1.25289811103,
32581: 5.96377064209}}
df = pd.DataFrame(data)
returns
nom_commune_postal code_postal latitude. longitude
30658 GEORFANS 70110 47.533835 6.512694
11190 BU 28410 48.802227 1.493279
15782 LA MURE 38350 44.909917 5.788174
17560 ST JEAN ST MAURICE SUR LOIRE 42155 45.957659 3.985725
11902 GROSSA 20100 41.602779 8.864928
25819 VILLERS ST SEPULCRE 60134 49.368951 2.209252
26911 MERCK ST LIEVIN 62560 50.636334 2.107616
6271 TOUVRE 16600 45.658186 0.267862
36343 CUGAND 85610 47.060239 -1.252898
32581 THUSY 74150 45.949873 5.963771
Our objective is to identify rows that include the text " ST " in the column named "nom_commune_postal".
Selecting rows that a contain specific word or text
Using contain() method
To select rows that a contain specific word or text with Pandas a solution is to use contain() method, which returns the rows that contain a specific text in a column.
For example, if we have a dataframe called "df" and we want to search for the text " ST " in the column "nom_commune_postal":
df[df['nom_commune_postal'].str.contains(" ST ")]
This will return all the rows that contain the text " ST " in the column "nom_commune_postal:
nom_commune_postal code_postal latitude longitude
17560 ST JEAN ST MAURICE SUR LOIRE 42155 45.957659 3.985725
25819 VILLERS ST SEPULCRE 60134 49.368951 2.209252
26911 MERCK ST LIEVIN 62560 50.636334 2.107616
Case sensitive
Please be aware that the method is case sensitive. If you're trying to match a lowercase string, make sure that the text in the dataframe is also lowercase. For example
df[df['nom_commune_postal'].str.contains(" st ")]
This will return here an empty dataframe:
Empty DataFrame
Columns: [nom_commune_postal, code_postal, latitude, longitude]
Index: []
Using startswith() method
If you specifically want to locate rows that commence with a particular word or text, an alternative approach would be to utilize the startswith() method. This allows for a more precise and targeted search, ensuring that only the desired results are obtained.
df[df['nom_commune_postal'].str.startswith("ST ")]
This will return here
nom_commune_postal code_postal latitude longitude
17560 ST JEAN ST MAURICE SUR LOIRE 42155 45.957659 3.985725
Using endswith() method
To identify rows that conclude with a particular word or text, you can utilize the endswith method:
df[df['nom_commune_postal'].str.endswith("ST ")]
returns here
Empty DataFrame
Columns: [nom_commune_postal, code_postal, latitude, longitude]
Index: []
References
Links | Site |
---|---|
pandas.Series.str.contains | pandas.pydata.org |
String methods | pandas.pydata.org |
pandas.Series.str.startswith | pandas.pydata.org |
pandas.pydata.org | pandas.pydata.org |