How to select rows that a contain specific word or text with Pandas ?


Introduction

Pandas is a popular library in Python used for data analysis and manipulation. One of its many useful features is the ability to select rows based on specific criteria, such as containing certain text. In this guide, we will go over how to use Pandas to filter and select rows that contain specific text.

Create a pandas dataframe

We will begin by creating a dataframe.

import pandas as pd

data = dataset = {'nom_commune_postal': {30658: 'GEORFANS',
  11190: 'BU',
  15782: 'LA MURE',
  17560: 'ST JEAN ST MAURICE SUR LOIRE',
  11902: 'GROSSA',
  25819: 'VILLERS ST SEPULCRE',
  26911: 'MERCK ST LIEVIN',
  6271: 'TOUVRE',
  36343: 'CUGAND',
  32581: 'THUSY'},
 'code_postal': {30658: 70110,
  11190: 28410,
  15782: 38350,
  17560: 42155,
  11902: 20100,
  25819: 60134,
  26911: 62560,
  6271: 16600,
  36343: 85610,
  32581: 74150},
 'latitude': {30658: 47.5338346388,
  11190: 48.8022272101,
  15782: 44.9099172773,
  17560: 45.9576591068,
  11902: 41.6027786723,
  25819: 49.3689508807,
  26911: 50.6363344754,
  6271: 45.658185613,
  36343: 47.0602388146,
  32581: 45.949872762},
 'longitude': {30658: 6.51269421338,
  11190: 1.49327920018,
  15782: 5.78817442937,
  17560: 3.98572458074,
  11902: 8.8649278809,
  25819: 2.20925215873,
  26911: 2.10761618871,
  6271: 0.267862440841,
  36343: -1.25289811103,
  32581: 5.96377064209}}

df = pd.DataFrame(data)

returns

                 nom_commune_postal             code_postal   latitude.   longitude
30658                                  GEORFANS        70110  47.533835   6.512694
11190                                              BU        28410  48.802227   1.493279
15782                                     LA MURE        38350  44.909917   5.788174
17560  ST JEAN ST MAURICE SUR LOIRE        42155  45.957659   3.985725
11902                                     GROSSA        20100  41.602779   8.864928
25819                 VILLERS ST SEPULCRE        60134  49.368951   2.209252
26911                        MERCK ST LIEVIN        62560  50.636334   2.107616
6271                                        TOUVRE        16600  45.658186   0.267862
36343                                     CUGAND        85610  47.060239  -1.252898
32581                                        THUSY        74150  45.949873   5.963771

Our objective is to identify rows that include the text " ST " in the column named "nom_commune_postal".

Selecting rows that a contain specific word or text

Using contain() method

To select rows that a contain specific word or text with Pandas a solution is to use contain() method, which returns the rows that contain a specific text in a column.

For example, if we have a dataframe called "df" and we want to search for the text " ST " in the column "nom_commune_postal":

df[df['nom_commune_postal'].str.contains(" ST ")]

This will return all the rows that contain the text " ST " in the column "nom_commune_postal:

              nom_commune_postal                code_postal   latitude  longitude
17560  ST JEAN ST MAURICE SUR LOIRE        42155  45.957659   3.985725
25819                 VILLERS ST SEPULCRE        60134  49.368951   2.209252
26911                        MERCK ST LIEVIN        62560  50.636334   2.107616

Case sensitive

Please be aware that the method is case sensitive. If you're trying to match a lowercase string, make sure that the text in the dataframe is also lowercase. For example

df[df['nom_commune_postal'].str.contains(" st ")]

This will return here an empty dataframe:

Empty DataFrame
Columns: [nom_commune_postal, code_postal, latitude, longitude]
Index: []

Using startswith() method

If you specifically want to locate rows that commence with a particular word or text, an alternative approach would be to utilize the startswith() method. This allows for a more precise and targeted search, ensuring that only the desired results are obtained.

df[df['nom_commune_postal'].str.startswith("ST ")]

This will return here

                   nom_commune_postal            code_postal   latitude  longitude
17560  ST JEAN ST MAURICE SUR LOIRE        42155  45.957659   3.985725

Using endswith() method

To identify rows that conclude with a particular word or text, you can utilize the endswith method:

df[df['nom_commune_postal'].str.endswith("ST ")]

returns here

Empty DataFrame
Columns: [nom_commune_postal, code_postal, latitude, longitude]
Index: []

References

Links Site
pandas.Series.str.contains pandas.pydata.org
String methods pandas.pydata.org
pandas.Series.str.startswith pandas.pydata.org
pandas.pydata.org pandas.pydata.org