How to select rows that contain a substring in Pandas DataFrame ?


One way to select rows that contain a particular substring in a Pandas DataFrame is by using the str.contains() method:

Case study

Let's consider the following DataFrame

import pandas as pd
import numpy as np

data = np.array([[47.95, -118.464, '2019-08-07T20:51:25Z'],
       [47.977, -118.606, '2019-08-04T21:36:25Z'],
       [47.916, -118.478, '2019-08-07T22:21:25Z'],
       [48.02, -118.404, '2019-08-07T21:51:25Z'],
       [47.985, -118.449, '2019-08-08T05:56:24Z'],
       [47.949, -118.495, '2019-08-08T22:51:24Z'],
       [47.983, -118.481, '2019-08-05T07:21:25Z'],
       [47.979, -118.575, '2019-08-04T05:11:24Z'],
       [47.986, -118.418, '2019-08-07T10:11:25Z'],
       [48.02, -118.404, '2019-08-09T05:31:25Z']], dtype=object)

df = pd.DataFrame(data,columns=['latitude','longitude', 'observation date/time'])

print(df)

output

      latitude longitude observation date/time
    0    47.95  -118.464  2019-08-07T20:51:25Z
    1   47.977  -118.606  2019-08-04T21:36:25Z
    2   47.916  -118.478  2019-08-07T22:21:25Z
    3    48.02  -118.404  2019-08-07T21:51:25Z
    4   47.985  -118.449  2019-08-08T05:56:24Z
    5   47.949  -118.495  2019-08-08T22:51:24Z
    6   47.983  -118.481  2019-08-05T07:21:25Z
    7   47.979  -118.575  2019-08-04T05:11:24Z
    8   47.986  -118.418  2019-08-07T10:11:25Z
    9    48.02  -118.404  2019-08-09T05:31:25Z

We want to select rows that contain '2019-08-08' in 'observation date/time' column.

Using str.contains()

To select rows that contain a particular substring in a Pandas DataFrame is by using the str.contains() method. The syntax for this command is

dataframe['column'].str.contains("substring")

Example

df[ df['observation date/time'].str.contains('2019-08-08')  ]

output

      latitude longitude observation date/time
    4   47.985  -118.449  2019-08-08T05:56:24Z
    5   47.949  -118.495  2019-08-08T22:51:24Z

Using match()

Another method to select rows that contain a particular substring in a Pandas DataFrame is by using the .str.match() method. The syntax for this command is

dataframe['column'].str.match("pattern")

Example

df[ df['observation date/time'].str.match('2019-08-08')  ]

output

      latitude longitude observation date/time
    4   47.985  -118.449  2019-08-08T05:56:24Z
    5   47.949  -118.495  2019-08-08T22:51:24Z

Difference between match() and contains()

The primary distinction between str.contains() and str.match()is that the former uses regular expression search (re.search) and the latter employs a match function (re.match). According to their respective documentation, str.contains () checks whether a pattern or regex is present in strings of Series or Index, while str.match()extracts elements from long strings by determining if each string fits into an existing regular expression rule set..

References

Links Site
pandas.Series.str.contains pandas.pydata.org
match() docs.python.org