One way to select rows that contain a particular substring in a Pandas DataFrame is by using the str.contains() method:
Case study
Let's consider the following DataFrame
import pandas as pdimport numpy as npdata = np.array([[47.95, -118.464, '2019-08-07T20:51:25Z'],[47.977, -118.606, '2019-08-04T21:36:25Z'],[47.916, -118.478, '2019-08-07T22:21:25Z'],[48.02, -118.404, '2019-08-07T21:51:25Z'],[47.985, -118.449, '2019-08-08T05:56:24Z'],[47.949, -118.495, '2019-08-08T22:51:24Z'],[47.983, -118.481, '2019-08-05T07:21:25Z'],[47.979, -118.575, '2019-08-04T05:11:24Z'],[47.986, -118.418, '2019-08-07T10:11:25Z'],[48.02, -118.404, '2019-08-09T05:31:25Z']], dtype=object)df = pd.DataFrame(data,columns=['latitude','longitude', 'observation date/time'])print(df)
output
latitude longitude observation date/time0 47.95 -118.464 2019-08-07T20:51:25Z1 47.977 -118.606 2019-08-04T21:36:25Z2 47.916 -118.478 2019-08-07T22:21:25Z3 48.02 -118.404 2019-08-07T21:51:25Z4 47.985 -118.449 2019-08-08T05:56:24Z5 47.949 -118.495 2019-08-08T22:51:24Z6 47.983 -118.481 2019-08-05T07:21:25Z7 47.979 -118.575 2019-08-04T05:11:24Z8 47.986 -118.418 2019-08-07T10:11:25Z9 48.02 -118.404 2019-08-09T05:31:25Z
We want to select rows that contain '2019-08-08' in 'observation date/time' column.
Using str.contains()
To select rows that contain a particular substring in a Pandas DataFrame is by using the str.contains() method. The syntax for this command is
dataframe['column'].str.contains("substring")
Example
df[ df['observation date/time'].str.contains('2019-08-08') ]
output
latitude longitude observation date/time4 47.985 -118.449 2019-08-08T05:56:24Z5 47.949 -118.495 2019-08-08T22:51:24Z
Using match()
Another method to select rows that contain a particular substring in a Pandas DataFrame is by using the .str.match() method. The syntax for this command is
dataframe['column'].str.match("pattern")
Example
df[ df['observation date/time'].str.match('2019-08-08') ]
output
latitude longitude observation date/time4 47.985 -118.449 2019-08-08T05:56:24Z5 47.949 -118.495 2019-08-08T22:51:24Z
Difference between match() and contains()
The primary distinction between str.contains() and str.match()is that the former uses regular expression search (re.search) and the latter employs a match function (re.match). According to their respective documentation, str.contains () checks whether a pattern or regex is present in strings of Series or Index, while str.match()extracts elements from long strings by determining if each string fits into an existing regular expression rule set..
References
| Links | Site |
|---|---|
| pandas.Series.str.contains | pandas.pydata.org |
| match() | docs.python.org |
