One way to select rows that contain a particular substring in a Pandas DataFrame is by using the str.contains() method:
Case study
Let's consider the following DataFrame
import pandas as pd
import numpy as np
data = np.array([[47.95, -118.464, '2019-08-07T20:51:25Z'],
[47.977, -118.606, '2019-08-04T21:36:25Z'],
[47.916, -118.478, '2019-08-07T22:21:25Z'],
[48.02, -118.404, '2019-08-07T21:51:25Z'],
[47.985, -118.449, '2019-08-08T05:56:24Z'],
[47.949, -118.495, '2019-08-08T22:51:24Z'],
[47.983, -118.481, '2019-08-05T07:21:25Z'],
[47.979, -118.575, '2019-08-04T05:11:24Z'],
[47.986, -118.418, '2019-08-07T10:11:25Z'],
[48.02, -118.404, '2019-08-09T05:31:25Z']], dtype=object)
df = pd.DataFrame(data,columns=['latitude','longitude', 'observation date/time'])
print(df)
output
latitude longitude observation date/time
0 47.95 -118.464 2019-08-07T20:51:25Z
1 47.977 -118.606 2019-08-04T21:36:25Z
2 47.916 -118.478 2019-08-07T22:21:25Z
3 48.02 -118.404 2019-08-07T21:51:25Z
4 47.985 -118.449 2019-08-08T05:56:24Z
5 47.949 -118.495 2019-08-08T22:51:24Z
6 47.983 -118.481 2019-08-05T07:21:25Z
7 47.979 -118.575 2019-08-04T05:11:24Z
8 47.986 -118.418 2019-08-07T10:11:25Z
9 48.02 -118.404 2019-08-09T05:31:25Z
We want to select rows that contain '2019-08-08' in 'observation date/time' column.
Using str.contains()
To select rows that contain a particular substring in a Pandas DataFrame is by using the str.contains() method. The syntax for this command is
dataframe['column'].str.contains("substring")
Example
df[ df['observation date/time'].str.contains('2019-08-08') ]
output
latitude longitude observation date/time
4 47.985 -118.449 2019-08-08T05:56:24Z
5 47.949 -118.495 2019-08-08T22:51:24Z
Using match()
Another method to select rows that contain a particular substring in a Pandas DataFrame is by using the .str.match() method. The syntax for this command is
dataframe['column'].str.match("pattern")
Example
df[ df['observation date/time'].str.match('2019-08-08') ]
output
latitude longitude observation date/time
4 47.985 -118.449 2019-08-08T05:56:24Z
5 47.949 -118.495 2019-08-08T22:51:24Z
Difference between match() and contains()
The primary distinction between str.contains() and str.match()is that the former uses regular expression search (re.search) and the latter employs a match function (re.match). According to their respective documentation, str.contains () checks whether a pattern or regex is present in strings of Series or Index, while str.match()extracts elements from long strings by determining if each string fits into an existing regular expression rule set..
References
Links | Site |
---|---|
pandas.Series.str.contains | pandas.pydata.org |
match() | docs.python.org |