Examples of how to split a full name column into a first and last name columns with pandas in python:
Table of contents
Create a dataframe with pandas
Let's first create a dataframe with pandas:
import pandas as pd
import numpy as np
data = {'Full_Name':['April Reiter','Emory Miller','David Ballin','Alice Trotter','Virginia Alicia Rios']}
df = pd.DataFrame(data=data)
print(df)
gives
Full_Name
0 April Reiter
1 Emory Miller
2 David Ballin
3 Alice Trotter
4 Virginia Rios
Split full name into 2 columns
To split the column called Full_Name a solution is to use pandas.Series.str.split:
df['Full_Name'].str.split(expand=True)
0 1
0 April Reiter
1 Emory Miller
2 David Ballin
3 Alice Trotter
4 Virginia Rios
Another solution using pandas.Series.str.extract and a regular expression
df['Full_Name'].str.extract(r'(\w+) (\w+)', expand=True)
gives
0 1
0 April Reiter
1 Emory Miller
2 David Ballin
3 Alice Trotter
4 Virginia Rios
To give directly a name the new columns just do:
df['Full_Name'].str.extract(r'(?P<First_Name>\w+) (?P<Last_Name>\w+)', expand=True)
gives then
First_Name Last_Name
0 April Reiter
1 Emory Miller
2 David Ballin
3 Alice Trotter
4 Virginia Rios
Another example
Let's see a more complicated example: lat's assume that one of the full names as a middle name here "Virginia Alicia Rios" instead of "Virginia Rios" previously:
import pandas as pd
import numpy as np
data = {'Full_Name':['April Reiter','Emory Miller','David Ballin','Alice Trotter','Virginia Alicia Rios']}
df = pd.DataFrame(data=data)
print(df)
gives
Full_Name
0 April Reiter
1 Emory Miller
2 David Ballin
3 Alice Trotter
4 Virginia Alicia Rios
then
df['Full_Name'].str.split(expand=True)
will returns three columns:
0 1 2
0 April Reiter None
1 Emory Miller None
2 David Ballin None
3 Alice Trotter None
4 Virginia Alicia Rios
To get only two columns a solution is to use extract
df['Full_Name'].str.extract(r'(?P<First_Name>\w+) (?P<Last_Name>\w+)', expand=True)
gives
First_Name Last_Name
0 April Reiter
1 Emory Miller
2 David Ballin
3 Alice Trotter
4 Virginia Alicia
However one can see that the last name of Virginia is Alicia here not Rios. To fix that a solution is to add a $:
print( df['Full_Name'].str.extract(r'(?P<First_Name>\w+) (?P<Last_Name>\w+)$', expand=True) )
gives
First_Name Last_Name
0 April Reiter
1 Emory Miller
2 David Ballin
3 Alice Trotter
4 Alicia Rios