When selecting values of a pandas series, the .loc and .iloc indexers are both useful depending on the context. The .loc is primarily used to select data based on labels, while the .iloc relies on integer positions. Examples:
Create a series
Let's create a simple series with pandas:
import pandas as pd
data = {'a': 1, 'b': 2, 'c': 3, 'e':4, 'f':5}
ds = pd.Series(data=data)
print(ds)
output
a 1
b 2
c 3
e 4
f 5
dtype: int64
Select values using loc()
Select one value
if you want to select a specific row you can use the .loc with an index name such as
ds.loc['c']
which return
3
Note that
type(ds.loc['c'])
returns here
numpy.int64
Select multiple values
To select a set of values based on their names using .loc:
ds.iloc['c':'f']
which will return:
c 3
e 4
f 5
dtype: int64
Note that
type(ds.iloc['c':'f'])
returns
pandas.core.series.Series
Using a list
name_list = ['a','c','f']
ds.loc[name_list]
returns
a 1
c 3
f 5
dtype: int64
Using a condition
It is also possible to use loc with a condition. For example, if you want to select all rows with a value greater than 2 you can use the .loc indexer with a condition such as
ds.loc[:] > 2.
gives
a False
b False
c True
e True
f True
dtype: bool
Then
ds.loc[ ds.loc[:] > 2. ]
gives
c 3
e 4
f 5
dtype: int64
Select values using iloc()
Select one value
Another solution if you want to select a specific row you can use the .iloc indexer with an integer position such as
ds.iloc[2]
which return
3
It is important to note that when using the .iloc indexer, you are always selecting by position and not by label. This means you should be aware of any potential changes in the order of rows or columns that could affect your results. Additionally, you should also be aware of any duplicate labels since they can produce unexpected behavior when selecting values with the .loc indexer.
Note that
type(ds.iloc[2])
returns
numpy.int64
Select multiple values
To select a set of values based on their integer position using .iloc:
ds.iloc[2:5]
which will return:
c 3
e 4
f 5
dtype: int64
Note that
type(ds.iloc[2:5])
returns
pandas.core.series.Series
Using a list
idx_list = [0,2,4]
ds.iloc[idx_list]
returns
a 1
c 3
f 5
dtype: int64
Extra notes
Convert series to a DataFrame
Just do
ds.to_frame()
returns
0
a 1
b 2
c 3
e 4
f 5
Iterate through rows of a DataFrame
Note
import pandas as pd
import numpy as np
np.random.seed(42)
data = np.random.uniform(10,80, size=(4,2))
df1 = pd.DataFrame(data,columns=['A','B'])
print(df1)
output
A B
0 36.217808 76.550001
1 61.239576 51.906094
2 20.921305 20.919616
3 14.065853 70.632330
iterate over rows of df1
for index, row in df1.iterrows():
print(row)
print( type(row) )
output
A 36.217808
B 76.550001
Name: 0, dtype: float64
<class 'pandas.core.series.Series'>
A 61.239576
B 51.906094
Name: 1, dtype: float64
<class 'pandas.core.series.Series'>
A 20.921305
B 20.919616
Name: 2, dtype: float64
<class 'pandas.core.series.Series'>
A 14.065853
B 70.632330
Name: 3, dtype: float64
<class 'pandas.core.series.Series'>
References
Links | Site |
---|---|
pandas.Series | pandas.pydata.org |
iloc() | pandas.pydata.org |
loc() | pandas.pydata.org |