Examples of how to convert a dataframe column to an array ?
Create a dataframe
Let's create a dataframe with pandas
import pandas as pd
import numpy as np
data = np.random.randint(10, size=(5,2))
df = pd.DataFrame(data=data,columns=['A','B'])
print(df)
returns for example
A B
0 2 3
1 9 8
2 4 8
3 4 7
4 3 8
Convert a column of numbers
To convert dataframe column to an array, a solution is to use pandas.DataFrame.to_numpy. Example with the column called 'B'
M = df['B'].to_numpy()
returns
array([3, 8, 8, 7, 8])
to check the type:
type(M)
returns
numpy.ndarray
Column with missing value(s)
If a missing value np.nan is inserted in the column:
df.iloc[2,1] = np.nan
print(df)
returns
A B
0 2 3.0
1 9 8.0
2 4 NaN
3 4 7.0
4 3 8.0
to_numpy() still works:
M = df['B'].to_numpy()
[ 3. 8. nan 7. 8.]
and
M.dtype
returns
dtype('float64')
To replace missing values by a given numbers, a solution is to use the paramter na_value
M = df['B'].to_numpy(na_value=-999)
returns
[ 3. 8. -999. 7. 8.]
Convert a column of strings
Another example with a column of strings:
new_col_df = pd.DataFrame(data=['a','b','c','d','e'],columns=['C'])
df = pd.concat([df,new_col_df], axis=1)
returns
A B C
0 2 3.0 a
1 9 8.0 b
2 4 NaN c
3 4 7.0 d
4 3 8.0 e
then
M = df['C'].to_numpy()
returns
['a' 'b' 'c' 'd' 'e']
and
M.dtype
returns
dtype('O')