Examples of how to convert a dataframe column to an array ?
Create a dataframe
Let's create a dataframe with pandas
import pandas as pdimport numpy as npdata = np.random.randint(10, size=(5,2))df = pd.DataFrame(data=data,columns=['A','B'])print(df)
returns for example
A B0 2 31 9 82 4 83 4 74 3 8
Convert a column of numbers
To convert dataframe column to an array, a solution is to use pandas.DataFrame.to_numpy. Example with the column called 'B'
M = df['B'].to_numpy()
returns
array([3, 8, 8, 7, 8])
to check the type:
type(M)
returns
numpy.ndarray
Column with missing value(s)
If a missing value np.nan is inserted in the column:
df.iloc[2,1] = np.nanprint(df)
returns
A B0 2 3.01 9 8.02 4 NaN3 4 7.04 3 8.0
to_numpy() still works:
M = df['B'].to_numpy()[ 3. 8. nan 7. 8.]
and
M.dtype
returns
dtype('float64')
To replace missing values by a given numbers, a solution is to use the paramter na_value
M = df['B'].to_numpy(na_value=-999)
returns
[ 3. 8. -999. 7. 8.]
Convert a column of strings
Another example with a column of strings:
new_col_df = pd.DataFrame(data=['a','b','c','d','e'],columns=['C'])df = pd.concat([df,new_col_df], axis=1)
returns
A B C0 2 3.0 a1 9 8.0 b2 4 NaN c3 4 7.0 d4 3 8.0 e
then
M = df['C'].to_numpy()
returns
['a' 'b' 'c' 'd' 'e']
and
M.dtype
returns
dtype('O')
