How to convert a dataframe column to an array with pandas ?

Published: March 30, 2021

Tags: Python; Pandas; DataFrame;

DMCA.com Protection Status

Examples of how to convert a dataframe column to an array ?

Create a dataframe

Let's create a dataframe with pandas

import pandas as pd
import numpy as np

data = np.random.randint(10, size=(5,2))

df = pd.DataFrame(data=data,columns=['A','B'])

print(df)

returns for example

   A  B
0  2  3
1  9  8
2  4  8
3  4  7
4  3  8

Convert a column of numbers

To convert dataframe column to an array, a solution is to use pandas.DataFrame.to_numpy. Example with the column called 'B'

M = df['B'].to_numpy()

returns

array([3, 8, 8, 7, 8])

to check the type:

type(M)

returns

numpy.ndarray

Column with missing value(s)

If a missing value np.nan is inserted in the column:

df.iloc[2,1] = np.nan

print(df)

returns

   A    B
0  2  3.0
1  9  8.0
2  4  NaN
3  4  7.0
4  3  8.0

to_numpy() still works:

M = df['B'].to_numpy()

[ 3.  8. nan  7.  8.]

and

M.dtype

returns

dtype('float64')

To replace missing values by a given numbers, a solution is to use the paramter na_value

M = df['B'].to_numpy(na_value=-999)

returns

[   3.    8. -999.    7.    8.]

Convert a column of strings

Another example with a column of strings:

new_col_df = pd.DataFrame(data=['a','b','c','d','e'],columns=['C'])

df = pd.concat([df,new_col_df], axis=1)

returns

   A    B  C
0  2  3.0  a
1  9  8.0  b
2  4  NaN  c
3  4  7.0  d
4  3  8.0  e

then

M = df['C'].to_numpy()

returns

['a' 'b' 'c' 'd' 'e']

and

M.dtype

returns

dtype('O')

References