How to read categorical columns with pandas’ read_csv?

In version 0.19.0 you can use parameter dtype="category" in read_csv:

data="col1,col2,col3\na,b,1\na,b,2\nc,d,3"
df = pd.read_csv(pd.compat.StringIO(data), dtype="category")
print (df)
  col1 col2 col3
0    a    b    1
1    a    b    2
2    c    d    3

print (df.dtypes)
col1    category
col2    category
col3    category
dtype: object

If want specify column for category use dtype with dictionary:

df = pd.read_csv(pd.compat.StringIO(data), dtype={'col1':'category'})
print (df)
  col1 col2  col3
0    a    b     1
1    a    b     2
2    c    d     3

print (df.dtypes)
col1    category
col2      object
col3       int64
dtype: object

Leave a Comment