How to read categorical columns with pandas’ read_csv?

In version 0.19.0 you can use parameter dtype=”category” in read_csv: data=”col1,col2,col3\na,b,1\na,b,2\nc,d,3″ df = pd.read_csv(pd.compat.StringIO(data), dtype=”category”) print (df) col1 col2 col3 0 a b 1 1 a b 2 2 c d 3 print (df.dtypes) col1 category col2 category col3 category dtype: object If want specify column for category use dtype with dictionary: df = pd.read_csv(pd.compat.StringIO(data), … Read more

Correlation among multiple categorical variables

You can using pd.factorize df.apply(lambda x : pd.factorize(x)[0]).corr(method=’pearson’, min_periods=1) Out[32]: a c d a 1.0 1.0 1.0 c 1.0 1.0 1.0 d 1.0 1.0 1.0 Data input df=pd.DataFrame({‘a’:[‘a’,’b’,’c’],’c’:[‘a’,’b’,’c’],’d’:[‘a’,’b’,’c’]}) Update from scipy.stats import chisquare df=df.apply(lambda x : pd.factorize(x)[0])+1 pd.DataFrame([chisquare(df[x].values,f_exp=df.values.T,axis=1)[0] for x in df]) Out[123]: 0 1 2 3 0 0.0 0.0 0.0 0.0 1 0.0 0.0 … Read more

How to know the labels assigned by astype(‘category’).cat.codes?

You can generate dictionary: c = language.lang.astype(‘category’) d = dict(enumerate(c.cat.categories)) print (d) {0: ‘english’, 1: ‘spanish’} So then if necessary is possible map: language[‘code’] = language.lang.astype(‘category’).cat.codes language[‘level_back’] = language[‘code’].map(d) print (language) lang level code level_back 0 english intermediate 0 english 1 spanish intermediate 1 spanish 2 spanish basic 1 spanish 3 english basic 0 english … Read more

Get a list of categories of categorical variable

I believe need Series.cat.categories or unique: np.random.seed(1245) a = [‘No’, ‘Yes’, ‘Maybe’] df = pd.DataFrame(np.random.choice(a, size=(10, 3)), columns=[‘Col1′,’Col2′,’Col3’]) df[‘Col1’] = pd.Categorical(df[‘Col1’]) print (df.dtypes) Col1 category Col2 object Col3 object dtype: object print (df[‘Col1’].cat.categories) Index([‘Maybe’, ‘No’, ‘Yes’], dtype=”object”) print (df[‘Col2’].unique()) [‘Yes’ ‘Maybe’ ‘No’] print (df[‘Col1’].unique()) [Maybe, No, Yes] Categories (3, object): [Maybe, No, Yes]

Issue with OneHotEncoder for categorical features

If you read the docs for OneHotEncoder you’ll see the input for fit is “Input array of type int”. So you need to do two steps for your one hot encoded data from sklearn import preprocessing cat_features = [‘color’, ‘director_name’, ‘actor_2_name’] enc = preprocessing.LabelEncoder() enc.fit(cat_features) new_cat_features = enc.transform(cat_features) print new_cat_features # [1 2 0] new_cat_features … Read more

Is it possible to read categorical columns with pandas’ read_csv?

In version 0.19.0 you can use parameter dtype=”category” in read_csv: data=”col1,col2,col3\na,b,1\na,b,2\nc,d,3″ df = pd.read_csv(pd.compat.StringIO(data), dtype=”category”) print (df) col1 col2 col3 0 a b 1 1 a b 2 2 c d 3 print (df.dtypes) col1 category col2 category col3 category dtype: object If want specify column for category use dtype with dictionary: df = pd.read_csv(pd.compat.StringIO(data), … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)