How to read categorical columns with pandas’ read_csv?

In version 0.19.0 you can use parameter dtype=”category” in read_csv: data=”col1,col2,col3\na,b,1\na,b,2\nc,d,3″ df = pd.read_csv(pd.compat.StringIO(data), dtype=”category”) print (df) col1 col2 col3 0 a b 1 1 a b 2 2 c d 3 print (df.dtypes) col1 category col2 category col3 category dtype: object If want specify column for category use dtype with dictionary: df = pd.read_csv(pd.compat.StringIO(data), … Read more

How to drop a row whose particular column is empty/NaN?

Use dropna with parameter subset for specify column for check NaNs: data = data.dropna(subset=[‘sms’]) print (data) id city department sms category 1 2 lhr revenue good 1 Another solution with boolean indexing and notnull: data = data[data[‘sms’].notnull()] print (data) id city department sms category 1 2 lhr revenue good 1 Alternative with query: print (data.query(“sms … Read more

ImportError: IProgress not found. Please update jupyter and ipywidgets although it is installed

I tried everything you mentioned in a new environment using conda and I had another issue related to the version of ipywidgets (a bug found in Github with comments saying that got solved after using last version). I solved the problem I had installing last version of ipywidgets. Here is my process: Create a new … Read more

JOIN two dataframes on common column in pandas

Use merge: print (pd.merge(df1, df2, left_on=’id’, right_on=’id1′, how=’left’).drop(‘id1’, axis=1)) id name count price rating 0 1 a 10 100.0 1.0 1 2 b 20 200.0 2.0 2 3 c 30 300.0 3.0 3 4 d 40 NaN NaN 4 5 e 50 500.0 5.0 Another solution is simple rename column: print (pd.merge(df1, df2.rename(columns={‘id1′:’id’}), on=’id’, how=’left’)) … Read more

How to convert single-row pandas data frame to series?

You can transpose the single-row dataframe (which still results in a dataframe) and then squeeze the results into a series (the inverse of to_frame). df = pd.DataFrame([list(range(5))], columns=[“a{}”.format(i) for i in range(5)]) >>> df.squeeze(axis=0) a0 0 a1 1 a2 2 a3 3 a4 4 Name: 0, dtype: int64 Note: To accommodate the point raised by … Read more

Looking for pandas “ungroup by” operation opposite to .groupby in the following string aggregation?

The rough equivalent is .reset_index(), but it may not be helpful to think of it as the “opposite” of groupby(). You are splitting a string in to pieces, and maintaining each piece’s association with ‘family’. This old answer of mine does the job. Just set ‘family’ as the index column first, refer to the link … Read more

How to use rolling functions for GroupBy objects

For the Googlers who come upon this old question: Regarding @kekert’s comment on @Garrett’s answer to use the new df.groupby(‘id’)[‘x’].rolling(2).mean() rather than the now-deprecated df.groupby(‘id’)[‘x’].apply(pd.rolling_mean, 2, min_periods=1) curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply … Read more