Pandas groupby to to_csv

Try doing this: week_grouped = df.groupby(‘week’) week_grouped.sum().reset_index().to_csv(‘week_grouped.csv’) That’ll write the entire dataframe to the file. If you only want those two columns then, week_grouped = df.groupby(‘week’) week_grouped.sum().reset_index()[[‘week’, ‘count’]].to_csv(‘week_grouped.csv’) Here’s a line by line explanation of the original code: # This creates a “groupby” object (not a dataframe object) # and you store it in the … Read more

Boxplot with pandas groupby multiindex, for specified sublevels from multiindex

this code: data[‘2013-08-17′].boxplot(by=’SPECIES’) Will not work, as boxplot is a function for a DataFrame and not a Series. While in Pandas > 0.18.1 the boxplot function has the argument columns which defines from what column the data is taken from. So data.boxplot(column=’2013-08-17′,by=’SPECIES’) should return the desired result. An example with the Iris dataset: import pandas … Read more

Including the group name in the apply function pandas python

I think you should be able to use the nameattribute: temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x)) should work, example: In [132]: df = pd.DataFrame({‘a’:list(‘aabccc’), ‘b’:np.arange(6)}) df Out[132]: a b 0 a 0 1 a 1 2 b 2 3 c 3 4 c 4 5 c 5 In [134]: df.groupby(‘a’).apply(lambda x: print(‘name:’, x.name, ‘\nsubdf:’,x)) name: a subdf: … Read more

pandas groupby dropping columns

I think it is Automatic exclusion of ‘nuisance’ columns, what described here. Sample: df = pd.DataFrame({‘C’: {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, 7: 1.548106}, ‘A’: {0: ‘foo’, 1: ‘bar’, 2: ‘foo’, 3: ‘bar’, 4: ‘foo’, 5: ‘bar’, 6: ‘foo’, 7: ‘foo’}, ‘B’: {0: ‘one’, 1: ‘one’, 2: … Read more

Python Pandas Conditional Sum with Groupby

First groupby the key1 column: In [11]: g = df.groupby(‘key1′) and then for each group take the subDataFrame where key2 equals ‘one’ and sum the data1 column: In [12]: g.apply(lambda x: x[x[‘key2’] == ‘one’][‘data1′].sum()) Out[12]: key1 a 0.093391 b 1.468194 dtype: float64 To explain what’s going on let’s look at the ‘a’ group: In [21]: … Read more

Groupby class and count missing values in features

Compute a mask with isna, then group and find the sum: df.drop(‘CLASS’, 1).isna().groupby(df.CLASS, sort=False).sum().reset_index() CLASS FEATURE1 FEATURE2 FEATURE3 0 X 1.0 1.0 2.0 1 B 0.0 0.0 0.0 Another option is to subtract the size from the count using rsub along the 0th axis for index aligned subtraction: df.groupby(‘CLASS’).count().rsub(df.groupby(‘CLASS’).size(), axis=0) Or, g = df.groupby(‘CLASS’) g.count().rsub(g.size(), … Read more