pandas-groupby – Tarik Billa

Pandas groupby to to_csv

December 25, 2023 by Tarik

Try doing this: week_grouped = df.groupby(‘week’) week_grouped.sum().reset_index().to_csv(‘week_grouped.csv’) That’ll write the entire dataframe to the file. If you only want those two columns then, week_grouped = df.groupby(‘week’) week_grouped.sum().reset_index()[[‘week’, ‘count’]].to_csv(‘week_grouped.csv’) Here’s a line by line explanation of the original code: # This creates a “groupby” object (not a dataframe object) # and you store it in the … Read more

Sorting the grouped data as per group size in Pandas

December 9, 2023 by Tarik

For Pandas 0.17+, use sort_values: df.groupby(‘col1’).size().sort_values(ascending=False) For pre-0.17, you can use size().order(): df.groupby(‘col1’).size().order(ascending=False)

Boxplot with pandas groupby multiindex, for specified sublevels from multiindex

November 30, 2023 by Tarik

this code: data[‘2013-08-17′].boxplot(by=’SPECIES’) Will not work, as boxplot is a function for a DataFrame and not a Series. While in Pandas > 0.18.1 the boxplot function has the argument columns which defines from what column the data is taken from. So data.boxplot(column=’2013-08-17′,by=’SPECIES’) should return the desired result. An example with the Iris dataset: import pandas … Read more

Including the group name in the apply function pandas python

August 22, 2023 by Tarik

I think you should be able to use the nameattribute: temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x)) should work, example: In [132]: df = pd.DataFrame({‘a’:list(‘aabccc’), ‘b’:np.arange(6)}) df Out[132]: a b 0 a 0 1 a 1 2 b 2 3 c 3 4 c 4 5 c 5 In [134]: df.groupby(‘a’).apply(lambda x: print(‘name:’, x.name, ‘\nsubdf:’,x)) name: a subdf: … Read more

pandas: GroupBy .pipe() vs .apply()

August 22, 2023 by Tarik

What pipe does is to allow you to pass a callable with the expectation that the object that called pipe is the object that gets passed to the callable. With apply we assume that the object that calls apply has subcomponents that will each get passed to the callable that was passed to apply. In … Read more

How can I group by month from a date field using Python and Pandas?

August 11, 2023 by Tarik

Try this: In [6]: df[‘date’] = pd.to_datetime(df[‘date’]) In [7]: df Out[7]: date Revenue 0 2017-06-02 100 1 2017-05-23 200 2 2017-05-20 300 3 2017-06-22 400 4 2017-06-21 500 In [59]: df.groupby(df[‘date’].dt.strftime(‘%B’))[‘Revenue’].sum().sort_values() Out[59]: date May 500 June 1000

pandas groupby dropping columns

August 8, 2023 by Tarik

I think it is Automatic exclusion of ‘nuisance’ columns, what described here. Sample: df = pd.DataFrame({‘C’: {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, 7: 1.548106}, ‘A’: {0: ‘foo’, 1: ‘bar’, 2: ‘foo’, 3: ‘bar’, 4: ‘foo’, 5: ‘bar’, 6: ‘foo’, 7: ‘foo’}, ‘B’: {0: ‘one’, 1: ‘one’, 2: … Read more

Python Pandas Conditional Sum with Groupby

August 7, 2023 by Tarik

First groupby the key1 column: In [11]: g = df.groupby(‘key1′) and then for each group take the subDataFrame where key2 equals ‘one’ and sum the data1 column: In [12]: g.apply(lambda x: x[x[‘key2’] == ‘one’][‘data1′].sum()) Out[12]: key1 a 0.093391 b 1.468194 dtype: float64 To explain what’s going on let’s look at the ‘a’ group: In [21]: … Read more