pandas-groupby – Page 2

Python Pandas: Calculate moving average within group

July 28, 2023 by Tarik

You can use rolling with transform: df[‘moving’] = df.groupby(‘object’)[‘value’].transform(lambda x: x.rolling(10, 1).mean()) The 1 in rolling is for minimum number of periods.

Transform vs. aggregate in Pandas

June 13, 2023 by Tarik

consider the dataframe df df = pd.DataFrame(dict(A=list(‘aabb’), B=[1, 2, 3, 4], C=[0, 9, 0, 9])) groupby is the standard use aggregater df.groupby(‘A’).mean() maybe you want these values broadcast across the whole group and return something with the same index as what you started with. use transform df.groupby(‘A’).transform(‘mean’) df.set_index(‘A’).groupby(level=”A”).transform(‘mean’) agg is used when you have specific … Read more

Pandas, groupby and count

June 11, 2023 by Tarik

You seem to want to group by several columns at once: df.groupby([‘revenue’,’session’,’user_id’])[‘user_id’].count() should give you what you want

How to do group by on a multiindex in pandas?

May 17, 2023 by Tarik

You can create the index on the existing dataframe. With the subset of data provided, this works for me: import pandas df = pandas.DataFrame.from_dict( { ‘category’: {0: ‘Love’, 1: ‘Love’, 2: ‘Fashion’, 3: ‘Fashion’, 4: ‘Hair’, 5: ‘Movies’, 6: ‘Movies’, 7: ‘Health’, 8: ‘Health’, 9: ‘Celebs’, 10: ‘Celebs’, 11: ‘Travel’, 12: ‘Weightloss’, 13: ‘Diet’, 14: … Read more

What’s the equivalent of Panda’s value_counts() in PySpark?

May 13, 2023 by Tarik

It’s more or less the same: spark_df.groupBy(‘column_name’).count().orderBy(‘count’) In the groupBy you can have multiple columns delimited by a , For example groupBy(‘column_1’, ‘column_2’)

Use Pandas groupby() + apply() with arguments

April 21, 2023 by Tarik

pandas.core.groupby.GroupBy.apply does NOT have named parameter args, but pandas.DataFrame.apply does have it. So try this: df.groupby(‘columnName’).apply(lambda x: myFunction(x, arg1)) or as suggested by @Zero: df.groupby(‘columnName’).apply(myFunction, (‘arg1’)) Demo: In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list(‘abc’)) In [83]: df Out[83]: a b c 0 0 3 1 1 0 3 4 2 3 0 4 3 4 2 … Read more