Python Pandas: Calculate moving average within group
You can use rolling with transform: df[‘moving’] = df.groupby(‘object’)[‘value’].transform(lambda x: x.rolling(10, 1).mean()) The 1 in rolling is for minimum number of periods.
You can use rolling with transform: df[‘moving’] = df.groupby(‘object’)[‘value’].transform(lambda x: x.rolling(10, 1).mean()) The 1 in rolling is for minimum number of periods.
This is by design, as described here and here The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this. Depending on your actual use case, you can replace the … Read more
consider the dataframe df df = pd.DataFrame(dict(A=list(‘aabb’), B=[1, 2, 3, 4], C=[0, 9, 0, 9])) groupby is the standard use aggregater df.groupby(‘A’).mean() maybe you want these values broadcast across the whole group and return something with the same index as what you started with. use transform df.groupby(‘A’).transform(‘mean’) df.set_index(‘A’).groupby(level=”A”).transform(‘mean’) agg is used when you have specific … Read more
You seem to want to group by several columns at once: df.groupby([‘revenue’,’session’,’user_id’])[‘user_id’].count() should give you what you want
Since Pandas 0.23.0, the groupby method can now take a parameter observed which fixes this issue if it is set to True (False by default). Below is the exact same code as in the question with just observed=True added : import pandas as pd group_cols = [‘Group1’, ‘Group2’, ‘Group3’] df = pd.DataFrame([[‘A’, ‘B’, ‘C’, 54.34], … Read more
Option 1 def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last) Option 2 – only works if index is unique idx = df.index.to_series().groupby(level=0).agg([‘first’, ‘last’]).stack() df.loc[idx] Option 3 – per notes below, this only makes sense when there are no NAs I also abused the agg function. The code below works, but is far uglier. df.reset_index(1).groupby(level=0).agg([‘first’, ‘last’]).stack() \ … Read more
You can create the index on the existing dataframe. With the subset of data provided, this works for me: import pandas df = pandas.DataFrame.from_dict( { ‘category’: {0: ‘Love’, 1: ‘Love’, 2: ‘Fashion’, 3: ‘Fashion’, 4: ‘Hair’, 5: ‘Movies’, 6: ‘Movies’, 7: ‘Health’, 8: ‘Health’, 9: ‘Celebs’, 10: ‘Celebs’, 11: ‘Travel’, 12: ‘Weightloss’, 13: ‘Diet’, 14: … Read more
It’s more or less the same: spark_df.groupBy(‘column_name’).count().orderBy(‘count’) In the groupBy you can have multiple columns delimited by a , For example groupBy(‘column_1’, ‘column_2’)
pandas.core.groupby.GroupBy.apply does NOT have named parameter args, but pandas.DataFrame.apply does have it. So try this: df.groupby(‘columnName’).apply(lambda x: myFunction(x, arg1)) or as suggested by @Zero: df.groupby(‘columnName’).apply(myFunction, (‘arg1’)) Demo: In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list(‘abc’)) In [83]: df Out[83]: a b c 0 0 3 1 1 0 3 4 2 3 0 4 3 4 2 … Read more
The rough equivalent is .reset_index(), but it may not be helpful to think of it as the “opposite” of groupby(). You are splitting a string in to pieces, and maintaining each piece’s association with ‘family’. This old answer of mine does the job. Just set ‘family’ as the index column first, refer to the link … Read more