Transform vs. aggregate in Pandas

consider the dataframe df df = pd.DataFrame(dict(A=list(‘aabb’), B=[1, 2, 3, 4], C=[0, 9, 0, 9])) groupby is the standard use aggregater df.groupby(‘A’).mean() maybe you want these values broadcast across the whole group and return something with the same index as what you started with. use transform df.groupby(‘A’).transform(‘mean’) df.set_index(‘A’).groupby(level=”A”).transform(‘mean’) agg is used when you have specific … Read more

Pandas groupby with categories with redundant nan

Since Pandas 0.23.0, the groupby method can now take a parameter observed which fixes this issue if it is set to True (False by default). Below is the exact same code as in the question with just observed=True added : import pandas as pd group_cols = [‘Group1’, ‘Group2’, ‘Group3’] df = pd.DataFrame([[‘A’, ‘B’, ‘C’, 54.34], … Read more

get first and last values in a groupby

Option 1 def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last) Option 2 – only works if index is unique idx = df.index.to_series().groupby(level=0).agg([‘first’, ‘last’]).stack() df.loc[idx] Option 3 – per notes below, this only makes sense when there are no NAs I also abused the agg function. The code below works, but is far uglier. df.reset_index(1).groupby(level=0).agg([‘first’, ‘last’]).stack() \ … Read more

How to do group by on a multiindex in pandas?

You can create the index on the existing dataframe. With the subset of data provided, this works for me: import pandas df = pandas.DataFrame.from_dict( { ‘category’: {0: ‘Love’, 1: ‘Love’, 2: ‘Fashion’, 3: ‘Fashion’, 4: ‘Hair’, 5: ‘Movies’, 6: ‘Movies’, 7: ‘Health’, 8: ‘Health’, 9: ‘Celebs’, 10: ‘Celebs’, 11: ‘Travel’, 12: ‘Weightloss’, 13: ‘Diet’, 14: … Read more

Use Pandas groupby() + apply() with arguments

pandas.core.groupby.GroupBy.apply does NOT have named parameter args, but pandas.DataFrame.apply does have it. So try this: df.groupby(‘columnName’).apply(lambda x: myFunction(x, arg1)) or as suggested by @Zero: df.groupby(‘columnName’).apply(myFunction, (‘arg1’)) Demo: In [82]: df = pd.DataFrame(np.random.randint(5,size=(5,3)), columns=list(‘abc’)) In [83]: df Out[83]: a b c 0 0 3 1 1 0 3 4 2 3 0 4 3 4 2 … Read more

Is there an “ungroup by” operation opposite to .groupby in pandas?

The rough equivalent is .reset_index(), but it may not be helpful to think of it as the “opposite” of groupby(). You are splitting a string in to pieces, and maintaining each piece’s association with ‘family’. This old answer of mine does the job. Just set ‘family’ as the index column first, refer to the link … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)