Summing over a multiindex level in a pandas series
If you know you always want to aggregate over the first two levels, then this is pretty easy: In [27]: data.groupby(level=[0, 1]).sum() Out[27]: A B 277 b 37 a B 159 b 16 dtype: int64
If you know you always want to aggregate over the first two levels, then this is pretty easy: In [27]: data.groupby(level=[0, 1]).sum() Out[27]: A B 277 b 37 a B 159 b 16 dtype: int64
You can create the index on the existing dataframe. With the subset of data provided, this works for me: import pandas df = pandas.DataFrame.from_dict( { ‘category’: {0: ‘Love’, 1: ‘Love’, 2: ‘Fashion’, 3: ‘Fashion’, 4: ‘Hair’, 5: ‘Movies’, 6: ‘Movies’, 7: ‘Health’, 8: ‘Health’, 9: ‘Celebs’, 10: ‘Celebs’, 11: ‘Travel’, 12: ‘Weightloss’, 13: ‘Diet’, 14: … Read more
The solution is to leave out the labels. This works fine for me: >>> import pandas as pd >>> my_index = pd.MultiIndex(levels=[[],[],[]], … codes=[[],[],[]], … names=[u’one’, u’two’, u’three’]) >>> my_index MultiIndex([], names=[‘one’, ‘two’, ‘three’]) >>> my_columns = [u’alpha’, u’beta’] >>> df = pd.DataFrame(index=my_index, columns=my_columns) >>> df Empty DataFrame Columns: [alpha, beta] Index: [] >>> df.loc[(‘apple’,’banana’,’cherry’),:] … Read more
When you pass inplace in makes the changes on the original variable and returns None, and the function does not return the modified dataframe, it returns None. is_none = df.set_index([‘Company’, ‘date’], inplace=True) df # the dataframe you want is_none # has the value None so when you have a line like: df = df.set_index([‘Company’, ‘date’], … Read more
The most straightforward way is with .loc: >>> data.loc[:, ([‘one’, ‘two’], [‘a’, ‘b’])] one two a b a b 0 0.4 -0.6 -0.7 0.9 1 0.1 0.4 0.5 -0.3 2 0.7 -1.6 0.7 -0.8 3 -0.9 2.6 1.9 0.6 Remember that [] and () have special meaning when dealing with a MultiIndex object: (…) a … Read more
Method #1: reset_index() >>> g uses books sum sum token year xanthos 1830 3 3 1840 3 3 1868 2 2 1875 1 1 [4 rows x 2 columns] >>> g = g.reset_index() >>> g token year uses books sum sum 0 xanthos 1830 3 3 1 xanthos 1840 3 3 2 xanthos 1868 2 … Read more
You can group and then unstack. >>> df.groupby([‘year’, ‘month’, ‘item’])[‘value’].sum().unstack(‘item’) item item 1 item 2 year month 2004 1 33 250 2 44 224 3 41 268 4 29 232 5 57 252 6 61 255 7 28 254 8 15 229 9 29 258 10 49 207 11 36 254 12 23 209 Or … Read more
If you are on version 0.14, you can simply pass a tuple to .loc as below: df.loc[(‘at’, [1,3,4]), ‘Dwell’]
df.reset_index(level=2, drop=True) Out[29]: A 1 1 8 3 9
You can do it with concat (the keys argument will create the hierarchical columns index): d = {‘ABC’ : df1, ‘XYZ’ : df2} print pd.concat(d.values(), axis=1, keys=d.keys()) XYZ ABC \ Open High Low Close Volume Open High Date 2002-01-17 0.18077 0.18800 0.16993 0.18439 1720833 0.18077 0.18800 2002-01-18 0.18439 0.21331 0.18077 0.19523 2027866 0.18439 0.21331 2002-01-21 … Read more