[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups
Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.
# setup
df = pd.DataFrame({'A': list('aabbcccd')})
dfg = df.groupby('A')
# call `.ngroups` on the GroupBy object
dfg.ngroups
# 4
Note that this is different from GroupBy.groups which returns the actual groups themselves.
Why should I prefer this over len?
As noted in BrenBarn’s answer, you could use len(dfg) to get the number of groups. But you shouldn’t. Looking at the implementation of GroupBy.__len__ (which is what len() calls interally), we see that __len__ makes a call to GroupBy.groups, which returns a dictionary of grouped indices:
dfg.groups
{'a': Int64Index([0, 1], dtype="int64"),
'b': Int64Index([2, 3], dtype="int64"),
'c': Int64Index([4, 5, 6], dtype="int64"),
'd': Int64Index([7], dtype="int64")}
Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroups on the other hand is a stored property that can be accessed in constant time.
This has been documented in GroupBy object attributes. The issue with len, however, is that for a GroupBy object with a lot of groups, this can take a lot longer
But what if I actually want the size of each group?
You’re in luck. We have a function for that, it’s called GroupBy.size. But please note that size counts NaNs as well. If you don’t want NaNs counted, use GroupBy.count instead.