When to use Category rather than Object?

Use a category when there is lots of repetition that you expect to exploit.

For example, suppose I want the aggregate size per exchange for a large table of trades. Using the default object is totally reasonable:

In [6]: %timeit trades.groupby('exch')['size'].sum()
1000 loops, best of 3: 1.25 ms per loop

But since the list of possible exchanges is pretty small, and because there is lots of repetition, I could make this faster by using a category:

In [7]: trades['exch'] = trades['exch'].astype('category')

In [8]: %timeit trades.groupby('exch')['size'].sum()
1000 loops, best of 3: 702 µs per loop

Note that categories are really a form of dynamic enumeration. They are most useful if the range of possible values is fixed and finite.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)