Determining duplicate values in an array

As of numpy version 1.9.0, np.unique has an argument return_counts which greatly simplifies your task:

u, c = np.unique(a, return_counts=True)
dup = u[c > 1]

This is similar to using Counter, except you get a pair of arrays instead of a mapping. I’d be curious to see how they perform relative to each other.

It’s probably worth mentioning that even though np.unique is quite fast in practice due to its numpyness, it has worse algorithmic complexity than the Counter solution. np.unique is sort-based, so runs asymptotically in O(n log n) time. Counter is hash-based, so has O(n) complexity. This will not matter much for anything but the largest datasets.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)