I would use a set to remember what was seen and return from the generator when you have seen enough:
a = [1, 2, 2, 3, 3, 4, 5, 6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
print(list(k))
Output:
[1, 2, 3, 4]
According to PEP-479 you should return from generators, not raise StopIteration – thanks to @khelwood & @iBug for that piece of comment – one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups – with k being the length of the slice – using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff – what is better is application/data dependend.
Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:
For 6 uniques (in longer list):
- you would have lookups of
O(1)+O(2)+...+O(5001) - mine would have
5001*O(1)lookup + memory forset( {1, 2, 3, 4, 5, 6})