Getting first n unique elements from Python list

I would use a set to remember what was seen and return from the generator when you have seen enough:

a = [1, 2, 2, 3, 3, 4, 5, 6]
    
def get_unique_N(iterable, N):
    """Yields (in order) the first N unique elements of iterable. 
    Might yield less if data too short."""
    seen = set()
    for e in iterable:
        if e in seen:
            continue
        seen.add(e)
        yield e
        if len(seen) == N:
            return
            
k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
print(list(k))
    

Output:

[1, 2, 3, 4]

According to PEP-479 you should return from generators, not raise StopIteration – thanks to @khelwood & @iBug for that piece of comment – one never learns out.

With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration


Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups – with k being the length of the slice – using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff – what is better is application/data dependend.

Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:

For 6 uniques (in longer list):

  • you would have lookups of O(1)+O(2)+...+O(5001)
  • mine would have 5001*O(1) lookup + memory for set( {1, 2, 3, 4, 5, 6})

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)