How do I efficiently find which elements of a list are in another list?

I thought it would be useful to actually time some of the solutions presented here on a larger sample input. For this input and on my machine, I find Cardstdani’s approach to be the fastest, followed by the numpy isin() approach.

Setup 1

import random

list_1 = [random.randint(1, 10_000) for i in range(100_000)]
list_2 = [random.randint(1, 10_000) for i in range(100_000)]

Setup 2

list_1 = [random.randint(1, 10_000) for i in range(100_000)]
list_2 = [random.randint(10_001, 20_000) for i in range(100_000)]

Timings – ordered from fastest to slowest (setup 1).

Cardstdani – approach 1


I recommend converting Cardstdani’s approach into a list comprehension (see this question for why list comprehensions are faster)

s = set(list_2)
booleans = [i in s for i in list_1]

# setup 1
6.01 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
4.19 ms ± 27.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

No list comprehension

s = set(list_2)
booleans = []
for i in list_1:
   booleans.append(i in s)

# setup 1
7.28 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
5.87 ms ± 8.19 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Cardstdani – approach 2 (with an assist from Timus)


common = set(list_1) & set(list_2)
booleans = [item in common for item in list_1]

# setup 1
8.3 ms ± 34.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
6.01 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Using the set intersection method

common = set(list_1).intersection(list_2)
booleans = [item in common for item in list_1]

# setup 1
10.1 ms ± 29.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
4.82 ms ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

numpy approach (crissal)


a1 = np.array(list_1)
a2 = np.array(list_2)

a = np.isin(a1, a2)

# setup 1
18.6 ms ± 74.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2
18.2 ms ± 47.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# setup 2 (assuming list_1, list_2 already numpy arrays)
10.3 ms ± 73.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

list comprehension


l = [i in list_2 for i in list_1]

# setup 1
4.85 s ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setup 2
48.6 s ± 823 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Sharim – approach 1


booleans = list(map(lambda e: e in list_2, list_1))

# setup 1
4.88 s ± 24.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setup 2
48 s ± 389 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using the __contains__ method

booleans = list(map(list_2.__contains__, list_1))

# setup 1
4.87 s ± 5.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setup 2
48.2 s ± 486 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Sharim – approach 2


set_2 = set(list_2)
booleans = list(map(lambda e: set_2 != set_2 - {e}, list_1))

# setup 1
5.46 s ± 56.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setup 2
11.1 s ± 75.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Varying the length of the input


Employing the following setup

import random 

list_1 = [random.randint(1, n) for i in range(n)]
list_2 = [random.randint(1, n) for i in range(n)]

and varying n in [2 ** k for k in range(18)]:

enter image description here

Employing the following setup

import random 

list_1 = [random.randint(1, n ** 2) for i in range(n)]
list_2 = [random.randint(1, n ** 2) for i in range(n)]

and varying n in [2 ** k for k in range(18)], we obtain similar results:

enter image description here

Employing the following setup

list_1 = list(range(n))
list_2 = list(range(n, 2 * n))

and varying n in [2 ** k for k in range(18)]:

enter image description here

Employing the following setup

import random 

list_1 = [random.randint(1, n) for i in range(10 * n)]
list_2 = [random.randint(1, n) for i in range(10 * n)]

and varying n in [2 ** k for k in range(18)]:

enter image description here

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)