GIL protects the Python interals. That means:
- you don’t have to worry about something in the interpreter going wrong because of multithreading
- most things do not really run in parallel, because python code is executed sequentially due to GIL
But GIL does not protect your own code. For example, if you have this code:
self.some_number += 1
That is going to read value of self.some_number
, calculate some_number+1
and then write it back to self.some_number
.
If you do that in two threads, the operations (read, add, write) of one thread and the other may be mixed, so that the result is wrong.
This could be the order of execution:
- thread1 reads
self.some_number
(0) - thread2 reads
self.some_number
(0) - thread1 calculates
some_number+1
(1) - thread2 calculates
some_number+1
(1) - thread1 writes 1 to
self.some_number
- thread2 writes 1 to
self.some_number
You use locks to enforce this order of execution:
- thread1 reads
self.some_number
(0) - thread1 calculates
some_number+1
(1) - thread1 writes 1 to
self.some_number
- thread2 reads
self.some_number
(1) - thread2 calculates
some_number+1
(2) - thread2 writes 2 to
self.some_number
EDIT: Let’s complete this answer with some code which shows the explained behaviour:
import threading
import time
total = 0
lock = threading.Lock()
def increment_n_times(n):
global total
for i in range(n):
total += 1
def safe_increment_n_times(n):
global total
for i in range(n):
lock.acquire()
total += 1
lock.release()
def increment_in_x_threads(x, func, n):
threads = [threading.Thread(target=func, args=(n,)) for i in range(x)]
global total
total = 0
begin = time.time()
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print('finished in {}s.\ntotal: {}\nexpected: {}\ndifference: {} ({} %)'
.format(time.time()-begin, total, n*x, n*x-total, 100-total/n/x*100))
There are two functions which implement increment. One uses locks and the other does not.
Function increment_in_x_threads
implements parallel execution of the incrementing function in many threads.
Now running this with a big enough number of threads makes it almost certain that an error will occur:
print('unsafe:')
increment_in_x_threads(70, increment_n_times, 100000)
print('\nwith locks:')
increment_in_x_threads(70, safe_increment_n_times, 100000)
In my case, it printed:
unsafe:
finished in 0.9840562343597412s.
total: 4654584
expected: 7000000
difference: 2345416 (33.505942857142855 %)
with locks:
finished in 20.564176082611084s.
total: 7000000
expected: 7000000
difference: 0 (0.0 %)
So without locks, there were many errors (33% of increments failed). On the other hand, with locks it was 20 times slower.
Of course, both numbers are blown up because I used 70 threads, but this shows the general idea.