These problems usually boil down to the following:
The function you are trying to parallelize doesn’t require enough CPU resources (i.e. CPU time) to rationalize parallelization!
Sure, when you parallelize with multiprocessing.Pool(8), you theoretically (but not practically) could get a 8x speed up.
However, keep in mind that this isn’t free – you gain this parallelization at the expense of the following overhead:
- Creating a
taskfor everychunk(of sizechunksize) in youriterpassed toPool.map(f, iter) - For each
task- Serialize the
task, and thetask'sreturn value (thinkpickle.dumps()) - Deserialize the
task, and thetask'sreturn value (thinkpickle.loads()) - Waste significant time waiting for
Lockson shared memoryQueues, while worker processes and parent processesget()andput()from/to theseQueues.
- Serialize the
- One-time cost of calls to
os.fork()for each worker process, which is expensive.
In essence, when using Pool() you want:
- High CPU resource requirements
- Low data footprint passed to each function call
- Reasonably long
iterto justify the one-time cost of (3) above.
For a more in-depth exploration, this post and linked talk walk-through how large data being passed to Pool.map() (and friends) gets you into trouble.
Raymond Hettinger also talks about proper use of Python’s concurrency here.