Department of Engineering

IT Services

Python and parallelising

Most computers have more than one CPU/core so it's tempting to try to exploit them all to speed up your code. Some number-crunching libraries do this for you when you use simple mathematical routines and big datasets. If you want to explicitly parallise, you have several choices, depending on how the subtasks should communicate or share memory. The multiprocessing package supports many options. One of the easiest is also potentially one of the most useful, and is illustrated below. It creates pools of worker-processes. When you send tasks to the pool, they'll be assigned to the worker processes for you. The program decides for you which CPU/cores to use.

The following program does no useful work. The print statements should help you see the effect of creating pools of various sizes, and using various chunksizes.

from multiprocessing import Pool,current_process, cpu_count
import time
def f(x):
    """pause for a second then return the square of the given number"""
    time.sleep(1)
    print(f"Running f({x}), worker: {current_process().name}")
    return x*x

if __name__ == '__main__':
    print(cpu_count(), "processors available")
    # use "with" so that an explict "close" isn't needed
    # The chunksize determines how many of the elements in the input list will be sent at a time.
    # Default is 1
    print("\nUsing a pool of 5 workers, chunksize=1")
    start_time = time.perf_counter()
    with Pool(5) as p: # this creates p, a pool of 5 worker processes ready to be used
    # the next line of code calls f(1), f(2) and f(3),
    # waiting for all the calls to finish,
    # and returning the answers as a list
        print("Answer=",p.map(f, [1, 2, 3]))
    end_time = time.perf_counter()
    print(end_time - start_time, "secs")
    
    print("\nUsing a pool of 2 workers, chunksize=1")
    start_time = time.perf_counter()
    with Pool(2) as p:
        print("Answer=",p.map(f, [1, 2, 3]))   
    end_time = time.perf_counter()
    print(end_time - start_time, "secs")
    
    print("\nUsing a pool of 2 workers, chunksize=3")
    start_time = time.perf_counter()
    with Pool(2) as p:
        print("Answer=",p.map(f, [1, 2, 3],3))     
    end_time = time.perf_counter()
    print(end_time - start_time, "secs")