How-to: Handle Concurrency & Parallelism in Python
Python offers several approaches to concurrency and parallelism.
Two of the most popular ways to achieve concurrent execution in Python are multithreading and multiprocessing. While both techniques aim to improve the performance of a Python program, they differ in their approach and use cases.
In this segment, I will walk us through on when to use multithreading and/or multiprocessing in Python in order to achieve good and performant code results.
Multithreading is a technique in which a single process or program is divided into multiple threads of execution.
Each thread runs concurrently, sharing the same memory space and system resources. In Python, multithreading is implemented using the threading module.
The main advantage of multithreading is that it enables parallelism without the overhead of spawning new processes. This makes it suitable for I/O-bound tasks, such as network requests, file I/O, and database operations, where the bottleneck is typically the time spent waiting for external resources. By using threads, we can execute multiple I/O-bound tasks simultaneously, which can significantly improve the overall performance of the program.
However, multithreading may not be suitable for CPU-bound tasks, such as mathematical calculations, image processing, and video rendering, where the bottleneck is the CPU time. This is because Python uses a Global Interpreter Lock (GIL), which prevents multiple threads from executing Python bytecode simultaneously.
Although multiple threads can still run concurrently, they cannot execute CPU-bound tasks in parallel. Therefore, in such cases, multiprocessing is a better choice.
Here is an example of multithreading in Python:
for i in range(10):
for letter in 'abcdefghij':
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)
In this example, we have two functions: print_numbers() and print_letters(), each running in a separate thread.
The Thread class from the threading module is used to create two threads, each targeting one of the two functions. The start() method is called on each thread to begin execution, and the join() method is called on each thread to wait for them to finish. Finally, the program prints “Done” to indicate that both threads have finished executing.
Multiprocessing is a technique in which a program or process is divided into multiple independent processes, each running in a separate memory space.
Each process has its own interpreter and Python Virtual Machine (PVM), and can execute Python code in parallel on different CPU cores. In Python, multiprocessing is implemented using the multiprocessing module.
The main advantage of multiprocessing is that it enables true parallelism for CPU-bound tasks. By using multiple processes, we can distribute the workload across multiple CPU cores, which can significantly reduce the execution time for CPU-bound tasks. However, multiprocessing comes with some overhead, as each process has to be created and initialized, and communication between processes can be slower than communication between threads.
Multiprocessing is suitable for CPU-bound tasks, where the bottleneck is the CPU time. It is also useful when we need to isolate a process from the main program, for example, when running an untrusted third-party library that could crash or corrupt the memory of the main program.
Here is an example of multiprocessing in Python:
return number * number
if __name__ == '__main__':
numbers = [1, 2, 3, 4, 5]
pool = multiprocessing.Pool(processes=3)
To summarize, the choice between using multiprocessing and multithreading in Python depends on the nature of the task at hand.
Multithreading is suitable for I/O-bound tasks where the bottleneck is typically the time spent waiting for external resources, such as network requests, file I/O, and database operations. This is because multithreading enables parallelism without the overhead of spawning new processes.
Multiprocessing, on the other hand, is suitable for CPU-bound tasks where the bottleneck is the CPU time, such as mathematical calculations, image processing, and video rendering. This is because multiprocessing enables true parallelism by distributing the workload across multiple CPU cores.
Additionally, it’s important to note that Python’s Global Interpreter Lock (GIL) prevents multiple threads from executing Python bytecode simultaneously. As a result, multithreading may not be suitable for CPU-bound tasks that require parallelism, and multiprocessing is the preferred approach in such cases.