Understanding Threads, Processes, and Concurrency in Python: A Beginner’s Guide - P1

Hey there, junior devs! If you’ve ever wondered how Python can handle multiple tasks at once—like downloading files, processing data, or running a web server—then you’re in the right place. Today, we’ll dive into threads, processes, and concurrency in Python. We’ll explore what they are, why Python behaves the way it does, and when to use each one. Plus, I’ll throw in some simple code examples to make it crystal clear. Let’s get started!

List blog-related:

What Are Threads and Processes?

Before we jump into Python specifics, let’s define these terms:

Thread: Think of a thread as a lightweight “worker” inside a single program. Multiple threads share the same memory space and resources, like a team of chefs working together in one kitchen. They’re great for tasks that need to happen “at the same time” but don’t require heavy computation.
Process: A process is like a completely separate program running on your computer. Each process has its own memory space and resources—like opening multiple instances of a game. Processes don’t share memory by default, so they’re more independent (and heavier) than threads.

In Python, we use the threading module for threads and the multiprocessing module for processes. But here’s the twist: Python treats threads and processes differently due to its design. Let’s explore why.

Threads in Python: Why Don’t They Run “Truly” Concurrently?

Python has a built-in limitation called the Global Interpreter Lock (GIL). The GIL is like a bouncer at a club—it only lets one thread execute Python code at a time, even on a multi-core CPU. This means that while Python supports threads, they don’t run truly in parallel for CPU-bound tasks (like heavy math calculations). Instead, they take turns.

Why does the GIL exist? It simplifies memory management and makes Python’s core code (written in C) safer and easier to maintain.
When are threads useful then? Threads shine in I/O-bound tasks—things like waiting for file downloads, network requests, or database queries—because the GIL doesn’t block threads when they’re waiting for I/O.

Here’s a simple example of threading for an I/O-bound task:

Python

import threading
import time

def download_file(file_name):
    print(f"Starting download: {file_name}")
    time.sleep(2)  # Simulate waiting for a file to download
    print(f"Finished downloading: {file_name}")

# Create two threads
t1 = threading.Thread(target=download_file, args=("file1.txt",))
t2 = threading.Thread(target=download_file, args=("file2.txt",))

# Start the threads
t1.start()
t2.start()

# Wait for both threads to finish
t1.join()
t2.join()

print("All downloads complete!")

Output:

Python

Starting download: file1.txt
Starting download: file2.txt
Finished downloading: file1.txt
Finished downloading: file2.txt
All downloads complete!

In this example, both “downloads” start almost simultaneously because they’re I/O-bound (waiting with time.sleep). Threads are perfect here since they can overlap waiting time.

Processes in Python: True Parallelism

Unlike threads, Python fully supports processes through the multiprocessing module. Each process runs in its own Python interpreter with its own memory space, bypassing the GIL entirely. This means processes can run truly in parallel on multiple CPU cores—great for CPU-bound tasks like crunching numbers or image processing.

Why are processes supported natively? Python’s design allows processes to work independently without the GIL’s restrictions, leveraging your computer’s full power.
Trade-off: Processes are heavier than threads because they don’t share memory, so starting them takes more time and resources.

Here’s an example of multiprocessing for a CPU-bound task:

Python

import multiprocessing
import time

def calculate_squares(n):
    print(f"Calculating squares up to {n}")
    result = [i * i for i in range(n)]
    print(f"Done with {n}")

if __name__ == "__main__":
    # Create two processes
    p1 = multiprocessing.Process(target=calculate_squares, args=(1000000,))
    p2 = multiprocessing.Process(target=calculate_squares, args=(2000000,))

    # Start the processes
    p1.start()
    p2.start()

    # Wait for both to finish
    p1.join()
    p2.join()

    print("All calculations complete!")

Output:

Python

Calculating squares up to 1000000
Calculating squares up to 2000000
Done with 1000000
Done with 2000000
All calculations complete!

Here, the two processes run in parallel on different CPU cores, making it faster for heavy computations.

What Is Concurrency?

Concurrency is the ability to handle multiple tasks at once, making progress on them without necessarily finishing one before starting another. It’s like juggling—you keep multiple balls in the air, even if you’re only catching one at a time.

Threads and Concurrency: In Python, threads provide concurrency for I/O-bound tasks. They don’t run in parallel (due to the GIL), but they switch between tasks efficiently.
Processes and Concurrency: Processes provide concurrency and parallelism for CPU-bound tasks, fully utilizing multiple cores.

There’s also asyncio, Python’s async framework, which offers single-threaded concurrency for I/O-bound tasks (we’ll save that for another blog!).

How to Distinguish Threads, Processes

Here’s a quick breakdown:

Feature	Threads	Processes
Memory	Shared	Separate
GIL Impact	Limited by GIL	Not affected by GIL
Speed	Lightweight, fast to start	Heavier, slower to start
Best For	I/O-bound tasks	CPU-bound tasks
Concurrency	Yes (task switching)	Yes (plus parallelism)

Use Cases and Examples

Threads Use Case: Web Scraping

Why? You’re waiting for network responses (I/O-bound).
Example: Scraping multiple websites at once.

Python

import threading
import time

def scrape_site(site):
    print(f"Scraping {site}")
    time.sleep(1)  # Simulate network delay
    print(f"Done with {site}")

threads = [threading.Thread(target=scrape_site, args=(site,)) for site in ["site1.com", "site2.com"]]
for t in threads:
    t.start()
for t in threads:
    t.join()

Processes Use Case: Image Processing

Why? Heavy computation (CPU-bound).
Example: Resizing a batch of images.

Python

import multiprocessing

def process_image(image_id):
    print(f"Processing image {image_id}")
    # Simulate CPU-intensive work
    [x * x for x in range(5000000)]
    print(f"Finished image {image_id}")

if __name__ == "__main__":
    processes = [multiprocessing.Process(target=process_image, args=(i,)) for i in range(3)]
    for p in processes:
        p.start()
    for p in processes:
        p.join()

Final Thoughts

Use threads when your program spends a lot of time waiting (e.g., I/O operations).
Use processes when you need raw CPU power and parallelism.
Understand your task: Is it I/O-bound or CPU-bound? That’s the key to choosing the right tool.

Python’s GIL might limit threads, but with threading and multiprocessing, you’ve got everything you need to build fast, efficient programs. Experiment with these examples, tweak them, and see what happens on your machine. Happy coding, juniors!

HaMind

Understanding Threads, Processes, and Concurrency in Python: A Beginner’s Guide – P1

What Are Threads and Processes?

Threads in Python: Why Don’t They Run “Truly” Concurrently?

Processes in Python: True Parallelism

What Is Concurrency?

How to Distinguish Threads, Processes

Use Cases and Examples

Final Thoughts

Leave a Reply Cancel reply