Skip to content

Threading

The threading module in Python provides a way to create and manage threads, allowing you to write concurrent programs. Threads are lighter-weight than processes and share the same memory space, making them suitable for tasks that can be parallelized. Here's a brief introduction to the threading module:

Thread

  • The Thread class is the core component of the threading module. You create threads by instantiating objects of this class.
  • Example:
    import threading
    def my_function():
        # code to be executed in the thread
    
    my_thread = threading.Thread(target=my_function)
    

Thread Lifecycle

The thread lifecycle in Python involves several states and transitions. Here's a detailed explanation of the different states a thread can go through in its lifecycle:

  • New State:

    • A thread is in the "New" state when it is created but not yet started. At this point, the thread has been instantiated but hasn't begun its execution.

      import threading
      
      def my_function():
          # Code to be executed in the thread
      
      my_thread = threading.Thread(target=my_function)  # Thread is in the "New" state
      
  • Runnable/Ready State:

    • After a thread is created, it enters the "Runnable" or "Ready" state when the start() method is called. In this state, the thread is ready to run but may not have been scheduled by the operating system yet.

       my_thread.start()  # Thread is in the "Runnable" state
      
  • Running State:

    • When the operating system scheduler assigns CPU time to the thread, it enters the "Running" state. In this state, the thread's run() method is being executed.
      ```
      # Thread is in the "Running" state (while executing the run() method)
      ```
      
  • Blocked/Waiting State:

    • A thread can move from the "Running" state to the "Blocked" or "Waiting" state when it is waiting for a resource, input, or some condition to be satisfied. It voluntarily releases the CPU. ``` py import threading import time

      def my_function(): with some_lock: time.sleep(5) # Thread is in the "Blocked" state, waiting for some_lock

  • Terminated State:

    • A thread enters the "Terminated" state when its run() method completes or when an unhandled exception is raised within the thread. Once terminated, a thread cannot be restarted.

      my_thread.join()
      

It's important to note that the GIL (Global Interpreter Lock) in CPython impacts the behavior of threads, especially in CPU-bound tasks. The GIL allows only one thread to execute Python bytecode at a time, limiting the effectiveness of threads for parallelizing certain types of operations. For CPU-bound tasks, multiprocessing may be a more suitable alternative. -

Thread Synchronization:

Thread synchronization in Python is essential for managing access to shared resources and preventing data corruption in a multithreaded environment. The threading module provides several synchronization primitives to facilitate this. Here are some key mechanisms

  • Locks

    • A Lock is the simplest synchronization primitive. It is used to enforce exclusive access to a shared resource.
    • Threads can acquire a lock using acquire() and release it using release().
    • Example:
      import threading
      shared_resource = 0
      lock = threading.Lock()
      
      def increment():
          global shared_resource
          with lock:
              shared_resource += 1
      
  • Semaphores:

    • Semaphores are counters that control access to a resource. They are often used to limit the number of threads that can access a resource simultaneously.
    • The Semaphore class in the threading module provides this functionality.
    • Example:
      import threading
      shared_resource = 0
      semaphore = threading.Semaphore(value=3)
      def increment():
          global shared_resource
          with semaphore:
              shared_resource += 1
      
  • Events

    • An Event is a simple signaling mechanism that allows one thread to notify others about a certain condition.
    • Threads can wait for an event using wait() and set the event using set().
    • Example:
       import threading
      shared_condition = threading.Event()
      
      def wait_for_event():
          shared_condition.wait()
          # Continue with the task after the event is set
      
      def set_event():
          shared_condition.set()
      
  • Conditions

    • A Condition is more advanced than an event and is often used to coordinate threads based on shared state.
    • It combines a lock and a signaling mechanism.
    • Example:
      import threading
      
      shared_resource = 0
      condition = threading.Condition()
      
      def modify_resource():
          global shared_resource
          with condition:
              # Modify shared resource
              shared_resource += 1
              # Notify waiting threads
              condition.notify_all()
      
      def wait_for_change():
          with condition:
              while shared_resource == 0:
                  condition.wait()
              # Continue the task after shared resource changes
      

These synchronization primitives help ensure that critical sections of code are executed atomically and that threads cooperate properly when accessing shared data. The choice of synchronization mechanism depends on the specific requirements of your multithreaded application.

Thread pool

Thread pools in Python provide a way to efficiently manage and reuse a fixed number of threads for executing tasks concurrently. The concurrent.futures module, introduced in Python 3.2, offers a high-level interface for working with thread pools through the ThreadPoolExecutor class. Here's a detailed explanation

  • ThreadPoolExecutor:

    • The ThreadPoolExecutor class is part of the concurrent.futures module.
    • It provides a simple and consistent interface for working with thread pools.
    • To create a thread pool, you instantiate ThreadPoolExecutor with the desired number of worker threads

      from concurrent.futures import ThreadPoolExecutor
      
      with ThreadPoolExecutor(max_workers=3) as executor:
          # Execute tasks using the executor
      
  • submit method:

    • The primary method of ThreadPoolExecutor is submit(func, *args, **kwargs).
    • It schedules the given callable (function or method) to be executed asynchronously by a thread in the pool.
    • The method returns a concurrent.futures.Future object representing the result of the computation.
      future = executor.submit(my_function, arg1, arg2)
      
  • map method:

    • The map(func, *iterables, timeout=None) method can be used to parallelize the execution of a function over multiple input values.
    • It's similar to the built-in map function but executes the function concurrently using the thread pool.
      results = executor.map(my_function, iterable_of_args)
      
  • shutdown method:

    • To gracefully shut down the thread pool, you can use the shutdown(wait=True) method. The wait parameter specifies whether to wait for all submitted tasks to complete before shutting down.

      executor.shutdown(wait=True)
      
  • handling Results

    • The concurrent.futures.Future objects returned by submit can be used to retrieve the result of a computation.

    • You can check if a future is done using the done() method and retrieve the result using the result() method.

      if future.done():
          result = future.result()
      
  • Exception Handling

    • Exceptions raised in a thread are captured and re-raised when calling result() on the corresponding Future object.
    • It allows you to handle exceptions that occurred during the execution of a task
      try:
          result = future.result()
      except Exception as e:
          # Handle the exception
      

Thread pools are particularly useful for parallelizing I/O-bound tasks where threads can be efficiently reused. However, for CPU-bound tasks, consider using the ProcessPoolExecutor or other multiprocessing approaches due to the Global Interpreter Lock (GIL) in CPython.

Daemon Threads

A daemon thread is a thread that runs in the background and is subordinate to the main program. Daemon threads are typically used for tasks that don't need to be explicitly waited for or completed before the program exits. When the main program exits, any remaining daemon threads are abruptly terminated, regardless of whether they have finished their tasks or not.

  • Creating Daemon Threads:

    • You can mark a thread as a daemon thread by setting its daemon attribute to True before starting it. The setDaemon(True) method can also be used.
      import threading
      import time
      
      def daemon_function():
          while True:
              print("Daemon Thread is running...")
              time.sleep(1)
      
      daemon_thread = threading.Thread(target=daemon_function)
      daemon_thread.setDaemon(True)  # Alternatively: daemon_thread.daemon = True
      daemon_thread.start()
      
  • Daemon vs. Non-Daemon Threads:

    • Non-daemon threads are considered foreground threads. The main program will wait for them to complete before it exits.
    • Daemon threads, on the other hand, are background threads. They do not prevent the main program from exiting, and if any daemon threads are still running when the program exits, they are abruptly terminated.
  • Use Cases for Daemon Threads:
    • Daemon threads are suitable for background tasks such as monitoring, periodic cleanup, or tasks that should run indefinitely while the main program is running.
    • They are not suitable for tasks that need to be completed before the program exits since they might be terminated abruptly.
  • Termination of Daemon Threads:
    • Daemon threads are terminated when the main program exits. This termination can be abrupt, potentially leading to incomplete tasks or resource leaks.
    • It's essential to ensure that daemon threads perform tasks that can be safely terminated at any point without causing issues.
  • Joining Daemon Threads:
    • While daemon threads do not prevent the main program from exiting, you can still use the join() method to wait for a daemon thread to finish its current task before proceeding.
      daemon_thread.join()
      
  • Default Daemon Status:

    • If you don't explicitly set the daemon attribute, threads are non-daemon by default.
      default_thread = threading.Thread(target=some_function)  # This thread is non-daemon
      

    Thread local

Thread-local data is a mechanism that allows each thread to have its own instance of a variable, making it unique to that thread. This is particularly useful when working with threads that share the same global variables, as it prevents interference and data corruption between threads. The threading module provides the local() function to create thread-local data.

  • The local() function creates an instance of a thread-local storage object. This object can hold variables that are unique to each thread.
    import threading
    local_data = threading.local()
    
  • You can then assign variables to the thread-local object, and each thread will have its own independent copy of the variable
    local_data.variable = 42
    
  • To access the thread-local data from within a thread, you simply use the thread-local object.
    def my_function():
        print(local_data.variable)
    
    my_thread = threading.Thread(target=my_function)
    my_thread.start()
    
  • By default, thread-local data is cleaned up when the thread exits. However, you can explicitly clean it up using the del statement.
    del local_data
    
  • Uses
    • Thread-local data is often used when multiple threads need to work with shared resources, but each thread needs its own independent view of those resources.
    • It's beneficial in scenarios where global variables might be accessed and modified by multiple threads, and you want to avoid data corruption and race conditions
      import threading
      
      global_variable = 0
      lock = threading.Lock()
      
      def increment_global():
          global global_variable
          with lock:
              global_variable += 1
              print(f"Global variable value: {global_variable}")
      
      def thread_function():
          local_data.local_variable = 100
          increment_global()
      
      local_data = threading.local()
      
      my_thread = threading.Thread(target=thread_function)
      my_thread.start()
      

Conclusion

In summary, the threading module provides a convenient way to work with threads in Python. It allows you to create, start, and manage threads, and includes features for synchronization and coordination between threads. Understanding these basics is crucial for developing concurrent and parallel programs in Python.

References

  1. Threading documentation
  2. The threading module