An overview to different I/O patterns in Python.

Last updated: April 16, 2023

Context

As an SRE, performance is one of the most relevant metrics to look for. Especially nowadays when distributed systems are abstracted by numerous APIs.

When we write code to achieve a goal, sometimes we only think about creating a solution to solve the very immediate problem. The problem can be complex and our solution might be just right but, how much toil are you leaving behind? How well does it scale?

Designing software is hard and yet it's cumbersome to predict edge cases or new features. This is why refactoring code is as much important as adding features, although managers tend to think otherwise. Continuous refactoring can give great benefits to your customers.

How much value does a new feature have if it has a poor user experience?

In this article, we'll go through different patterns of solving a problem (for business people: adding a feature) and the design of it will determine your user experience and SLAs.

The goal

For this example, I wanted to choose an I/O bound problem where the operating system is idle most of the time. A great way of showing this problem is by looking at the dining philosophers problem and how the operating system solves it. Andrew Tanenbaum explains it well in his book.

Illustration of the philosophers problem

The idea is to make as many HTTP GET requests as we are given. Simple, right?

Synchronously

Let's say that we want to aggregate data from multiple sources and then fiddle with them. We start with a few websites, so we just approach it with a traditional pattern.

import requests
import os
from time import time

def get_url(domain_name):
    results = {}
    try:
        response = requests.get(f"https://www.{domain_name}", timeout=3, allow_redirects=False)
        result = response.status_code
    except requests.exceptions.ReadTimeout:
        result = "timeout"
    except requests.exceptions.ConnectionError:
        result = "connection error"
    except Exception as e:
        result = str(e)
    results[domain_name] = result
    return results

if __name__ == '__main__':
    site_chunk = [10, 100, 500, 1000]
    top_sites = f'{os.path.dirname(os.path.realpath(__file__))}/top-1m.csv'
    endpoints = load_csv(top_sites) # Adds the domain names from CSV to a list.

    for n in site_chunk:
        start = time()
        results = [get_url(endpoint) for endpoint in endpoints[0:n]]
        end = time()
        print(f"{n} endpoints took {end-start:.2f} seconds")

How well did this perform?

$ python socket_sync.py
10 endpoints took 3.12 seconds
100 endpoints took 41.84 seconds
500 endpoints took 317.75 seconds
1000 endpoints took 642.83 seconds

Multi Processing

The above execution was running in one process and one thread. From the philosopher's perspective mentioned above, only one of them was able to eat while the others were just thinking. So, let's add more processes.

from multiprocessing import Pool

if __name__ == '__main__':
    for n in site_chunk:
        start = time()
        with Pool() as pool:
            results = pool.map(get_url, endpoints[0:n])
        end = time()
        print(f"{n} endpoints took {end-start:.2f} seconds")

# Using 8 processes
$ python socket_multiprocessing.py
10 endpoints took 0.95 seconds
100 endpoints took 8.68 seconds
500 endpoints took 52.82 seconds
1000 endpoints took 86.07 seconds

All right this is better, but it's very expensive for our operating system.

Concurrency

Since threads are part of one process and are more lightweight, let's explore the difference. Python implements Futures as a pool of threads that will return at some point.

import concurrent.futures

def main(endpoints):
    results = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = {executor.submit(get_url, endpoint): endpoint for endpoint in endpoints}
        for future in concurrent.futures.as_completed(futures):
            endpoint = futures[future]
            try:
                results.append(future.result())
            except Exception as exc:
                print('%r generated an exception: %s' % (endpoint, exc))
    return results


if __name__ == '__main__':
    for n in site_chunk:
        start = time()
        results = main(endpoints[0:n])
        end = time()
        print(f"{n} endpoints took {end-start:.2f} seconds")

The results are slightly better, and it's more economical for the operating system.

$ python socket_futures.py
10 endpoints took 0.83 seconds
100 endpoints took 6.06 seconds
500 endpoints took 34.15 seconds
1000 endpoints took 75.72 seconds

AsyncIO

We can rely on AsyncIO, so it's the software itself that leverages the asynchronous operations instead of the operating system scheduler.

Since requests library is not async, we can use aiohttp instead.

import asyncio
import aiohttp

async def get_url(session, domain_name):
    results = {}
    try:
        async with session.get(f"https://www.{domain_name}", allow_redirects=False) as response:
            await response.read()
            results[domain_name] = response.status
    except asyncio.exceptions.TimeoutError:
        results[domain_name] = "timeout"
    except aiohttp.ClientError:
        results[domain_name] = "clienterror"
    return results


async def main(endpoints):
    timeout_seconds = 3
    timeout = aiohttp.ClientTimeout(total=None, sock_connect=timeout_seconds, sock_read=timeout_seconds)
    async with aiohttp.ClientSession(timeout=timeout) as session:
        results = await asyncio.gather(*[get_url(session, endpoint) for endpoint in endpoints])
    return results


if __name__ == '__main__':
    for n in site_chunk:
        start = time()
        loop = asyncio.get_event_loop()
        results = loop.run_until_complete(main(endpoints[0:n]))
        end = time()
        print(f"{n} endpoints took {end-start:.2f} seconds")

$ python socket_asyncio.py
10 endpoints took 1.46 seconds
100 endpoints took 3.15 seconds
500 endpoints took 16.91 seconds
1000 endpoints took 57.64 seconds

Final thoughts

If we compare the results altogether, we see a big difference from our first original design.

Skipping the synchronous approach and having a closer look to the rest.

We saw that for this particular case, asyncio is a better solution. Although it took us a few patterns to end up with the fastest solution, we realized that our original code changed quite a bit by including new libraries. In turn, those will change our tests too.

Coming back to the original questions, requests would work just fine, but we will create toil and at the same time, our project will keep growing. At some point, we should allocate time to refactor it by adding new features.

Is this a bad thing? Not necessarily. Project requirements and staff are part of a business' living cycle.

While we could have already anticipated this with a good initial design, we might be refactoring it again down the line.

All the code can be found in this repository.