Sigma Python #2: Generators

Sigma Python #2: Generators

Python generators are a crucial feature with multiple uses, from lazy iteration to continuous data streaming. They are heavily adopted in popular packages such as FastAPI, SQLAlchemy, pytest, and others, highlighting their power and the importance of understanding how they work.

This article provides answers to the following questions about generators:

  • What are they exactly?

  • How do they work?

  • What are generator expressions and subgenerators?

  • What are some common use cases for generators?

Each section covers both theoretical and practical aspects of Python generators. By the end, you will have a strong comprehension of generators and their importance. The first section begins by defining what a generator is and how it operates.

Anatomy of a Generator

At first glance, the generator function appears to be no different from a regular function. Consider the following example:

from typing import Generator

def calculate(a: int, b: int) -> Generator[int | float, None, None]:
    print(f"Starting calculations for {a} and {b}")

    print("Sum")
    yield a + b

    print("Subtraction")
    yield a - b

    print("Multiplication")
    yield a * b

    print("Division")
    yield a / b

    print("Done calculating")

The defining characteristic of a generator function is the use of the yield keyword. When called, the generator function produces a generator, meaning that it can be seen as a generator factory.

To work with a generator, there are two methods. Consider the first example of how to use the calculate function:

calculations = calculate(10, 5)
for result in calculations:
    print("Result:", result)

To start, the calculate generator function is used to create a generator object. It doesn't do anything else: without the for loop, no print outputs would appear. The code then goes through the generator output by iterating over it. This causes the generator to stop at each yield and pass it as a result in the loop. Overall, running this code produces the following result:

Starting calculations for 10 and 5
Sum
Result: 15
Subtraction
Result: 5
Multiplication
Result: 50
Division
Result: 2.0
Done calculating

If you want to work with a generator in a more direct way, you can use the next function. Here's an example of how to use the calculate generator in this manner:

calculations = calculate(10, 5)
print(next(calculations))
print(next(calculations))
print(next(calculations))
print(next(calculations))
print(next(calculations))

Although the outcome is identical, the code now throws a StopIteration exception. This occurs because the generator gets exhausted after the final yield statement. As a result, it cannot be used again, and a new generator object is required to start over. When using a for loop to iterate, it automatically stops when the generator is exhausted, which is why there was no exception in the first example.

When using a generator, it's crucial to pay attention to when it stops and continues the process. If you only have one next call, you'll only see the print statement for the sum, indicating that the generator stopped at the first yield. This means that any remaining computations and print statements will not be executed. This feature can be very useful for adding some "laziness" to your code.

It is important to mention that the final print statement with the "Done calculating" message is called after the fifth next call. This is the same call that exhausts the generator, but it's also the only way to reach the code after the last "yield" statement. To sum up, when using a for loop or the next function, the generator jumps over subsequent yield statements and the code between them until there are no more yields. At that point, the final code is executed, and the generator is exhausted.

Although generators are not an easy topic, this introduction should provide you with a good understanding of how they operate. With this knowledge covered, you can now move on to further interactions with the generator object.

Sending and Returning

Generators use the yield keyword to temporarily pause execution and output a value, which is their core feature. However, generators also have the functionalities of sending and returning.

In the first example, the type annotation Generator[int | float, None, None] indicates that the generator yields values of type integer or float, and the remaining two are types for sending or returning.

The send method of a generator object allows you to input a value into a generator during execution, which is then processed by the yield statement where the generator paused. Here is an example to illustrate this functionality:

import random
from typing import Generator

def send_message(message: str) -> str:
    print(f"Sending: {message}")
    return str(random.randint(1, 1000))

def chat(message: str) -> Generator[str, str, None]:
    print("Starting a new chat")
    history = []

    while True:
        history.append(message)
        response = send_message(message)
        history.append(response)
        message = yield response

quick_chat = chat("hello")
print(next(quick_chat))
print(quick_chat.send("how are you doing?"))
print(quick_chat.send("oh, that is nice!"))

The chat generator function acts like a conversation. It starts with an initial message and uses the send_message function to send and receive messages while keeping a record of the conversation history. The generator object won't get exhausted because of the while True loop. When executed, the code produces the following output:

Starting a new chat
Sending: hello
786
Sending: how are you doing?
392
Sending: oh, that is nice!
416

To send values into a generator, you need to assign the result of yield to a variable (e.g. message = yield response). However, before sending anything, you must first ensure that the generator object proceeds to the first yield statement. Otherwise, there will be nothing to handle the sending. In the example above, this is achieved by calling the next function, which also sends the "hello" message. Once the generator is stopped at the yield statement, you can use the send method. In the example, the send method is used twice to reassign the message variable, which is then sent using the send_message function.

Adding a send feature can enhance the interaction of a generator. By including a return statement within the generator, you can further expand its functionalities. Check out the tweaked chat generator for an example:

... # trunkated code

def chat(message: str) -> Generator[str, str, list[str]]:
    print("Starting a new chat")
    history = []

    while True:
        history.append(message)
        response = send_message(message)
        history.append(response)
        message = yield response
        if not message:
            return history

quick_chat = chat("hello")
print(next(quick_chat))
print(quick_chat.send("how are you doing?"))
print(quick_chat.send("oh, that is nice!"))

try:
    quick_chat.send("")
except StopIteration as e:
    print(e.value)

A new type annotation, list[str] has been added as the return type of the generator. If an empty value is passed via the send method, the generator will now return conversation history. The updated output is as follows:

Starting a new chat
Sending: hello
459
Sending: how are you doing?
817
Sending: oh, that is nice!
84
['hello', '459', 'how are you doing?', '817', 'oh, that is nice!', '84']

The generator got exhausted due to the return statement. However, you can retrieve the returned value from the exception by using the try/except syntax. This method may seem unusual, as it is not the typical way of using a generator. Generally, some iteration is used to go over the generator, which handles the exhaustion behind the scene. The above examples make use of the next and send methods just to showcase the underlying processes.

This knowledge should help you understand how to interact with generators better. Additionally, there are two more techniques to build and exhaust a generator to follow.

Generator Expressions

Generator expressions provide a concise way of creating generators. They enable you to define generators in just one line of code, which is a sophisticated method for generating sequences of data. The syntax for creating a generator expression is like that of a list comprehension. For example, suppose you have a decorator that takes a list of numbers (which might be a large list), squares them, and yields only even numbers. In that case, you can use a generator function as follows:

from typing import Generator

large_numbers_dataset = list(range(1, 11))
print(large_numbers_dataset)

def square_even(numbers: list[int]) -> Generator[int, None, None]:
    for n in numbers:
        if n % 2 == 0:
            yield n**2

squared = square_even(large_numbers_dataset)
print(squared)
print(list(squared))

And the output is the following:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
<generator object square_even at 0x7fc5789f4110>
[4, 16, 36, 64, 100]

The results are as expected. In this scenario, using a generator allows for the computation of a portion of the dataset. For instance, you can iterate over the squared generator and stop at the second outcome. In such case, you would receive only two results without processing the entire dataset. Hence, while a one-line list comprehension is feasible, it is not as efficient. Nonetheless, a one-line generator expression can be utilized. To do so, replace the square_even function with the implementation below:

large_numbers_dataset = list(range(1, 11))
squared_exp = (n**2 for n in large_numbers_dataset if n % 2 == 0)
print(squared_exp)
print(list(squared_exp))

The outcome is now nearly the same:

<generator object <genexpr> at 0x7fc5789f5cb0>
[4, 16, 36, 64, 100]

The generator expression ultimately creates a generator object, similar to the generator function. However, it is more concise and expressive. It can be used like a list comprehension, but still maintains the laziness functionality of a generator. Moreover, generator expressions are suitable to be used as a function argument. For instance, consider the following example:

large_numbers_dataset = list(range(1, 11))
print(sum(n**2 for n in large_numbers_dataset if n % 2 == 0))

The result is 220, which means that the generator yielded all the numbers as previously and they were all passed to the sum function. This is a clean and memory-efficient way of processing data without creating intermediate lists.

Subgenerators

In Python, you can create hierarchical generators by using subgenerators. These are generators that are nested within other generators, making it possible to structure and organize your code in a modular and reusable manner. Essentially, subgenerators serve as an alternative method for "unpacking" a generator within another generator. Take a look at the code example below for reference:

numbers_board = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]

for row in numbers_board:
    for cell in row:
        print(cell)

Naturally, the code displays numbers 1 to 9. Nonetheless, using a double for loop in your code is not the most optimal and elegant solution. A more efficient approach would be to use a generator:

def cells(board: list[list[int]]) -> Generator[int, None, None]:
    for row in board:
        for cell in row:
            yield cell

for cell in cells(numbers_board):
    print(cell)

Although the outcome remains unchanged, your code outside of the function is now more concise and takes benefits from using a generator. Additionally, there is another enhancement you can make in this scenario: separate the two loops and delegate for cell in row to a distinct generator. This can be accomplished by implementing a subgenerator and the yield from syntax:

def columns(row: list[int]) -> Generator[int, None, None]:
    for column in row:
        yield column

def cells(board: list[list[int]]) -> Generator[int, None, None]:
    for row in board:
        yield from columns(row)

for cell in cells(numbers_board):
    print(cell)

The program will print the numbers 1 through 9 again, but this time, the cells generator will use the columns subgenerator to generate the cells. The yield from statement allows the outer generator to transfer control to the inner generator and obtain values from it without having to iterate over each one. Although this may seem like an unnecessary addition in this simple example, it is included to demonstrate the use of a subgenerator.

With the information covered in the previous two chapters, you should now have a good understanding of the generators topic. The next section will explain why it is crucial to have a correct comprehension of how generators work.

Use Cases

Python's generators are definitely a unique aspect of the language that is, however, not used that frequently in the code. This is likely due to the fact that the default iteration and list comprehension syntax are sufficient in most cases. However, incorporating lazy iteration and other generator-related enhancements can sometimes be a game-changer in the codebase.

Despite their limited use in code, generators are essential when working with external libraries such as the FastAPI web framework or pytest testing library. To go through some examples, first, take a look at a class that mimics a database and is aware of its connected status:

class Database:
    def __init__(self) -> None:
        self._entries: list[dict] = []
        self._connected = False

    def connect(self) -> None:
        self._connected = True

    def disconnect(self) -> None:
        self._connected = False

    def add(self, entry: dict) -> None:
        if self._connected is False:
            raise exceptions.DatabaseError("Database is disconnected")
        self._entries.append(entry)

    @property
    def entries(self) -> list[dict]:
        if self._connected is False:
            raise exceptions.DatabaseError("Database is disconnected")
        return self._entries

If you want to implement this class in a FastAPI example, it would appear as follows:

from fastapi import FastAPI, APIRouter

from sigma_py.generators.utils import Database

app = FastAPI()
router = APIRouter()

@router.get("/entries")
async def get_entries() -> list[dict]:
    db = Database()
    db.connect()
    entries = db.entries
    db.disconnect()
    return entries

The issue in this case, is that you need to manually:

  • initialize the database instance

  • connect the database

  • disconnect the database

FastAPI solves this problem by introducing a dependency system, that uses generators. Take a look at the enhanced version of the get_entries endpoint:

from typing import Generator
from fastapi import FastAPI, APIRouter, Depends

from sigma_py.generators.utils import Database

app = FastAPI()
router = APIRouter()

def get_db() -> Generator[Database, None, None]:
    db = Database()
    db.connect()
    yield db
    db.disconnect()

@router.get("/entries")
async def get_entries(db: Database = Depends(get_db)) -> list[dict]:
    return db.entries

The get_db dependency is responsible for all database-related tasks, allowing the endpoint to focus on its intended logic. This dependency is a generator function that creates and connects to a database object, and disconnects it once the generator is exhausted. The framework handles the generator's mechanics behind the scenes.

When using pytest to write tests, a similar process occurs. The library also uses generators to create fixtures that resemble FastAPI dependencies:

import pytest
from typing import Generator

from sigma_py.generators.utils import Database

@pytest.fixture
def db() -> Generator[Database, None, None]:
    db = Database()
    db.connect()
    yield db
    db._entries = []
    db.disconnect()


def test_add_entry(db: Database) -> None:
    assert db.entries == []
    db.add({"name": "Jon"})
    assert db.entries == [{"name": "Jon"}]

As you can see, pytest also leverages a decorator to create and initialize dependencies and add a tear-down code which is executed before the generator gets exhausted.

Summary

Generators are a powerful feature in Python that enable efficient and flexible handling of data generation, iteration, and processing. They offer lazy evaluation, saving memory and enabling infinite sequences. Generators provide advantages such as improved memory utilization, enhanced code readability, and the ability to handle large or infinite data streams. Popular libraries like FastAPI, pytest, Django, TensorFlow, or asyncio utilize generators for dependency management, fixtures, pagination, batch processing, and asynchronous iteration respectively. By understanding and leveraging generators, you can optimize code, improve performance, and design modular and reusable solutions. Embrace generators to write efficient, readable, and scalable code in various domains.

Sources

Code examples used in the article can be found here: link.

The biggest inspiration for these articles and source of my knowledge is the Fluent Python book by Luciano Ramalho. I highly encourage you to check it out; you will not be disappointed.

Documentation page of each library: