Search⌘ K
AI Features

Generator Functions

Explore generator functions to understand how they create iterator objects in Python. Learn to use yield statements for efficient data handling, build reusable generators, and apply parsing techniques with regular expressions. This lesson shows how generators combine simplicity and power to streamline processing with the iterator pattern.

Overview

Generator functions embody the essential features of a generator expression, which is the generalization of a comprehension. The generator function syntax looks even less object-oriented than anything we’ve seen, but we’ll discover that once again, it is a syntax shortcut to create a kind of iterator object. It helps us build processing following the standard iterator-filter-mapping pattern.

Example 1: Sprawling code

Let’s take the log file example a little further. If we want to decompose the log into columns, we’ll have to do a more significant transformation as part of the mapping step. This will involve a regular expression to find the timestamp, the severity word, and the message as a whole. We’ll look at a number of solutions to this problem to show how generators and generator functions can be applied to create the objects we want.

Here’s a version, avoiding generator expressions entirely:

Python 3.10.4
import csv
import re
from pathlib import Path
from typing import Match, cast
def extract_and_parse_1(full_log_path: Path, warning_log_path: Path)-> None:
with warning_log_path.open("w") as target:
writer = csv.writer(target, delimiter="\t")
pattern = re.compile(r"(\w\w\w \d\d, \d\d\d\d \d\d:\d\d:\d\d) (\w+) (.*)")
with full_log_path.open() as source:
for line in source:
if "WARN" in line:
line_groups = cast(
Match[str], pattern.match(line)).groups()
writer.writerow(line_groups)

We’ve defined a regular expression to match three groups:

  • The complex date string, (\w\w\w \d\d, \d\d\d\d \d\d:\d\d:\d\d), which is a generalization of strings like Apr 05, 2021 20:04:41.

  • The severity level, (\w+), which matches a run of letters, digits, or underscores. This will match words like INFO and DEBUG.

  • An optional message, (.*), which will collect all characters to the end of the line.

This pattern is assigned to the pattern variable. As an alternative, we could also use ...