Unlikenesses A Backend Developer

PHP generators in the wild

14 March 2020

Following on from the previous post about PHP iterators in the wild, I want to look at some uses of generators in open source software.

Generators were introduced in PHP 5.5. To quote the manual, "Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface". In other words, there is no difference in behaviour between generators and iterators, as you can see if you look at the Generator class, which implements the Iterator interface. A Generator object is obtained by using the yield keyword in a function. See this blog post by Alan Storm and this one by Anthony Ferrara for an introduction.

Example 1: PHP Unit data providers

PHP Unit uses data providers to supply arrays which contain arguments for a test method. These argument arrays are iterated over, with the test method being called on each array in the provider. Often the arrays are held in an array, but they can also be yielded by an iterator. Since every yield is processed before the tests are run, you do not profit from the improved memory assignment associated with iterators. However it can make the code easier to read.

For example, take a look at one of the test providers in the test suite for the symfony/security component:

public function getUserTests()
{
    yield [null, null];

    yield ['string_username', null];

    $user = new User('nice_user', 'foo');
    yield [$user, $user];
}

this is arguably easier to read than

public function getUserTests()
{
    $user = new User('nice_user', 'foo');

    return [
        [null, null],
        ['string_username', null],
        [$user, $user]
    ];
}

if only because it allows the declaration of the $user variable to sit next to its usage.

Example 2: Symfony Console

Symfony's Console package has a Table class containing some helpers to display a table. Its render method gets the table rows from a method buildTableRows, before iterating over them with a foreach loop. This is a good use case for a generator given that we do not know in advance how many rows will be in the table.

So how does buildTableRows work? First, there are a considerable amount of bytes devoted to catering to the possibility of multiple row-spans and col-spans within the table. Once this is done, it returns a new iterator, TableRows. This iterator implements the IteratorAggregate interface, which means you get a quick way of creating an iterator without having to implement all the boilerplate methods (rewind, current, valid etc). Instead, you just implement the getIterator method. Sometimes you might just return an ArrayIterator or some other common Traversable interface here, but TableRows can be instantiated with any generator which is then used as the return value for getIterator (remember that generators implement the Iterator interface, which in turn extends Traversable, so they are a valid return value for getIterator). In the case of buildTableRows, it returns a function which yields each row (formatted appropriately if it spans multiple columns). It will then yield any unmerged rows for that row. This requirement to format cells and handle edge cases means that a custom iterator is preferable to, say, an ArrayIterator.

Example 3: Laravel Lazy Collections

Laravel Lazy Collections use generators to provide collection-like classes that, in the words of the manual, "allow you to work with very large datasets while keeping memory usage low." Where Laravel collections are wrappers around arrays, lazy collections are wrappers around iterators. Here's one way to create a lazy collection, taken from the original pull request:

LazyCollection::make(function () {
    $handle = fopen('log.txt', 'r');

    while (($line = fgets($handle)) !== false) {
        yield $line;
    }
})
->chunk(4);

Essentially, the LazyCollection class takes a generator in its constructor, then provides some nice syntax to access items in a collection-like way. The make method (which is contained in the EnumeratesValues trait) simply news up an instance of the class, passing it its parameter. In this case, the parameter is a simple closure which opens a text file and yields a line at a time.

While collections and lazy collections share some of the same implementations for their syntactic sugar, the fact that we are now dealing with generators rather than arrays does mean that many implementations have to change. Let's look at the chunk example above. The Collection class simply uses PHP's array_chunk method. On the other hand, the LazyCollection class needs its own implementation. It returns a new LazyCollection instance, whose source is a generator which loops over the original source (this is the while ($iterator->isValid()) loop), splitting into arrays (chunks) of the requested size, and using those chunks as sources for yet another LazyCollection instance.

The missing example: co-routines

PHP generators have also played a role in bringing co-routines to the language. This would require another blog post in itself, and to be frank I'm not ready to tackle it yet. The locus classicus on this subject is an eight year old post by PHP superman, Nikita Popov. More recently it has been an important feature in asynchronous PHP projects like Swoole, ReactPHP and Amp.