PHP generators in the wild
14 March 2020Following on from the previous post about PHP iterators in the wild, I want to look at some uses of generators in open source software.
Generators were introduced in PHP 5.5. To quote the manual, "Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface". In other words, there is no difference in behaviour between generators and iterators, as you can see if you look at the Generator class, which implements the Iterator
interface. A Generator object is obtained by using the yield
keyword in a function. See this blog post by Alan Storm and this one by Anthony Ferrara for an introduction.
Example 1: PHP Unit data providers
PHP Unit uses data providers to supply arrays which contain arguments for a test method. These argument arrays are iterated over, with the test method being called on each array in the provider. Often the arrays are held in an array, but they can also be yielded by an iterator. Since every yield is processed before the tests are run, you do not profit from the improved memory assignment associated with iterators. However it can make the code easier to read.
For example, take a look at one of the test providers in the test suite for the symfony/security
component:
public function getUserTests()
{
yield [null, null];
yield ['string_username', null];
$user = new User('nice_user', 'foo');
yield [$user, $user];
}
this is arguably easier to read than
public function getUserTests()
{
$user = new User('nice_user', 'foo');
return [
[null, null],
['string_username', null],
[$user, $user]
];
}
if only because it allows the declaration of the $user
variable to sit next to its usage.
Example 2: Symfony Console
Symfony's Console package has a Table
class containing some helpers to display a table. Its render
method gets the table rows from a method buildTableRows
, before iterating over them with a foreach
loop. This is a good use case for a generator given that we do not know in advance how many rows will be in the table.
So how does buildTableRows
work? First, there are a considerable amount of bytes devoted to catering to the possibility of multiple row-spans and col-spans within the table. Once this is done, it returns a new iterator, TableRows
. This iterator implements the IteratorAggregate
interface, which means you get a quick way of creating an iterator without having to implement all the boilerplate methods (rewind
, current
, valid
etc). Instead, you just implement the getIterator
method. Sometimes you might just return an ArrayIterator
or some other common Traversable
interface here, but TableRows
can be instantiated with any generator which is then used as the return value for getIterator
(remember that generators implement the Iterator
interface, which in turn extends Traversable
, so they are a valid return value for getIterator
). In the case of buildTableRows
, it returns a function which yield
s each row (formatted appropriately if it spans multiple columns). It will then yield
any unmerged rows for that row. This requirement to format cells and handle edge cases means that a custom iterator is preferable to, say, an ArrayIterator
.
Example 3: Laravel Lazy Collections
Laravel Lazy Collections use generators to provide collection-like classes that, in the words of the manual, "allow you to work with very large datasets while keeping memory usage low." Where Laravel collections are wrappers around arrays, lazy collections are wrappers around iterators. Here's one way to create a lazy collection, taken from the original pull request:
LazyCollection::make(function () {
$handle = fopen('log.txt', 'r');
while (($line = fgets($handle)) !== false) {
yield $line;
}
})
->chunk(4);
Essentially, the LazyCollection
class takes a generator in its constructor, then provides some nice syntax to access items in a collection-like way. The make
method (which is contained in the EnumeratesValues
trait) simply new
s up an instance of the class, passing it its parameter. In this case, the parameter is a simple closure which opens a text file and yield
s a line at a time.
While collections and lazy collections share some of the same implementations for their syntactic sugar, the fact that we are now dealing with generators rather than arrays does mean that many implementations have to change. Let's look at the chunk
example above. The Collection
class simply uses PHP's array_chunk
method. On the other hand, the LazyCollection
class needs its own implementation. It returns a new LazyCollection
instance, whose source is a generator which loops over the original source (this is the while ($iterator->isValid())
loop), splitting into arrays (chunks) of the requested size, and using those chunks as sources for yet another LazyCollection
instance.
The missing example: co-routines
PHP generators have also played a role in bringing co-routines to the language. This would require another blog post in itself, and to be frank I'm not ready to tackle it yet. The locus classicus on this subject is an eight year old post by PHP superman, Nikita Popov. More recently it has been an important feature in asynchronous PHP projects like Swoole, ReactPHP and Amp.