PHP iterators in the wild

Although they were introduced way back in PHP 5, iterators are one of the language's less commonly used features. Almost all articles about PHP iterators seem to resort to one of two fairly contrived examples: reading a file line-by-line, or creating a bespoke range function. In this post I want to take a look at some examples of their use in open source applications, with the hope that this approach will demonstrate how they can solve real-world problems.

In lieu of an introduction, I will point you to the most helpful explanation of iterators I've found, Anthony Ferrara's video "Iterators". He shows how a basic for loop in PHP is analogous to the methods implementable in PHP's Iterator interface, such that

for ($i = 0; $i < count($array); $i++) {

corresponds to

for ($it->rewind(); $it->valid(); $it->next()) {
    $key = $it->key();

(where $it->key() in the second example maps to $i in the first example). (These operations, rewind, valid, next, key and (not mentioned above) current, correspond to the classes defined in the Iterator pattern as described in Gang of Four.)

Example 1: Flysystem

As I said, file operations are a favourite example when introducing iterators. One concrete implementation would be the PHP League's Flysystem package. Its listContents method takes a path string and a boolean to specify whether or not the contents should be listed recursively. If we take a look at the code for this method on GitHub, we'll find that it instantiates an iterator (either a DirectoryIterator or a RecursiveDirectoryIterator depending on the boolean), which then makes it very easy to traverse the contents of the directory with a simple foreach loop. Far simpler than messing about with openddir, readdir, etc. (Note the package also makes use of FilesystemIterator, which allows you to skip . and .. when traversing a directory tree.)

Example 2: PHP Unit

Another useful iterator is the FilterIterator. By creating a class which extends this iterator, you can easily filter the result of iterators by wrapping them with your new filter iterator. An example of this in the wild is in PHP Unit. You've probably run phpunit --filter, phpunit --group, or phpunit --exclude-group, many times. If we look in PHP Unit's GitHub repo we find a method processSuiteFilters. This method makes use of a few classes in the PHPUnit/Runner/Filter namespace, specifically a factory, then a number of Iterators which extend FilterIterator (or rather RecursiveFilterIterator, which itself extends FilterIterator). All filter iterators require an accept method. For example, the accept method of the IncludeGroupFilterIterator checks whether the current test (in the traversal) is in the array of groups permissible by the user's filter.

Example 3: Symfony Finder

In a blog post from 2010, Symfony creator Fabien Potencier said that iterators were "largely underused", and described using them when rewriting the Finder component for Symfony 2.

The Finder component combines the approaches outlined in the previous two examples: the DirectoryIterator and the FilterIterator. Here Potencier implements the IteratorAggregate interface, which requires just one method, getIterator, which returns an external iterator. In the case of Finder, it returns PHP's AppendIterator. So, to adapt the example from the manual:

$finder = new Finder();
$finder->files()->in(__DIR__)->in('/home');
foreach ($finder as $file) {
    // do stuff
}

Here we want the finder instance to focus only on files (files()) and to look in the current and /home directories. Each repeated in call just adds its argument to the class's dirs property. Finally, when the iterator is triggered with the foreach statement, then for every directory in the dirs array, it does two things. First, it calls the searchInDirectory method, which basically configures the iterator. Like Flysystem, it uses the RecursiveDirectoryIterator and adds some bespoke Symfony filters depending on Finder's configuration (these filters extend PHP's FilterIterator). Second, having obtained this iterator, it adds it to the AppendIterator it instantiated earlier. Finally it appends any more iterators that have been explicitly provided by the user, and returns the AppendIterator. So here we have an example of PHP's various iterator classes providing a clean way to compose an extensible, flexible API for traversing a file system.

Example 4: CSV

Another use of iterators is as a memory-saving technique. When you are dealing with data sets of very large, or of unknown, size, processing the data iteratively means you do not suffer the performance drawbacks of holding the entire data set in memory. Basically, it can turn an O(n) process to an O(1) process.

As an example, let's look at another PHP League package, CSV, which makes heavy use of iterators. What happens when you run this code?

$csv = Reader::createFromPath('/path/to/file.csv', 'r');
$records = $csv->getRecords();

First the static createFromPath method returns an instance of the Stream object, which in turn implements PHP's SeekableIterator. This is a simple extension to the common-or-garden iterator, allowing clients to specify the position of the cursor. This instance of the Stream object is set to the document property of the AbstractCsv class which Reader extends.

Next, getRecords gives the client this iterator after applying some cleaning to it. This line does two things: first, it uses PHP's CallbackFilterIterator to normalize the data (remove any corrupt or empty rows), before using the package's own MapIterator to remove any BOMs. Next the getRecords method uses another CallbackFilterIterator to skip headers if required. Finally, it returns the return value of combineHeader, a method which takes the iterator produced so far in getRecords and if necessary adds a header to the records using another MapIterator. All of this means that at the end you have an iterator you can use to foreach over the records in a CSV file.