Marc Ermshaus’ avatar

Marc Ermshaus

Linkblog

PHP: Adding grouping and extended sorting to SPL's ArrayObject

Published on 10 Mar 2010. Tagged with programming, php, spl, arrayobject.

Regarding the frequency of questions about the topic, one thing many programmers seem to have difficulties with is a technique called "control break". Basically, this is a way of displaying data grouped into visually separated hierarchical sections. Examples might be a list of employees grouped by the starting letter of their family names or a list of blog posts grouped first by year and second by month. Every time the index key of one of the sections changes, a visual mark, like a new heading, should be printed. Those changes are defined as control breaks.

Although it's not really a big deal to write a simple algorithm to achieve the desired effect, I thought it might be desirable to have a generic solution to the problem. So I created a helper class containing static methods that would transform an array of entries (homogeneous arrays of key/value pairs) into a grouped array using the unique values of one of the entries' fields as grouping key. That worked quite well but I am not a big fan of calling static class methods if feasible alternatives exist. A more object-centric approach in which the grouping code could be run as an instance method seemed to me to be the superior solution. Following this thought, I wrote an extended version of ArrayObject from the SPL which I'd like to introduce in this post.

Basic grouping (groupBy)

The class I came up with is named Kaloa_Spl_ArrayObject (you can view or download it here). It's designed to be an unobtrusive addition to ArrayObject. In the current version, the constructor from the parent class is overriden but everything else is left intact. In order to show how it works, I'll define some data that might roughly resemble a list of articles from a blogging application.

$items = array(
    array('year' => 2009, 'month' =>  9, 'title' => 'Hello World!'),
    array('year' => 2009, 'month' =>  9, 'title' => 'At the museum'),
    array('year' => 2009, 'month' =>  9, 'title' => 'Godspeed'),
    array('year' => 2009, 'month' =>  9, 'title' => '2010 Olympics'),
    array('year' => 2010, 'month' =>  1, 'title' => 'Tornado season'),
    array('year' => 2010, 'month' =>  1, 'title' => 'Bailout'),
    array('year' => 2010, 'month' =>  2, 'title' => 'Cheers, Ladies!'),
    array('year' => 2010, 'month' =>  2, 'title' => 'Neglected'),
    array('year' => 2009, 'month' => 11, 'title' => 'Ethics probe'),
    array('year' => 2010, 'month' =>  3, 'title' => 'Commitment to security'),
    array('year' => 2010, 'month' =>  3, 'title' => 'Election'),
    array('year' => 2009, 'month' => 10, 'title' => 'Same-sex couples'),
    array('year' => 2009, 'month' => 10, 'title' => 'Junkyard'),
);

The most interesting new method of Kaloa_Spl_ArrayObject, the grouping method, is groupBy. It takes a callback function as argument which is called once for every entry in the original array. The method might be used to group the example data by year and month.

$obj = new Kaloa_Spl_ArrayObject($items);

$obj->groupBy(
    create_function(
        '$item',
        'return array($item["year"], $item["month"]);'
    )
);

The return value of the callback function is the key of the group to which the corresponding entry will be assigned. If an array is returned, it will be treated as a multi-dimensional key which translates to a multi-level grouping.

Displaying the content of $obj using var_dump or print_r will result in an array structured like this:

Kaloa_Spl_ArrayObject Object(
    [2009] => Kaloa_Spl_ArrayObject Object(
        [9]  => Kaloa_Spl_ArrayObject Object(...),
        [11] => Kaloa_Spl_ArrayObject Object(...),
        [10] => Kaloa_Spl_ArrayObject Object(...)
    ),
    [2010] => Kaloa_Spl_ArrayObject Object(
        [1] => Kaloa_Spl_ArrayObject Object(...),
        [2] => Kaloa_Spl_ArrayObject Object(...),
        [3] => Kaloa_Spl_ArrayObject Object(...)
    )
)

The third dimension contains a numbered array with all of the original entries that are part of the corresponding group. For instance, the content of $obj[2009][9] would be an array with the four entries from September 2009:

0 => Kaloa_Spl_ArrayObject(
    'year' => 2009,
    'month' => 9,
    'title' => 'Hello World!'
),
1 => Kaloa_Spl_ArrayObject(
    'year' => 2009,
    'month' => 9,
    'title' => 'At the museum'
),
2 => Kaloa_Spl_ArrayObject(
    'year' => 2009,
    'month' => 9,
    'title' => 'Godspeed'
),
3 => Kaloa_Spl_ArrayObject(
    'year' => 2009,
    'month' => 9,
    'title' => '2010 Olympics'
)

As Kaloa_Spl_ArrayObject subclasses ArrayObject, it's already possible to print the data in the desired fashion using nested foreach-loops.

foreach ($obj as $year => $yearContent) {
    echo '<h1>' . $year . "</h1>\n";
    foreach ($yearContent as $month => $monthContent) {
        echo '  <h2>' . $month . "</h2>\n";
        echo "    <ul>\n";
        foreach ($monthContent as $entry => $entryContent) {
            echo '      <li>' . $entryContent['title'] . "</li>\n";
        }
        echo "    </ul>\n";
    }
}

The resulting HTML code:

<h1>2009</h1>
  <h2>9</h2>
    <ul>
      <li>Hello World!</li>
      <li>At the museum</li>
      <li>Godspeed</li>
      <li>2010 Olympics</li>
    </ul>
  <h2>11</h2>
    <ul>
      <li>Ethics probe</li>
    </ul>
  <h2>10</h2>
    <ul>
      <li>Same-sex couples</li>
      <li>Junkyard</li>
    </ul>
<h1>2010</h1>
  <h2>1</h2>
    <ul>
      <li>Tornado season</li>
      <li>Bailout</li>
    </ul>
  <h2>2</h2>
    <ul>
      <li>Cheers, Ladies!</li>
      <li>Neglected</li>
    </ul>
  <h2>3</h2>
    <ul>
      <li>Commitment to security</li>
      <li>Election</li>
    </ul>

Basically, that's all there is to it.

Advanced grouping

In some cases, it might be useful to modify entries before they are added to the resulting data structure. This can be achieved by simply editing or removing fields from the argument passed to the callback function. All arguments, including scalar values, are passed by reference.

This grouping function will remove the fields "year" and "month" from all entries of the resulting array and will change the content of the "title" field to all uppercase letters.

$obj->groupBy(
    create_function(
        '$item',
        '$ret = array($item["year"], $item["month"]);
         unset($item["year"]);
         unset($item["month"]);
         $item["title"] = strtoupper($item["title"]);
         return $ret;'
    )
);

An example using scalar values that will be grouped by the first letter and changed to uppercase:

$items = array('Carl', 'Susan', 'Cindy', 'Peter', 'Steve', 'Patricia', 'Sam');

$obj = new Kaloa_Spl_ArrayObject($items);

$obj->groupBy(
    create_function(
        '$item',
        '$item = strtoupper($item);
         return substr($item, 0, 1);'
    )
);

var_dump($obj);

Output:

object(Kaloa_Spl_ArrayObject)#1 (3) {
  ["C"]=>
  object(Kaloa_Spl_ArrayObject)#4 (2) {
    [0]=>
    string(4) "CARL"
    [1]=>
    string(5) "CINDY"
  }
  ["S"]=>
  object(Kaloa_Spl_ArrayObject)#5 (3) {
    [0]=>
    string(5) "SUSAN"
    [1]=>
    string(5) "STEVE"
    [2]=>
    string(3) "SAM"
  }
  ["P"]=>
  object(Kaloa_Spl_ArrayObject)#6 (2) {
    [0]=>
    string(5) "PETER"
    [1]=>
    string(8) "PATRICIA"
  }
}

Sorting (usort, usortm, uasortm, uksortm)

By now, it might have become apparent that the groupBy method doesn't sort the resulting array in any way. Therefore, I made a second major addition to ArrayObject by adding more sophisticated sorting functionality that is able to realign one or more dimensions of the array. All three multi-dimensional sorting methods are based on the different flavours of PHP's built-in usort function. They each take sorting criteria specified by an anonymous function or an array of anonymous functions as arguments.

Here is an example to illustrate the usage. It works with the data defined in the "Basic grouping" section.

$obj->groupBy(
    create_function(    // Group by year and month
        '$item',
        'return array($item["year"], $item["month"]);'
    )
)->uksortm(
    array(
        create_function(    // Order first dimension descending
            '$a, $b',
            'return $a < $b;'
        ),
        create_function(    // Order second dimension ascending
            '$a, $b',
            'return $a > $b;'
        )
    )
)->usortm(
    array(
        null,    // Skip first and second dimensions, only realign third
        null,    //  (descending by length of an entry's title)
        create_function(
            '$a, $b',
            'return strlen($a["title"]) < strlen($b["title"]);'
        )
    )
);

This notation uses method chaining in order to hint at the fact that I implemented a fluent interface for all new methods (with the exception of the usort method which I threw in because it was the only one missing). This might be split into three parts starting with $obj->, of course.

Besides the groupBy call, there are calls to both uksortm and usortm because the first two dimensions (years and months) have to be sorted by key whereas the third one (the entries) should be sorted by value. (By the way: usortm might be exchanged for uasortm here as well-formed keys are not an issue when iterating the array using foreach.) The differences between all of the usort-like functions are explained in the PHP documentation.

Each of the u*sortm ("m" standing for "multi-dimensional") methods recursively applies the passed functions to the corresponding dimension of the array. From an array of three functions, the first one would be used to sort the years (first dimension), the second one to sort the months (second dimension) and the third one to sort the entries (third dimension). If no function is needed for a specific dimension, null can be passed and the dimension is skipped.

Further documentation about the class may be found in the inline DocBlock comments of the source file. If you try it out and have questions or any remarks or bug reports, please contact me.