something to do with web development

January 15, 2008

Tree Drawing with the Adjacency List Model

Let's not fake piety, we've all of us stuffed a few hierarchies into the adjacency list model. It's so easy to add parent_id to the end of any table, and the performance issues seem negligible in small data sets. It's only when we're grabbing data to draw the hierarchical tree it represents that we maybe worry about recursion's inherent performance drawbacks. For example, you've probably seen a method like this:

function get_children($parent) {
    $result = mysql_query("SELECT id
        FROM family
        WHERE parent='{$parent}'");
    while($child = mysql_fetch_object($result)) {
        $children[$child->id] = get_children($child->id);
    }
    return $children;
}

One obvious problem is that this function runs a query for every single node it discovers. Another is that the amount of recursion this function produces depends, not on the size of your data set, but on the complexity of the hierarchy your data represents; the more branches your tree has, the deeper its recursion goes. Even if you have a small number of rows such that preorder tree traversal seems overkill, the complexity of your tree may still demand a more efficient artist than get_children. I'd like to propose one alternative for producing a complete tree.

Here's my data: two families of the animal kingdom:

  • Hapalemurinae
    • Hapalemur
      • Hapalemur Alaotrensis
      • Hapalemur Aureus
      • Hapalemur Griseus
    • Prolemur
      • Prolemur Simus
  • Lemurinae
    • Eulemur
      • Eulemur Coronatus
      • Eulemur Fulvus
      • Eulemur Macaco
      • Eulemur Mongoz
      • Eulemur Rubriventer
    • Lemur
      • Lemur Catta
    • Varecia
      • Varecia Rubra
      • Varecia Variegata

To help emphasize the concept I'm promoting, I'm using taxonomic names as IDs and parents in the $family array I'm using to represent this data:

$family = array(
    array('id' => 'Eulemur', 'parent' => 'Lemurinae'),
    array('id' => 'Eulemur Coronatus', 'parent' => 'Eulemur'),
    array('id' => 'Eulemur Fulvus', 'parent' => 'Eulemur'),
    array('id' => 'Eulemur Macaco', 'parent' => 'Eulemur'),
    array('id' => 'Eulemur Mongoz', 'parent' => 'Eulemur'),
    array('id' => 'Eulemur Rubriventer', 'parent' => 'Eulemur'),
    array('id' => 'Hapalemur', 'parent' => 'Hapalemurinae'),
    array('id' => 'Hapalemur Alaotrensis', 'parent' => 'Hapalemur'),
    array('id' => 'Hapalemur Aureus', 'parent' => 'Hapalemur'),
    array('id' => 'Hapalemur Griseus', 'parent' => 'Hapalemur'),
    array('id' => 'Hapalemurinae', 'parent' => ''),
    array('id' => 'Lemur', 'parent' => 'Lemurinae'),
    array('id' => 'Lemur Catta', 'parent' => 'Lemur'),
    array('id' => 'Lemurinae', 'parent' => ''),
    array('id' => 'Prolemur', 'parent' => 'Hapalemurinae'),
    array('id' => 'Prolemur Simus', 'parent' => 'Prolemur'),
    array('id' => 'Varecia', 'parent' => 'Lemurinae'),
    array('id' => 'Varecia Rubra', 'parent' => 'Varecia'),
    array('id' => 'Varecia Variegata', 'parent' => 'Varecia'),
);

This still comes quite close to the result you might expect from a MySQL query.

Here's the recursion-free method I'm proposing:

$branches = array();
$tree = array();

foreach($family as $child) {
    extract($child);

    if(!isset($branches[$id]))
        $branches[$id] = array();

    if(!empty($parent)) {
        if(!isset($branches[$parent]))
            $branches[$parent] = array();
        $branches[$parent][$id] = &$branches[$id];
    }
    else $tree[$id] = &$branches[$id];
}

The results are stored in two arrays: $branches, which contains individual branches of the hierarchy indexed by parent; and $tree, which collects all branches in the context of a complete tree. print_r($tree) and you'll see the whole structure; print_r($branches['Hapalemurinae']) and you'll see only the children of Hapalemurinae.

In the real world, this code would be more complex—each node probably having more properties than its ID—but the underlying principle of assigning nodes to the tree by reference can be used to create a relatively inexpensive serialize()d tree-caching routine for small, complex data structures.