2007-02-24

The problem of how to handle trees in SQL has been talked about alot.
The basic 3 ways are:

store the full path for each entry

store the parent for each node

use nested tree

Nested tree is good for read-many-write-less applications where the tree
doesn't check over time too much as a write-operation is heavy-weight most
of the time.

Referencing the Parent through the full path

Using the variant of the path involves a lot of string handling and is
always slow. See below:

Referencing the Parent by ID

The self referencing variant which is using can't handle all jobs
in the DB.

With the help of Store Procedures we can make it possible as SP provide
recursion what is necessary here.

fetch the first level of childs

fetch the first level of childs for each childs

... and so on until we don't have any new childs

Check this:

Triggers

The triggers in MySQL 5.0.3 are quite limited, but we can already make
some use out of them. The problem we want to handle is optimized search for
domains, sub-domains and so on.

In our example want to concentrate on 'a user might have multiple
email-addresses'.

As a basic operation we want to check which user has a account at in .uk
domain.

This is slow if you have a long list of email addresses. LIKE can only be
optimized if the percent-sign is at the end.

REVERSE() can handle this for us at INSERT and UPDATE time. But we want to
keep this transparent for the user.

Whenever the user INSERTs something into the table the reverse_address field
should be set directly without user-intervention. We need pre-INSERT trigger
which replaces the default value by a REVERSE() of the address field.

Let's insert a new email address for Karl and see if it works.

It works. The same is necessary for the update of the address field.
The trigger does exactly the same.

Does it work ?

It does. Now we have this automagically generated field. The user hasn't
been told about it.

As the REVERSE() puts the percent-sign in front the query results in a
range-request which is compared to the table-scan very fast.

Function and Triggers with DML

Is the same true for DML statements in triggers ?

All the tables are transactional here and as you see the trigger
works and is rollback as expected. Fine.

Porting world-db to trees

To test the recursive SPs we take the world-db from mysql documentation
and take geographic categories as levels in the tree.

planet

continent

region

country

district

city

As the original structure is quite flat, let's blow it up a little bit :).

Ok, let's start with some simple queries on the tree:

Getting the Depth of a Tree II

The last attempt on this topic resulted in a quite slow SP. It walked
the tree and asked every node for its node and walked them too. This
is good for a memory-based tree but bad for a SQL storage.

SQL can do JOINs, let's use them for this job.

The idea is now to join from left to right and count how long you can join

Ok, 3 levels below the 1.

Yeaha, down to a part of a second. Just a little bit of mess with pre-creating
tables because of a bug mentioned above. Otherwise the CREATE TABLE would
be a CREATE TEMPORARY TABLE and it would be in the SP itself.

Rethinking algorithms always wins.

A flexible tree class using Dynamic SQL
I promised this set of functions back in April 2005 at the MySQL UC in Santa Clara where
I talked about the features you see above.

Starting with 5.0.13 the ground work for dynamic SQL was laid out as it was possible
to use PREPARE inside a stored procedure.

The PREPARE statement can take user variable as input and we can use it to generate
SQL statements on the fly, hence the name dynamic SQL. We need this as we don't want to
hard-code the table names into the SPs.

I prepared a set of SPs to work on tree-structures which have this setup:

id is INT, is the node-id and is the PRIMARY KEY of the table

parent_id is a FOREIGN KEY to id in the same table

Part of the tree-SPs are the folloing functions:
- sp_tree_depth
- sp_tree_get_subnodes
- sp_tree_get_parent
- sp_tree_delete_subnodes
- sp_tree_delete_node
- sp_tree_move_subnodes
- sp_tree_move_node

Check out the source file about the parameter to the SPs and how to use them. They are well documented.

Adding a new node was left out as we don't know the structure of the table.

To use the functions check out the test-file provided below:

I have made up 3 files
- the stored procedures itself sp_tree.sql
- the conversion script to turn the world database into a tree structure
- the tests which use the tree SPs on the world_tree table.

All the files are released under the MIT License with a Copyright in 2005, Jan Kneschke, meaning you are free to use it where ever you like as long as you include the copyright and the don't expect that this works.

Show more