12.5.09

Using CTEs to Flatten a Hierarchy

In working with the open source software TestLink, I ran across a problem that I couldn't solve with normal SQL—so I turned to Common Table Expressions.

Common Table Expressions (CTEs) are a feature in SQL Server 2005 and beyond which allows for recursive queries to be done without manually maintaining a temp table and the associated complexities. There are plenty of other sources for information about them on the Internet, so I will get right down to the problem.

TestLink is a test-case management platform, which stores Test Cases. Test Cases exist inside Test Suites, which can be nested within each other, and Test Suites ultimately exist inside a Test Project. This is all stored in a single parent/child table which looks like this:

node_hierarchy:
   [id]
   [name]
   [parent_id]
   [node_type_id]
   [node_order]

Test Case nodes have a node_type_id of 3, Test Suites have a node_type_id of 2, and Test Projects have a node_type_id of 1.

My problem was to get a list of all Test Cases and their Test Projects. It's easy to get a list of all the Test Projects, and equally easy to get a list of all of the Test Cases, but the tricky part is traversing the parent_id trail to figure out what the top-level parent Test Project is for each Test Case—since there can be any number of nested Test Suites between the two.

In comes the CTE:
WITH hierarchy (
id, parent_id,
[name], hierarchy_name,
testcase_id, testcase_name, node_type_id
) AS
(
SELECT
id,
parent_id,
[name],
CAST([name] as varchar(1024)),
id,
[name],
node_type_id
FROM
[dbo].[nodes_hierarchy]
WHERE node_type_id = 3 and id in (8343,8361,8487)
UNION ALL

SELECT
nhtc.id,
nhtc.parent_id,
nhtc.[name],
CAST(h.hierarchy_name + '|' + nhtc.[name] as varchar(1024)),
h.testcase_id,
h.testcase_name,
nt.id
FROM
[dbo].[nodes_hierarchy] nhtc INNER JOIN
hierarchy h ON (h.parent_id = nhtc.id) INNER JOIN
[dbo].[node_types] nt ON (nhtc.node_type_id = nt.id)
)

SELECT *
FROM hierarchy p
order by node_type_id

The first SELECT loads the first level of recursion: the Test Cases. All the test cases (node_type_id = 3) are added. Then the 2nd SELECT recursively joins to the previously added rows in the CTE and matches parent nodes to the test cases, and continues until all the nodes have been matched.

You end up with something like this: (Filtered to only 3 Test Cases)



There are three Test Cases shown, 8343, 8361, and 8487. Then their Test Suites are shown, also with the same Test Case IDs, and finally the Test Project Air Services is listed. It actually appears 3 times, each time with a different testcase_id and testcase_name column. This is perfect, because I can now filter the result set to just show rows with node_type_id = 1, and it will give me all Test Projects and Test Case combinations.

I also threw in the hierarchy_name columns, which does a running concatenation of the node names. It could be useful, but I don't need it for this problem. The key is passing the same testcase_id and testcase_name columns through every recursion.

No comments: