Materialized path or path enumeration model1 stores the path to the tree node as a string or a list by concatenating the keys of the nodes in the path. Simple node inserts and removals, child and other descendants list, and parent and ancestors list queries make materialized path model attractive for large-scale applications. Fetching a subtree or an entire three rows in a single query is easy. An obvious solution to restoring the structure of a tree or subtree is to build a key-value map and append the child to the specific parent. This approach requires additional memory and relies on mutable data structures. Instead, it is enough to sort the lines in nesting order and build the tree structure by adding children to the previous parent.
Sorting by nesting order using database query language is problematic. So, the input data of the algorithm is unsorted:
id | name | path |
---|---|---|
24a23792-… | Collection 1 | [24a23792-…, 00000000-…] |
4daa2217-… | Collection 1.1 | [4daa2217-…, 24a23792-…, 00000000-…] |
c0a3de07-… | Collection 2 | [c0a3de07-…, 00000000-…] |
81b20455-… | Collection 1.2 | [81b20455-…, 24a23792-…, 00000000-…] |
fb42f2c8-… | Collection 1.1.1 | [fb42f2c8-…, 4daa2217-…, 24a23792-…, 00000000-…] |
Materialized paths are stored in reverse order to simplify typical queries (syntax Cassandra Query Language (CQL)):
Find all root nodes
SELECT id, name, path FROM collections WHERE path[1] = 00000000-...
Find all children of the parent
SELECT id, name, path FROM collections WHERE path[1] = ?
Find all descendants of the parent
SELECT id, name, path FROM collections WHERE path CONTAINS ?
Remove the node and its descendants
DELETE FROM collections WHERE path CONTAINS ?
The database rows are converted to the immutable Collection
record:
type Collection = {
id: Guid
name: string
path: Guid array
children: Collection array option
}
Algorithm
The algorithm decomposes into a sequence of independent steps:
Each step corresponds to a high-order function for sequences:
rows
(* Convert *) |> Seq.map make
(* Sort by Path *) |> Seq.sortWith comparer
(* Fold Sorted Sequence *) |> Seq.fold nest []
(* Sort Roots by Name *) |> Seq.sortBy (fun collection -> collection.name)
(* Return Roots Array *) |> Seq.toArray
Sort by Path
Materialized paths comparison starts from the last elements and ends if the remaining paths are empty or the current elements of the paths are different:
let comparer a b =
let rec compareAt i j =
match (i, j) with
| (-1, -1) -> 0
| (-1, _) -> -1
| (_, -1) -> +1
| _ ->
let compareTo = a.path.[i].CompareTo(b.path.[j])
if compareTo = 0 then
compareAt (i - 1) (j - 1)
else
compareTo
compareAt (a.path.Length - 1) (b.path.Length - 1)
Applying the above comparer function to the sequence of collections with Seq.sortWith
produces the following collection sequence:
id | name | path |
---|---|---|
24a23792-… | Collection 1 | [24a23792-…, 00000000-…] |
4daa2217-… | Collection 1.1 | [4daa2217-…, 24a23792-…, 00000000-…] |
fb42f2c8-… | Collection 1.1.1 | [fb42f2c8-…, 4daa2217-…, 24a23792-…, 00000000-…] |
81b20455-… | Collection 1.2 | [81b20455-…, 24a23792-…, 00000000-…] |
c0a3de07-… | Collection 2 | [c0a3de07-…, 00000000-…] |
Fold Sorted Sequence
Each element of the sorted sequence needs to be placed in the parent’s children
collection. However, a child element can be the parent element of another element and therefore cannot be immediately placed in the parent’s children
collection. Instead, it goes onto the stack of parent elements.
If the element at the top of the stack is not the current element’s parent, it is removed from the stack and added to its parent’s children
collection. The collapse of the stack elements continues until only the root elements remain on the stack, or the parent of the current element of the input sequence is at the top of the stack.
let rec nest collections collection =
let inline isRoot collection = collection.path.[1] = Guid.Empty
let inline isParent parent collection = collection.path.[1] = parent.id
match List.tryHead collections with
| Some(current) when
not (isRoot current) && not (isParent current collection) ->
let previous = List.item 1 collections
let rest = List.skip 2 collections
let children =
match previous.children with
| Some(children) -> Array.append children [| current |]
| None -> [| current |]
Array.sortInPlaceBy (fun collection -> collection.name) children
nest ({ previous with children = Some(children) } :: rest) collection
| _ -> collection :: collections
Application of the above nest function to the sequence of collections with Seq.fold produces the root nodes sequence:
[
{
name = "Collection 1", ...,
children = [
{
name = "Collection 1.1", ...,
chidren = [
{ name = "Collection 1.1.1", ... },
{ name = "Collection 1.1.2", ... }
]
},
{
name = "Collection 1.2", ...,
}
]
},
{
name = "Collection 2", ...
}
]
Conclusion
Development in F# requires efforts to decompose the problem, but it is easier to formulate a solution method and check the correctness of individual sub-problems. Functions comparer
and nest
became new building blocks themselves.