Materialized path or path enumeration model1 stores the path to the tree node as a string or a list by concatenating the keys of the nodes in the path. Simple node inserts and removals, child and other descendants list, and parent and ancestors list queries make materialized path model attractive for large-scale applications. Fetching a subtree or an entire three rows in a single query is easy. An obvious solution to restoring the structure of a tree or subtree is to build a key-value map and append the child to the specific parent. This approach requires additional memory and relies on mutable data structures. Instead, it is enough to sort the lines in nesting order and build the tree structure by adding children to the previous parent.

Sorting by nesting order using database query language is problematic. So, the input data of the algorithm is unsorted:

id name path
24a23792-… Collection 1 [24a23792-…, 00000000-…]
4daa2217-… Collection 1.1 [4daa2217-…, 24a23792-…, 00000000-…]
c0a3de07-… Collection 2 [c0a3de07-…, 00000000-…]
81b20455-… Collection 1.2 [81b20455-…, 24a23792-…, 00000000-…]
fb42f2c8-… Collection 1.1.1 [fb42f2c8-…, 4daa2217-…, 24a23792-…, 00000000-…]

Materialized paths are stored in reverse order to simplify typical queries (syntax Cassandra Query Language (CQL)):

Find all root nodes

SELECT id, name, path FROM collections WHERE path[1] = 00000000-...

Find all children of the parent

SELECT id, name, path FROM collections WHERE path[1] = ?

Find all descendants of the parent

SELECT id, name, path FROM collections WHERE path CONTAINS ?

Remove the node and its descendants

DELETE FROM collections WHERE path CONTAINS ?

The database rows are converted to the immutable Collection record:

type Collection = {
    id: Guid
    name: string
    path: Guid array
    children: Collection array option
}

Algorithm

The algorithm decomposes into a sequence of independent steps:

Matrialized Paths to Tree
Matrialized Paths to Tree

Each step corresponds to a high-order function for sequences:

rows
(* Convert *)              |> Seq.map make
(* Sort by Path *)         |> Seq.sortWith comparer
(* Fold Sorted Sequence *) |> Seq.fold nest []
(* Sort Roots by Name *)   |> Seq.sortBy (fun collection -> collection.name)
(* Return Roots Array *)   |> Seq.toArray

Sort by Path

Materialized paths comparison starts from the last elements and ends if the remaining paths are empty or the current elements of the paths are different:

let comparer a b =
    let rec compareAt i j =
        match (i, j) with
        | (-1, -1) -> 0
        | (-1, _) -> -1
        | (_, -1) -> +1
        | _ ->
            let compareTo = a.path.[i].CompareTo(b.path.[j])
            if compareTo = 0 then
                compareAt (i - 1) (j - 1)
            else
                compareTo
    compareAt (a.path.Length - 1) (b.path.Length - 1)

Applying the above comparer function to the sequence of collections with Seq.sortWith produces the following collection sequence:

id name path
24a23792-… Collection 1 [24a23792-…, 00000000-…]
4daa2217-… Collection 1.1 [4daa2217-…, 24a23792-…, 00000000-…]
fb42f2c8-… Collection 1.1.1 [fb42f2c8-…, 4daa2217-…, 24a23792-…, 00000000-…]
81b20455-… Collection 1.2 [81b20455-…, 24a23792-…, 00000000-…]
c0a3de07-… Collection 2 [c0a3de07-…, 00000000-…]

Fold Sorted Sequence

Each element of the sorted sequence needs to be placed in the parent’s children collection. However, a child element can be the parent element of another element and therefore cannot be immediately placed in the parent’s children collection. Instead, it goes onto the stack of parent elements.

If the element at the top of the stack is not the current element’s parent, it is removed from the stack and added to its parent’s children collection. The collapse of the stack elements continues until only the root elements remain on the stack, or the parent of the current element of the input sequence is at the top of the stack.

let rec nest collections collection =
    let inline isRoot collection = collection.path.[1] = Guid.Empty
    let inline isParent parent collection = collection.path.[1] = parent.id

    match List.tryHead collections with
    | Some(current) when
        not (isRoot current) && not (isParent current collection) ->
        let previous = List.item 1 collections
        let rest = List.skip 2 collections
        let children =
            match previous.children with
            | Some(children) -> Array.append children [| current |]
            | None -> [| current |]
        Array.sortInPlaceBy (fun collection -> collection.name) children
        nest ({ previous with children = Some(children) } :: rest) collection
    | _ -> collection :: collections

Application of the above nest function to the sequence of collections with Seq.fold produces the root nodes sequence:

[
    {
        name = "Collection 1", ...,
        children = [
            {
                name = "Collection 1.1", ...,
                chidren = [
                    { name = "Collection 1.1.1", ... },
                    { name = "Collection 1.1.2", ... }
                ]
            },
            {
                name = "Collection 1.2", ...,
            }
        ]
    },
    {
        name = "Collection 2", ...
    }
]

Conclusion

Development in F# requires efforts to decompose the problem, but it is easier to formulate a solution method and check the correctness of individual sub-problems. Functions comparer and nest became new building blocks themselves.