--- title: "WOQL Control Flow: Backtracking, Scope, and Performance" nextjs: metadata: title: "WOQL Control Flow: Backtracking, Scope, and Performance" keywords: woql control flow backtracking scope variables member group_by collect distinct select triple triple_slice performance streaming deterministic non-deterministic datalog description: Deep guide to WOQL's execution model — how backtracking generates solutions, how variable scope works across sub-queries, and how to structure queries for optimal performance with triple, triple_slice, and streaming results. openGraph: images: https://assets.terminusdb.com/docs/technical-documentation-terminuscms-og.png alternates: canonical: https://terminusdb.org/docs/woql-control-flow/ media: [] --- WOQL queries look declarative — you describe *what* you want and the engine figures out the rest. But underneath, the engine makes choices about *how* to find answers, and those choices follow specific rules. Understanding these rules is the difference between a query that streams results in milliseconds and one that materializes an entire dataset into memory before returning anything. This page explains the execution model from the ground up: how solutions are generated, how variables move between inner and outer scopes, and how to structure queries so the engine can do its job efficiently. > **Prerequisites:** This page assumes you've completed the [WOQL Tutorial](/docs/woql-tutorial/) and understand basic concepts like `and`, `or`, `triple`, and `eq`. If terms like "binding" and "variable" are unfamiliar, start there. > **See also:** [WOQL Explanation](/docs/woql-explanation/) | [WOQL Tutorial](/docs/woql-tutorial/) | [Range Queries with triple_slice](/docs/woql-triple-slice/) | [Time Handling](/docs/woql-time-handling/) | [WOQL Class Reference](/docs/woql-class-reference-guide/) --- ## The Execution Model: Solutions and Backtracking ### What is a solution? A WOQL query produces a table of results. Each row is a **solution** — a complete assignment of values to variables that satisfies every constraint in the query. The engine's job is to find all such solutions. ```javascript // This query has exactly 1 solution: eq("v:x", 42) // Result: x = 42 // This query has 0 solutions: and(eq("v:x", 42), eq("v:x", 99)) // No value of x satisfies both constraints simultaneously ``` ### How backtracking works The engine evaluates a query left to right. When it reaches a predicate that can produce multiple results — like `triple`, `or`, or `member` — it picks the first result and continues forward. If a later constraint fails, the engine **backtracks** to the most recent choice point and tries the next option. Think of it as exploring a tree of possibilities: ``` and( triple(v.person, "city", v.city), ← choice point: many people triple(v.person, "age", v.age), ← for each person, look up age greater(v.age, 30) ← filter: does age > 30? ) ``` The engine: 1. Picks the first person/city triple. 2. Looks up that person's age. 3. Checks if age > 30. If yes, emits a solution. If no, backtracks to step 1. 4. Picks the next person/city triple. Repeats. This is **backtracking** — the engine systematically explores all possibilities by going forward when constraints succeed and going backward when they fail. ### Deterministic vs. non-deterministic predicates Every WOQL predicate falls into one of two categories: | Category | Behavior | Examples | |----------|----------|---------| | **Non-deterministic** (generators) | Can produce multiple solutions via backtracking | `triple`, `triple_slice`, `member`, `or`, `sequence`, `path` | | **Deterministic** (filters/transforms) | Produce exactly one solution or fail | `eq`, `greater`, `less`, `not`, `typecast`, `length`, `sum`, `group_by`, `limit`, `distinct` | Non-deterministic predicates are **choice points** — they are where the engine branches. Deterministic predicates either pass or block a solution, but never multiply the number of solutions. Understanding this distinction is crucial for reasoning about how many rows a query will return and where the computational work happens. --- ## Variable Scope: Inner and Outer Queries ### The default: all variables are globally scoped In a flat WOQL query, all variables live in the same scope. A binding made anywhere is visible everywhere: ```javascript and( eq("v:x", 42), eq("v:y", "v:x") // v:x is already bound to 42, so v:y = 42 ) ``` This is simple and intuitive for flat queries. It gets more interesting with sub-queries. ### `select` — restricting which variables escape `select` runs a sub-query but only exposes the listed variables to the outer query. Variables inside the sub-query that are not selected are invisible outside: ```javascript and( select("v:name", and( triple("v:person", "name", "v:name"), triple("v:person", "age", "v:age") // v:age is computed but not exposed ) ), eq("v:age", "unknown") // v:age is unbound here — select hid it ) ``` **When to use `select`:** When a sub-query binds helper variables that would pollute the outer scope or cause unintended unification. It is the primary tool for variable hygiene. ### `group_by` — grouping with scoped aggregation `group_by` runs a sub-query, groups the results by one or more variables, and collects the grouped values into lists. The sub-query's variables are scoped — only the grouping variables and the collected list are visible outside. ```javascript // Count people per city and( group_by( ["city"], // group by these variables ["person"], // collect these into lists "v:city_group", // the collected list and( isa("v:person", "Person"), triple("v:person", "city", "v:city") ) ), length("v:city_group", "v:count") ) ``` **Key scoping behavior:** - `v:city` is the grouping variable — it is bound in the outer scope (one value per group). - `v:person` is the template variable — its individual values are not visible outside, only the collected list is. - `v:city_group` is the result list — visible in the outer scope. - The inner query runs to exhaustion for each group. **Common pitfall:** Using `group_by` with an empty grouping key just to collect all values into a list: ```javascript // Verbose way to collect all ages into a list group_by( [], ["age"], "v:all_ages", triple("v:person", "age", "v:age") ) ``` This works but is semantically misleading — you're not grouping by anything. The proposed `Collect` predicate (see below) makes this intent explicit. ### `Collect` — gathering bindings into a list (proposed) `Collect` is the inverse of `Member`. Where `Member` unpacks a list into individual bindings, `Collect` runs a sub-query to exhaustion and gathers all values of a template variable into a list. ```javascript // Collect all ages into a list WOQL.collect( WOQL.var("age"), WOQL.var("all_ages"), WOQL.triple(WOQL.var("person"), "age", WOQL.var("age")) ) // Result: all_ages = [28, 35, 28, 42, 31] ``` **Scoping behavior:** - The `query` parameter is a self-contained sub-query. It runs independently. - The `into` variable receives the resulting list in the outer scope. - The `template` specifies what to collect from each solution (last to support fluent queries) - Variables in the sub-query that are not in the template are existentially quantified — they exist only to help generate solutions. **The key insight:** `Collect` must **control** the backtracking. You cannot collect values from a variable that is already backtracking in the outer scope — the sub-query that generates the values must be **inside** the `Collect`. This is the same constraint as `group_by` and the `findall` Prolog predicate. ```javascript // This does NOT collect outer backtracking into a list: and( triple("v:person", "age", "v:age"), // generates many bindings WOQL.collect("v:age", ???, "v:list") // too late — v:age is already bound ) // This DOES work — the generating query is inside Collect: WOQL.collect( WOQL.var("age"), WOQL.triple(WOQL.var("person"), "age", WOQL.var("age")), WOQL.var("all_ages") ) ``` ### `member` — unpacking a list into bindings `Member` is the inverse of `Collect`. Given a list, it generates one solution per element: ```javascript and( eq("v:fruits", ["apple", "banana", "cherry"]), member("v:fruit", "v:fruits") ) // Result: 3 rows — fruit = "apple", fruit = "banana", fruit = "cherry" ``` `Member` is non-deterministic — it creates a choice point with as many branches as there are list elements. Each branch produces one binding. **The Member/Collect symmetry:** | Predicate | Direction | Behavior | |-----------|-----------|----------| | `Member` | List → Bindings | Non-deterministic: generates one solution per element | | `Collect` | Bindings → List | Deterministic: runs sub-query to exhaustion, produces one solution | ### `distinct` — deduplication within a scope `distinct` eliminates duplicate values for a specified variable while preserving backtracking for everything else: ```javascript distinct("v:city", triple("v:person", "city", "v:city") ) // Returns each city once, even if multiple people live there ``` **Scoping behavior:** `distinct` observes the values of the specified variable as the inner query generates them, and suppresses solutions that would repeat a previously seen value. The inner query still runs normally — `distinct` is a filter on the output stream. ### `not` — negation as failure `not` succeeds when its inner query **fails** (produces zero solutions). It does not bind any variables: ```javascript and( isa("v:person", "Person"), triple("v:person", "name", "v:name"), not(triple("v:person", "email", "v:email")) ) // People without an email address ``` **Critical scoping rule:** Variables that appear only inside `not` are existentially quantified. `not` checks whether *any* solution exists — it does not expose which specific values matched. If you need to know which values were excluded, use a different pattern (e.g., `opt` + filter). ### `opt` — optional matching `opt` tries to match its inner query. If the inner query succeeds, its bindings flow to the outer scope. If it fails, the outer query continues with those variables unbound: ```javascript and( triple("v:person", "name", "v:name"), opt(triple("v:person", "email", "v:email")) ) // All people, with email if they have one ``` **Scoping behavior:** Unlike `not`, `opt` **does** expose inner variable bindings when the inner query succeeds. When it fails, those variables remain unbound (empty in the result). --- ## Structuring Queries for Performance ### Principle 1: Put the most selective constraint first The engine evaluates left to right. Constraints that eliminate more possibilities early mean less backtracking overall: ```javascript // Slow: scan all triples, then filter and( triple("v:person", "name", "v:name"), triple("v:person", "city", "v:city"), triple("v:person", "age", "v:age"), greater("v:age", 60) ) // Faster: if few people are over 60, this is easiest to reason about // (But only if you can — triple doesn't support filtering and will materialize every triple for the non-grounded variable) ``` In practice, `triple` always scans all matching triples for a given predicate. The real optimization opportunity is choosing between `triple` and `triple_slice`. ### Principle 2: Use `triple_slice` for range constraints `triple` iterates every triple for a predicate. `triple_slice` binary-searches the sorted value dictionary and only touches values in the range: ```javascript // O(n) — scans all timestamps, then filters and( triple("v:doc", "timestamp", "v:time"), greater("v:time", literal("2025-01-01T00:00:00Z", "xsd:dateTime")), less("v:time", literal("2025-02-01T00:00:00Z", "xsd:dateTime")) ) // O(log n) — binary search, only touches matching values triple_slice("v:doc", "timestamp", "v:time", "2025-01-01T00:00:00Z", "2025-02-01T00:00:00Z") ``` Both return the same results. The performance difference grows with dataset size. For time-series data or large numeric ranges, `triple_slice` is almost always the right choice. See the [triple_slice guide](/docs/woql-triple-slice/) for full details on binding modes, type inference, and worked examples. ### Principle 3: Understand what streams and what materializes WOQL predicates fall into two performance categories: **Streaming predicates** yield one solution at a time, using constant memory regardless of how many total solutions exist: - `triple` / `quad` - `triple_slice` / `quad_slice` - `member` - `sequence` - `path` - `and`, `or` (composition) **Materializing predicates** must collect all solutions from their sub-query before producing output. Memory usage scales with the number of inner solutions: - `group_by` — groups all solutions, then emits groups one at a time - `Collect` — collects all template values into a list in memory - `distinct` — tracks all seen values to detect duplicates - `order_by` — must see all solutions before it can sort - `aggregate` / `count` / `sum` / `length` (on query results) **The practical consequence:** If a sub-query inside `group_by` or `Collect` produces millions of solutions, those millions are held in memory. For very large result sets, consider whether you can narrow the sub-query with `limit`, `triple_slice`, or more selective constraints. ### Principle 4: Streaming responses — results flow as they complete When you execute a WOQL query via the API, TerminusDB can stream results back as they are found. The client receives the first solution as soon as it is computed — it does not have to wait for all solutions to be found. This means: - **Streaming queries** (built from `triple`, `and`, `or`, etc.) start returning results immediately. Time-to-first-result is fast even for queries with millions of solutions. - **Materializing queries** (containing `group_by`, `order_by`, etc.) must finish their inner computation before the first result streams out. Time-to-first-result depends on the inner query's total execution time. **Designing for responsiveness:** If your application can process results incrementally (e.g., displaying a table row by row), prefer streaming query shapes. If you need aggregation, do it as late as possible in the query so the streaming portion runs first. ### Principle 5: Leverage bound variables for index access `triple` uses different internal access patterns depending on which arguments are bound, examples: | Subject | Predicate | Object | Access pattern | |---------|-----------|--------|----------------| | bound | bound | unbound | Direct lookup — fast | | bound | unbound | unbound | Iterate predicates for one subject | | unbound | bound | unbound | Iterate subjects for one predicate | | unbound | unbound | unbound | Full scan — slowest | The engine is fastest when it can use bound arguments to narrow the search. A common pattern is to bind the subject first (via a previous `triple` or `isa`), then look up specific properties: ```javascript // Good: v.person is bound by isa, then used to look up properties directly and( isa("v:person", "Person"), triple("v:person", "name", "v:name"), triple("v:person", "age", "v:age") ) ``` When you use `and`, the shared variable `v:person` is bound by the first predicate and used as an index key by the subsequent ones. This is the natural join pattern — and it is efficient because each subsequent `triple` does a direct lookup rather than a scan. ### Principle 6: Compose `Collect` with list operators Rather than building custom aggregation logic, use `Collect` to bridge query results into lists, then apply existing list operators: ```javascript // Min salary and( WOQL.collect( WOQL.var("salary"), WOQL.triple(WOQL.var("emp"), "salary", WOQL.var("salary")), WOQL.var("salaries") ), WOQL.range_min(WOQL.var("salaries"), WOQL.var("lowest")) ) // Count of results and( WOQL.collect( WOQL.var("doc"), WOQL.isa(WOQL.var("doc"), "Person"), WOQL.var("docs") ), WOQL.length(WOQL.var("docs"), WOQL.var("count")) ) // Window of results and( WOQL.collect( WOQL.var("x"), WOQL.sequence(WOQL.var("x"), 0, 100), WOQL.var("all") ), WOQL.slice(WOQL.var("all"), WOQL.var("window"), 10, 20) ) ``` This pattern keeps the API surface small — one new predicate (`Collect`) unlocks all existing list operators (`RangeMin`, `RangeMax`, `Slice`, `Length`, `Sum`, `ListToSet`, `SetDifference`, `SetIntersection`, `SetUnion`) for use over query results. --- ## Predicate Quick Reference: Control Flow Properties | Predicate | Deterministic? | Streams? | Scopes variables? | Description | |-----------|---------------|----------|-------------------|-------------| | `and` | — | Yes | No (shared scope) | All constraints must hold | | `or` | No | Yes | No (shared scope) | Any branch can hold; one row per successful branch | | `triple` | No | Yes | No | Pattern match on subject-predicate-object | | `triple_slice` | No | Yes | No | Range-constrained triple with O(log n) access | | `triple_slice_rev` | No | Yes | No | Range-constrained triple with O(log n) access, reversed | | `triple_first` | No | Yes | No | Convenience range function, it's a limit(1) of the slice | | `triple_last` | No | Yes | No | Convenience range function, it's a limit(1) of the reverse slice | | `member` | No | Yes | No | Generates one binding per list element | | `sequence` | No | Yes | No | Generates values in a range | | `path` | No | Yes | No | Graph traversal with regex-like path expressions | | `select` | — | Yes | **Yes** — hides non-selected variables | Filter which variables are exposed | | `distinct` | — | Partial | No | Deduplicates; must track seen values | | `not` | Yes | — | **Yes** — inner variables are existential | Succeeds when inner query fails | | `opt` | — | Yes | Partial — bindings escape on success | Optional match; continues on failure | | `group_by` | Yes | No (materializes) | **Yes** — template variables are scoped | Groups results, collects into lists | | `collect` | Yes | No (materializes) | **Yes** — sub-query is self-contained | Gathers all template values into a list | | `order_by` | — | No (materializes) | No | Sorts all solutions | | `limit` | — | Yes | No | Caps the number of solutions | | `eq` | Yes | — | No | Unifies two values | | `greater` / `less` | Yes | — | No | Strict comparison filter | --- ## Patterns and Anti-Patterns ### Pattern: Narrow early, expand late ```javascript // Good: isa + triple_slice narrows to a small set, then join and( isa("v:doc", "SensorReading"), triple_slice("v:doc", "timestamp", "v:time", "2025-01-01T00:00:00Z", "2025-01-02T00:00:00Z"), triple("v:doc", "temperature", "v:temp") ) ``` ### Anti-pattern: Collect everything, then filter ```javascript // Bad: materializes all readings into memory, then filters client-side WOQL.collect( WOQL.var("temp"), WOQL.var("all_temps") WOQL.triple(WOQL.var("doc"), "temperature", WOQL.var("temp")), ) // Then filter all_temps in application code ``` Instead, filter inside the sub-query: ```javascript // Better: only collect what you need WOQL.collect( WOQL.var("temp"), WOQL.var("daily_temps"), WOQL.and( WOQL.triple_slice(WOQL.var("doc"), "timestamp", WOQL.var("time"), "2025-01-01T00:00:00Z", "2025-01-02T00:00:00Z"), WOQL.triple(WOQL.var("doc"), "temperature", WOQL.var("temp")) ), ) ``` ### Pattern: Use `Member` and `Collect` as inverses ```javascript // Generate a list from a query WOQL.collect( WOQL.var("city"), WOQL.var("all_cities"), WOQL.and( WOQL.isa(WOQL.var("person"), "Person"), WOQL.triple(WOQL.var("person"), "city", WOQL.var("city")) ), ) // Later: iterate over that list WOQL.member(WOQL.var("city"), WOQL.var("unique_cities")) ``` ### Pattern: Streaming joins for responsive UIs When your UI can display results incrementally, prefer streaming query shapes: ```javascript // Streams results immediately — first row arrives fast and( isa("v:person", "Person"), triple("v:person", "name", "v:name"), triple("v:person", "age", "v:age") ) ``` Over materializing shapes: ```javascript // Must compute all results before returning any order_by("v:age", "asc", and( isa("v:person", "Person"), triple("v:person", "name", "v:name"), triple("v:person", "age", "v:age") ) ) ``` Both return the same data. The first starts streaming immediately; the second waits until all results are sorted. --- ## Summary | Concept | Key takeaway | |---------|-------------| | **Backtracking** | The engine explores all possibilities by going forward on success and backward on failure | | **Choice points** | Non-deterministic predicates (`triple`, `member`, `or`, `sequence`) create branches | | **Variable scope** | Variables are global by default; `select`, `group_by`, `Collect`, and `not` create scoped boundaries | | **Streaming vs. materializing** | `triple`, `member`, `and` stream; `group_by`, `Collect`, `order_by` materialize | | **Performance** | Use `triple_slice` for ranges, bind variables early, filter inside sub-queries | | **Member/Collect symmetry** | `Member` unpacks lists into bindings; `Collect` packs bindings into lists | Understanding these mechanics lets you write WOQL queries that are not just correct, but efficient — queries that leverage the engine's strengths rather than fighting against them.