--- name: regocpp-builtins description: 'Add, update, or remove OPA Rego built-in functions in rego-cpp. Use when: implementing a new builtin, replacing a placeholder with a real implementation, adding a new OPA builtin namespace, updating builtin declarations to match a new OPA version, removing deprecated builtins, or debugging builtin dispatch/registration. Covers the full lifecycle: declaration, implementation, dispatch registration, CMake wiring, and OPA conformance testing.' argument-hint: 'Describe which builtin(s) to add, update, or remove.' --- # rego-cpp Built-in Function Development Add, update, and remove OPA Rego built-in functions in rego-cpp. ## When to Use - Implementing a new builtin (replacing a placeholder or adding from scratch) - Adding a new OPA builtin namespace (new `src/builtins/.cc` file) - Updating builtin declarations to track a new OPA version - Replacing placeholder stubs with real implementations - Removing deprecated builtins - Debugging builtin dispatch or registration issues ## Architecture Overview Built-in functions follow a three-layer architecture: ``` BuiltInsDef::lookup(name) ← Dispatch layer (src/builtins.cc) → builtins::(name) ← Namespace router (src/builtins/.cc) → _factory() ← Factory (returns BuiltIn with decl + behavior) → (args) ← Implementation (unwrap args, compute, return) ``` ### Key Files | File | Purpose | |------|---------| | `src/builtins/builtins.hh` | Namespace dispatch function declarations | | `src/builtins.cc` | `BuiltInsDef::lookup` — hand-coded binary dispatch tree | | `src/builtins/.cc` | One file per OPA namespace (e.g., `crypto.cc`, `jwt.cc`) | | `src/CMakeLists.txt` | SOURCES list — must include new `.cc` files | | `include/rego/rego.hh` | Public API — `BuiltIn`, `BuiltInDef`, `UnwrapOpt`, helpers | ### The Binary Dispatch Tree `BuiltInsDef::lookup` in `src/builtins.cc` uses a **hand-coded binary search tree** (generated by `src/builtins/binary_tree.py`) that routes the namespace prefix of a builtin name to the corresponding namespace function. When adding a new namespace, this tree must be regenerated or manually updated. Special case: `"io"` prefix routes to `builtins::jwt(name)` for `io.jwt.*` builtins. ## Procedure ### Adding a New Builtin to an Existing Namespace 1. **Read the existing namespace file** (`src/builtins/.cc`) to understand the patterns in use. 2. **Write the implementation function:** ```cpp Node my_func(const Nodes& args) { // Unwrap and validate arguments Node x = unwrap_arg(args, UnwrapOpt(0).type(JSONString).func("namespace.my_func")); if (x->type() == Error) return x; // Extract values std::string val = get_string(x); // Compute result std::string result = do_something(val); // Return wrapped result return JSONString ^ result; } ``` 3. **Write the factory function:** ```cpp BuiltIn my_func_factory() { const Node my_func_decl = bi::Decl << (bi::ArgSeq << (bi::Arg << (bi::Name ^ "x") << (bi::Description ^ "input string") << (bi::Type << bi::String))) << (bi::Result << (bi::Name ^ "y") << (bi::Description ^ "result description") << (bi::Type << bi::String)); return BuiltInDef::create({"namespace.my_func"}, my_func_decl, my_func); } ``` 4. **Register in the namespace router** (the public function at the bottom of the file): ```cpp BuiltIn namespace_func(const Location& name) { // ... existing dispatches ... if (view == "my_func") { return my_func_factory(); } return nullptr; } ``` 5. **Run OPA conformance tests:** ```bash cd build && ./tests/rego_test -wf opa/v1/test/cases/testdata/v1/ ``` ### Adding a New Namespace When adding an entirely new OPA namespace (new `.cc` file): 1. **Create `src/builtins/.cc`** following the pattern of existing files. Include the anonymous namespace for internal functions and the `rego::builtins` namespace for the public dispatch function. 2. **Declare the dispatch function** in `src/builtins/builtins.hh`: ```cpp namespace rego::builtins { BuiltIn my_namespace(const Location& name); } ``` 3. **Add to the dispatch tree** in `src/builtins.cc` — find the correct position in `BuiltInsDef::lookup` based on the namespace prefix string and add the routing branch. Alternatively, regenerate the tree using `src/builtins/binary_tree.py`. 4. **Add the source file to `src/CMakeLists.txt`:** ```cmake set( SOURCES # ... existing sources ... builtins/.cc ) ``` 5. **Rebuild and test.** ### Replacing a Placeholder with a Real Implementation Many builtins are registered as `BuiltInDef::placeholder(...)` which returns an error message when called. To replace: 1. **Keep the existing declaration** (`bi::Decl << ...`) — it defines the argument and return types. 2. **Write the implementation function** that takes `const Nodes& args` and returns a `Node`. 3. **Change the factory** from: ```cpp return BuiltInDef::placeholder({"name"}, decl, "message"); ``` to: ```cpp return BuiltInDef::create({"name"}, decl, implementation_function); ``` 4. **If the builtin requires a platform dependency** (e.g., OpenSSL), use compile-time guards: ```cpp #ifdef REGOCPP_HAS_CRYPTO return BuiltInDef::create({"crypto.sha256"}, sha256_decl, sha256); #else return BuiltInDef::placeholder({"crypto.sha256"}, sha256_decl, Message); #endif ``` ### Removing a Deprecated Builtin 1. Check the deprecated list in `BuiltInsDef::is_deprecated` in `src/builtins.cc`. 2. Add the builtin name to the `deprecated` vector if not already present. 3. Deprecated builtins return `RegoTypeError` when called, regardless of implementation. ## Key Patterns ### Argument Unwrapping ```cpp // Single type Node x = unwrap_arg(args, UnwrapOpt(0).type(JSONString)); // Multiple accepted types Node x = unwrap_arg(args, UnwrapOpt(0).types({JSONString, Int, Float})); // With function name for error messages Node x = unwrap_arg(args, UnwrapOpt(0).type(JSONString).func("crypto.sha256")); // With custom error details Node x = unwrap_arg(args, UnwrapOpt(0).type(JSONString) .func("crypto.sha256").specify_number(true)); ``` Always check for errors after unwrapping: ```cpp if (x->type() == Error) return x; ``` ### Value Extraction ```cpp std::string val = get_string(node); // strips quotes BigInt ival = get_int(node); double dval = get_double(node); bool bval = get_bool(node); // Optional variants (return std::nullopt on wrong type) auto maybe_str = try_get_string(node); auto maybe_int = try_get_int(node); ``` ### Result Construction For **scalar** results, return bare token nodes: ```cpp return JSONString ^ "result"; // string result return Int ^ BigInt(42); // integer result return Float ^ 3.14; // float result return True ^ "true"; // boolean true return False ^ "false"; // boolean false return Undefined; // undefined (no result) return err(args[0], "error message"); // error return err(args[0], "msg", EvalTypeError); // typed error ``` For **compound** results (arrays, objects, nested structures), use the rego API helpers declared in `include/rego/rego.hh`. These handle all Term/Scalar wrapping and cloning correctly, avoiding well-formedness errors: ```cpp // Booleans, strings, numbers, null — produce correctly-wrapped Scalar nodes return boolean(true); // same as True ^ "true" but self-documenting return rego::string("hello"); // note: qualify as rego::string to avoid std::string return number(3.14); return null(); // Arrays — items are auto-wrapped via Resolver::to_term() return array({boolean(true), rego::string("ok")}); return array({header_term, payload_term, rego::string(sig_hex)}); // Objects — built from object_item() nodes return object({ object_item(rego::string("key"), rego::string("value")), object_item(rego::string("count"), number(42.0)) }); // Nested: array of [bool, object, object] return array({boolean(false), object({}), object({})}); ``` **IMPORTANT**: Never manually construct compound result nodes with `NodeDef::create(Array)`, `Term <<`, `Scalar <<`, or `push_back`. These patterns produce nodes that violate well-formedness rules. Always use `array()`, `object()`, `object_item()`, `boolean()`, `rego::string()`, `number()`, and `null()` instead. These helpers call `Resolver::to_term()` internally, which handles all wrapping (Term, Scalar) and cloning correctly regardless of whether the input is a bare token, a Scalar, or an already-wrapped Term. ### Declaration Types ```cpp bi::String, bi::Number, bi::Boolean, bi::Null, bi::Any // Scalar types bi::DynamicArray << (bi::Type << bi::String) // array of strings bi::DynamicObject << (bi::Type << bi::String) << (bi::Type << bi::Any) // object bi::StaticArray << (bi::Type << bi::Boolean) << (bi::Type << bi::String) // [bool, string] bi::Set << (bi::Type << bi::String) // set of strings ``` ### Shared Code Between Namespaces When multiple namespaces share implementation logic (e.g., crypto primitives shared between `crypto.*` and `io.jwt.*`): 1. Create a shared internal header: `src/builtins/.hh` 2. Create a shared implementation: `src/builtins/.cc` 3. Add the `.cc` to `src/CMakeLists.txt` SOURCES 4. Include from both namespace files Use compile-time backend selection for platform-dependent code: ```cmake set(REGOCPP_CRYPTO_BACKEND "" CACHE STRING "Crypto backend: openssl3, '' (disabled)") if(REGOCPP_CRYPTO_BACKEND STREQUAL "openssl3") find_package(OpenSSL 3.0 REQUIRED) target_link_libraries(rego PUBLIC OpenSSL::SSL OpenSSL::Crypto) target_compile_definitions(rego PUBLIC REGOCPP_HAS_CRYPTO=1 REGOCPP_CRYPTO_OPENSSL3=1) endif() ``` ### Parsing JSON Inside Builtins When a builtin needs to inspect or validate JSON data (e.g., JWT headers/payloads, JWK keys), **always use the Trieste JSON parser** (``) instead of manual string searching. Manual JSON parsing (e.g., `json.find("\"field\"")`, character-by-character extraction) is brittle and will break on whitespace variations, escaped characters, nested structures, and field-name substrings. **Two JSON AST types exist** — use the right one for the task: | AST type | Namespace | Produced by | Use for | |----------|-----------|-------------|---------| | JSON AST | `json::Object`, `json::Array`, `json::String`, ... | `json::reader().synthetic(str).read()` | Internal inspection: field lookup, type checking, claim validation | | Rego AST | `rego::Object`, `rego::Array`, `rego::JSONString`, ... | `json::reader().synthetic(str) >> json_to_rego(true)` | Return values to the Rego evaluator | **For internal inspection**, parse into the JSON AST and use `json::select` with RFC 6901 JSON Pointer paths: ```cpp #include // Parse raw JSON string into JSON AST Node ast = parse_json(json_str); // json::reader().synthetic(str).read() // Field lookup — paths use RFC 6901 format with leading "/" auto alg = ::json::select_string(ast, {"/alg"}); // std::optional auto exp = ::json::select_number(ast, {"/exp"}); // std::optional auto ok = ::json::select_boolean(ast, {"/active"}); // std::optional // Check field existence (select returns Error node if missing) Node field = ::json::select(ast, {"/enc"}); if (field->type() != Error) { /* field exists */ } // Check field type Node aud = ::json::select(ast, {"/aud"}); if (aud->type() == ::json::Array) { /* it's an array */ } // Nested paths auto deep = ::json::select_string(ast, {"/foo/bar/baz"}); ``` **CRITICAL**: The path argument is a `Location` initialized from a string literal with `{"/field"}` syntax. The leading `/` is required by RFC 6901. Using `Location("field")` without the `/` will fail silently. **For return values**, use `parse_json_to_term()` (which runs `json_to_rego`) to produce Rego-typed nodes suitable for the evaluator. Parse into the JSON AST first for validation, then convert to Rego terms only at the end when building the return value. **For Rego Object nodes** (e.g., constraint objects passed as builtin arguments), use `try_get_string(node)` and `try_get_double(node)` — these already handle Term/Scalar unwrapping. Do NOT navigate with `node / Scalar` before calling them. ## Testing ### OPA Conformance Tests OPA test cases live in `build/opa/v1/test/cases/testdata/v1//`. Directory names match OPA builtin names with no separators (e.g., `cryptohmacsha256`, `jwtdecodeverify`). ```bash # Run a specific builtin's tests cd build && ./tests/rego_test -wf opa/v1/test/cases/testdata/v1/ # List available test directories ls build/opa/v1/test/cases/testdata/v1/ | grep # Run all OPA tests (slow) ctest -R rego_test_opa ``` ### Custom Test Cases Add YAML test cases to `tests/regocpp.yaml` or `tests/bugs.yaml`: ```yaml - note: mybuiltin/basic query: data.test.p = x modules: - | package test p := crypto.sha256("hello") want_result: - x: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 ``` ### Error Message Matching Error messages must match OPA exactly — conformance tests compare strings literally. When implementing error handling, check OPA's actual error output for the builtin. ## Reference Plans - [Crypto & JWT Implementation Plan](./references/crypto-jwt-plan.md) — Phased plan for implementing `crypto.*` and `io.jwt.*` builtins with a shared OpenSSL core