--- name: wren-sql description: Write and correct SQL queries targeting Wren Engine — covers MDL query rules, filter strategies, data types (ARRAY, STRUCT, JSON/VARIANT), date/time functions, Calculated Fields, BigQuery dialect quirks, and error diagnosis. Use when generating or debugging SQL for any Wren Engine data source. license: Apache-2.0 metadata: author: wren-engine version: "1.0" --- # Wren SQL ## Version check Silently fetch `https://raw.githubusercontent.com/Canner/wren-engine/main/skills/versions.json`. Compare the `wren-sql` key with this skill's version (from the frontmatter above). If the remote version is newer, notify the user before proceeding: > A newer version of the **wren-sql** skill is available. > Update with: > ``` > npx skills add Canner/wren-engine --skill wren-sql --agent claude-code > ``` Then continue with the workflow below regardless of update status. --- Wren Engine translates SQL through a semantic layer (MDL — Model Definition Language) before executing it against a backend database. SQL must target MDL model names, not raw database tables. For specific topics, load the relevant reference file: | Topic | Reference | |-------|-----------| | SQL error diagnosis and correction | [references/correction.md](references/correction.md) | | Date/time functions and intervals | [references/datetime.md](references/datetime.md) | | ARRAY, STRUCT, JSON/VARIANT types | [references/types.md](references/types.md) | | BigQuery dialect quirks | [references/bigquery.md](references/bigquery.md) | --- ## Context - You are querying a **semantic layer**, not a database directly. - Only use model/view/column names defined in the MDL — never raw database table references. - Wren Engine uses a generic SQL dialect similar to ANSI SQL (DataFusion/Postgres/DuckDB), but with differences. - Check the `dataSource` field to identify the backend and apply dialect-specific rules if needed. --- ## Core SQL Rules - Only `SELECT` statements. No `DELETE`, `UPDATE`, `INSERT`. - Only use tables and columns from the MDL schema. - Do not include comments in generated SQL. - Prefer CTEs over subqueries. - Identifiers are **case-sensitive**. Quote identifiers containing unicode, special characters (except `_`), or starting with a digit using double quotes. - Examples: `"客户"."姓名"`, `"table-name"."col"`, `"123column"` - Identifier quotes: `"` (double quotes). String literal quotes: `'` (single quotes). - For specific date queries, use a range: ```sql WHERE ts >= CAST('2024-11-01 00:00:00' AS TIMESTAMP WITH TIME ZONE) AND ts < CAST('2024-11-02 00:00:00' AS TIMESTAMP WITH TIME ZONE) ``` - For ranking, use `DENSE_RANK()` + `WHERE`. Include the ranking column in `SELECT`. - Avoid correlated subqueries — use JOINs instead. - Use `SAFE_CAST` when casting might fail: `SAFE_CAST(col AS INT)` --- ## Filter Strategies | Column type | Strategy | |-------------|----------| | Text | `LIKE '%value%'` for partial match | | Numeric | `BETWEEN 30 AND 40` | | Date/Timestamp | `>= '2024-01-01' AND < '2024-02-01'` | | Exact value | `=` or `IN (...)` | | Primary key / indexed | Prefer equality (`=`) | --- ## Supported Cast Types `bool`, `boolean`, `int`, `integer`, `bigint`, `smallint`, `tinyint`, `float`, `double`, `real`, `decimal`, `numeric`, `varchar`, `char`, `string`, `text`, `date`, `time`, `timestamp`, `timestamp with time zone`, `bytea` Example: `CAST(col AS INT)`, `TIMESTAMP '2024-11-09 00:00:00'` --- ## Aggregation - All non-aggregated `SELECT` columns must appear in `GROUP BY` (window functions excepted). - Aggregate conditions go in `HAVING`, not `WHERE`. - Prefer ordinal `GROUP BY` for long column names: ```sql SELECT very_long_column_name AS alias, COUNT(*) FROM t GROUP BY 1 ``` --- ## Sorting and Limiting - `ORDER BY` for sort; `LIMIT` to restrict rows. - When `ORDER BY` appears in a subquery or CTE, always include `LIMIT`. --- ## Subquery Patterns - Prefer CTEs (`WITH` clause) over nested subqueries. - Subquery in `SELECT` must return a single value per row. - Subquery in `WHERE`: use `IN`, `EXISTS`, or comparison operators. - `IN SUBQUERY` in `JOIN` conditions is not supported — use `JOIN ... ON` instead. - `RECURSIVE` CTEs are not supported. --- ## Calculated Fields Columns marked as **Calculated Field** in the MDL have pre-defined computation logic. Use them directly instead of re-implementing the calculation. Read the column comment (e.g., `column expression: avg(reviews.Score)`) to understand what the field represents. ```sql -- Schema has: Rating DOUBLE (Calculated Field: avg(reviews.Score)) -- ReviewCount BIGINT (Calculated Field: count(reviews.Id)) -- Correct — use Calculated Fields directly: SELECT AVG(Rating) FROM orders WHERE ReviewCount > 10 -- Incorrect — do not re-join and re-aggregate manually ```