--- name: ggsql description: Write ggsql queries — a grammar of graphics for SQL. Use when the user wants to create, modify, or understand a ggsql visualization query. allowed-tools: Bash(ggsql:*) argument-hint: "[description of desired visualization]" metadata: author: George Stagg (@georgestagg) version: "1.0" license: MIT --- # ggsql Query Writer ggsql is a SQL extension for declarative data visualization based on Grammar of Graphics principles. It lets users combine SQL data queries with visualization specifications in a single, composable syntax. When the user describes a visualization they want, write a valid ggsql query. Use ONLY syntax documented below. NEVER invent clauses, settings, aesthetics, or layer types. ## Query structure A ggsql query has two parts: 1. **SQL part** (optional): Standard SQL executed on the backend. Any tables, CTEs, or SELECT results are available to the visualization. 2. **VISUALISE part** (required): Begins with `VISUALISE` (or `VISUALIZE`). Everything after this is the visualization query. There are two patterns for combining SQL with VISUALISE: ### Pattern A: SELECT → VISUALISE The last SQL statement is a SELECT. Data flows from its result set into VISUALISE, which has no `FROM` clause. ```ggsql SELECT name, score_a, score_b FROM 'dataset.csv' WHERE value > 50 VISUALISE score_a AS x, score_b AS y [DRAW / PLACE / SCALE / FACET / PROJECT / LABEL clauses] ``` Works with any SQL that ends in a SELECT: bare SELECT, WITH...SELECT, UNION/INTERSECT/EXCEPT. ### Pattern B: VISUALISE FROM VISUALISE provides its own data source via `FROM`. Use when referencing a table, file, CTE, or built-in dataset directly without a trailing SELECT. ```ggsql VISUALISE score_a AS x, score_b AS y FROM 'dataset.csv' DRAW point ``` ```ggsql WITH summary AS (SELECT category, COUNT(*) AS n FROM 'dataset.csv' GROUP BY category) VISUALISE category AS x, n AS y FROM summary DRAW bar ``` ## Data sources Data sources can appear in `VISUALISE ... FROM` or `DRAW ... MAPPING ... FROM`: - **Table/CTE name** (unquoted): `FROM sales`, `FROM my_cte` - **File path** (single-quoted string): `FROM 'data.parquet'`, `FROM 'data.csv'` - **Built-in datasets**: `FROM ggsql:penguins`, `FROM ggsql:airquality` ## VISUALISE clause Marks the start of the visualization. Optionally defines global mappings inherited by all layers. ``` VISUALISE , ... FROM ``` ### Mapping forms - **Explicit**: `column AS aesthetic` — e.g. `revenue AS y` - **Implicit**: `column` — column name must match aesthetic name, e.g. `x` maps to `x` - **Wildcard**: `*` — all columns with names matching aesthetics are mapped - **Constants**: `'red' AS fill`, `42 AS size` — literal values mapped to aesthetic ```ggsql VISUALISE bill_len AS x, bill_dep AS y, species AS fill FROM ggsql:penguins VISUALISE * FROM my_table VISUALISE FROM ggsql:penguins ``` ## DRAW clause Defines a layer. Multiple DRAW clauses stack layers (first = bottom, last = top). ``` DRAW MAPPING , ... FROM REMAPPING AS , ... SETTING => , ... FILTER PARTITION BY , ... ORDER BY , ... ``` All subclauses are optional if VISUALISE provides global mappings and data. ### MAPPING Same syntax as VISUALISE mappings. Layer mappings merge with global mappings (layer takes precedence). Can include `FROM` for layer-specific data. - Use `null` to prevent inheriting a global mapping: `MAPPING null AS color` ### REMAPPING For statistical layers (histogram, density, boxplot, violin, smooth, bar without y). Maps calculated statistics to aesthetics. Each layer documents its available stats and default remapping. ```ggsql DRAW histogram MAPPING body_mass AS x REMAPPING density AS y -- use density instead of default count ``` ### SETTING Set literal aesthetic values or layer parameters. Aesthetics set here bypass scales. ```ggsql DRAW point SETTING size => 5, opacity => 0.7, stroke => 'red' ``` **Position adjustment** is a special setting: ```ggsql SETTING position => 'identity' -- no adjustment (default for most) SETTING position => 'stack' -- stack (default for bar, histogram, area) SETTING position => 'dodge' -- side by side (default for boxplot, violin) SETTING position => 'jitter' -- random offset ``` ### FILTER SQL WHERE condition applied to layer data. Content is passed to the database: ```ggsql DRAW point FILTER sex = 'female' AND body_mass > 4000 ``` ### PARTITION BY Additional grouping columns beyond mapped discrete aesthetics: ```ggsql DRAW line MAPPING Day AS x, Temp AS y PARTITION BY Month ``` ### ORDER BY Controls record order (important for path layers): ```ggsql DRAW path ORDER BY timestamp ``` ## PLACE clause Creates annotation layers with literal values only (no data mappings). Supports tuples for multiple annotations. ``` PLACE SETTING => , ... ``` ```ggsql PLACE point SETTING x => 5, y => 10, color => 'red' PLACE rule SETTING y => 70, linetype => 'dotted' PLACE text SETTING x => (34, 44), y => (66, 49), label => ('Mean = 34', 'Mean = 44') ``` ## SCALE clause Controls how data values are translated to aesthetic values. Sensible defaults are always provided. ``` SCALE FROM TO VIA SETTING => , ... RENAMING =>