--- name: query description: Use when the user wants to query, analyze, or explore data through the Honeydew semantic layer. Covers structured queries and multi-step deep analysis. --- ## Prerequisites Queries run against the workspace and branch set for the current session. Use `get_session_workspace_and_branch` to check the current context. If no workspace/branch is set, use `list_workspaces`, `list_workspace_branches`, and `set_session_workspace_and_branch` to select one. See the `model-exploration` skill for the full workspace/branch tool reference. --- ## Overview Honeydew provides three ways to query data through the semantic layer. Each method suits a different situation — pick the right one based on how well you understand the model and how complex the question is. | Method | Tool | Best For | | -------------------- | ---------------------------------------------- | -------------------------------------------------------------------------------- | | **Structured query** | `get_data_from_fields` / `get_sql_from_fields` | You know the exact fields. Deterministic, full control. | | **Deep analysis** | `ask_deep_analysis_question` | Any natural language question — simple or complex, "why", multi-step, agentic. | --- ## When to Use Each Method ### 1. Structured Query (`get_data_from_fields` / `get_sql_from_fields`) **Use when:** - You know the entity, attribute, and metric names (or can discover them via `list_entities` / `get_entity`) - You need precise control over filters, ordering, and field selection - You want deterministic, reproducible results - You need to validate a newly created metric or attribute - The user specifies exact fields like "show me `detailed_listings.price` by `detailed_listings.room_type`" **Do NOT use when:** - The question requires multi-step reasoning or investigation **How it works:** - `get_data_from_fields` — executes the query and returns data rows - `get_sql_from_fields` — returns the generated SQL without executing (useful for review, debugging, or handing off to other tools) Both take the same field parameters. ### 2. Deep Analysis (`ask_deep_analysis_question`) **Use when:** - The user asks a question in plain English and you don't know the exact field names - The user wants a quick answer without worrying about model details - The question requires multiple steps or investigative reasoning - The user asks "why" something happened (e.g., "why did revenue drop in Q3?") - The user wants trend analysis, anomaly detection, or root cause investigation - The question is open-ended and may require looking at the data from multiple angles - Follow-up questions build on prior analysis (use `conversation_id`) --- ## Decision Flow ``` User asks a data question │ ├─► Do you know the exact field names? │ │ │ ├─► YES → get_data_from_fields (structured, deterministic) │ │ (or get_sql_from_fields to preview SQL without executing) │ │ │ └─► NO → ask_deep_analysis_question (plain English, any complexity) │ └─► Plain English question / investigation / "why" / trends? └─► ask_deep_analysis_question ``` --- ## Method 1: Structured Query ### Field Parameters A structured query uses flat field parameters to define what data to retrieve: - **`attributes`** — dimensions to group by (columns in the output), e.g. `["entity.attribute_name"]` - **`metrics`** — aggregated measures (SUM, COUNT, AVG, etc.), e.g. `["entity.metric_name"]` - **`filters`** — row-level filters applied before aggregation, e.g. `["entity.field = 'value'"]` - **`order_by`** — sort order for results. **Each entry MUST be a quoted string**, as if it were a SQL identifier — e.g. `["\"entity.field\" ASC"]`. Always wrap the field reference in double quotes inside the string. - **`domain`** — optional domain name for query context - **`limit`** — max rows to return (default: 100) - **`offset`** — rows to skip (for pagination) All fields use `entity.field_name` syntax. Cross-entity fields are supported when relations exist. ### Discovering Fields Before building a query, discover the available fields: 1. `list_entities` — see all entities 2. `get_entity` with entity name — see its attributes, metrics, and relations 3. `get_field` with entity and field name — get detailed info about a specific field 4. `list_domains` — see all available domains (useful before passing `domain` parameter) 5. `search_model` with a keyword and `search_mode` (`OR` for broad discovery, `EXACT` for known names) — find fields across the model ### Examples **Simple metric query — total count:** Call `get_data_from_fields` with: - `metrics`: `["detailed_listings.count"]` **Dimension breakdown — listings by room type:** Call `get_data_from_fields` with: - `attributes`: `["detailed_listings.room_type"]` - `metrics`: `["detailed_listings.count"]` - `order_by`: `["\"detailed_listings.count\" DESC"]` **Filtered query — only entire homes:** Call `get_data_from_fields` with: - `attributes`: `["detailed_listings.neighbourhood_cleansed"]` - `metrics`: `["detailed_listings.count"]` - `filters`: `["detailed_listings.room_type = 'Entire home/apt'"]` - `order_by`: `["\"detailed_listings.count\" DESC"]` **Cross-entity query — listings with host info:** Call `get_data_from_fields` with: - `attributes`: `["detailed_listings.room_type", "dim_host.host_is_superhost"]` - `metrics`: `["detailed_listings.count"]` - `order_by`: `["\"detailed_listings.count\" DESC"]` **Using aliases — rename fields or ad-hoc expressions:** You can alias any field or ad-hoc expression using `AS "alias_name"`. This controls the column name in the output. Call `get_data_from_fields` with: - `attributes`: `["detailed_listings.room_type"]` - `metrics`: `["detailed_listings.count AS \"total_listings\"", "AVG(detailed_listings.price) AS \"avg_price\""]` - `order_by`: `["\"total_listings\" DESC"]` Once aliased, use the alias (not the original expression) in `order_by`. **Pagination — large result sets:** Call `get_data_from_fields` with: - `attributes`: (your fields) - `metrics`: (your metrics) - `limit`: 50 (max rows to return) - `offset`: 100 (skip first 100 rows) **Finding duplicate values:** Call `get_data_from_fields` with: - `attributes`: `["detailed_listings.host_name"]` - `metrics`: `["COUNT(detailed_listings.host_name)"]` - `filters`: `["COUNT(detailed_listings.host_name) > 1"]` - `order_by`: `["\"COUNT(detailed_listings.host_name)\" DESC"]` This groups by the attribute, counts occurrences, and filters to only rows that appear more than once — surfacing duplicates. **SQL preview only:** Call `get_sql_from_fields` with the same field parameters to see the generated SQL without executing. ### Filter Syntax Filters use standard comparison expressions: `=`, `>`, `<`, `IN (...)`, `ILIKE`, `SEARCH(...)`, `IS NULL`, booleans, date ranges, and `AND`/`OR` combinations. > For the complete filter expression reference — including SEARCH, date handling, and type casting — see the **filtering** skill. --- ## Method 2: Deep Analysis ### ask_deep_analysis_question Call with: - `question` (required): the analysis question - `agent` (optional): agent name to use as analysis context — use `list_agents` to discover available agents and their associated domains - `conversation_id` (optional): ID from a previous deep analysis call, for follow-up questions ``` question: "Analyze the relationship between host response time and review scores. Are there significant patterns?" agent: "my_agent" ``` Returns: - Markdown analysis report with findings - Supporting data - Suggested follow-up questions - `conversation_id` for continuing the conversation ### After Deep Analysis: Display the UI Link After a successful `ask_deep_analysis_question` call, the response includes a `ui_url` field. **Always display this URL to the user** so they can view the full analysis in the Honeydew application. ### Follow-up Questions Use `conversation_id` from the previous response to ask follow-up questions that build on the prior analysis: ``` question: "Now break this down by room type — does the pattern hold across all types?" agent: "my_agent" conversation_id: "" ``` ### Example Questions **Simple natural language:** - "What are the top 10 neighbourhoods by number of listings?" - "Show me average price by room type." **Complex analysis:** - "Why did the average review score drop for listings in Brooklyn?" - "What factors most influence listing price? Analyze the key drivers." - "Compare superhost vs non-superhost performance across all metrics." - "Identify unusual patterns in listing availability over the past year." - "What are the characteristics of top-performing listings?" --- ## Combining Methods For complex tasks, combine methods in sequence: 1. **Discover** — Use `list_entities` / `get_entity` to understand the model 2. **Query** — Use `get_data_from_fields` for precise, targeted queries 3. **Investigate** — Use `ask_deep_analysis_question` for root cause or trend analysis ### Example Workflow User: "Help me understand pricing patterns for Airbnb listings." 1. Discover entities: `list_entities` → find `detailed_listings` 2. Explore fields: `get_entity` for `detailed_listings` → find `price`, `room_type`, `neighbourhood_cleansed` 3. Targeted query: `get_data_from_fields` → price distribution by neighbourhood for Entire homes only 4. Deep dive: `ask_deep_analysis_question` → "What factors most influence listing price? Analyze correlations with room type, location, amenities, and reviews." --- ## Documentation Lookup Use the `honeydew-docs` MCP tools to search the Honeydew documentation when: - The user asks about query capabilities or features not covered in this skill - You need to understand how the query API interacts with domains, parameters, or governance rules - The user encounters unexpected query behavior and needs deeper context on how the semantic layer resolves queries Search for topics like: "queries", "perspectives", "dynamic datasets", "parameters", "query API". --- ## Tip: Getting Distinct Values for a Field To retrieve the distinct (unique) values of a field, include it in `attributes` and add a count metric in `metrics`. Use the entity's built-in count metric (e.g., `entity.count`) if available, or an ad-hoc count metric using `COUNT(entity.field)` on the field whose distinct values you want — never use `COUNT(*)`. The metric forces aggregation, which groups by the attribute and returns one row per distinct value. **Example — distinct room types:** Call `get_data_from_fields` with: - `attributes`: `["detailed_listings.room_type"]` - `metrics`: `["COUNT(detailed_listings.room_type)"]` - `order_by`: `["\"COUNT(detailed_listings.room_type)\" DESC"]` This returns each unique `room_type` along with its count, ordered by frequency. The count is a useful bonus — it tells you how common each value is — but the key point is that the query returns **one row per distinct value**. This pattern is useful for: - **Exploring filter values** — find out what values exist before writing a filter expression (see the **filtering** skill) - **Validating a new attribute** — after creating a calculated attribute, check its distinct output values to confirm the logic is correct (see the **attribute-creation** skill) - **Understanding data distribution** — see how data is spread across categories --- ## Best Practices - **Start with discovery** — always check `list_entities` / `get_entity` before building queries, so you reference real fields - **Use structured queries for precision** — when you know the fields, `get_data_from_fields` gives you full control and reproducible results - **Use deep analysis for insight** — when the question is about "why" or requires investigating multiple dimensions - **Paginate large results** — use `limit` and `offset` in `get_data_from_fields` to avoid overwhelming output - **Show SQL when debugging** — use `get_sql_from_fields` to inspect the generated query - **Reference fields correctly** — always use `entity.field_name` syntax in field parameters