---
name: e2e-test-review
description: Review Cypress E2E spec files for Metabase conventions, common gotchas, and flakiness/performance issues. Use when reviewing pull requests or diffs containing Cypress spec files in e2e/test/scenarios/.
allowed-tools: Read, Grep, Bash, Glob
---
# E2E Test Review Skill
@./../_shared/cypress-conventions.md
## Review mode detection
Before starting, determine which mode to use:
1. **PR review mode** — if `mcp__github__create_pending_pull_request_review` is available, post issues as one cohesive pending review.
2. **Local review mode** — if not, output a numbered list in the conversation.
## Review process
1. Detect mode.
2. Read the changed spec end-to-end first to understand intent. Don't review line-by-line cold.
3. **(Conditional)** If after reading the spec a specific assertion or selector is **ambiguous** — you can't tell if it's the right anchor, or whether the test actually verifies what its title claims — briefly grep the related component for the test ID / role / text. **Don't read components by default.** Only do this when there's a real signal of confusion in the spec.
4. Scan against the checklist + pattern table below.
5. Number all issues sequentially. Skip nits — only flag what's worth fixing.
### When to hand off instead of review
If the spec references an issue (`metabase#NNNNN`) and the user wants to **fix** flakiness or assess whether the test still reproduces the original bug, that's outside the review skill's scope. Point them at `/fix-flakey-test` (or the dedicated flake-fixing workflow), which knows to fetch the issue and the resolving PR's diff. The review skill stays focused on "is this test well-written and conformant" — it doesn't fetch external issue context by default.
## Review checklist
The checklist mirrors the order of the conventions file. Items marked **(lint)** are also caught by ESLint — flag only when you see them slip through (helper-wrapped, lint-disabled, etc.).
### File and naming
- [ ] Spec lives in `e2e/test/scenarios//`
- [ ] Extension is `.cy.spec.ts` (preferred) or `.cy.spec.js` — don't flag `.js` on existing files
- [ ] `describe` block names the area when relevant: `"area > sub-area > feature (#issue-number)"`
- [ ] No leftover `.only` or `.skip` (pre-commit hook should catch, but flag it if it slips through)
### Helpers and constants
- [ ] Helpers accessed via `const { H } = cy;` — no direct imports from `e2e/support/helpers` **(lint)**
- [ ] Sample DB schema imported from `cypress_sample_database`
- [ ] Instance data IDs imported from `cypress_sample_instance_data`
- [ ] **No hardcoded numeric IDs anywhere** — including for entities the test creates itself. Capture from the create response or alias the intercept.
- [ ] Existing navigation helpers used (`H.openOrdersTable`, `H.visitDashboard(id)`, etc.) instead of raw `cy.visit()` chains
### Selectors
- [ ] Prefers a11y queries (`findByRole`, `findByLabelText`) over `findByText`
- [ ] `findByText` only when an a11y query doesn't fit
- [ ] `findByTestId` for `data-testid` attributes (never raw `cy.get("[data-testid='...']")`)
- [ ] No CSS class names — especially generated ones from styled-components/Mantine (`.css-1abc2d`)
- [ ] No ad-hoc CSS attribute selectors in specs or new helpers (the visualization helpers in `e2e/support/helpers/e2e-visual-tests-helpers.js` are the **only** intentional exception — see "What NOT to flag")
- [ ] No XPath
- [ ] **Positional selectors** (`.eq()`, `.first()`, `.last()`, `:nth-child`) only when the order is the assertion itself, or guarded by a length assertion immediately before. **(lint catches `.last()` and `.eq( cy.findByText("Foo")`), **not** assigned to a `const`
- [ ] Resolved values accessed via `.then()` or aliased with `.as()` + `cy.get("@alias")`
- [ ] Aliases only used when there's distance between lookup and use (otherwise just chain)
### Assertions
- [ ] Assertions target user-visible state (text, URL, aria) — not DOM structure
- [ ] **Negative assertions are paired with a positive one.** A standalone `should("not.exist")` / `should("not.be.visible")` passes by accident if the page hasn't rendered yet. Anchor on a positive signal first.
- [ ] **Multiple text checks on the same parent are collapsed into a `.should("contain", ...).and("contain", ...).and("not.contain", ...)` chain** rather than three separate `findByText().should(...)` queries. Single retry budget, atomic against one DOM snapshot.
- [ ] `expect()` only used inside `cy.then` / `cy.wrap` callbacks
- [ ] `.should("not.exist")` vs `.should("not.be.visible")` used correctly for intent
- [ ] No JS conditionals on cy chains (`.then(el => if (...))`) — `.then()` runs once, `.should()` retries
### `cy.within` (always chained)
- [ ] **Every `cy.within(...)` is chained off a previous selector.** Standalone `cy.within(...)` has no scope and is wrong by construction — flag on sight.
- [ ] **No single-statement `within` callbacks** — if the callback has only one inner command, chain directly off the parent instead. Reserve `within` for two or more commands sharing the scope.
- [ ] **No named parameter on `within` callbacks** — `within((modal) => ...)` is redundant; the subject is inherited automatically by inner commands. If you actually need the jQuery subject, use `.then($el => ...)` instead.
- [ ] `.within()` callback is properly closed; assertions outside the callback don't accidentally rely on the within scope
### Logging and annotation
- [ ] Step annotations use `cy.log("...")`, not `// comments` (visible in command panel, screenshots, videos)
- [ ] `cy.log` lines aren't redundant paraphrases of the next command — they mark phases or non-obvious intent
### Performance
> **Look up actual CI duration before flagging slowness.** `e2e/support/timings.json` holds the latest CI run's per-spec wall time in ms (entries keyed as `../test/scenarios/...`). Read it instead of running the spec — running adds minutes; reading is instant. Compare the spec to its siblings in the same directory to ground the "this is expensive" claim. Per-`it()` granularity is not in this file.
- [ ] **Tiny tests** — multiple `it()` blocks for the same flow with the same `beforeEach` setup are a smell. Each `it()` costs 5–10s of Cypress runner overhead + the `beforeEach` cost; merge into a single flow when possible.
- [ ] **Tiny tests that should be unit tests** — a test that only asserts on element existence/visibility (no flow, no real backend interaction, no cross-screen traversal) belongs in a Jest + RTL unit test, not e2e. Common offenders: token-gated UI checks, "renders the help panel" tests.
- [ ] **Near-duplicate tests** — a new `it()` that shares 80–90% of its setup and steps with a sibling test should extend the existing test, not spawn a clone.
- [ ] **Multiple `cy.visit()` calls** — each is a cold app boot (multi-second). After the first visit, navigate via the UI (click a link, breadcrumb, sidebar item) instead of issuing a second `cy.visit()`.
- [ ] **Redundant with a cheaper-layer test** — when the spec references an issue (e.g. `metabase#12345`) in the test name or `describe`, grep the codebase for the **bare** `#NNNNN` (no `metabase` prefix — backend tests don't always include it). If a Jest spec or backend `_test.clj` already cites the same issue, the e2e test is already redundant; recommend deletion. Recipe:
```bash
rg "#12345" -g '*.spec.{ts,tsx,js,jsx}' -g '!*.cy.spec.*' -g '*_test.clj' -g '*_test.cljc'
```
The `-g '!*.cy.spec.*'` exclusion drops the e2e spec being reviewed (which will obviously match its own reference).
### Cypress framework anti-patterns
- [ ] No `forEach` over cy queries — use `cy.each()` if iteration is needed
- [ ] No mixing native Promises / `async`-`await` with cy chains
- [ ] No reliance on stale element references after re-render — re-query instead
- [ ] No `cy.contains("text")` (single-arg, top-level) — same scoping rule as `findByText`
## Pattern matching table
Quick scan for common issues. **(lint)** marks patterns ESLint already catches in the simple form — flag when you see them slip through.
| Pattern | Issue |
| ------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------ |
| `cy.wait(2000)` | Numeric wait — use intercept alias or `.should("be.visible")` |
| `cy.get(".css-1abc2d")` | Generated CSS class — use a11y query / `findByText` / `findByTestId` |
| `cy.get("[data-testid='foo']")` | Always use `cy.findByTestId('foo')` instead |
| `cy.get("path[fill='#abc']")` in a spec | Ad-hoc chart selector — use `e2e-visual-tests-helpers` or add a new helper there |
| `cy.get("li:nth-child(3)")` | Positional in a CSS selector — anchor on text or role |
| `.last()` / `.eq(-1)` without a prior length assertion | Risk of off-by-one when collection size shifts **(lint)** |
| `cy.visit("/dashboard/10")` | Hardcoded numeric ID — capture from create response or import from `cypress_sample_instance_data` |
| `import { restore } from "e2e/support/helpers"` | Direct helper import — use `const { H } = cy;` **(lint)** |
| `cy.findByText("Save")` at top of `it()` / `beforeEach` | Unscoped text selector — wrap in `cy.contains(selector, text)` or chain off a scoping query **(lint, BUT misses helper-wrapped)** |
| `function clickSave() { cy.findByText("Save").click() }` | Helper-wrapped unscoped text — lint blind spot, scan helpers manually |
| Standalone `cy.within(() => ...)` | `cy.within` must be chained off an existing selector — wrong by construction |
| `cy.intercept` placed AFTER the triggering call | Intercept must precede the action |
| `cy.get(...).then(el => { if (...) })` | JS conditional on cy chain — use `.should()` for retry semantics |
| `await cy.something()` | Cypress chains aren't real Promises |
| `els.forEach(el => cy....)` | Use `cy.each()` instead |
| `{ timeout: 30000 }` on a single command | Likely papering over a race — find the root cause |
| `const button = cy.findByRole(...)` | Assigning `cy.*` return value — `button` is a one-shot chainer, not a DOM element **(lint catches simple cases)** |
| `const foo = () => cy.findByText("Foo")` | This is **fine** — the function form re-enqueues the query on each call |
| `.should("not.exist")` as the only assertion in a step | Negative-only — passes by accident if page hasn't rendered. Anchor on a positive assertion first |
| `H.resetTestTable()` before `H.restore()` | `restore` must come first **(lint)** |
| Multiple `cy.visit()` in one `it()` block | Each is a cold boot — navigate via the UI between screens |
| `it.only(` / `describe.only(` | Pre-commit hook should block — flag as slip |
| New `it()` block with same setup + 80–90% same steps as sibling | Likely a near-duplicate — extend the existing test instead |
| `it("...", () => { cy.findBy*().should("be.visible") })` only | Static-UI-only test — strong signal it should be a Jest unit test, not e2e |
| `cy.log("Visit dashboard"); H.visitDashboard(id);` | Redundant log — paraphrases the next command |
| `// Visit dashboard` followed by `H.visitDashboard(id);` | Use `cy.log("...")` instead — visible in screenshots/videos |
## What NOT to flag
- Don't flag the `.cy.spec.js` extension on existing files — both `.js` and `.ts` are acceptable; `.ts` is preferred only for **new** specs.
- Don't flag CSS attribute selectors (`path[fill='...']`, `[stroke-dasharray='...']`, `text[stroke-width='3']`, etc.) inside `e2e/support/helpers/e2e-visual-tests-helpers.js`. ECharts renders SVG with no `data-testid` and minimal a11y, and that file is the intentional exception. **Do** flag the same patterns when they appear in spec files or in new helpers — those should funnel through the existing helpers or extend that file.
- Don't flag `.first()` / `.eq(N)` / `:nth-child(N)` when the order is genuinely the assertion (e.g. testing sort order) or when a `.should("have.length", n)` immediately precedes them.
- Don't flag `cy.findByText(...)` / `cy.contains(...)` inside a helper body **if** the helper is documented or clearly intended to be called from inside an outer `within(...)` scope at the call site (the inherited within scope makes it safe at runtime). When in doubt, ask the author.
- Don't flag `.only` if you're doing a local-dev review (the user is intentionally focused). At PR / commit-time review, do flag it.
- Don't post "looks good" or congratulatory comments. Only post issues.
- Don't post comments for formatting issues that the linter handles (Prettier/ESLint).
- Don't flag stylistic preferences that don't materially affect flakiness, speed, or correctness.
## Feedback format
Number every issue sequentially starting from `Issue 1`. Format: `**Issue N: [Brief title]**`.
### Local review mode
```markdown
## Issues
**Issue 1: [Brief title]**
File:Line — succinct description
Suggested fix
**Issue 2: [Brief title]**
...
```
### PR review mode
Use the pending review workflow:
1. `mcp__github__create_pending_pull_request_review` — start a draft review.
2. `mcp__github__get_pull_request_diff` — get file paths and line numbers.
3. Identify ALL issues, number them sequentially.
4. `mcp__github__add_pull_request_review_comment_to_pending_review` — post each issue as a separate comment, in **parallel** within a single response.
5. `mcp__github__submit_pending_pull_request_review` with event `"COMMENT"` (not `REQUEST_CHANGES`), no body.
Each comment body starts with `**Issue N: [Brief title]**`.
## Final check
1. Trim issues that won't materially help the author.
2. Verify sequential numbering with no gaps.
3. In PR mode, verify each issue was posted as a separate review comment.