--- name: e2e-tests-studio description: > REQUIRED when modifying any file in packages/playground-ui or packages/playground. Triggers on: React component creation/modification/refactoring, UI changes, new playground features, bug fixes affecting studio UI. Generates Playwright E2E tests that validate PRODUCT BEHAVIOR, not just UI states. model: claude-opus-4-5 --- # E2E Behavior Validation for Frontend Modifications ## Core Principle: Test Product Behavior, Not UI States **CRITICAL**: Tests must verify that product features WORK correctly, not just that UI elements render. ### What NOT to test (UI States): - ❌ "Dropdown opens when clicked" - ❌ "Modal appears after button click" - ❌ "Loading spinner shows during request" - ❌ "Form fields are visible" - ❌ "Sidebar collapses" ### What TO test (Product Behavior): - ✅ "Selecting an LLM provider configures the agent to use that provider" - ✅ "Creating a new agent persists it and shows in the agents list" - ✅ "Running a tool with parameters returns the expected output" - ✅ "Chat messages stream correctly and maintain conversation context" - ✅ "Workflow execution triggers tools in the correct order" ## Prerequisites Requires Playwright MCP server. If the `browser_navigate` tool is unavailable, instruct the user to add it: ```sh claude mcp add playwright -- npx @playwright/mcp@latest ``` ## Step 1: Understand the Feature Intent Before writing ANY test, answer these questions: 1. **What user problem does this feature solve?** 2. **What is the expected outcome when the feature works correctly?** 3. **What data flows through the system?** (user input → API → state → UI) 4. **What should persist after page reload?** 5. **What downstream effects should this action have?** Document these answers as comments in your test file. ## Step 2: Build and Start ```sh pnpm build:cli cd packages/playground/e2e/kitchen-sink && pnpm dev ``` Verify server at http://localhost:4111 ## Step 3: Map Feature to Behavior Tests ### Feature-to-Test Mapping Guide | Feature Category | What to Test | Example Assertion | | -------------------------- | ------------------------------------------------- | ------------------------------------------------------------ | | **Agent Configuration** | Config changes affect agent behavior | Send message → verify response uses selected model | | **LLM Provider Selection** | Selected provider is used in requests | Intercept API call → verify provider in request payload | | **Tool Execution** | Tool runs with correct params & returns result | Execute tool → verify output matches expected transformation | | **Workflow Execution** | Steps execute in order, data flows between steps | Run workflow → verify each step's output feeds next step | | **Chat/Streaming** | Messages persist, context maintained across turns | Multi-turn conversation → verify context awareness | | **MCP Server Tools** | Server tools are callable and return data | Call MCP tool → verify response structure and content | | **Memory/Persistence** | Data survives page reload | Create item → reload → verify item exists | | **Error Handling** | Errors surface correctly to user | Trigger error condition → verify error message + recovery | ## Step 4: Write Behavior-Focused Tests ### Test Structure Template ```ts import { test, expect, Page } from '@playwright/test'; import { resetStorage } from '../__utils__/reset-storage'; import { selectFixture } from '../__utils__/select-fixture'; import { nanoid } from 'nanoid'; /** * FEATURE: [Name of feature] * USER STORY: As a user, I want to [action] so that [outcome] * BEHAVIOR UNDER TEST: [Specific behavior being validated] */ test.describe('[Feature Name] - Behavior Tests', () => { let page: Page; test.beforeEach(async ({ browser }) => { const context = await browser.newContext(); page = await context.newPage(); }); test.afterEach(async () => { await resetStorage(page); }); test('should [verb describing behavior] when [trigger condition]', async () => { // ARRANGE: Set up preconditions // - Navigate to the feature // - Configure any required state // ACT: Perform the user action that triggers the behavior // ASSERT: Verify the OUTCOME, not the UI state // - Check data persistence // - Verify downstream effects // - Confirm API calls made correctly }); }); ``` ### Behavior Test Patterns #### Pattern 1: Configuration Affects Behavior ```ts test('selecting LLM provider should use that provider for agent responses', async () => { // ARRANGE await page.goto('/agents/my-agent/chat'); // Intercept API to verify provider let capturedProvider: string | null = null; await page.route('**/api/chat', route => { const body = JSON.parse(route.request().postData() || '{}'); capturedProvider = body.provider; route.continue(); }); // ACT: Select a different provider await page.getByTestId('provider-selector').click(); await page.getByRole('option', { name: 'OpenAI' }).click(); // Send a message to trigger the agent await page.getByTestId('chat-input').fill('Hello'); await page.getByTestId('send-button').click(); // ASSERT: Verify the selected provider was used await expect.poll(() => capturedProvider).toBe('openai'); }); ``` #### Pattern 2: Data Persistence ```ts test('created agent should persist after page reload', async () => { // ARRANGE await page.goto('/agents'); const agentName = `Test Agent ${nanoid()}`; // ACT: Create new agent await page.getByTestId('create-agent-button').click(); await page.getByTestId('agent-name-input').fill(agentName); await page.getByTestId('save-agent-button').click(); // Wait for creation to complete await expect(page.getByText(agentName)).toBeVisible(); // ASSERT: Verify persistence await page.reload(); await expect(page.getByText(agentName)).toBeVisible({ timeout: 10000 }); }); ``` #### Pattern 3: Tool Execution Produces Correct Output ```ts test('weather tool should return formatted weather data', async () => { // ARRANGE await selectFixture(page, 'weather-success'); await page.goto('/tools/weather-tool'); // ACT: Execute tool with parameters await page.getByTestId('param-city').fill('San Francisco'); await page.getByTestId('execute-tool-button').click(); // ASSERT: Verify OUTPUT content, not just that output appears const output = page.getByTestId('tool-output'); await expect(output).toContainText('temperature'); await expect(output).toContainText('San Francisco'); // Verify structured data if applicable const outputText = await output.textContent(); const outputData = JSON.parse(outputText || '{}'); expect(outputData).toHaveProperty('temperature'); expect(outputData).toHaveProperty('conditions'); }); ``` #### Pattern 4: Workflow Step Chaining ```ts test('workflow should pass data between steps correctly', async () => { // ARRANGE await selectFixture(page, 'workflow-multi-step'); const sessionId = nanoid(); await page.goto(`/workflows/data-pipeline?session=${sessionId}`); // ACT: Trigger workflow execution await page.getByTestId('workflow-input').fill('test input data'); await page.getByTestId('run-workflow-button').click(); // ASSERT: Verify each step received correct input from previous step // Wait for completion await expect(page.getByTestId('workflow-status')).toHaveText('completed', { timeout: 30000 }); // Check step outputs show data transformation chain const step1Output = await page.getByTestId('step-1-output').textContent(); const step2Output = await page.getByTestId('step-2-output').textContent(); // Verify step 2 received step 1's output as input expect(step2Output).toContain(step1Output); }); ``` #### Pattern 5: Streaming Chat with Context ```ts test('chat should maintain conversation context across messages', async () => { // ARRANGE await selectFixture(page, 'contextual-chat'); const chatId = nanoid(); await page.goto(`/agents/assistant/chat/${chatId}`); // ACT: Multi-turn conversation await page.getByTestId('chat-input').fill('My name is Alice'); await page.getByTestId('send-button').click(); await expect(page.getByTestId('assistant-message').last()).toBeVisible({ timeout: 20000 }); await page.getByTestId('chat-input').fill('What is my name?'); await page.getByTestId('send-button').click(); // ASSERT: Verify context was maintained const response = page.getByTestId('assistant-message').last(); await expect(response).toContainText('Alice', { timeout: 20000 }); }); ``` #### Pattern 6: Error Recovery ```ts test('should show actionable error and allow retry when API fails', async () => { // ARRANGE: Set up failure fixture await selectFixture(page, 'api-failure'); await page.goto('/tools/flaky-tool'); // ACT: Trigger the error await page.getByTestId('execute-tool-button').click(); // ASSERT: Error is shown with recovery option await expect(page.getByTestId('error-message')).toContainText('failed'); await expect(page.getByTestId('retry-button')).toBeVisible(); // Switch to success fixture and retry await selectFixture(page, 'api-success'); await page.getByTestId('retry-button').click(); // Verify recovery worked await expect(page.getByTestId('tool-output')).toBeVisible({ timeout: 10000 }); await expect(page.getByTestId('error-message')).not.toBeVisible(); }); ``` ## Step 5: Update Existing Tests When a test file already exists: 1. **Read the existing tests** to understand current coverage 2. **Identify if tests are UI-focused or behavior-focused** 3. **Refactor UI-focused tests** to verify behavior instead: ### Refactoring Example **BEFORE (UI-focused):** ```ts test('dropdown opens when clicked', async () => { await page.getByTestId('model-dropdown').click(); await expect(page.getByRole('listbox')).toBeVisible(); }); ``` **AFTER (Behavior-focused):** ```ts test('selecting model from dropdown updates agent configuration', async () => { // Open dropdown and select model await page.getByTestId('model-dropdown').click(); await page.getByRole('option', { name: 'GPT-4' }).click(); // Verify the selection persists and affects behavior await page.reload(); await expect(page.getByTestId('model-dropdown')).toHaveText('GPT-4'); // Optionally: verify the model is used in actual requests // (via request interception or checking response metadata) }); ``` ## Step 6: Kitchen-Sink Fixtures for Behavior Testing Fixtures should represent **realistic scenarios**, not just mock data: ### Fixture Naming Convention ``` -.fixture.ts Examples: - agent-with-tools.fixture.ts - chat-multi-turn-context.fixture.ts - workflow-parallel-execution.fixture.ts - tool-validation-error.fixture.ts - mcp-server-timeout.fixture.ts ``` ### Fixture Content Requirements Each fixture must define: 1. **Scenario description** (what behavior it enables testing) 2. **Expected outcomes** (what assertions should pass) 3. **Edge cases covered** (error states, empty states, etc.) ```ts // fixtures/agent-provider-switch.fixture.ts export const agentProviderSwitch = { name: 'agent-provider-switch', description: 'Tests that switching LLM providers changes agent behavior', // Mock responses for different providers responses: { openai: { content: 'Response from OpenAI', model: 'gpt-4' }, anthropic: { content: 'Response from Anthropic', model: 'claude-3' }, }, expectedBehavior: { // When provider is switched, subsequent messages use new provider providerSwitchAffectsNextMessage: true, // Provider selection persists across page reload providerPersistsOnReload: true, }, }; ``` ## Step 7: Run and Validate ```sh cd packages/playground && pnpm test:e2e ``` ### Test Quality Checklist Before considering tests complete, verify: - [ ] Each test has a clear user story comment - [ ] Tests verify OUTCOMES, not intermediate UI states - [ ] Tests would FAIL if the feature broke (not just if UI changed) - [ ] Persistence is verified via `page.reload()` where applicable - [ ] Error scenarios are covered - [ ] Tests use appropriate timeouts for async operations - [ ] Fixtures represent realistic usage scenarios ## Quick Reference | Step | Command/Action | | --------- | ----------------------------------------------------- | | Build | `pnpm build:cli` | | Start | `cd packages/playground/e2e/kitchen-sink && pnpm dev` | | App URL | http://localhost:4111 | | Routes | `@packages/playground/src/App.tsx` | | Run tests | `cd packages/playground && pnpm test:e2e` | | Test dir | `packages/playground/e2e/tests/` | | Fixtures | `packages/playground/e2e/kitchen-sink/fixtures/` | ## Anti-Patterns to Avoid | ❌ Don't | ✅ Do Instead | | ---------------------------------- | ------------------------------------------------------------ | | Test that modal opens | Test that modal action completes and persists | | Test that button is clickable | Test that clicking button produces expected result | | Test loading spinner appears | Test that loaded data is correct | | Test form validation message shows | Test that invalid form cannot submit AND valid form succeeds | | Test dropdown has options | Test that selecting option changes system behavior | | Test sidebar navigation works | Test that navigated page has correct data/functionality | | Assert element is visible | Assert element contains expected data/state |