--- name: backend-code-review description: Review backend code for quality, security, maintainability, and best practices based on established checklist rules. Use when the user requests a review, analysis, or improvement of backend files (e.g., `.py`) under the `src/backend/` directory. Do NOT use for frontend files (e.g., `.tsx`, `.ts`, `.js`). Supports pending-change review, code snippets review, and file-focused review. --- # Backend Code Review ## When to use this skill Use this skill whenever the user asks to **review, analyze, or improve** backend code (e.g., `.py`) under the `src/backend/` directory. Supports the following review modes: - **Pending-change review**: when the user asks to review current changes (inspect staged/working-tree files slated for commit to get the changes). - **Code snippets review**: when the user pastes code snippets (e.g., a function/class/module excerpt) into the chat and asks for a review. - **File-focused review**: when the user points to specific files and asks for a review of those files (one file or a small, explicit set of files, e.g., `src/backend/base/langflow/api/v1/flows.py`). Do NOT use this skill when: - The request is about frontend code or UI (e.g., `.tsx`, `.ts`, `.js`, `src/frontend/`). - The user is not asking for a review/analysis/improvement of backend code. - The scope is not under `src/backend/` (unless the user explicitly asks to review backend-related changes outside `src/backend/`). ## How to use this skill Follow these steps when using this skill: 1. **Identify the review mode** (pending-change vs snippet vs file-focused) based on the user's input. Keep the scope tight: review only what the user provided or explicitly referenced. 2. Follow the rules defined in **Checklist** to perform the review. If no Checklist rule matches, apply **General Review Rules** as a fallback to perform the best-effort review. 3. Compose the final output strictly following the **Required Output Format**. Notes when using this skill: - Always include actionable fixes or suggestions (including possible code snippets). - Use best-effort `File:Line` references when a file path and line numbers are available; otherwise, use the most specific identifier you can. ## Checklist - db schema design: if the review scope includes code/files under `src/backend/base/langflow/services/database/models/` or Alembic migrations under `src/backend/base/langflow/alembic/versions/`, follow [references/db-schema-rule.md](references/db-schema-rule.md) to perform the review - architecture: if the review scope involves route/service/model layering, dependency direction, or moving responsibilities across modules, follow [references/architecture-rule.md](references/architecture-rule.md) to perform the review - service abstraction: if the review scope contains table/model operations (e.g., `select(...)`, `session.execute(...)`, joins, CRUD) and is not already inside a service under `src/backend/base/langflow/services/`, follow [references/repositories-rule.md](references/repositories-rule.md) to perform the review - sqlalchemy patterns: if the review scope involves SQLAlchemy/SQLModel session/query usage, db transaction/crud usage, `session_scope()` usage, or raw SQL usage, follow [references/sqlalchemy-rule.md](references/sqlalchemy-rule.md) to perform the review ## General Review Rules ### 1. Security Review Check for: - SQL injection vulnerabilities (especially raw `text()` queries with string interpolation). Consequence: attacker can read/modify/delete any data in the database. - Server-Side Request Forgery (SSRF) in component HTTP calls. Consequence: attacker uses the server to scan internal networks or access cloud metadata endpoints. - Command injection (especially in subprocess or shell-executing components). Consequence: attacker gains shell access to the server. - Insecure deserialization (pickle, yaml.load without SafeLoader). Consequence: arbitrary code execution on the server. - Hardcoded secrets/credentials. Consequence: secrets leak via git history and are impossible to fully revoke. - Improper authentication/authorization (missing `CurrentActiveUser` dependency). Consequence: unauthenticated users can access protected endpoints. - Insecure direct object references (missing `user_id` scoping on queries). Consequence: user A can read/modify user B's flows, variables, API keys. - Path traversal in file storage operations. Consequence: attacker reads arbitrary server files (e.g., `/etc/passwd`, `.env`). ### 2. Performance Review Check for: - N+1 queries (especially in loops calling `session.execute()`). Consequence: 100 flows = 101 DB queries instead of 2; page load goes from 50ms to 5s. - Missing database indexes on frequently queried columns. Consequence: full table scans on large datasets; queries degrade from O(log n) to O(n). - Memory leaks (unbounded caches, retained references in long-lived services). Consequence: server OOM after hours of operation; pods restart in production. - Blocking operations in async code (`time.sleep()`, synchronous I/O, CPU-bound work without `run_in_executor`). Consequence: entire event loop stalls; all concurrent requests hang until the blocking call completes. - Missing caching opportunities for expensive computations. Consequence: repeated computation of the same result on every request. - Large result sets loaded entirely into memory without pagination. Consequence: memory spike + slow response when user has 10K+ flows. ### 3. Code Quality Review Check for: - Code forward compatibility with Python 3.10-3.13 - Code duplication (DRY violations — extract when the *exact same business rule* is duplicated in 3+ places) - Functions doing too much (SRP violations — if you need "and" to describe it, split it) - Deep nesting / complex conditionals (prefer early returns and guard clauses) - Magic numbers/strings (extract to named constants or enums) - Poor naming: unclear abbreviations, misleading names, generic names (`data`, `result`, `obj`, `temp`). Functions should use verbs (`get`, `create`, `validate`). Booleans should use prefixes (`is_`, `has_`, `can_`, `should_`). - Missing error handling (bare `except`, swallowed exceptions, silent failures) - Incomplete type coverage (use strong typing, avoid `Any` where a concrete type is known) - Use Python 3.10+ union syntax (`X | Y` not `Union[X, Y]`, `X | None` not `Optional[X]`) - Use `TYPE_CHECKING` guard for imports only needed for type annotations (prevents circular imports) - Use `Annotated[Type, Depends(...)]` with project aliases (`CurrentActiveUser`, `DbSession`, `DbSessionReadOnly`) for FastAPI DI - Google-style docstrings (enforced by Ruff): `Args:`, `Returns:`, `Raises:` sections for public functions - Violations of SOLID principles - YAGNI violations (code that anticipates future needs without a present requirement) - Line length exceeding 120 characters (project Ruff config) - Comments that explain WHAT instead of WHY (comments should only explain reasoning, not restate code) - Commented-out code (use version control instead) - Boolean parameters that switch function behavior (split into two named functions instead) - Mutable shared state where immutable alternatives exist (prefer returning new objects over mutation) ### 4. File Structure Review Check for: - Production files exceeding ~500 lines of code (excluding imports, types, and docstrings). Files above 600 lines are a red flag and should be split by responsibility. Why: Files above 500 lines have statistically higher defect rates and take longer to review. They signal multiple responsibilities (SRP violation). In Langflow, services like `DatabaseService` that grow beyond this limit should have their CRUD operations extracted to dedicated modules. - Test files exceeding ~1000 lines. Split by logical grouping if exceeded. - No more than 5 functions with different responsibilities in a single file (per AGENTS-example.md). - Each file has a single reason to exist and a single reason to change (SRP). - No generic file names: `utils.py`, `helpers.py`, `misc.py`, `common.py` as standalone files. Why: A file named `utils.py` becomes a dumping ground for unrelated functions. Within months it has 50+ functions covering formatting, validation, parsing, and HTTP calls — violating SRP. Each function group should be in a file named after its responsibility (`formatting.py`, `validation.py`). ### 5. Testing Review Check for: - Missing test coverage for new code paths - Tests that don't test behavior (testing implementation details) - Flaky test patterns (time-dependent, order-dependent, external-service-dependent) - Proper use of `pytest.mark.asyncio` for async tests - Excessive mocking (prefer real integrations per project conventions) - Coverage target: 80% (minimum acceptable: 75%) - Test anti-patterns: The Liar (passes but doesn't verify claimed behavior), The Mirror (asserts exactly what code does), The Giant (50+ lines setup), The Mockery (tests only mock setup), The Inspector (coupled to implementation), The Chain Gang (depends on execution order), The Flaky (inconsistent results) **Happy path tests are the foundation but are NOT enough.** Tests MUST also challenge the code to find real defects: - **Unexpected inputs**: `None`, `""`, `[]`, `{}`, `0`, `-1`, `UUID("00000000-0000-0000-0000-000000000000")` - **Boundary values**: max length strings, exactly at the limit, one past the limit, zero items, max items - **Malformed data**: missing required fields, extra unexpected fields, wrong types, invalid formats - **Error states**: what happens when the database is down? When an external API returns 500? When the user doesn't exist? - **What should NOT happen**: verify that user A CANNOT access user B's flows. Verify that a deleted flow returns 404. Verify that invalid `endpoint_name` is rejected with 422. - **Error messages and types**: not just that it fails, but that it fails with the RIGHT exception and the RIGHT message - **Concurrency**: what happens when two requests try to update the same flow simultaneously? **Write tests based on REQUIREMENTS/SPEC, not on what the source code currently does.** This is how you catch bugs where the code diverges from expected behavior. **When a test fails:** first ask if the CODE is wrong, not the test. Do NOT silently change a failing assertion to match the current code without understanding WHY. ### 6. Observability Review Check for: - Use the async logger from `lfx.log.logger` with `a`-prefixed methods (`adebug`, `ainfo`, `awarning`, `aerror`, `aexception`). Never use `print()` or stdlib `logging`. - Log at key decision points and boundaries, not inside tight loops - Include: operation name, relevant IDs, outcome (success/failure), duration if relevant - Correct log levels: ERROR (broken, needs attention), WARN (degraded but recoverable), INFO (significant events), DEBUG (diagnostic, off in prod) - **ZERO PII TOLERANCE**: Never log email addresses, user names, phone numbers, tokens, passwords. Only approved identifiers: `user_id`, `flow_id`, `session_id` - No `print()` statements — these go to production logs - Use `{e!s}` for string representation of exceptions in log messages ### 7. Pre-Commit Verification For pending-change reviews, verify the author has run: - `make format_backend` (Ruff formatter) — inconsistent formatting creates noisy diffs that hide real changes in code review. Format first, review second. - `make lint` (MyPy type checking) — type errors caught at lint time are 10x cheaper to fix than runtime crashes in production. Langflow services use duck typing via `Service` base class; MyPy catches mismatches early. - `make unit_tests` (pytest) — a failing test means the change breaks existing behavior. Never merge with failing tests; investigate whether the code or the test is wrong. ## Required Output Format When this skill is invoked, the response must exactly follow one of the two templates: ### Template A (any findings) ```markdown # Code Review Summary Found critical issues need to be fixed: ## 🔴 Critical (Must Fix) ### 1. FilePath: line #### Explanation #### Suggested Fix 1. 2. (optional, omit if not applicable) --- ... (repeat for each critical issue) ... Found suggestions for improvement: ## 🟡 Suggestions (Should Consider) ### 1. FilePath: line #### Explanation #### Suggested Fix 1. 2. (optional, omit if not applicable) --- ... (repeat for each suggestion) ... Found optional nits: ## 🟢 Nits (Optional) ### 1. FilePath: line #### Explanation #### Suggested Fix - --- ... (repeat for each nits) ... ## ✅ What's Good - ``` - If there are no critical issues or suggestions or optional nits or good points, just omit that section. - If the issue number is more than 10, summarize as "Found 10+ critical issues/suggestions/optional nits" and only output the first 10 items. - Don't compress the blank lines between sections; keep them as-is for readability. - If there is any issue that requires code changes, append a brief follow-up question to ask whether the user wants to apply the fix(es) after the structured output. For example: "Would you like me to use the Suggested fix(es) to address these issues?" ### Template B (no issues) ```markdown ## Code Review Summary ✅ No issues found. ```