--- name: system-architecture description: >- System architecture guidance for Python/React full-stack projects. Use during the design phase when making architectural decisions — component boundaries, service layer design, data flow patterns, database schema planning, and technology trade-off analysis. Covers FastAPI layer architecture (Routes/Services/Repositories/Models), React component hierarchy, state management, and cross-cutting concerns (auth, errors, logging). Produces architecture documents and ADRs. Does NOT cover implementation (use python-backend-expert or react-frontend-expert) or API contract design (use api-design-patterns). license: MIT compatibility: 'Python 3.12+, FastAPI 0.115+, React 18+, SQLAlchemy 2.0+, TypeScript 5+' metadata: author: platform-team version: '1.0.0' sdlc-phase: architecture allowed-tools: Read Grep Glob context: fork --- # System Architecture ## When to Use Activate this skill when: - Designing a new module, service, or major feature that requires structural decisions - Choosing between architectural approaches (e.g., where to place logic, how to structure data flow) - Planning database schema changes or refactoring existing schema - Making frontend state management decisions (server state vs client state, context vs store) - Evaluating technology trade-offs for a new capability - Creating or reviewing Architecture Decision Records (ADRs) - Setting up a new project or major subsystem from scratch Do NOT use this skill for: - Writing implementation code (use `python-backend-expert` or `react-frontend-expert`) - API contract design or endpoint specifications (use `api-design-patterns`) - Testing patterns or strategies (use `pytest-patterns` or `react-testing-patterns`) - Deployment or infrastructure decisions (use `docker-best-practices` or `deployment-pipeline`) ## Instructions ### Project Layer Architecture The standard Python/React full-stack architecture follows a layered pattern with strict dependency direction. #### Backend Layers (FastAPI) ``` HTTP Request ↓ ┌─────────────────────┐ │ Routers (routes/) │ ← HTTP concerns: request parsing, response formatting, status codes │ │ Uses: Depends() for injection, Pydantic schemas for validation ├─────────────────────┤ │ Services │ ← Business logic: orchestration, validation rules, domain operations │ (services/) │ No HTTP awareness. Raises domain exceptions, not HTTPException. ├─────────────────────┤ │ Repositories │ ← Data access: queries, CRUD operations, database interactions │ (repositories/) │ No business logic. Returns model instances or None. ├─────────────────────┤ │ Models (models/) │ ← SQLAlchemy ORM models: table definitions, relationships, indexes │ Schemas (schemas/) │ ← Pydantic v2 models: request/response contracts, validation └─────────────────────┘ ↓ Database ``` **Dependency direction rules:** - Routers depend on Services (never on Repositories directly) - Services depend on Repositories (never on Routers) - Repositories depend on Models (never on Services) - Schemas are shared across layers but define no dependencies themselves - Never skip layers: no direct database access from routes **Dependency injection pattern:** ```python # Router depends on Service via Depends() @router.post("/users", response_model=UserResponse) async def create_user( data: UserCreate, service: UserService = Depends(get_user_service), ) -> UserResponse: return await service.create_user(data) # Service depends on Repository via constructor injection class UserService: def __init__(self, repo: UserRepository) -> None: self.repo = repo # Repository depends on AsyncSession via Depends() class UserRepository: def __init__(self, session: AsyncSession) -> None: self.session = session ``` #### Frontend Layers (React/TypeScript) ``` ┌─────────────────────┐ │ Pages (pages/) │ ← Route-level components: data fetching, layout composition ├─────────────────────┤ │ Layouts │ ← Page structure: navigation, sidebars, content areas │ (layouts/) │ ├─────────────────────┤ │ Features │ ← Domain-specific: UserProfile, OrderList, ChatPanel │ (features/) │ Composed from shared components + hooks ├─────────────────────┤ │ Shared Components │ ← Reusable UI: Button, Modal, Table, Form, Input │ (components/) │ No business logic. Configurable via props. ├─────────────────────┤ │ Hooks (hooks/) │ ← Custom hooks: useAuth, usePagination, useDebounce │ API (api/) │ ← API client functions, TanStack Query configurations ├─────────────────────┤ │ Types (types/) │ ← Shared TypeScript interfaces and type definitions └─────────────────────┘ ``` **Component dependency direction:** - Pages import Features and Layouts - Features import Shared Components and Hooks - Shared Components import only other Shared Components and Types - Hooks import API functions and Types - API functions import Types only ### Decision Framework When facing architectural decisions, follow this structured process: #### Step 1: Define the Problem - What capability is needed? - What are the non-functional requirements? (performance, scalability, maintainability) - What constraints exist? (team size, timeline, existing infrastructure) #### Step 2: Identify Options - List 2-3 viable architectural approaches - For each option, document: - How it works (brief technical description) - Advantages - Disadvantages - Risks #### Step 3: Evaluate Against Criteria | Criterion | Weight | Description | |-----------|--------|-------------| | Maintainability | High | Can the team understand, modify, and debug this easily? | | Testability | High | Can each component be tested in isolation? | | Performance | Medium | Does it meet latency and throughput requirements? | | Team familiarity | Medium | Does the team have experience with this approach? | | Operational cost | Low | What are the infrastructure and maintenance costs? | | Future flexibility | Low | How easily can this evolve as requirements change? | #### Step 4: Decide and Document - Choose the option that best satisfies the weighted criteria - Document the decision in an ADR (see `references/architecture-decision-record-template.md`) - Record what was NOT chosen and why — this context is valuable for future decisions #### Step 5: Communicate - Share the ADR with the team - Identify any migration or rollout steps needed - Flag reversibility: is this a one-way door or a two-way door? ### Database Schema Design #### Design Principles 1. **Start normalized (3NF)** — Denormalize only for proven performance bottlenecks, not speculation 2. **One migration per logical change** — Each Alembic migration should represent a single, coherent schema modification 3. **Always include downgrade** — Every migration must have a working `downgrade()` function 4. **Index strategically:** - Primary keys (automatic) - Foreign keys (always) - Columns in WHERE clauses of frequent queries - Composite indexes for multi-column lookups - Partial indexes for filtered queries (e.g., `WHERE is_active = true`) #### SQLAlchemy 2.0 Async Patterns ```python # Model definition with Mapped types (SQLAlchemy 2.0 style) class User(Base): __tablename__ = "users" id: Mapped[int] = mapped_column(primary_key=True) email: Mapped[str] = mapped_column(String(255), unique=True, index=True) is_active: Mapped[bool] = mapped_column(default=True) created_at: Mapped[datetime] = mapped_column(server_default=func.now()) # Relationships: ALWAYS use eager loading with async posts: Mapped[list["Post"]] = relationship( back_populates="author", lazy="selectin", # or "joined" — NEVER "lazy" with async ) ``` **Async session rules:** - One `AsyncSession` per request — never share across concurrent tasks - Use `async with` context manager for automatic cleanup - Map session boundaries to transaction boundaries - Use `selectin` or `joined` loading — lazy loading is incompatible with asyncio - Use `run_sync()` only as a last resort for legacy code #### Migration Planning 1. Schema change → Generate migration: `alembic revision --autogenerate -m "description"` 2. Review generated migration — verify column types, indexes, constraints 3. Test upgrade: `alembic upgrade head` 4. Test downgrade: `alembic downgrade -1` 5. Test data preservation: ensure existing data survives the round-trip ### Frontend Architecture #### State Management Decision Tree ``` Is the data from the server? ├── YES → Use TanStack Query (useQuery, useMutation) │ Configure staleTime, gcTime, query keys │ └── NO → Is it needed across multiple components? ├── YES → Is it complex with actions/reducers? │ ├── YES → Use Zustand store │ └── NO → Use React Context │ └── NO → Use useState / useReducer locally ``` **TanStack Query conventions:** - Query keys: `[resource, ...identifiers]` (e.g., `["users", userId]`, `["posts", { page, limit }]`) - Use `queryOptions()` factory to centralize key + fn definitions — prevents copy-paste key errors - Set `staleTime` based on data freshness needs (default 0 is too aggressive for most cases) - Invalidate with `invalidateQueries()` after mutations — never manual `refetch()` - Handle all states: `isPending`, `isError`, `data` **Component design rules:** - Props for configuration, hooks for data - Lift state only as high as needed — no premature context creation - Keep components under 200 lines — extract sub-components or custom hooks when larger - Use `children` and composition over deep prop drilling #### Routing Structure Organize routes to mirror the URL structure: ``` src/ ├── pages/ │ ├── HomePage.tsx → / │ ├── LoginPage.tsx → /login │ ├── users/ │ │ ├── UserListPage.tsx → /users │ │ └── UserDetailPage.tsx → /users/:id │ └── settings/ │ └── SettingsPage.tsx → /settings ``` ### Cross-Cutting Concerns #### Authentication Flow ``` Login Request ↓ Backend: Validate credentials → Generate JWT (access + refresh tokens) ↓ Frontend: Store access token in memory, refresh token in httpOnly cookie ↓ API Calls: Attach access token via Authorization header ↓ Token Expired: Use refresh token to obtain new access token ↓ Refresh Failed: Redirect to login ``` **Architecture decisions for auth:** - Access tokens: short-lived (15-30 min), stored in memory (not localStorage) - Refresh tokens: longer-lived (7-30 days), stored in httpOnly cookie - Backend: FastAPI `Depends()` chain for token validation → user extraction → permission check - Frontend: Auth context providing `user`, `login()`, `logout()`, `isAuthenticated` #### Error Handling Strategy Errors should be handled at the appropriate layer: | Layer | Error Type | Action | |-------|-----------|--------| | Router | `HTTPException` | Return HTTP error response with status code | | Service | Domain exceptions | Raise custom exceptions (e.g., `UserNotFoundError`) | | Repository | Database exceptions | Catch and re-raise as domain exceptions or let propagate | | Frontend | API errors | Display user-friendly messages, retry where appropriate | **Backend exception hierarchy:** ```python class AppError(Exception): """Base application error.""" class NotFoundError(AppError): """Resource not found.""" class ConflictError(AppError): """Resource conflict (duplicate, version mismatch).""" class ValidationError(AppError): """Business rule violation.""" ``` Router-level exception handler maps domain exceptions to HTTP responses: ```python @app.exception_handler(NotFoundError) async def not_found_handler(request: Request, exc: NotFoundError): return JSONResponse(status_code=404, content={"detail": str(exc)}) ``` #### Logging Architecture **Backend (structlog):** - Structured JSON logs in production - Human-readable console in development - Bind request context (request_id, user_id) at middleware level - Log at service layer (business events), not repository layer (too noisy) - Use log levels: DEBUG (development only), INFO (business events), WARNING (recoverable issues), ERROR (failures requiring attention) **Frontend:** - `console.*` in development - Structured error reporting to backend or Sentry in production - Log user actions for debugging, not for analytics #### Configuration Management **Backend (pydantic-settings):** ```python class Settings(BaseSettings): model_config = SettingsConfigDict(env_file=".env") database_url: str redis_url: str = "redis://localhost:6379" jwt_secret: str debug: bool = False ``` **Frontend (environment variables):** - `VITE_API_URL` for API base URL - Build-time injection via Vite's `import.meta.env` - No secrets in frontend environment variables ## Examples ### Architecture Decision: Real-Time Notifications **Problem:** The application needs real-time notifications for users (new messages, status updates). **Options evaluated:** | Option | Pros | Cons | |--------|------|------| | **WebSocket** | True bidirectional, low latency | Complex connection management, harder to scale | | **Server-Sent Events (SSE)** | Simple, HTTP-based, auto-reconnect | Unidirectional (server→client only), limited browser connections | | **Polling** | Simplest implementation, works everywhere | Higher latency, unnecessary server load | **Decision:** WebSocket for this use case. **Rationale:** Notifications require low latency and the system will eventually need bidirectional communication (typing indicators, presence). SSE would work for notifications alone but would require a separate solution for future bidirectional needs. Polling introduces unacceptable latency for real-time UX. **Architecture:** - Backend: FastAPI WebSocket endpoint with `ConnectionManager` class - Frontend: Custom `useWebSocket` hook with automatic reconnection - Scaling: Redis pub/sub for multi-instance message distribution - Persistence: Store notifications in database for offline users - Fallback: REST endpoint for notification history and initial load See `references/architecture-decision-record-template.md` for the full ADR format. ## Edge Cases ### Monolith vs Microservices **Default to modular monolith** for teams smaller than 10 developers. A modular monolith provides: - Clear module boundaries without network overhead - Shared database with module-specific schemas - Easy refactoring and code navigation - Simple deployment and debugging **Consider microservices** only when: - Independent scaling is required for specific components - Different modules need different technology stacks - Team size exceeds 10 and ownership boundaries are clear - Deployment independence is a business requirement **Migration path:** Design module boundaries in the monolith as if they were services (no direct cross-module database access, communicate via service interfaces). This makes extraction to microservices straightforward when needed. ### When to Break the Layer Pattern The strict Router → Service → Repository pattern should be followed for standard CRUD operations. Acceptable exceptions: - **Background tasks:** May call services directly without going through a router - **Event handlers:** Domain event listeners may call services from any context - **CLI commands:** Management scripts may access services or repositories directly - **Migrations:** Data migrations may access models directly (no service/repo layer needed) - **Health checks:** May access the database directly for simple connectivity verification In all cases, business logic should still live in the service layer — these exceptions are about the entry point, not about bypassing business rules. ### Evolving Architecture When the architecture needs to change: 1. Write an ADR documenting the motivation and the proposed change 2. Identify all affected modules and their dependencies 3. Plan an incremental migration — never big-bang rewrites 4. Maintain backward compatibility during transition (strangler fig pattern) 5. Set a deadline for completing the migration and removing legacy code