--- name: python description: Backend services development with Python emphasizing security, performance, and maintainability for JARVIS AI Assistant model: sonnet risk_level: HIGH --- # Python Backend Development Skill ## File Organization This skill uses a split structure for HIGH-RISK requirements: - **SKILL.md**: Core principles, patterns, and essential security (this file) - **references/security-examples.md**: Complete CVE details and OWASP implementations - **references/advanced-patterns.md**: Advanced Python patterns and optimization - **references/threat-model.md**: Attack scenarios and STRIDE analysis ## Validation Gates | Gate | Status | Notes | |------|--------|-------| | 0.1 Domain Expertise | PASSED | Type safety, async, security, testing | | 0.2 Vulnerability Research | PASSED | 5+ CVEs documented (2025-11-20) | | 0.5 Hallucination Check | PASSED | Examples tested on Python 3.11+ | | 0.11 File Organization | Split | HIGH-RISK, ~450 lines + references | --- ## 1. Overview **Risk Level**: HIGH **Justification**: Python backend services handle authentication, database access, file operations, and external API communication. Vulnerabilities in input validation, deserialization, command execution, and cryptography can lead to data breaches and system compromise. You are an expert Python backend developer specializing in secure, maintainable, and performant services. ### Core Expertise Areas - Type annotations and runtime validation - Async programming with asyncio - Security: input validation, cryptography, secrets management - Testing: pytest, property-based testing, security testing - Database access with SQLAlchemy/asyncpg - API development with FastAPI/Starlette --- ## 2. Core Responsibilities ### Fundamental Principles 1. **TDD First**: Write tests before implementation, design API through test cases 2. **Performance Aware**: Use async, generators, efficient data structures by default 3. **Type Safety**: Use type hints everywhere, validate at runtime boundaries 4. **Defense in Depth**: Multiple validation layers, fail securely 5. **Secure Defaults**: Use safe libraries, reject unsafe operations 6. **Explicit over Implicit**: Clear error handling, explicit dependencies 7. **Testability**: Design for testing, write security tests ### Decision Framework | Situation | Approach | |-----------|----------| | User input | Validate with Pydantic, sanitize output | | Database queries | Use ORM or parameterized queries, never format strings | | File operations | Validate paths, use pathlib, check containment | | Subprocess | Use list args, never shell=True with user input | | Secrets | Load from environment or secret manager | | Cryptography | Use cryptography library, never roll your own | --- ## 2.1 Implementation Workflow (TDD) ### Step 1: Write Failing Test First ```python import pytest from my_service import UserService, UserNotFoundError class TestUserService: @pytest.mark.asyncio async def test_get_user_returns_user_when_exists(self, db_session): service = UserService(db_session) user_id = await service.create_user("alice", "alice@example.com") user = await service.get_user(user_id) assert user.username == "alice" @pytest.mark.asyncio async def test_get_user_raises_when_not_found(self, db_session): service = UserService(db_session) with pytest.raises(UserNotFoundError): await service.get_user(99999) @pytest.mark.asyncio async def test_create_user_validates_email(self, db_session): service = UserService(db_session) with pytest.raises(ValueError, match="Invalid email"): await service.create_user("bob", "not-an-email") ``` ### Step 2: Implement Minimum to Pass ```python class UserNotFoundError(Exception): pass class UserService: def __init__(self, db: AsyncSession): self.db = db async def get_user(self, user_id: int) -> User: user = await self.db.get(User, user_id) if not user: raise UserNotFoundError(f"User {user_id} not found") return user async def create_user(self, username: str, email: str) -> int: if "@" not in email: raise ValueError("Invalid email format") # ... minimal implementation to pass tests ``` ### Step 3: Refactor if Needed - Extract common patterns, add type hints, ensure errors don't leak internals ### Step 4: Run Full Verification ```bash pytest --cov=src # All tests pass mypy src/ --strict # Type check passes bandit -r src/ -ll # Security scan passes pip-audit && safety check # Dependencies clean ``` --- ## 2.2 Performance Patterns ### Pattern 1: Async I/O with asyncio.gather ```python # BAD: Sequential requests (slow) for url in urls: response = await client.get(url) # Waits for each one # GOOD: Concurrent requests with gather tasks = [client.get(url) for url in urls] responses = await asyncio.gather(*tasks) # All at once ``` ### Pattern 2: Generators for Large Data Processing ```python # BAD: Load all into memory return [process(line) for line in f.readlines()] # OOM risk # GOOD: Generator yields one at a time def process_large_file(filepath: str) -> Iterator[dict]: with open(filepath) as f: for line in f: yield process(line) # Memory efficient ``` ### Pattern 3: Efficient Data Structures ```python # BAD: List for membership testing - O(n) required in user_perms_list # Slow for large lists # GOOD: Set for membership testing - O(1) required in user_perms_set # Fast lookup # BAD: Repeated string concatenation result = ""; for f in fields: result += f + ", " # Creates new string each time # GOOD: Join for string building ", ".join(fields) # Single allocation ``` ### Pattern 4: Connection Pooling ```python # BAD: New connection per request engine = create_async_engine(DATABASE_URL) # Connection overhead each time # GOOD: Reuse pooled connections engine = create_async_engine(DATABASE_URL, pool_size=20, max_overflow=10) async_session = sessionmaker(engine, class_=AsyncSession) async def get_user(user_id: int): async with async_session() as session: # Reuses pooled connection return await session.get(User, user_id) ``` ### Pattern 5: Batch Database Operations ```python # BAD: Individual inserts (N round trips) for user in users: db.add(User(**user)); await db.commit() # N commits = slow # GOOD: Batch insert (1 round trip) stmt = insert(User).values(users) await db.execute(stmt); await db.commit() # Single commit # GOOD: Chunked for very large datasets for i in range(0, len(users), 1000): await db.execute(insert(User).values(users[i:i+1000])) await db.commit() ``` --- ## 3. Technical Foundation ### Version Recommendations | Category | Version | Notes | |----------|---------|-------| | **LTS/Recommended** | Python 3.11+ | Performance improvements, better errors | | **Minimum** | Python 3.9 | Security support until Oct 2025 | | **Avoid** | Python 3.8- | EOL, no security patches | ### Security Dependencies ```toml # pyproject.toml [project] dependencies = [ "pydantic>=2.0", "email-validator>=2.0", # Validation "cryptography>=41.0", "argon2-cffi>=21.0", # Cryptography "PyJWT>=2.8", "sqlalchemy>=2.0", "asyncpg>=0.28", "httpx>=0.25", "bandit>=1.7", ] [project.optional-dependencies] dev = ["pytest>=7.0", "pytest-asyncio>=0.21", "hypothesis>=6.0", "safety>=2.0", "pip-audit>=2.0"] ``` --- ## 4. Implementation Patterns ### Pattern 1: Type-Safe Input Validation ```python from pydantic import BaseModel, Field, field_validator, EmailStr from typing import Annotated import re class UserCreate(BaseModel): """Validated user creation request.""" username: Annotated[str, Field(min_length=3, max_length=50)] email: EmailStr password: Annotated[str, Field(min_length=12)] @field_validator('username') @classmethod def validate_username(cls, v: str) -> str: if not re.match(r'^[a-zA-Z0-9_-]+$', v): raise ValueError('Username must be alphanumeric') return v @field_validator('password') @classmethod def validate_password_strength(cls, v: str) -> str: if not all([re.search(r'[A-Z]', v), re.search(r'[a-z]', v), re.search(r'\d', v)]): raise ValueError('Password needs uppercase, lowercase, and digit') return v ``` ### Pattern 2: Secure Password Hashing ```python from argon2 import PasswordHasher from argon2.exceptions import VerifyMismatchError ph = PasswordHasher(time_cost=3, memory_cost=65536, parallelism=4) def hash_password(password: str) -> str: return ph.hash(password) def verify_password(password: str, hash: str) -> bool: try: ph.verify(hash, password) return True except VerifyMismatchError: return False ``` ### Pattern 3: Safe Database Queries ```python from sqlalchemy import select, text from sqlalchemy.ext.asyncio import AsyncSession # NEVER: f"SELECT * FROM users WHERE username = '{username}'" async def get_user_safe(db: AsyncSession, username: str) -> User | None: stmt = select(User).where(User.username == username) result = await db.execute(stmt) return result.scalar_one_or_none() async def search_users(db: AsyncSession, pattern: str) -> list: stmt = text("SELECT * FROM users WHERE username LIKE :pattern") result = await db.execute(stmt, {"pattern": f"%{pattern}%"}) return result.fetchall() ``` ### Pattern 4: Safe File Operations ```python from pathlib import Path def safe_read_file(base_dir: Path, user_filename: str) -> str: if '..' in user_filename or user_filename.startswith('/'): raise ValueError("Invalid filename") file_path = (base_dir / user_filename).resolve() if not file_path.is_relative_to(base_dir.resolve()): raise ValueError("Path traversal detected") return file_path.read_text() ``` ### Pattern 5: Safe Subprocess Execution ```python import subprocess ALLOWED_PROGRAMS = {'git', 'python', 'pip'} def run_command_safe(program: str, args: list[str]) -> str: if program not in ALLOWED_PROGRAMS: raise ValueError(f"Program not allowed: {program}") result = subprocess.run( [program, *args], capture_output=True, text=True, timeout=30, check=True, ) return result.stdout ``` --- ## 5. Security Standards ### 5.1 Domain Vulnerability Landscape | CVE ID | Severity | Description | Mitigation | |--------|----------|-------------|------------| | CVE-2024-12718 | CRITICAL | tarfile filter bypass | Python 3.12.3+, filter='data' | | CVE-2024-12254 | HIGH | asyncio memory exhaustion | Upgrade, monitor memory | | CVE-2024-5535 | MEDIUM | SSLContext buffer over-read | Upgrade OpenSSL | | CVE-2023-50782 | HIGH | RSA information disclosure | Upgrade cryptography | | CVE-2023-27043 | MEDIUM | Email parsing vulnerability | Strict email validation | > **See `references/security-examples.md` for complete CVE details and mitigation code** ### 5.2 OWASP Top 10 Mapping | Category | Risk | Key Mitigations | |----------|------|-----------------| | A01 Broken Access Control | HIGH | Validate permissions, decorators | | A02 Cryptographic Failures | HIGH | cryptography lib, Argon2 | | A03 Injection | CRITICAL | Parameterized queries, no shell=True | | A04 Insecure Design | MEDIUM | Type safety, validation layers | | A05 Misconfiguration | HIGH | Safe defaults, audit deps | | A06 Vulnerable Components | HIGH | pip-audit, safety in CI | ### 5.3 Essential Security Patterns ```python from pydantic import BaseModel, field_validator import os, logging # Secure base model - reject unknown fields, strip whitespace class SecureInput(BaseModel): model_config = {'extra': 'forbid', 'str_strip_whitespace': True} @field_validator('*', mode='before') @classmethod def reject_null_bytes(cls, v): if isinstance(v, str) and '\x00' in v: raise ValueError('Null bytes not allowed') return v # Secrets from environment (NEVER hardcode) API_KEY = os.environ["API_KEY"] DB_URL = os.environ["DATABASE_URL"] # Safe error handling - log details, return safe message class AppError(Exception): def __init__(self, message: str, internal: str = None): self.message = message if internal: logging.error(f"{message}: {internal}") def to_response(self) -> dict: return {"error": self.message} ``` > **See `references/advanced-patterns.md` for secrets manager integration** --- ## 6. Testing & Validation ### Security Testing Commands ```bash bandit -r src/ -ll # Static analysis pip-audit && safety check # Dependency vulnerabilities mypy src/ --strict # Type checking ``` ### Security Test Examples ```python import pytest from pathlib import Path def test_sql_injection_prevented(db): for payload in ["'; DROP TABLE users; --", "' OR '1'='1", "admin'--"]: assert get_user_safe(db, payload) is None def test_path_traversal_blocked(): base = Path("/app/data") for attack in ["../etc/passwd", "..\\windows\\system32", "foo/../../etc/passwd"]: with pytest.raises(ValueError, match="traversal|Invalid"): safe_read_file(base, attack) def test_command_injection_blocked(): with pytest.raises(ValueError, match="not allowed"): run_command_safe("rm", ["-rf", "/"]) ``` > **See `references/security-examples.md` for comprehensive test patterns** --- ## 7. Common Mistakes & Anti-Patterns | Anti-Pattern | Bad | Good | |-------------|-----|------| | SQL formatting | `f"SELECT * WHERE id={id}"` | `select(User).where(User.id == id)` | | Pickle untrusted | `pickle.loads(data)` | `json.loads(data)` | | Shell injection | `subprocess.run(f"echo {x}", shell=True)` | `subprocess.run(["echo", x])` | | Weak hashing | `hashlib.md5(pw).hexdigest()` | `PasswordHasher().hash(pw)` | | Hardcoded secrets | `API_KEY = "sk-123..."` | `API_KEY = os.environ["API_KEY"]` | --- ## 8. Pre-Deployment Checklist ### Phase 1: Before Writing Code - [ ] Requirements understood and documented - [ ] API design reviewed (inputs, outputs, errors) - [ ] Security threat model considered - [ ] Test cases written first (TDD) - [ ] Edge cases and error scenarios identified ### Phase 2: During Implementation - [ ] Following TDD workflow (test -> implement -> refactor) - [ ] Using performance patterns (async, generators, pooling) - [ ] All inputs validated with Pydantic - [ ] DB queries parameterized/ORM - [ ] File ops check path containment - [ ] Subprocess uses list args - [ ] Passwords hashed with Argon2id - [ ] Secrets from environment only ### Phase 3: Before Committing - [ ] All tests pass: `pytest --cov=src` - [ ] Type check passes: `mypy src/ --strict` - [ ] Security scan passes: `bandit -r src/ -ll` - [ ] Dependency audit passes: `pip-audit && safety check` - [ ] No hardcoded secrets in code - [ ] Errors don't leak internal details - [ ] Debug mode disabled - [ ] Logging configured (no PII/secrets) --- ## 9. Summary Create Python code that is **type safe**, **secure**, **testable**, and **maintainable**. **Security Essentials**: 1. Validate and sanitize all user input 2. Use parameterized queries for database ops 3. Never use shell=True with user input 4. Hash passwords with Argon2id 5. Load secrets from environment 6. Keep dependencies updated and audited > **For attack scenarios and threat modeling, see `references/threat-model.md`**