--- name: python-best-practices description: Python development best practices including PEP 8 style guidelines, type hints, docstring conventions, and common patterns. Use when writing or modifying Python code. --- # Python Best Practices ## Purpose This skill provides guidance on Python development best practices to ensure code quality, maintainability, and consistency across your Python projects. ## When to Use This Skill Auto-activates when: - Working with Python files (*.py) - Mentions of "python", "best practices", "style guide" - Adding type hints or docstrings - Code refactoring in Python ## Style Guidelines ### PEP 8 Compliance Follow PEP 8 style guide for Python code: - **Indentation**: 4 spaces per indentation level - **Line Length**: Maximum 79 characters for code, 72 for docstrings/comments - **Blank Lines**: 2 blank lines between top-level definitions, 1 between methods - **Imports**: Always at top of file, grouped (stdlib, third-party, local) - **Naming Conventions**: - `snake_case` for functions, variables, modules - `PascalCase` for classes - `UPPER_SNAKE_CASE` for constants - Leading underscore `_name` for internal/private ### Import Organization Always organize imports in this order: ```python # 1. Standard library imports import os import sys from pathlib import Path # 2. Third-party imports import requests import numpy as np # 3. Local application imports from myapp.core import MyClass from myapp.utils import helper_function ``` Avoid circular imports by using `TYPE_CHECKING`: ```python from typing import TYPE_CHECKING if TYPE_CHECKING: from myapp.other_module import OtherClass def my_function(obj: "OtherClass") -> None: """Function that uses OtherClass only for type hints.""" pass ``` ## Type Hints ### Always Use Type Hints Type hints improve code clarity and catch errors early: ```python def process_data( items: list[str], max_count: int | None = None, verbose: bool = False ) -> dict[str, int]: """Process items and return counts. Parameters ---------- items : list[str] List of items to process max_count : int | None, optional Maximum items to process, by default None verbose : bool, optional Enable verbose output, by default False Returns ------- dict[str, int] Dictionary mapping items to counts """ result: dict[str, int] = {} for item in items[:max_count]: result[item] = result.get(item, 0) + 1 if verbose: print(f"Processed: {item}") return result ``` ### Modern Type Syntax (Python 3.10+) Use modern union syntax with `|` instead of `Union`: ```python # Good (Python 3.10+) def get_value(key: str) -> int | None: pass # Avoid (old style) from typing import Union, Optional def get_value(key: str) -> Optional[int]: pass ``` ### Generic Types Use built-in generic types (Python 3.9+): ```python # Good (Python 3.9+) def process_list(items: list[str]) -> dict[str, int]: pass # Avoid (old style) from typing import List, Dict def process_list(items: List[str]) -> Dict[str, int]: pass ``` ## Docstrings ### NumPy Style Docstrings Use NumPy-style docstrings for consistency: ```python def calculate_statistics( data: list[float], include_median: bool = True ) -> dict[str, float]: """Calculate statistical measures for a dataset. This function computes mean, standard deviation, and optionally median for the provided dataset. Parameters ---------- data : list[float] List of numerical values to analyze include_median : bool, optional Whether to calculate median, by default True Returns ------- dict[str, float] Dictionary containing: - 'mean': arithmetic mean - 'std': standard deviation - 'median': median value (if include_median=True) Raises ------ ValueError If data is empty or contains non-numeric values Examples -------- >>> calculate_statistics([1.0, 2.0, 3.0, 4.0, 5.0]) {'mean': 3.0, 'std': 1.414, 'median': 3.0} Notes ----- Standard deviation uses Bessel's correction (ddof=1). """ if not data: raise ValueError("Data cannot be empty") # Implementation here pass ``` ### Class Docstrings ```python class DataProcessor: """Process and transform data from various sources. This class provides methods for loading, transforming, and validating data from multiple input formats. Parameters ---------- source_dir : Path Directory containing source data files cache_enabled : bool, optional Enable result caching, by default True Attributes ---------- source_dir : Path Directory path for source files cache : dict[str, Any] Cache for processed results Examples -------- >>> processor = DataProcessor(Path("/data")) >>> results = processor.process_files() """ def __init__(self, source_dir: Path, cache_enabled: bool = True): """Initialize the DataProcessor.""" self.source_dir = source_dir self.cache: dict[str, Any] = {} if cache_enabled else None ``` ## Error Handling ### Specific Exception Types Use specific exception types, not bare `except`: ```python # Good try: with open(file_path) as f: data = f.read() except FileNotFoundError: logger.error(f"File not found: {file_path}") raise except PermissionError: logger.error(f"Permission denied: {file_path}") raise # Avoid try: with open(file_path) as f: data = f.read() except: # Too broad! pass ``` ### Context Managers Always use context managers for resources: ```python # Good with open(file_path) as f: content = f.read() # Avoid f = open(file_path) content = f.read() f.close() # Easy to forget! ``` ### Custom Exceptions Define custom exceptions for domain-specific errors: ```python class ValidationError(Exception): """Raised when data validation fails.""" pass class DataProcessingError(Exception): """Raised when data processing encounters an error.""" def __init__(self, message: str, item_id: str): super().__init__(message) self.item_id = item_id ``` ## Common Patterns ### Dataclasses for Data Structures Use `dataclasses` for simple data containers: ```python from dataclasses import dataclass, field @dataclass class User: """User profile information.""" username: str email: str age: int tags: list[str] = field(default_factory=list) is_active: bool = True def __post_init__(self): """Validate fields after initialization.""" if self.age < 0: raise ValueError("Age cannot be negative") ``` ### Enums for Fixed Sets Use `Enum` for fixed sets of values: ```python from enum import Enum, auto class Status(Enum): """Processing status values.""" PENDING = auto() PROCESSING = auto() COMPLETED = auto() FAILED = auto() # Usage current_status = Status.PENDING if current_status == Status.COMPLETED: print("Done!") ``` ### Pathlib for File Operations Use `pathlib.Path` instead of `os.path`: ```python from pathlib import Path # Good data_dir = Path("/data") file_path = data_dir / "input.txt" if file_path.exists(): content = file_path.read_text() # Avoid import os data_dir = "/data" file_path = os.path.join(data_dir, "input.txt") if os.path.exists(file_path): with open(file_path) as f: content = f.read() ``` ### List Comprehensions Use comprehensions for clarity and performance: ```python # Good squared = [x**2 for x in range(10) if x % 2 == 0] # Avoid squared = [] for x in range(10): if x % 2 == 0: squared.append(x**2) ``` ## Code Organization ### Module Structure Organize modules with clear sections: ```python """Module for data processing utilities. This module provides functions for loading, transforming, and validating data from various sources. """ # Standard library imports import os import sys from pathlib import Path # Third-party imports import requests import pandas as pd # Local imports from myapp.core import BaseProcessor from myapp.utils import validate_input # Constants MAX_RETRIES = 3 DEFAULT_TIMEOUT = 30 # Exceptions class ProcessingError(Exception): """Raised when processing fails.""" pass # Functions def load_data(source: str) -> pd.DataFrame: """Load data from source.""" pass # Classes class DataProcessor(BaseProcessor): """Process and validate data.""" pass # Module initialization if __name__ == "__main__": # CLI entry point main() ``` ### Avoid Magic Numbers Use named constants instead of magic numbers: ```python # Good MAX_RETRIES = 3 TIMEOUT_SECONDS = 30 def fetch_data(url: str) -> dict: for attempt in range(MAX_RETRIES): response = requests.get(url, timeout=TIMEOUT_SECONDS) if response.status_code == 200: return response.json() # Avoid def fetch_data(url: str) -> dict: for attempt in range(3): # What is 3? response = requests.get(url, timeout=30) # Why 30? if response.status_code == 200: return response.json() ``` ## Testing ### Use pytest for Testing ```python import pytest from myapp.processor import DataProcessor def test_process_valid_data(): """Test processing with valid input.""" processor = DataProcessor() result = processor.process([1, 2, 3]) assert result == [2, 4, 6] def test_process_empty_data(): """Test processing with empty input.""" processor = DataProcessor() with pytest.raises(ValueError): processor.process([]) @pytest.fixture def sample_data(): """Provide sample data for tests.""" return [1, 2, 3, 4, 5] def test_with_fixture(sample_data): """Test using fixture.""" processor = DataProcessor() result = processor.process(sample_data) assert len(result) == len(sample_data) ``` ## Key Takeaways 1. Follow PEP 8 style guidelines consistently 2. Always use type hints for function signatures 3. Write NumPy-style docstrings for all public functions/classes 4. Use specific exception types, not bare `except` 5. Prefer `pathlib.Path` over `os.path` 6. Use dataclasses and enums for structured data 7. Organize imports: stdlib → third-party → local 8. Avoid magic numbers, use named constants 9. Write tests using pytest 10. Use modern Python syntax (3.9+)