--- name: Pattern Extraction description: Detect and extract patterns from codebases for template generation version: 1.0.0 trigger_phrases: - "extract patterns" - "detect patterns" - "find variables" - "identify configuration" - "analyze structure" categories: ["infrastructure", "templating", "analysis"] --- # Pattern Extraction Skill ## When to Use This Skill Use pattern extraction when you need to: - **Analyze existing infrastructure** to create reusable templates - **Convert hardcoded values** into configurable template variables - **Identify configuration patterns** across multiple similar files - **Detect technology stacks** and their conventions - **Extract naming conventions** and structural patterns - **Generate template metadata** from existing code - **Create variable schemas** from inferred types and constraints Perfect for: - Converting existing IaC to templates - Building template libraries from production code - Standardizing infrastructure patterns - Automating template generation - Identifying refactoring opportunities ## Core Capabilities ### 1. Technology Stack Detection Automatically identify: - **IaC Frameworks**: Terraform, Pulumi, CloudFormation, ARM, Bicep - **Cloud Providers**: AWS, Azure, GCP, multi-cloud patterns - **Service Types**: Kubernetes, Docker, serverless, containers - **Configuration Formats**: YAML, JSON, HCL, TOML - **Build Systems**: Helm, Kustomize, Jsonnet - **CI/CD Platforms**: GitHub Actions, GitLab CI, Jenkins, Harness ### 2. Configuration Pattern Extraction Extract patterns from: - Resource definitions and relationships - Environment-specific configurations - Naming conventions and tagging strategies - Security policies and compliance rules - Network topologies and architectures - Deployment strategies and workflows ### 3. Variable Identification Intelligent detection of: - **Hardcoded Values**: Strings, numbers, booleans that should be variables - **Repeated Values**: Values appearing multiple times across files - **Environment Indicators**: dev, staging, prod patterns - **Naming Patterns**: Prefixes, suffixes, delimiters - **Secret Patterns**: API keys, passwords, tokens (flag for security) - **Configuration Schemas**: Type inference from usage ### 4. Structure Analysis Analyze: - File organization and directory structure - Module boundaries and dependencies - Resource hierarchies and relationships - Configuration inheritance patterns - Composition and reuse strategies - Template inclusion patterns ## Pattern Detection Matrix | Pattern Type | Indicators | Extraction Method | Output Format | |--------------|-----------|-------------------|---------------| | **Environment Values** | `dev`, `staging`, `prod` in names | Context-aware regex | `{{ environment }}` | | **Resource Names** | Repeated prefixes/suffixes | Token analysis | `{{ project_name }}-{{ resource_type }}` | | **Region/Location** | `us-east-1`, `westeurope` | Cloud provider patterns | `{{ region }}` | | **Version Numbers** | Semantic versioning patterns | Regex + validation | `{{ version }}` | | **Port Numbers** | Common service ports | Port range analysis | `{{ port }}` | | **Size/Scale** | Instance types, node counts | Capacity patterns | `{{ instance_size }}` | | **CIDR Blocks** | IP address ranges | Network pattern analysis | `{{ cidr_block }}` | | **Tags/Labels** | Key-value metadata | Metadata extraction | `{{ tags }}` | | **Secret References** | Vault paths, secret names | Secret pattern detection | `{{ secret_ref }}` (secure) | | **Feature Flags** | Boolean toggles | Conditional analysis | `{{ enable_feature }}` | ## Variable Inference Rules ### Before/After Transformation Examples #### Example 1: Resource Names **Before:** ```hcl resource "aws_instance" "web_server" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.medium" tags = { Name = "myapp-web-server-prod" Environment = "production" Project = "myapp" } } ``` **After:** ```hcl resource "aws_instance" "web_server" { ami = "{{ ami_id }}" instance_type = "{{ instance_type }}" tags = { Name = "{{ project_name }}-web-server-{{ environment }}" Environment = "{{ environment }}" Project = "{{ project_name }}" } } ``` **Extracted Variables:** ```yaml variables: ami_id: type: string description: "AMI ID for the EC2 instance" default: "ami-0c55b159cbfafe1f0" pattern: "^ami-[a-f0-9]{17}$" instance_type: type: string description: "EC2 instance type" default: "t3.medium" allowed_values: ["t3.micro", "t3.small", "t3.medium", "t3.large"] project_name: type: string description: "Project identifier used in resource naming" default: "myapp" pattern: "^[a-z][a-z0-9-]{2,30}$" environment: type: string description: "Deployment environment" default: "production" allowed_values: ["dev", "staging", "production"] ``` #### Example 2: Kubernetes Deployment **Before:** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: production spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx version: "1.21.0" spec: containers: - name: nginx image: nginx:1.21.0 ports: - containerPort: 80 resources: requests: memory: "128Mi" cpu: "250m" limits: memory: "256Mi" cpu: "500m" ``` **After:** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: {{ app_name }}-deployment namespace: {{ namespace }} spec: replicas: {{ replica_count }} selector: matchLabels: app: {{ app_name }} template: metadata: labels: app: {{ app_name }} version: "{{ app_version }}" spec: containers: - name: {{ app_name }} image: {{ container_image }}:{{ app_version }} ports: - containerPort: {{ container_port }} resources: requests: memory: "{{ memory_request }}" cpu: "{{ cpu_request }}" limits: memory: "{{ memory_limit }}" cpu: "{{ cpu_limit }}" ``` **Extracted Variables:** ```yaml variables: app_name: type: string description: "Application name" default: "nginx" namespace: type: string description: "Kubernetes namespace" default: "production" replica_count: type: integer description: "Number of pod replicas" default: 3 min: 1 max: 10 app_version: type: string description: "Application version" default: "1.21.0" pattern: "^\\d+\\.\\d+\\.\\d+$" container_image: type: string description: "Container image name" default: "nginx" container_port: type: integer description: "Container port" default: 80 memory_request: type: string description: "Memory request" default: "128Mi" memory_limit: type: string description: "Memory limit" default: "256Mi" cpu_request: type: string description: "CPU request" default: "250m" cpu_limit: type: string description: "CPU limit" default: "500m" ``` #### Example 3: Azure Bicep **Before:** ```bicep resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = { name: 'mystorageacct12345' location: 'westeurope' sku: { name: 'Standard_LRS' } kind: 'StorageV2' properties: { accessTier: 'Hot' minimumTlsVersion: 'TLS1_2' supportsHttpsTrafficOnly: true allowBlobPublicAccess: false } tags: { environment: 'production' costCenter: 'engineering' project: 'platform' } } ``` **After:** ```bicep resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = { name: '{{ storage_account_name }}' location: '{{ location }}' sku: { name: '{{ sku_name }}' } kind: 'StorageV2' properties: { accessTier: '{{ access_tier }}' minimumTlsVersion: '{{ min_tls_version }}' supportsHttpsTrafficOnly: {{ https_only }} allowBlobPublicAccess: {{ allow_public_access }} } tags: { environment: '{{ environment }}' costCenter: '{{ cost_center }}' project: '{{ project_name }}' } } ``` **Extracted Variables:** ```yaml variables: storage_account_name: type: string description: "Storage account name (globally unique)" default: "mystorageacct12345" pattern: "^[a-z0-9]{3,24}$" location: type: string description: "Azure region" default: "westeurope" allowed_values: ["westeurope", "northeurope", "eastus", "westus"] sku_name: type: string description: "Storage account SKU" default: "Standard_LRS" allowed_values: ["Standard_LRS", "Standard_GRS", "Premium_LRS"] access_tier: type: string description: "Storage access tier" default: "Hot" allowed_values: ["Hot", "Cool"] min_tls_version: type: string description: "Minimum TLS version" default: "TLS1_2" https_only: type: boolean description: "Require HTTPS traffic only" default: true allow_public_access: type: boolean description: "Allow public blob access" default: false environment: type: string description: "Environment name" default: "production" cost_center: type: string description: "Cost center for billing" default: "engineering" project_name: type: string description: "Project name" default: "platform" ``` ## Decision Tree Pattern classification flowchart: ``` START: Analyze value/pattern | v ┌───────────────────────────────────┐ │ Is it repeated across files? │ └─────┬─────────────────────┬───────┘ │ YES │ NO v v ┌─────────────┐ ┌──────────────┐ │ Global Var │ │ Check Usage │ └─────────────┘ └──────┬───────┘ │ v ┌──────────────────────┐ │ Used in conditionals? │ └──┬─────────────────┬──┘ │ YES │ NO v v ┌──────────┐ ┌──────────────┐ │ Feature │ │ Check Type │ │ Flag │ └──────┬───────┘ └──────────┘ │ v ┌──────────────────────┐ │ Contains 'env'? │ └──┬───────────────┬───┘ │ YES │ NO v v ┌───────────┐ ┌──────────────┐ │ Env Var │ │ Check Format │ └───────────┘ └──────┬───────┘ │ v ┌──────────────────┐ │ Cloud resource? │ └──┬───────────┬───┘ │ YES │ NO v v ┌──────────┐ ┌─────────┐ │ Resource │ │ Generic │ │ ID │ │ Config │ └──────────┘ └─────────┘ ``` ## Examples ### Example 1: Extract Terraform AWS Pattern **Input:** ```bash extract patterns from ./terraform/aws/ec2-instances.tf ``` **Analysis:** ``` Analyzing: terraform/aws/ec2-instances.tf ├── Technology: Terraform (AWS Provider) ├── Resources Found: 3 (aws_instance, aws_security_group, aws_eip) ├── Variables Identified: 12 └── Patterns Detected: ├── Naming Convention: {project}-{resource}-{env} ├── Tag Strategy: Environment, Project, ManagedBy └── Configuration Reuse: 85% similarity across instances ``` **Extracted Template:** ```hcl # Template: aws-ec2-instance.tf.tmpl variable "project_name" { type = string description = "Project identifier" } variable "environment" { type = string description = "Deployment environment" validation { condition = contains(["dev", "staging", "prod"], var.environment) error_message = "Environment must be dev, staging, or prod." } } variable "instance_config" { type = object({ ami = string instance_type = string key_name = string subnet_id = string }) description = "EC2 instance configuration" } resource "aws_instance" "main" { ami = var.instance_config.ami instance_type = var.instance_config.instance_type key_name = var.instance_config.key_name subnet_id = var.instance_config.subnet_id vpc_security_group_ids = [aws_security_group.main.id] tags = { Name = "${var.project_name}-instance-${var.environment}" Environment = var.environment Project = var.project_name ManagedBy = "Terraform" } } resource "aws_security_group" "main" { name = "${var.project_name}-sg-${var.environment}" description = "Security group for ${var.project_name} in ${var.environment}" tags = { Name = "${var.project_name}-sg-${var.environment}" Environment = var.environment Project = var.project_name } } ``` ### Example 2: Extract Kubernetes Pattern **Input:** ```bash extract patterns from ./k8s/deployments/ --type kubernetes ``` **Analysis:** ``` Analyzing: k8s/deployments/ (15 files) ├── Technology: Kubernetes (v1.24+) ├── Resources Found: 45 total │ ├── Deployments: 15 │ ├── Services: 12 │ ├── ConfigMaps: 10 │ └── Ingresses: 8 ├── Common Patterns: │ ├── Label Strategy: app, version, environment │ ├── Resource Requests: 90% use standard sizes │ ├── Probes: 100% have health checks │ └── Image Pattern: registry/org/image:tag └── Variable Candidates: 28 identified ``` **Extracted Template:** ```yaml # Template: kubernetes-microservice.yaml.tmpl apiVersion: apps/v1 kind: Deployment metadata: name: {{ service_name }}-deployment namespace: {{ namespace }} labels: app: {{ service_name }} version: {{ version }} environment: {{ environment }} spec: replicas: {{ replica_count }} selector: matchLabels: app: {{ service_name }} template: metadata: labels: app: {{ service_name }} version: {{ version }} environment: {{ environment }} spec: containers: - name: {{ service_name }} image: {{ image_registry }}/{{ organization }}/{{ service_name }}:{{ version }} ports: - containerPort: {{ container_port }} protocol: TCP env: {{#each environment_variables}} - name: {{ name }} value: "{{ value }}" {{/each}} resources: requests: memory: {{ memory_request }} cpu: {{ cpu_request }} limits: memory: {{ memory_limit }} cpu: {{ cpu_limit }} livenessProbe: httpGet: path: {{ health_check_path }} port: {{ container_port }} initialDelaySeconds: {{ liveness_initial_delay }} periodSeconds: {{ liveness_period }} readinessProbe: httpGet: path: {{ readiness_check_path }} port: {{ container_port }} initialDelaySeconds: {{ readiness_initial_delay }} periodSeconds: {{ readiness_period }} --- apiVersion: v1 kind: Service metadata: name: {{ service_name }}-service namespace: {{ namespace }} labels: app: {{ service_name }} spec: type: {{ service_type }} ports: - port: {{ service_port }} targetPort: {{ container_port }} protocol: TCP selector: app: {{ service_name }} ``` ### Example 3: Multi-File Pattern Extraction **Input:** ```bash extract patterns from ./infrastructure/ --recursive --consolidate ``` **Analysis:** ``` Analyzing: infrastructure/ (recursive) ├── Files Scanned: 47 ├── Technologies Detected: │ ├── Terraform (AWS): 23 files │ ├── Kubernetes: 15 files │ ├── Helm Charts: 9 files │ └── Docker Compose: 2 files (excluded - different pattern) ├── Cross-Cutting Patterns: │ ├── Environment Strategy: 3-tier (dev/staging/prod) │ ├── Tagging: 100% compliance with org policy │ ├── Naming: Consistent kebab-case with env suffix │ └── Secrets: HashiCorp Vault references └── Template Opportunities: ├── AWS Lambda Function: 8 similar resources ├── RDS Database: 5 similar resources ├── Kubernetes Service: 12 similar resources └── ALB Configuration: 6 similar resources ``` **Output:** ``` Generated Templates: ├── templates/ │ ├── aws-lambda-function.tf.tmpl (consolidated from 8 files) │ ├── aws-rds-instance.tf.tmpl (consolidated from 5 files) │ ├── kubernetes-service.yaml.tmpl (consolidated from 12 files) │ └── aws-alb.tf.tmpl (consolidated from 6 files) └── variables/ ├── common.yaml (shared across all templates) ├── aws-specific.yaml (AWS provider variables) └── kubernetes-specific.yaml (K8s variables) Variable Reuse Analysis: ├── Shared Variables: 15 (45% reuse) ├── Template-Specific: 18 (55% unique) └── Potential Consolidation: 3 variables can be merged ``` ## Best Practices ### 1. Incremental Extraction Start with a single file or small directory to refine patterns before scaling: ```bash # Start small extract patterns from ./terraform/main.tf # Validate and adjust review template ./templates/main.tf.tmpl # Scale up extract patterns from ./terraform/ --recursive ``` ### 2. Variable Naming Conventions Follow consistent naming: - Use snake_case for variables - Prefix with context: `aws_`, `k8s_`, `azure_` - Suffix with type: `_count`, `_enabled`, `_config` - Be descriptive: `instance_type` not `type` ### 3. Type Inference Priority 1. **Explicit types** from existing variable definitions 2. **Usage patterns** (e.g., used in math → number) 3. **Value format** (e.g., "true"/"false" → boolean) 4. **Defaults** to string if ambiguous ### 4. Security-Sensitive Patterns Always flag potential secrets: ```yaml # Good - flagged for security review api_key: type: string sensitive: true description: "API key for external service" default: null # Force explicit value ``` ### 5. Validation Rules Add validation for critical variables: ```yaml environment: type: string allowed_values: ["dev", "staging", "prod"] cidr_block: type: string pattern: "^\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}/\\d{1,2}$" replica_count: type: integer min: 1 max: 100 ``` ### 6. Documentation Generation Auto-generate docs from extracted patterns: ```markdown # Generated Template: aws-ec2-instance ## Description Extracted from 8 similar EC2 instance definitions. Pattern confidence: 95% ## Variables | Name | Type | Required | Default | Description | |------|------|----------|---------|-------------| | instance_type | string | Yes | t3.medium | EC2 instance type | | environment | string | Yes | - | Deployment environment | ## Usage terraform apply -var="instance_type=t3.large" -var="environment=prod" ``` ### 7. Pattern Confidence Scoring Rate extraction confidence: - **High (90-100%)**: Identical patterns across files - **Medium (70-89%)**: Similar with minor variations - **Low (<70%)**: Significant differences, manual review needed ### 8. Iterative Refinement ```bash # Extract initial patterns extract patterns from ./infra/ --output ./templates/v1/ # Review and refine review templates ./templates/v1/ --suggest-improvements # Re-extract with refinements extract patterns from ./infra/ --output ./templates/v2/ \ --naming-convention kebab-case \ --tag-strategy company-standard ``` ## Related Skills - **[[template-generation]]** - Generate templates from extracted patterns - **[[variable-schema-design]]** - Design robust variable schemas - **[[template-validation]]** - Validate generated templates - **[[naming-convention-analyzer]]** - Analyze and enforce naming conventions - **[[configuration-consolidation]]** - Merge similar configurations - **[[security-pattern-detection]]** - Identify security anti-patterns - **[[compliance-checking]]** - Ensure extracted patterns meet compliance requirements ## Advanced Techniques ### Multi-Stage Extraction Pipeline ```bash # Stage 1: Initial scan extract patterns from ./infra/ --stage scan # Stage 2: Pattern classification extract patterns from ./infra/ --stage classify # Stage 3: Variable inference extract patterns from ./infra/ --stage variables # Stage 4: Template generation extract patterns from ./infra/ --stage generate # Stage 5: Validation extract patterns from ./infra/ --stage validate ``` ### Machine Learning-Enhanced Extraction Use ML models to improve pattern detection: - **Clustering**: Group similar configurations - **Anomaly Detection**: Identify outliers for manual review - **Type Inference**: Predict variable types from usage - **Dependency Analysis**: Extract implicit dependencies ### Cross-Repository Pattern Mining Extract patterns across multiple repositories: ```bash extract patterns --repos-file ./repos.txt \ --output ./org-templates/ \ --consolidate-org-wide ``` ## Output Formats ### JSON Schema ```json { "template": "aws-ec2-instance", "version": "1.0.0", "variables": { "instance_type": { "type": "string", "default": "t3.medium", "description": "EC2 instance type" } } } ``` ### YAML Schema ```yaml template: aws-ec2-instance version: 1.0.0 variables: instance_type: type: string default: t3.medium description: EC2 instance type ``` ### Terraform Variable Definitions ```hcl variable "instance_type" { type = string default = "t3.medium" description = "EC2 instance type" } ``` ## Integration Points - **CI/CD Pipelines**: Auto-extract patterns on code changes - **Template Registries**: Publish extracted templates - **Documentation Systems**: Generate docs from patterns - **Monitoring**: Track pattern usage and evolution - **Compliance Tools**: Validate against org standards ## Troubleshooting ### Pattern Detection Issues **Problem**: Too many false positives **Solution**: Increase confidence threshold, add exclusion patterns **Problem**: Missing obvious patterns **Solution**: Check file encoding, adjust regex patterns, enable debug mode **Problem**: Inconsistent variable names **Solution**: Enable smart naming normalization, use naming dictionary ### Template Generation Issues **Problem**: Templates too generic **Solution**: Use narrower extraction scope, increase specificity **Problem**: Templates too specific **Solution**: Widen extraction scope, enable cross-file consolidation --- **Version:** 1.0.0 **Last Updated:** 2026-01-19 **Skill Type:** Analysis & Extraction **Complexity:** Advanced