--- name: distributed-caching description: Expert skill for distributed cache design, implementation, and optimization using Redis and Memcached. Design cache architectures, configure eviction policies, implement caching patterns (cache-aside, write-through, write-behind), monitor cache performance, and optimize memory usage. allowed-tools: Bash(*) Read Write Edit Glob Grep WebFetch metadata: author: babysitter-sdk version: "1.0.0" category: caching backlog-id: SK-010 --- # distributed-caching You are **distributed-caching** - a specialized skill for distributed cache architecture and optimization. This skill provides expert capabilities for designing, implementing, and maintaining high-performance caching layers using Redis, Memcached, and related technologies. ## Overview This skill enables AI-powered caching operations including: - Designing Redis data structures and access patterns - Configuring Redis Cluster and Sentinel for high availability - Implementing caching patterns (cache-aside, write-through, write-behind) - Configuring eviction policies (LRU, LFU, TTL-based) - Monitoring cache hit rates and memory usage - Debugging cache invalidation issues - Optimizing memory efficiency ## Prerequisites - Redis 6.0+ (7.0+ recommended for advanced features) - Or Memcached 1.6+ - redis-cli and memcached utilities - Optional: Redis Stack for JSON, Search, and Time Series - Optional: Redis Enterprise for production deployments ## Capabilities ### 1. Redis Data Structure Design Design optimal data structures for use cases: ```redis # String - Simple key-value caching SET user:1001:profile '{"name":"John","email":"john@example.com"}' EX 3600 GET user:1001:profile # Hash - Structured data with partial updates HSET product:5001 name "Widget" price 29.99 stock 150 HGET product:5001 price HINCRBY product:5001 stock -1 # Sorted Set - Leaderboards and ranking ZADD leaderboard 1500 "player:1" 2200 "player:2" 1800 "player:3" ZREVRANGE leaderboard 0 9 WITHSCORES # Top 10 ZRANK leaderboard "player:1" # List - Message queues and activity feeds LPUSH notifications:user:1001 '{"type":"order","id":"ord-123"}' LRANGE notifications:user:1001 0 19 # Latest 20 LTRIM notifications:user:1001 0 99 # Keep only 100 # Set - Tags, unique visitors, relationships SADD product:5001:tags "electronics" "sale" "featured" SINTER user:1001:interests product:5001:tags # Common interests # HyperLogLog - Cardinality estimation PFADD daily:visitors:20260124 "user:1001" "user:1002" "guest:abc" PFCOUNT daily:visitors:20260124 # Stream - Event sourcing and message streaming XADD orders * action "created" order_id "ord-123" total "99.99" XREAD COUNT 10 STREAMS orders 0 XGROUP CREATE orders order-processors $ MKSTREAM XREADGROUP GROUP order-processors worker-1 COUNT 10 STREAMS orders > ``` ### 2. Caching Patterns Implementation Implement common caching patterns: ```python import redis import json from functools import wraps r = redis.Redis(host='localhost', port=6379, decode_responses=True) # Cache-Aside Pattern (Lazy Loading) def get_user(user_id): cache_key = f"user:{user_id}" # Try cache first cached = r.get(cache_key) if cached: return json.loads(cached) # Cache miss - fetch from database user = database.get_user(user_id) # Populate cache with TTL r.setex(cache_key, 3600, json.dumps(user)) return user # Write-Through Pattern def update_user(user_id, data): cache_key = f"user:{user_id}" # Update database first database.update_user(user_id, data) # Update cache immediately r.setex(cache_key, 3600, json.dumps(data)) return data # Write-Behind (Write-Back) Pattern def update_user_async(user_id, data): cache_key = f"user:{user_id}" # Update cache immediately r.setex(cache_key, 3600, json.dumps(data)) # Queue database write r.lpush("write_queue", json.dumps({ "operation": "update_user", "user_id": user_id, "data": data, "timestamp": time.time() })) # Read-Through with Cache-Aside decorator def cached(ttl=3600, prefix="cache"): def decorator(func): @wraps(func) def wrapper(*args, **kwargs): # Generate cache key from function and arguments key = f"{prefix}:{func.__name__}:{hash(str(args) + str(kwargs))}" cached_value = r.get(key) if cached_value: return json.loads(cached_value) result = func(*args, **kwargs) r.setex(key, ttl, json.dumps(result)) return result return wrapper return decorator @cached(ttl=300, prefix="products") def get_product_recommendations(user_id, category): return recommendation_service.get_recommendations(user_id, category) ``` ### 3. Cache Invalidation Strategies Implement robust cache invalidation: ```python # Time-based invalidation (TTL) r.setex("session:abc123", 1800, session_data) # 30 minutes # Event-driven invalidation def on_user_updated(user_id): # Delete specific cache entries r.delete(f"user:{user_id}") r.delete(f"user:{user_id}:profile") # Delete pattern-matched keys (use with caution) keys = r.keys(f"user:{user_id}:*") if keys: r.delete(*keys) # Tag-based invalidation def set_with_tags(key, value, ttl, tags): pipe = r.pipeline() pipe.setex(key, ttl, value) for tag in tags: pipe.sadd(f"tag:{tag}", key) pipe.execute() def invalidate_by_tag(tag): keys = r.smembers(f"tag:{tag}") if keys: pipe = r.pipeline() pipe.delete(*keys) pipe.delete(f"tag:{tag}") pipe.execute() # Version-based invalidation def get_with_version(key, version_key): version = r.get(version_key) or "1" versioned_key = f"{key}:v{version}" return r.get(versioned_key) def invalidate_version(version_key): r.incr(version_key) # Increment version, old keys expire naturally ``` ### 4. Redis Cluster Configuration Configure Redis Cluster for scalability: ```conf # redis-cluster.conf port 7000 cluster-enabled yes cluster-config-file nodes-7000.conf cluster-node-timeout 5000 appendonly yes appendfsync everysec # Memory management maxmemory 4gb maxmemory-policy allkeys-lru # Persistence save 900 1 save 300 10 save 60 10000 # Replication replica-read-only yes min-replicas-to-write 1 min-replicas-max-lag 10 ``` ```bash # Create cluster redis-cli --cluster create \ 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \ 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \ --cluster-replicas 1 # Check cluster status redis-cli -c -p 7000 cluster info redis-cli -c -p 7000 cluster nodes # Rebalance slots redis-cli --cluster rebalance 127.0.0.1:7000 ``` ### 5. Redis Sentinel for High Availability Configure Sentinel for automatic failover: ```conf # sentinel.conf sentinel monitor mymaster 127.0.0.1 6379 2 sentinel auth-pass mymaster sentinel down-after-milliseconds mymaster 5000 sentinel failover-timeout mymaster 60000 sentinel parallel-syncs mymaster 1 # Notification scripts sentinel notification-script mymaster /opt/redis/notify.sh sentinel client-reconfig-script mymaster /opt/redis/reconfig.sh ``` ```python # Python client with Sentinel from redis.sentinel import Sentinel sentinel = Sentinel([ ('sentinel1.example.com', 26379), ('sentinel2.example.com', 26379), ('sentinel3.example.com', 26379) ], socket_timeout=0.1) # Get master master = sentinel.master_for('mymaster', socket_timeout=0.1) master.set('key', 'value') # Get replica for reads replica = sentinel.slave_for('mymaster', socket_timeout=0.1) value = replica.get('key') ``` ### 6. Eviction Policy Configuration Configure optimal eviction policies: ```conf # LRU - Least Recently Used (general purpose) maxmemory-policy allkeys-lru # LFU - Least Frequently Used (hot data scenarios) maxmemory-policy allkeys-lfu lfu-log-factor 10 lfu-decay-time 1 # Volatile - Only evict keys with TTL maxmemory-policy volatile-lru maxmemory-policy volatile-lfu maxmemory-policy volatile-ttl # No eviction - Return errors when full maxmemory-policy noeviction ``` ### 7. Cache Performance Monitoring Monitor cache health and performance: ```bash # Redis INFO command redis-cli INFO stats redis-cli INFO memory redis-cli INFO replication redis-cli INFO clients # Key metrics to monitor # - hit_rate: keyspace_hits / (keyspace_hits + keyspace_misses) # - memory_usage: used_memory / maxmemory # - evicted_keys: Number of keys evicted # - connected_clients: Current client connections # - blocked_clients: Clients waiting on blocking operations ``` ```python # Calculate cache hit rate info = r.info('stats') hits = info['keyspace_hits'] misses = info['keyspace_misses'] hit_rate = hits / (hits + misses) * 100 if (hits + misses) > 0 else 0 print(f"Cache hit rate: {hit_rate:.2f}%") # Memory analysis memory_info = r.info('memory') print(f"Used memory: {memory_info['used_memory_human']}") print(f"Peak memory: {memory_info['used_memory_peak_human']}") print(f"Fragmentation ratio: {memory_info['mem_fragmentation_ratio']}") ``` ## MCP Server Integration This skill can leverage the following MCP servers: | Server | Description | Installation | |--------|-------------|--------------| | mcp-redis (Official) | Redis data management | [GitHub](https://github.com/redis/mcp-redis) | | Redis Cloud Admin API | Cloud Redis management | See Redis documentation | ## Best Practices ### Cache Design 1. **Key naming conventions** - Use consistent, hierarchical naming (e.g., `entity:id:attribute`) 2. **TTL strategy** - Always set TTLs to prevent unbounded growth 3. **Serialization** - Use efficient formats (MessagePack, Protocol Buffers) 4. **Hot key handling** - Shard hot keys or use local caching ### Data Consistency 1. **Cache-aside for reads** - Safest pattern for most use cases 2. **Write-through for consistency** - When consistency is critical 3. **Eventual consistency** - Accept staleness for performance 4. **Version tagging** - Track data versions for invalidation ### Performance 1. **Pipeline commands** - Batch multiple operations 2. **Connection pooling** - Reuse connections 3. **Avoid large keys** - Keep values under 100KB 4. **Use appropriate data structures** - Hashes over JSON strings for partial updates ## Process Integration This skill integrates with the following processes: - `caching-strategy-design.js` - Cache architecture planning - Application-level cache optimization workflows - Performance tuning recommendations ## Output Format When executing operations, provide structured output: ```json { "operation": "analyze-cache", "status": "success", "metrics": { "hitRate": 94.5, "missRate": 5.5, "evictionRate": 0.02, "memoryUsage": { "used": "3.2GB", "peak": "3.8GB", "maxmemory": "4GB", "utilizationPercent": 80 }, "connections": { "current": 45, "blocked": 0, "maxClients": 10000 } }, "recommendations": [ { "category": "memory", "issue": "High memory utilization", "action": "Consider increasing maxmemory or enabling LFU eviction", "priority": "medium" } ] } ``` ## Error Handling ### Common Issues | Error | Cause | Resolution | |-------|-------|------------| | `OOM command not allowed` | Memory limit reached | Increase maxmemory or enable eviction | | `CLUSTERDOWN` | Cluster not available | Check cluster health, majority nodes | | `MOVED` | Key on different node | Use cluster-aware client | | `BUSY` | Lua script running | Wait or kill script with SCRIPT KILL | | `LOADING` | Redis loading from disk | Wait for load to complete | ## Constraints - Monitor memory usage to prevent OOM conditions - Use connection pooling in applications - Implement circuit breakers for cache unavailability - Test cache invalidation thoroughly - Consider cache stampede prevention