# Analytics Integration Plan - Weather MCP Server **Version:** 1.0 **Date:** 2025-11-10 **Status:** Planning Phase ## Executive Summary This document outlines a privacy-first approach to adding optional, anonymous analytics to the Weather MCP Server. The analytics system is designed to help improve the product by understanding tool usage patterns and identifying errors, while strictly protecting user privacy. **Core Principles:** - **Opt-in by default** - Analytics OFF unless explicitly enabled - **Privacy-first** - No PII collection, anonymization at source - **Transparent** - Clear documentation of what's collected - **Minimal** - Only collect what's necessary to improve the product - **Compliant** - GDPR and CCPA compliant design --- ## 1. Design Philosophy ### 1.1 Privacy Principles 1. **Data Minimization**: Collect only what's necessary 2. **Anonymization at Source**: No user identifiers, no exact coordinates 3. **Aggregation Over Individual Events**: Prefer counts over individual records 4. **Transparent Collection**: Users know exactly what's being sent 5. **Easy Opt-Out**: Simple environment variable to disable 6. **No Retroactive Changes**: Analytics scope frozen per major version ### 1.2 Legal Compliance - **GDPR Compliance**: No personal data, no consent needed (recital 26) - **CCPA Compliance**: Anonymous data not subject to CCPA - **Data Protection**: No data can be linked to individuals - **Right to Erasure**: N/A (no personal data stored) ### 1.3 User Benefits Users benefit from analytics through: - **Better Error Detection**: Identify and fix issues faster - **Feature Prioritization**: Improve most-used features - **Performance Optimization**: Optimize common workflows - **API Reliability**: Monitor service health across regions - **Documentation**: Real-world usage informs better docs --- ## 2. Analytics Levels ### 2.1 Configuration Options Users control analytics via `.env` file: ```bash # Disable analytics completely (DEFAULT) ANALYTICS_ENABLED=false # Enable with minimal data collection ANALYTICS_ENABLED=true ANALYTICS_LEVEL=minimal # Enable with standard metrics ANALYTICS_ENABLED=true ANALYTICS_LEVEL=standard # Enable with detailed workflow analysis ANALYTICS_ENABLED=true ANALYTICS_LEVEL=detailed ``` ### 2.2 Level Definitions #### Minimal Level **Purpose**: Basic usage statistics and critical error tracking **Data Collected:** - Tool name (e.g., "get_forecast", "get_alerts") - Success/failure status - Error type (anonymized error class name) - MCP server version - Timestamp (UTC, rounded to nearest hour) **What's NOT collected:** - Coordinates, location names, or any geographic data - User identifiers, session IDs, or device info - API response data or weather information - Exact timestamps **Example Event:** ```json { "version": "1.6.1", "tool": "get_forecast", "status": "success", "timestamp_hour": "2025-11-10T14:00:00Z", "analytics_level": "minimal" } ``` #### Standard Level **Purpose**: Performance monitoring and service health **Includes Minimal Level + :** - Response time (in milliseconds) - API service used (NOAA/Open-Meteo) - Cache hit/miss - Retry count (if applicable) - Country code (derived from first API call only, ISO 3166-1 alpha-2) **Example Event:** ```json { "version": "1.6.1", "tool": "get_forecast", "status": "success", "timestamp_hour": "2025-11-10T14:00:00Z", "response_time_ms": 245, "service": "noaa", "cache_hit": false, "country": "US", "analytics_level": "standard" } ``` #### Detailed Level **Purpose**: Workflow understanding and UX improvement **Includes Standard Level + :** - Tool parameters (anonymized): - Forecast days requested (not location) - Granularity (daily/hourly) - Data sources requested - Boolean flags (e.g., include_normals) - Tool call sequences (workflow patterns) - Session duration (time between first and last tool call) **Example Event:** ```json { "version": "1.6.1", "tool": "get_forecast", "status": "success", "timestamp_hour": "2025-11-10T14:00:00Z", "response_time_ms": 245, "service": "noaa", "cache_hit": false, "country": "US", "parameters": { "days": 7, "granularity": "daily", "source": "auto" }, "session_id": "abc123-hash", // Anonymous session hash "sequence_number": 2, "analytics_level": "detailed" } ``` **Privacy Notes:** - Session ID is a one-way hash with salt (cannot be reversed) - No geographic coordinates ever sent - Location names stripped before analysis - Sessions expire after 1 hour of inactivity --- ## 3. Technical Implementation ### 3.1 Architecture ``` ┌─────────────────────────────────────────┐ │ MCP Server (Weather Tools) │ │ │ │ ┌───────────────────────────────────┐ │ │ │ Tool Handlers │ │ │ │ (forecastHandler, alertsHandler) │ │ │ └───────────┬───────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────┐ │ │ │ Analytics Middleware │ │ │ │ - Capture events │ │ │ │ - Anonymize data │ │ │ │ - Respect privacy level │ │ │ └───────────┬───────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────┐ │ │ │ Analytics Buffer (In-Memory) │ │ │ │ - Batch events (max 100) │ │ │ │ - Flush every 5 minutes │ │ │ │ - Graceful shutdown flush │ │ │ └───────────┬───────────────────────┘ │ │ │ │ └──────────────┼─────────────────────────┘ │ │ HTTPS POST ▼ ┌─────────────────────────────────────────┐ │ Analytics Collection Server │ │ (Separate project, see plan doc) │ └─────────────────────────────────────────┘ ``` ### 3.2 File Structure ``` src/ ├── analytics/ │ ├── index.ts # Main analytics interface │ ├── collector.ts # Event collection and buffering │ ├── anonymizer.ts # Data anonymization utilities │ ├── transport.ts # HTTPS transport to collection server │ └── types.ts # Analytics event types ├── middleware/ │ └── analyticsMiddleware.ts # Wraps tool handlers └── config/ └── analytics.ts # Analytics configuration ``` ### 3.3 Core Components #### 3.3.1 Analytics Collector (`src/analytics/collector.ts`) ```typescript import { EventEmitter } from 'events'; import { logger } from '../utils/logger.js'; import { AnalyticsEvent, AnalyticsConfig } from './types.js'; import { anonymizeEvent } from './anonymizer.js'; import { sendBatch } from './transport.js'; export class AnalyticsCollector extends EventEmitter { private buffer: AnalyticsEvent[] = []; private config: AnalyticsConfig; private flushInterval: NodeJS.Timeout | null = null; private readonly MAX_BUFFER_SIZE = 100; private readonly FLUSH_INTERVAL_MS = 5 * 60 * 1000; // 5 minutes constructor(config: AnalyticsConfig) { super(); this.config = config; if (this.config.enabled) { this.startFlushTimer(); this.setupShutdownHandlers(); } } async trackToolCall( toolName: string, status: 'success' | 'error', metadata: Record = {} ): Promise { if (!this.config.enabled) return; const event = anonymizeEvent({ version: this.config.version, tool: toolName, status, timestamp_hour: this.roundToHour(new Date()), analytics_level: this.config.level, ...metadata }, this.config.level); this.buffer.push(event); if (this.buffer.length >= this.MAX_BUFFER_SIZE) { await this.flush(); } } private async flush(): Promise { if (this.buffer.length === 0) return; const batch = [...this.buffer]; this.buffer = []; try { await sendBatch(batch, this.config.endpoint); logger.info('Analytics batch sent', { count: batch.length, level: this.config.level }); } catch (error) { logger.error('Failed to send analytics batch', { error: error instanceof Error ? error.message : 'Unknown error', count: batch.length }); // Fail silently - analytics should never break the app } } private startFlushTimer(): void { this.flushInterval = setInterval(() => { this.flush().catch(err => { logger.error('Flush timer error', { error: err.message }); }); }, this.FLUSH_INTERVAL_MS); } private setupShutdownHandlers(): void { const shutdown = async () => { if (this.flushInterval) { clearInterval(this.flushInterval); } await this.flush(); }; process.on('SIGTERM', shutdown); process.on('SIGINT', shutdown); } private roundToHour(date: Date): string { const rounded = new Date(date); rounded.setMinutes(0, 0, 0); return rounded.toISOString(); } } ``` #### 3.3.2 Anonymization (`src/analytics/anonymizer.ts`) ```typescript import crypto from 'crypto'; import { AnalyticsEvent, AnalyticsLevel } from './types.js'; export function anonymizeEvent( event: AnalyticsEvent, level: AnalyticsLevel ): AnalyticsEvent { const baseEvent = { version: event.version, tool: event.tool, status: event.status, timestamp_hour: event.timestamp_hour, analytics_level: level }; if (level === 'minimal') { return baseEvent; } const standardEvent = { ...baseEvent, response_time_ms: event.response_time_ms, service: event.service, cache_hit: event.cache_hit, country: event.country // Already anonymized to country level }; if (level === 'standard') { return standardEvent; } // Detailed level return { ...standardEvent, parameters: sanitizeParameters(event.parameters), session_id: hashSessionId(event.session_id), sequence_number: event.sequence_number }; } function sanitizeParameters(params: Record | undefined): Record | undefined { if (!params) return undefined; const safe: Record = {}; // Allowlist of safe parameters const allowedParams = [ 'days', 'granularity', 'source', 'forecast', 'include_normals', 'include_fire_weather', 'include_severe_weather', 'active_only' ]; for (const key of allowedParams) { if (params[key] !== undefined) { safe[key] = params[key]; } } // Never include: latitude, longitude, location names, user input return safe; } function hashSessionId(sessionId: string | undefined): string | undefined { if (!sessionId) return undefined; // One-way hash with server-specific salt const salt = process.env.ANALYTICS_SALT || 'default-salt-change-me'; return crypto .createHash('sha256') .update(sessionId + salt) .digest('hex') .substring(0, 16); // Shortened for storage } export function getCountryFromCoordinates(lat: number, lon: number): string | undefined { // Simple country detection based on first API call // This is approximate and privacy-preserving // US: roughly 25-49°N, 125-66°W if (lat >= 25 && lat <= 49 && lon >= -125 && lon <= -66) { return 'US'; } // Only detect major regions, not exact countries // This prevents user tracking while providing useful regional data return 'OTHER'; // Intentionally vague for privacy } ``` #### 3.3.3 Transport (`src/analytics/transport.ts`) ```typescript import https from 'https'; import { logger } from '../utils/logger.js'; import { AnalyticsEvent } from './types.js'; export async function sendBatch( events: AnalyticsEvent[], endpoint: string ): Promise { return new Promise((resolve, reject) => { const data = JSON.stringify({ events }); const url = new URL(endpoint); const options = { hostname: url.hostname, port: url.port || 443, path: url.pathname, method: 'POST', headers: { 'Content-Type': 'application/json', 'Content-Length': Buffer.byteLength(data), 'User-Agent': `weather-mcp/${events[0]?.version || 'unknown'}` }, timeout: 5000 // 5 second timeout }; const req = https.request(options, (res) => { // Consume response data to free memory res.on('data', () => {}); res.on('end', () => { if (res.statusCode && res.statusCode >= 200 && res.statusCode < 300) { resolve(); } else { reject(new Error(`HTTP ${res.statusCode}`)); } }); }); req.on('error', reject); req.on('timeout', () => { req.destroy(); reject(new Error('Request timeout')); }); req.write(data); req.end(); }); } ``` #### 3.3.4 Middleware Integration (`src/middleware/analyticsMiddleware.ts`) ```typescript import { AnalyticsCollector } from '../analytics/collector.js'; export function wrapHandlerWithAnalytics( toolName: string, handler: (...args: any[]) => Promise, collector: AnalyticsCollector ) { return async (...args: any[]) => { const startTime = Date.now(); try { const result = await handler(...args); const metadata = { response_time_ms: Date.now() - startTime, // Extract metadata from result/args based on analytics level }; await collector.trackToolCall(toolName, 'success', metadata); return result; } catch (error) { const metadata = { response_time_ms: Date.now() - startTime, error_type: error instanceof Error ? error.constructor.name : 'Unknown' }; await collector.trackToolCall(toolName, 'error', metadata); throw error; // Re-throw to maintain normal error handling } }; } ``` ### 3.4 Configuration (`src/config/analytics.ts`) ```typescript import { AnalyticsLevel } from '../analytics/types.js'; export interface AnalyticsConfig { enabled: boolean; level: AnalyticsLevel; endpoint: string; version: string; } export function getAnalyticsConfig(): AnalyticsConfig { const enabled = process.env.ANALYTICS_ENABLED === 'true'; const level = (process.env.ANALYTICS_LEVEL as AnalyticsLevel) || 'minimal'; const endpoint = process.env.ANALYTICS_ENDPOINT || 'https://analytics.weather-mcp.example.com/v1/events'; // Validate level if (!['minimal', 'standard', 'detailed'].includes(level)) { throw new Error(`Invalid ANALYTICS_LEVEL: ${level}. Must be minimal, standard, or detailed.`); } return { enabled, level, endpoint, version: process.env.npm_package_version || 'unknown' }; } ``` ### 3.5 Integration Points #### Modify `src/index.ts`: ```typescript import { AnalyticsCollector } from './analytics/collector.js'; import { getAnalyticsConfig } from './config/analytics.js'; import { wrapHandlerWithAnalytics } from './middleware/analyticsMiddleware.js'; // Initialize analytics const analyticsConfig = getAnalyticsConfig(); const analyticsCollector = new AnalyticsCollector(analyticsConfig); // Log analytics status on startup if (analyticsConfig.enabled) { logger.info('Analytics enabled', { level: analyticsConfig.level, endpoint: analyticsConfig.endpoint }); } else { logger.info('Analytics disabled'); } // Wrap handlers with analytics const wrappedForecastHandler = wrapHandlerWithAnalytics( 'get_forecast', forecastHandler, analyticsCollector ); // Use wrappedForecastHandler in CallToolRequestSchema handler ``` --- ## 4. Security Considerations ### 4.1 Data Security 1. **HTTPS Only**: All analytics sent over TLS 1.2+ 2. **No Authentication**: Truly anonymous (no API keys, tokens, or user IDs) 3. **Input Validation**: All events validated before sending 4. **Rate Limiting**: Client-side throttling to prevent abuse 5. **Timeout Protection**: 5-second timeout on analytics requests ### 4.2 Privacy Protection 1. **No IP Logging**: Collection server must not log client IPs 2. **No User Tracking**: Session IDs are hashed and short-lived 3. **Data Minimization**: Only collect what's documented 4. **No Correlation**: Cannot link events to specific users 5. **Open Source**: Analytics code is auditable ### 4.3 Failure Handling 1. **Silent Failures**: Analytics errors never break the MCP server 2. **No Retries**: Failed batches are discarded (not queued) 3. **Circuit Breaker**: Disable analytics if failures exceed threshold 4. **Graceful Degradation**: MCP server works perfectly with analytics disabled --- ## 5. Testing Strategy ### 5.1 Unit Tests **File:** `tests/unit/analytics.test.ts` ```typescript import { describe, it, expect, beforeEach } from 'vitest'; import { AnalyticsCollector } from '../../src/analytics/collector.js'; import { anonymizeEvent } from '../../src/analytics/anonymizer.js'; describe('Analytics Collector', () => { it('should not collect when disabled', async () => { const collector = new AnalyticsCollector({ enabled: false, level: 'minimal' }); await collector.trackToolCall('test', 'success'); // Verify no network calls made }); it('should respect analytics level', () => { const event = { tool: 'get_forecast', status: 'success', response_time_ms: 100, coordinates: { lat: 40.7, lon: -74.0 } // PII }; const minimal = anonymizeEvent(event, 'minimal'); expect(minimal.response_time_ms).toBeUndefined(); expect(minimal.coordinates).toBeUndefined(); }); it('should hash session IDs', () => { const event = { session_id: 'user-session-123' }; const hashed = anonymizeEvent(event, 'detailed'); expect(hashed.session_id).not.toBe('user-session-123'); expect(hashed.session_id).toHaveLength(16); }); it('should round timestamps to hour', () => { const collector = new AnalyticsCollector({ enabled: true, level: 'minimal' }); const timestamp = collector['roundToHour'](new Date('2025-11-10T14:35:42.123Z')); expect(timestamp).toBe('2025-11-10T14:00:00.000Z'); }); }); ``` ### 5.2 Integration Tests **File:** `tests/integration/analytics.test.ts` ```typescript import { describe, it, expect, beforeEach, afterEach } from 'vitest'; import { createServer } from 'http'; describe('Analytics Integration', () => { let mockServer: any; beforeEach(() => { // Start mock analytics server mockServer = createServer((req, res) => { let body = ''; req.on('data', chunk => body += chunk); req.on('end', () => { const data = JSON.parse(body); // Verify no PII in events for (const event of data.events) { expect(event.latitude).toBeUndefined(); expect(event.longitude).toBeUndefined(); expect(event.user_id).toBeUndefined(); } res.writeHead(200); res.end(); }); }); }); afterEach(() => { mockServer.close(); }); it('should send batched events', async () => { // Test batch sending logic }); }); ``` ### 5.3 Privacy Tests **File:** `tests/unit/privacy.test.ts` ```typescript describe('Privacy Compliance', () => { it('should never send coordinates', () => { // Test all code paths to ensure coordinates are stripped }); it('should never send location names', () => { // Test parameter sanitization }); it('should anonymize all events', () => { // Test anonymization for all analytics levels }); it('should use one-way hashing for session IDs', () => { // Verify hash cannot be reversed }); }); ``` --- ## 6. Documentation Updates ### 6.1 README.md Add new section: ```markdown ## Analytics (Optional) Weather MCP Server supports **optional, anonymous analytics** to help improve the product. Analytics are **disabled by default** and must be explicitly enabled. ### What's Collected? Analytics are completely anonymous and respect your privacy: - **Minimal Level**: Tool usage counts and error types - **Standard Level**: Performance metrics and API service selection - **Detailed Level**: Anonymized workflow patterns **Never Collected:** - Exact coordinates or location names - User identifiers or personal information - Weather data or API responses - Anything that could identify you ### Enabling Analytics Add to your `.env` file: ```bash # Enable minimal analytics (recommended) ANALYTICS_ENABLED=true ANALYTICS_LEVEL=minimal # Or choose standard/detailed ANALYTICS_LEVEL=standard ``` See [Analytics Privacy Policy](docs/ANALYTICS_PRIVACY.md) for details. ### Why Enable Analytics? - Help identify and fix bugs faster - Improve the most-used features - Optimize performance - Inform better documentation Your privacy is paramount. All analytics code is open source and auditable. ``` ### 6.2 New Document: `docs/ANALYTICS_PRIVACY.md` ```markdown # Analytics Privacy Policy ## Overview Weather MCP Server collects anonymous usage statistics to improve the product. Analytics are: - **Opt-in**: Disabled by default - **Anonymous**: No personal data collected - **Transparent**: Exactly what's collected is documented - **Auditable**: All code is open source ## What We Collect [Detailed breakdown of each analytics level with examples] ## What We Don't Collect - Exact coordinates or locations - User identifiers, API keys, or tokens - IP addresses or device information - Weather data or API responses - Anything that could identify you ## Data Retention - Events: 90 days - Aggregated metrics: 2 years - No user profiles or histories ## Your Rights - You can disable analytics anytime - No data is linked to you personally - GDPR/CCPA compliant (anonymous data) ## Questions? Contact: privacy@weather-mcp.example.com ``` ### 6.3 CHANGELOG.md ```markdown ## [1.7.0] - TBD ### Added - Optional anonymous analytics (opt-in via .env) - Three analytics levels: minimal, standard, detailed - Privacy-first design with no PII collection - Analytics privacy policy documentation ### Changed - None (analytics are purely additive) ### Security - All analytics sent over HTTPS - Client-side anonymization before transmission - No user tracking or identification possible ``` --- ## 7. Rollout Plan ### Phase 1: Development (Week 1-2) - [ ] Implement core analytics infrastructure - [ ] Write comprehensive unit tests - [ ] Implement anonymization utilities - [ ] Create privacy tests ### Phase 2: Integration (Week 3) - [ ] Integrate middleware with all tool handlers - [ ] Implement batching and buffering - [ ] Add configuration validation - [ ] Write integration tests ### Phase 3: Documentation (Week 4) - [ ] Update README.md - [ ] Create ANALYTICS_PRIVACY.md - [ ] Update .env.example - [ ] Create migration guide ### Phase 4: Testing (Week 5) - [ ] Internal testing with all analytics levels - [ ] Privacy audit (verify no PII leaks) - [ ] Performance testing (ensure no impact) - [ ] Security review ### Phase 5: Soft Launch (Week 6) - [ ] Release v1.7.0-beta with analytics - [ ] Monitor for issues - [ ] Gather feedback - [ ] Iterate on implementation ### Phase 6: General Availability (Week 7+) - [ ] Release v1.7.0 stable - [ ] Announce in README and GitHub - [ ] Create blog post explaining benefits - [ ] Monitor adoption and feedback --- ## 8. Success Metrics ### For Users - **Transparency Score**: 100% of data collection documented - **Privacy Score**: 0 PII incidents - **Reliability**: Analytics never cause MCP server failures - **Performance**: < 1ms average analytics overhead ### For Development - **Adoption Rate**: Target 20% of users opt-in within 3 months - **Data Quality**: < 5% malformed events - **Insights Generated**: 10+ product improvements from analytics - **Bug Detection**: 30% faster issue identification --- ## 9. Open Questions 1. **Analytics Endpoint**: What domain/subdomain to use? - Suggestion: `analytics.weather-mcp.com` 2. **Public Dashboard**: Should we show aggregated stats publicly? - Recommendation: Yes, builds trust 3. **Opt-in Prompt**: Should we prompt users to enable analytics? - Recommendation: No, keep it purely .env based 4. **Analytics Salt**: How to generate/distribute? - Recommendation: Random per installation, auto-generated 5. **Collection Server Hosting**: Where to host? - See separate plan document --- ## 10. Risk Assessment ### High Risk - **Privacy Breach**: Accidentally collect PII - Mitigation: Comprehensive privacy tests, code review ### Medium Risk - **Performance Impact**: Analytics slow down MCP server - Mitigation: Async batching, silent failures - **User Backlash**: Users distrust analytics - Mitigation: Opt-in, transparency, open source ### Low Risk - **Collection Server Downtime**: Analytics unavailable - Mitigation: Fail silently, doesn't affect MCP server --- ## 11. Alternatives Considered ### Alternative 1: Use Existing Analytics Service (Rejected) **Options**: Google Analytics, Mixpanel, Segment **Rejected because**: - Not privacy-focused (they track users) - Requires user identifiers - Data stored on third-party servers - Not GDPR compliant for MCP use case - Loss of control over data ### Alternative 2: No Analytics (Rejected) **Rejected because**: - Can't identify user pain points - Slower bug detection - No data-driven feature prioritization - Community can't see project health ### Alternative 3: Local-Only Analytics (Rejected) **Option**: Store analytics locally, users manually share **Rejected because**: - Low adoption (users won't share) - Biased data (only motivated users) - No real-time insights --- ## 12. Future Enhancements ### v1.8.0 Possibilities - Public analytics dashboard - Anonymous performance comparisons (your install vs. average) - Opt-in crash reporting with stack traces - A/B testing framework for new features ### v2.0.0 Possibilities - Differential privacy implementation - Homomorphic encryption for sensitive metrics - Federated learning for error prediction - Zero-knowledge proofs for data integrity --- ## Conclusion This analytics implementation balances product improvement needs with strict privacy protection. By making analytics opt-in, transparent, and truly anonymous, we can build user trust while gaining valuable insights to improve Weather MCP Server. **Next Steps:** 1. Review this plan with community 2. Implement Phase 1 (core infrastructure) 3. Create analytics collection server (see separate plan) 4. Proceed through rollout phases **Key Principle**: When in doubt, protect privacy. Analytics should never compromise user trust. --- **Document Version**: 1.0 **Last Updated**: 2025-11-10 **Review Date**: 2025-12-10 (1 month)