--- plugin: seo updated: 2026-01-20 name: data-extraction-patterns description: Common patterns for extracting analytics data from GA4 and GSC with API handling --- plugin: seo updated: 2026-01-20 # Data Extraction Patterns ## When to Use - Setting up analytics data pipelines - Combining data from multiple sources - Handling API rate limits and errors - Caching frequently accessed data - Building data collection workflows ## API Reference ### Google Analytics 4 (GA4) **MCP Server**: `mcp-server-google-analytics` **Key Operations**: ``` get_report({ propertyId: "properties/123456789", dateRange: { startDate: "30daysAgo", endDate: "today" }, dimensions: ["pagePath", "date"], metrics: ["screenPageViews", "averageSessionDuration", "bounceRate"] }) ``` **Useful Metrics**: | Metric | Description | Use Case | |--------|-------------|----------| | screenPageViews | Total page views | Traffic volume | | sessions | User sessions | Visitor count | | averageSessionDuration | Avg time in session | Engagement | | bounceRate | Single-page visits | Content quality | | engagementRate | Engaged sessions % | True engagement | | scrolledUsers | Users who scrolled | Content consumption | **Useful Dimensions**: | Dimension | Description | |-----------|-------------| | pagePath | URL path | | date | Date (for trending) | | sessionSource | Traffic source | | deviceCategory | Desktop/mobile/tablet | ### Google Search Console (GSC) **MCP Server**: `mcp-server-gsc` **Key Operations**: ``` search_analytics({ siteUrl: "https://example.com", startDate: "2025-11-27", endDate: "2025-12-27", dimensions: ["query", "page"], rowLimit: 1000 }) get_url_inspection({ siteUrl: "https://example.com", inspectionUrl: "https://example.com/page" }) ``` **Available Metrics**: | Metric | Description | Use Case | |--------|-------------|----------| | clicks | Total clicks from search | Traffic from Google | | impressions | Times shown in results | Visibility | | ctr | Click-through rate | Snippet effectiveness | | position | Average ranking | SEO success | **Dimensions**: | Dimension | Description | |-----------|-------------| | query | Search query | | page | Landing page URL | | country | User country | | device | Desktop/mobile/tablet | | date | Date (for trending) | ## Parallel Execution Pattern ### Optimal Data Fetch (All Sources) ```markdown ## Parallel Data Fetch Pattern When fetching from multiple sources, issue all requests in a SINGLE message for parallel execution: ┌─────────────────────────────────────────────────────────────────┐ │ MESSAGE 1: Parallel Data Requests │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ [MCP Call 1]: google-analytics.get_report(...) │ │ [MCP Call 2]: google-search-console.search_analytics(...) │ │ │ │ → All execute simultaneously │ │ → Results return when all complete │ │ → ~2x faster than sequential │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Sequential (When Needed) Some operations require sequential execution: ```markdown ## Sequential Pattern (Dependencies) When one request depends on another's result: ┌─────────────────────────────────────────────────────────────────┐ │ MESSAGE 1: Get list of pages │ │ → Returns: ["/page1", "/page2", "/page3"] │ ├─────────────────────────────────────────────────────────────────┤ │ MESSAGE 2: Get details for each page │ │ → Uses page list from Message 1 │ │ → Can parallelize within this message │ └─────────────────────────────────────────────────────────────────┘ ``` ## Rate Limiting ### API Rate Limits | API | Limit | Strategy | |-----|-------|----------| | GA4 | 10 QPS per property | Batch dimensions | | GSC | 1,200 requests/min | Paginate large exports | ### Retry Pattern ```bash #!/bin/bash # Retry with exponential backoff MAX_RETRIES=3 RETRY_DELAY=5 fetch_with_retry() { local url="$1" local attempt=1 while [ $attempt -le $MAX_RETRIES ]; do response=$(curl -s -w "%{http_code}" -o /tmp/response.json "$url") http_code="${response: -3}" if [ "$http_code" = "200" ]; then cat /tmp/response.json return 0 elif [ "$http_code" = "429" ]; then echo "Rate limited, waiting ${RETRY_DELAY}s..." >&2 sleep $RETRY_DELAY RETRY_DELAY=$((RETRY_DELAY * 2)) else echo "Error: HTTP $http_code" >&2 return 1 fi attempt=$((attempt + 1)) done echo "Max retries exceeded" >&2 return 1 } ``` ## Caching Pattern ### Session-Based Cache ```bash # Cache structure SESSION_PATH="/tmp/seo-performance-20251227-143000-example" CACHE_DIR="${SESSION_PATH}/cache" CACHE_TTL=3600 # 1 hour in seconds mkdir -p "$CACHE_DIR" # Cache key generation cache_key() { echo "$1" | md5sum | cut -d' ' -f1 } # Check cache get_cached() { local key=$(cache_key "$1") local cache_file="${CACHE_DIR}/${key}.json" if [ -f "$cache_file" ]; then local age=$(($(date +%s) - $(stat -f%m "$cache_file" 2>/dev/null || stat -c%Y "$cache_file"))) if [ $age -lt $CACHE_TTL ]; then cat "$cache_file" return 0 fi fi return 1 } # Save to cache save_cache() { local key=$(cache_key "$1") local cache_file="${CACHE_DIR}/${key}.json" cat > "$cache_file" } # Usage CACHE_KEY="ga4_${URL}_${DATE_RANGE}" if ! RESULT=$(get_cached "$CACHE_KEY"); then RESULT=$(fetch_from_api) echo "$RESULT" | save_cache "$CACHE_KEY" fi ``` ## Date Range Standardization ### Common Date Ranges ```bash # Standard date range calculations TODAY=$(date +%Y-%m-%d) case "$RANGE" in "7d") START_DATE=$(date -v-7d +%Y-%m-%d 2>/dev/null || date -d "7 days ago" +%Y-%m-%d) ;; "30d") START_DATE=$(date -v-30d +%Y-%m-%d 2>/dev/null || date -d "30 days ago" +%Y-%m-%d) ;; "90d") START_DATE=$(date -v-90d +%Y-%m-%d 2>/dev/null || date -d "90 days ago" +%Y-%m-%d) ;; "mtd") START_DATE=$(date +%Y-%m-01) ;; "ytd") START_DATE=$(date +%Y-01-01) ;; esac END_DATE="$TODAY" ``` ### API-Specific Formats | API | Format | Example | |-----|--------|---------| | GA4 | Relative or ISO | "30daysAgo", "2025-12-01" | | GSC | ISO 8601 | "2025-12-01" | ## Graceful Degradation ### Data Source Fallback ```markdown ## Fallback Strategy When a data source is unavailable: ┌─────────────────────────────────────────────────────────────────┐ │ PRIMARY SOURCE │ FALLBACK │ LAST RESORT │ ├──────────────────────┼─────────────────────┼────────────────────┤ │ GA4 traffic data │ GSC clicks │ Estimate from GSC │ │ GSC search perf │ Manual SERP check │ WebSearch SERP │ │ CWV (CrUX) │ PageSpeed API │ Lighthouse CLI │ └──────────────────────┴─────────────────────┴────────────────────┘ ``` ### Partial Data Output ```markdown ## Analysis Report (Partial Data) ### Data Availability | Source | Status | Impact | |--------|--------|--------| | GA4 | NOT CONFIGURED | Missing engagement metrics | | GSC | AVAILABLE | Full search data | ### Analysis Notes This analysis is based on limited data sources: - Search performance metrics are complete (GSC) - Engagement metrics unavailable (no GA4) **Recommendation**: Configure GA4 for complete analysis. Run `/setup-analytics` to add Google Analytics. ``` ## Unified Data Model ### Combined Output Structure ```json { "metadata": { "url": "https://example.com/page", "fetchedAt": "2025-12-27T14:30:00Z", "dateRange": { "start": "2025-11-27", "end": "2025-12-27" } }, "sources": { "ga4": { "available": true, "metrics": { "pageViews": 2450, "avgTimeOnPage": 222, "bounceRate": 38.2, "engagementRate": 64.5 } }, "gsc": { "available": true, "metrics": { "impressions": 15200, "clicks": 428, "ctr": 2.82, "avgPosition": 4.2 }, "topQueries": [ {"query": "seo guide", "clicks": 156, "position": 4} ] } }, "computed": { "healthScore": 72, "status": "GOOD" } } ``` ## Error Handling ### Common Errors | Error | Cause | Resolution | |-------|-------|------------| | 401 Unauthorized | Invalid/expired credentials | Re-run /setup-analytics | | 403 Forbidden | Missing permissions | Check API access in console | | 429 Too Many Requests | Rate limit | Wait and retry with backoff | | 404 Not Found | Invalid property/site | Verify IDs in configuration | | 500 Server Error | API issue | Retry later, check status page | ### Error Output Pattern ```markdown ## Data Fetch Error **Source**: Google Analytics 4 **Error**: 403 Forbidden **Message**: "User does not have sufficient permissions for this property" ### Troubleshooting Steps 1. Verify Service Account email in GA4 Admin 2. Ensure "Viewer" role is granted 3. Check Analytics Data API is enabled 4. Wait 5 minutes for permission propagation ### Workaround Proceeding with available data sources (GSC). GA4 engagement metrics will not be included in this analysis. ```