--- name: extracting-session-data description: Locates, lists, filters, and extracts structured data from Claude Code native session logs. Supports both single and multiple session analysis. --- # Extracting Session Data Skill ## Core Responsibility Provide raw access to Claude Code session logs stored in `~/.claude/projects/{project-dir}/{session-id}.jsonl`. **Key Principle**: This skill extracts data only - return raw data to calling skills for analysis. Do not analyze or interpret within this skill. ## Available Scripts All scripts located in `scripts/` subdirectory relative to this skill. ### 1. locate-logs.sh Find log directory or specific session file path. ```bash # Get logs directory for current working directory scripts/locate-logs.sh # Get logs directory for specific project scripts/locate-logs.sh /path/to/project # Get specific session log file path scripts/locate-logs.sh /path/to/project abc123-session-id ``` **Use when**: Building dynamic paths, verifying logs exist before processing. ### 2. list-sessions.sh Enumerate all sessions with metadata (ID, size, lines, date, branch). ```bash # List all sessions (table format) scripts/list-sessions.sh # JSON output scripts/list-sessions.sh --format json # Sort by size or lines scripts/list-sessions.sh --sort size scripts/list-sessions.sh --sort lines # Specific project scripts/list-sessions.sh /path/to/project ``` **Output formats**: `table`, `json`, `csv` **Sort options**: `date`, `size`, `lines` **Use when**: Starting retrospective, showing available sessions to user, checking for recent sessions. ### 3. extract-data.sh Parse JSONL logs and extract specific data types. **Available extraction types**: - `metadata` - Session info (ID, timestamps, branch, working dir) - `user-prompts` - All user messages - `tool-usage` - Tool call statistics - `errors` - Failed tool calls with timestamps - `thinking` - Thinking blocks (if extended thinking enabled) - `text-responses` - Assistant text responses only - `statistics` - Session metrics (message counts, tool calls, errors) - `all` - Combined extraction ```bash # Extract from specific session scripts/extract-data.sh --type statistics --session SESSION_ID scripts/extract-data.sh --type errors --session SESSION_ID scripts/extract-data.sh --type tool-usage --session SESSION_ID # Extract from all sessions (omit --session) scripts/extract-data.sh --type statistics # Limit output scripts/extract-data.sh --type user-prompts --limit 10 # Different project scripts/extract-data.sh --type metadata --project /path/to/project ``` **Use when**: Need specific data without loading entire log, generating metrics, identifying errors. ### 4. filter-sessions.sh Find sessions matching criteria. **Filter options**: - `--since DATE` - Sessions modified since date ("2 days ago", "2025-10-20") - `--until DATE` - Sessions modified until date - `--branch NAME` - Sessions on specific git branch - `--min-size SIZE` - Minimum file size ("1M", "500K") - `--max-size SIZE` - Maximum file size - `--min-lines N` - Minimum line count - `--max-lines N` - Maximum line count - `--has-errors` - Only sessions with failed tool calls - `--keyword WORD` - Sessions containing keyword **Output formats**: `list`, `paths`, `json` ```bash # Recent sessions scripts/filter-sessions.sh --since "2 days ago" # Large sessions with errors scripts/filter-sessions.sh --min-lines 500 --has-errors # Sessions on main branch in last week scripts/filter-sessions.sh --branch main --since "7 days ago" # Sessions containing keyword scripts/filter-sessions.sh --keyword "authentication" # Get paths only (for piping) scripts/filter-sessions.sh --since "1 day ago" --format paths ``` **Use when**: User requests analysis of recent sessions, finding sessions for specific feature/branch, identifying problematic sessions. ## Working Process ### Single Session Analysis ```bash # 1. Verify session exists and get metadata scripts/extract-data.sh --type metadata --session SESSION_ID # 2. Get session statistics (to determine size) scripts/extract-data.sh --type statistics --session SESSION_ID # 3. Extract specific data as needed scripts/extract-data.sh --type errors --session SESSION_ID scripts/extract-data.sh --type tool-usage --session SESSION_ID ``` ### Multiple Session Analysis ```bash # 1. Filter to find relevant sessions scripts/filter-sessions.sh --since "7 days ago" --branch main # 2. Extract data from all filtered sessions scripts/extract-data.sh --type statistics # 3. Or iterate through filtered subset SESSIONS=$(scripts/filter-sessions.sh --has-errors --format paths) for session in $SESSIONS; do SESSION_ID=$(basename "$session" .jsonl) scripts/extract-data.sh --type errors --session $SESSION_ID done ``` ### Integration Pattern for Calling Skills When another skill (like retrospecting) needs session data: 1. **Discovery**: Use `list-sessions.sh` or `filter-sessions.sh` to find relevant sessions 2. **Size Check**: Use `extract-data.sh --type statistics` to determine session complexity 3. **Targeted Extraction**: Use `extract-data.sh` with specific types for needed data 4. **Return Raw Data**: Return extracted data to caller for analysis **Example**: ```bash # Get latest session ID LATEST=$(scripts/list-sessions.sh --format json --sort date | jq -r '.[0].sessionId') # Check size before processing STATS=$(scripts/extract-data.sh --type statistics --session $LATEST) LINE_COUNT=$(echo "$STATS" | grep "Total Lines:" | awk '{print $3}') # Extract based on size if [ "$LINE_COUNT" -lt 500 ]; then # Small session: extract detail scripts/extract-data.sh --type errors --session $LATEST scripts/extract-data.sh --type tool-usage --session $LATEST else # Large session: summary only scripts/extract-data.sh --type statistics --session $LATEST fi ``` ## Context Budget Management **CRITICAL: This skill is designed for context efficiency** ### Use Bash Processing, Not Read Tool ```bash # GOOD: Extract via bash, stays in bash context STATS=$(scripts/extract-data.sh --type statistics) # Process $STATS in bash # BAD: Reading full log files Read ~/.claude/projects/-path/session.jsonl # Loads entire file into context unnecessarily ``` ### Check Session Size Before Loading **Never load full session logs into context without checking size first.** ```bash # Always check statistics first scripts/extract-data.sh --type statistics --session SESSION_ID # Shows total lines, message counts, etc. # Decision rules: # - Small (<500 lines): Can extract detail safely # - Medium (500-2000 lines): Use selective extraction # - Large (>2000 lines): Statistics only, offer targeted deep-dives ``` ### Return Raw Data to Caller This skill should: - Execute bash scripts to extract data - Return raw text output to calling skill - Let calling skill manage context for analysis - Avoid interpretation or analysis within this skill ## Output Format Return **raw extracted data** with minimal formatting: ``` # Statistics output Session: abc123-def456-ghi789 Total Lines: 450 User Messages: 12 Assistant Messages: 23 Tool Calls: 45 Errors: 2 # Tool usage output === Tool Usage: abc123-def456-ghi789 === Read 15 Bash 12 Edit 8 Grep 5 Write 3 ``` No analysis, no interpretation - just data extraction. ## Error Handling All scripts exit with non-zero status on errors and output to stderr. Check exit status before processing: ```bash if ! scripts/locate-logs.sh /path/to/project &>/dev/null; then # Handle: logs directory doesn't exist echo "Project has no session logs yet" fi if ! scripts/extract-data.sh --type metadata --session abc123 &>/dev/null; then # Handle: session doesn't exist echo "Session not found" fi ``` Common error messages: - `Error: Logs directory not found: ~/.claude/projects/-path` - `Error: Session file not found: ~/.claude/projects/-path/session-id.jsonl` - `Error: --type is required` - `Error: jq is required but not installed. Install with: brew install jq` ## Path Calculation Claude Code stores sessions using this pattern: ``` ~/.claude/projects/{project-identifier}/{session-id}.jsonl ``` Where `{project-identifier}` is calculated by replacing all `/` with `-` in the absolute working directory path: ```bash # Example: /Users/user/project → -Users-user-project PROJECT_ID=$(echo "${PWD}" | sed 's/\//\-/g') LOGS_DIR="${HOME}/.claude/projects/${PROJECT_ID}" ``` All scripts use `locate-logs.sh` internally for consistent path calculation. ## Anti-Patterns to Avoid **Don't**: - Load full session logs into context without checking size - Parse JSONL manually - use `extract-data.sh` - Hardcode log paths - use `locate-logs.sh` - Analyze or interpret data - return raw data to caller - Process large logs synchronously without user awareness **Do**: - Check session size with `--type statistics` before processing - Use appropriate extraction type for specific needs - Filter sessions before extraction for efficiency - Stream/pipe data when processing multiple sessions - Return raw data for caller to analyze ## Success Criteria Effective use of this skill means: 1. **Efficient Discovery**: Quickly find relevant sessions without manual searching 2. **Targeted Extraction**: Get exactly the data needed, nothing more 3. **Context Preservation**: Avoid loading unnecessary data into context 4. **Raw Data Focus**: Return unprocessed data for caller to analyze 5. **Multi-Session Support**: Handle analysis across timeframes or branches efficiently ## Dependencies **Required**: - `bash` (v4.0+) - `jq` (JSON parser) Scripts check for `jq` and provide installation instructions if missing: ``` Error: jq is required but not installed. Install with: brew install jq ```