--- name: Cleaning Up Research Sessions description: Safely remove intermediate files from completed research sessions while preserving important data when_to_use: After research session is complete and consolidated. When research folder has accumulated temporary files. Before archiving or sharing research session. version: 1.0.0 --- # Cleaning Up Research Sessions ## Overview Remove intermediate files created during research workflow while preserving all important data. **Core principle:** Conservative cleanup with user confirmation. Never delete anything important. ## When to Use Use this skill when: - Research session is complete and consolidated - Preparing to archive or share research session folder - Research folder has accumulated temporary/intermediate files - User explicitly asks to clean up **When NOT to use:** - Research is still in progress - User hasn't reviewed final outputs yet - Unsure what files are safe to delete ## Files That Are ALWAYS KEPT **NEVER delete these (protected list):** **Core outputs:** - `SUMMARY.md` - Enhanced findings with methodology - `relevant-papers.json` - Filtered relevant papers - `papers-reviewed.json` - Complete screening history - `papers/` directory - All PDFs and supplementary files - `citations/citation-graph.json` - Citation relationships **Methodology documentation:** - `screening-criteria.json` - Rubric definition (if exists) - `test-set.json` - Rubric validation papers (if exists) - `abstracts-cache.json` - Cached abstracts for re-screening (if exists) - `rubric-changelog.md` - Rubric version history (if exists) **Auxiliary documentation (if exists):** - `README.md` - Project overview - `TOP_PRIORITY_PAPERS.md` - Curated priority list - `evaluated-papers.json` - Rich structured data **Project configuration:** - `.claude/` directory - Permissions and settings - `*.py` helper scripts that were created - Keep for reproducibility ## Files That May Be Cleaned Up **Candidates for removal (with confirmation):** **Intermediate search results:** - `initial-search-results.json` - Raw PubMed results before screening - Safe to delete: Data is in papers-reviewed.json - Reason to keep: Shows raw search results for reproducibility **Temporary files:** - `*.tmp` files - `*.swp` files (vim swap files) - `.DS_Store` (macOS) - `__pycache__/` (Python cache) - `*.pyc` (Python compiled) **Log files:** - `*.log` files - `debug-*.txt` files ## Cleanup Workflow ### Step 1: Analyze Research Session ```bash cd research-sessions/YYYY-MM-DD-description/ # List all files with sizes find . -type f -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh ``` **Identify files by category:** - Core outputs (MUST keep) - Methodology files (SHOULD keep) - Intermediate files (candidates for cleanup) - Temporary files (safe to delete) ### Step 2: Present Cleanup Plan to User **Show what will be deleted:** ``` ๐Ÿงน Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/ Files to KEEP (protected): โœ… SUMMARY.md (45 KB) โœ… relevant-papers.json (12 KB) โœ… papers-reviewed.json (28 KB) โœ… papers/ (14 PDFs, 32 MB) โœ… citations/citation-graph.json (5 KB) โœ… screening-criteria.json (2 KB) โœ… abstracts-cache.json (156 KB) Files that CAN be removed (intermediate): ๐Ÿ—‘๏ธ initial-search-results.json (8 KB) - Raw PubMed results ๐Ÿ—‘๏ธ .DS_Store (6 KB) - macOS metadata Total space to recover: 14 KB Proceed with cleanup? (y/n/review) ``` **Options:** - `y` - Delete intermediate files - `n` - Cancel cleanup, keep everything - `review` - Show contents of each file before deciding ### Step 3: Confirm Deletions **Before deleting ANY file:** 1. **Verify it's not in protected list** 2. **Check file isn't referenced in SUMMARY.md** 3. **Confirm with user one more time** **Example confirmation:** ``` About to delete: - initial-search-results.json (8 KB) This file contains raw PubMed search results. The data is preserved in papers-reviewed.json, so this is safe to delete. Confirm deletion? (y/n) ``` ### Step 4: Perform Cleanup **Delete confirmed files:** ```bash # Move to trash instead of rm (safer) # On macOS: mv initial-search-results.json ~/.Trash/ # On Linux: mv initial-search-results.json ~/.local/share/Trash/files/ # Or use rm if user confirms rm initial-search-results.json ``` **Report results:** ``` โœ… Cleanup complete! Removed: - initial-search-results.json (8 KB) - .DS_Store (6 KB) Space recovered: 14 KB Protected files preserved: - All 8 core files kept - All 14 PDFs kept - All methodology documentation kept ``` ### Step 5: Verify Integrity **After cleanup, verify critical files:** ```bash # Check core files exist test -f SUMMARY.md && echo "โœ“ SUMMARY.md" test -f relevant-papers.json && echo "โœ“ relevant-papers.json" test -f papers-reviewed.json && echo "โœ“ papers-reviewed.json" test -d papers && echo "โœ“ papers/ directory" # Verify JSON files are valid jq empty relevant-papers.json && echo "โœ“ relevant-papers.json valid JSON" jq empty papers-reviewed.json && echo "โœ“ papers-reviewed.json valid JSON" ``` **Report to user:** ``` โœ… Integrity check passed - All core files present - All JSON files valid - All PDFs intact ``` ## Special Cases ### Case 1: Large abstracts-cache.json **If abstracts-cache.json is very large (>100 MB):** ``` โš ๏ธ abstracts-cache.json is 256 MB This file enables re-screening if you update the rubric. Options: 1. Keep (recommended if you might refine rubric) 2. Compress (gzip to ~50 MB, can decompress later) 3. Delete (only if research is final and won't be updated) Choice? (1/2/3) ``` **If user chooses compress:** ```bash gzip abstracts-cache.json # Creates abstracts-cache.json.gz echo "Compressed abstracts-cache.json to $(du -h abstracts-cache.json.gz | cut -f1)" ``` ### Case 2: Helper Scripts **If user created helper scripts during research:** ``` ๐Ÿ“ Found helper scripts: - screen_papers.py (created for batch screening) - deep_dive_papers.py (created for data extraction) These scripts document your methodology. Recommendations: - Keep for reproducibility - Add comments if not already documented - Reference in SUMMARY.md under "Reproducibility" section Keep scripts? (y/n) ``` ### Case 3: Multiple Research Sessions **If cleaning up multiple sessions:** ```bash # Find all research sessions find research-sessions/ -maxdepth 1 -type d # For each session: for session in research-sessions/*/; do echo "Analyzing: $session" # Run cleanup analysis done ``` **Ask user:** ``` Found 5 completed research sessions. Clean up all sessions? (y/n/select) - y: Analyze and clean all sessions - n: Cancel - select: Choose which sessions to clean ``` ## Safety Mechanisms ### Protected File List **Maintain hardcoded list of patterns to NEVER delete:** ```python PROTECTED_PATTERNS = [ 'SUMMARY.md', 'relevant-papers.json', 'papers-reviewed.json', 'papers/*.pdf', 'papers/*.zip', 'citations/citation-graph.json', 'screening-criteria.json', 'test-set.json', 'abstracts-cache.json', 'rubric-changelog.md', 'README.md', 'TOP_PRIORITY_PAPERS.md', 'evaluated-papers.json', '*.py', # Helper scripts '.claude/*', # Project settings ] ``` **Before deleting any file:** ```python def is_protected(filepath): """Check if file matches any protected pattern""" for pattern in PROTECTED_PATTERNS: if fnmatch(filepath, pattern): return True return False # Never delete protected files if is_protected(file_to_delete): print(f"โš ๏ธ ERROR: {file_to_delete} is protected and cannot be deleted") return ``` ### Dry Run Mode **Always show what will be deleted before doing it:** ```bash # Dry run (show only, don't delete) echo "DRY RUN - No files will be deleted" for file in $candidate_files; do if is_safe_to_delete "$file"; then echo "Would delete: $file ($(du -h $file | cut -f1))" fi done echo "" echo "Proceed with actual deletion? (y/n)" ``` ## Integration with Other Skills **After answering-research-questions workflow:** 1. Complete Phase 8 (consolidation) 2. User reviews SUMMARY.md and relevant-papers.json 3. **Optionally**: Run cleaning-up-research-sessions 4. Archive or share research folder **Add to answering-research-questions Phase 8:** ```markdown ### Optional: Cleanup After reviewing outputs, optionally clean up intermediate files: "Research session is complete. Would you like me to clean up intermediate files? I'll show you what will be deleted before removing anything." If yes: Use `cleaning-up-research-sessions` skill ``` ## Common Mistakes **Deleting papers-reviewed.json:** This is the deduplication database - NEVER delete โ†’ Always protect with hardcoded list **Deleting abstracts-cache.json:** Needed for re-screening โ†’ Ask user, default to keep **Deleting helper scripts:** Important for reproducibility โ†’ Keep by default, ask if user wants to remove **Not showing user what will be deleted:** User needs to see the plan โ†’ Always show dry run first **Deleting without confirmation:** Too risky โ†’ Always ask for final confirmation **Not verifying after cleanup:** Files could be corrupted โ†’ Run integrity check after deletion ## Quick Reference | Action | Command | |--------|---------| | List all files with sizes | `find . -type f -exec ls -lh {} \; \| sort -rh` | | Check if JSON valid | `jq empty file.json` | | Move to trash (macOS) | `mv file ~/.Trash/` | | Move to trash (Linux) | `mv file ~/.local/share/Trash/files/` | | Compress large file | `gzip file` | | Calculate folder size | `du -sh .` | ## Success Criteria Cleanup successful when: - All protected files still exist and are valid - Only confirmed intermediate files deleted - User knows what was deleted and why - Space recovered is reported - Integrity check passed - Research session is still fully functional ## Example Session ``` User: "Can you clean up the research session folder?" Agent: "I'll analyze the research session and show you what can be safely cleaned up." [Analyzes folder] Agent: "๐Ÿงน Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/ Files to KEEP (protected): โœ… SUMMARY.md (45 KB) - Final findings โœ… relevant-papers.json (12 KB) - Filtered results โœ… papers-reviewed.json (28 KB) - Complete history โœ… papers/ (14 PDFs, 32 MB) - Downloaded papers โœ… citations/citation-graph.json (5 KB) - Citation network โœ… screening-criteria.json (2 KB) - Rubric definition โœ… abstracts-cache.json (156 KB) - Cached abstracts Files that CAN be removed: ๐Ÿ—‘๏ธ initial-search-results.json (8 KB) - Raw PubMed results โ†’ Data is preserved in papers-reviewed.json ๐Ÿ—‘๏ธ .DS_Store (6 KB) - macOS metadata โ†’ Not needed Total space to recover: 14 KB Proceed with cleanup? (y/n/review)" User: "y" Agent: "Final confirmation - About to delete: - initial-search-results.json (8 KB) - .DS_Store (6 KB) All protected files will be preserved. Confirm? (y/n)" User: "y" Agent: [Deletes files] "โœ… Cleanup complete! Removed: - initial-search-results.json (8 KB) - .DS_Store (6 KB) Space recovered: 14 KB Running integrity check... โœ… All core files intact โœ… All JSON files valid โœ… All 14 PDFs present Research session is ready for archiving or sharing." ``` ## Next Steps After cleanup: - Research folder is clean and ready to archive - Share folder with collaborators - Move to long-term storage - Continue with follow-up research if needed