--- name: context-ingestion description: Scan project folder structure, validate organization, clone GitHub repository, and generate an inventory of available materials. First step of writer workflow. Use when starting a new manuscript project. --- # Context Ingestion Scans the project folder, validates structure, fetches the GitHub repository, and generates an inventory of all available materials. ## Input User provides path to project folder (or current directory if already there). ## Workflow ``` [Receive project path] │ ▼ [Validate Folder Structure] ─── Check required folders exist │ ▼ [Parse config.md] ─── Extract GitHub URL, constraints │ ▼ [Clone GitHub Repository] ─── Fetch code for analysis │ ▼ [Inventory Materials] ─── List all available files │ ▼ [Extract Ethics Content] ─── If ethics/ exists, generate notes/ethics-summary.md │ ▼ [Generate inventory.md] ─── Structured summary ``` ## Step 1: Validate Folder Structure Check that required folders exist: ```bash # Required structure project/ ├── papers/ # Must exist (can be empty) ├── data/ # Must exist (can be empty) ├── figures/ # Must exist (can be empty) ├── ethics/ # Optional - Ethics/governance documents (IRB, IACUC, etc.) └── config.md # Must exist ``` Validation: ```bash cd /path/to/project # Check required folders [ -d "papers" ] || echo "ERROR: papers/ folder missing" [ -d "data" ] || echo "ERROR: data/ folder missing" [ -d "figures" ] || echo "ERROR: figures/ folder missing" [ -f "config.md" ] || echo "ERROR: config.md missing" ``` If validation fails, inform user what's missing and provide the expected structure template. ## Step 2: Parse config.md Extract configuration values: ```markdown # Expected config.md format ## GitHub Repository url: https://github.com/username/repo-name branch: main access: private ## Constraints word_limit: 3500 target_journal: [Target Journal] citation_style: AMA ## Additional Notes [Free text notes] ``` Parse and store: - `github_url`: Repository URL - `github_branch`: Branch to clone (default: main) - `github_access`: public or private - `word_limit`: Target word count - `target_journal`: Journal name for formatting - `citation_style`: AMA, Vancouver, APA, etc. ## Step 3: Clone GitHub Repository For public repositories: ```bash git clone --depth 1 --branch main https://github.com/username/repo-name.git code/ ``` For private repositories, user must have GitHub CLI authenticated: ```bash gh repo clone username/repo-name code/ -- --depth 1 --branch main ``` If clone fails: 1. Check if `gh` is authenticated: `gh auth status` 2. Provide instructions: "Run `gh auth login` to authenticate" 3. Allow user to proceed without code (Methods section will be limited) Store cloned repo at: `project/code/` ## Step 4: Inventory Materials Scan each folder and catalog contents: ### Papers Inventory ```bash ls -la papers/*.pdf 2>/dev/null | wc -l # Count PDFs ``` For each PDF, extract basic info: - Filename - File size - (Attempt to extract title from first page if possible) ### Data Inventory ```bash ls -la data/*.csv data/*.xlsx 2>/dev/null ``` For each data file: - Filename - File size - Row/column count (for CSVs) - Sheet names (for Excel) Preview CSV structure: ```bash head -5 data/results.csv ``` ### Figures Inventory ```bash ls -la figures/*.png figures/*.jpg figures/*.svg 2>/dev/null ``` For each figure: - Filename - Dimensions (if determinable) - File size ### Code Inventory If GitHub clone succeeded: ```bash find code/ -name "*.py" -o -name "*.ipynb" -o -name "*.R" | head -20 ``` Identify: - Primary language (Python, R, etc.) - Notebook files (.ipynb) - Key script files - Requirements/dependencies file ### Ethics Inventory (Optional) If `ethics/` folder exists, scan for governance documents: ```bash ls -la ethics/*.pdf ethics/*.docx ethics/*.md 2>/dev/null ``` Supported formats: - `.md` - Read directly with Read tool - `.pdf` - Read with Claude's native PDF capability - `.docx` - Extract text using `document-skills:docx` skill ## Step 5: Extract Ethics Content **Skip this step if `ethics/` folder does not exist or is empty.** For each document in `ethics/`: 1. Read the document content using appropriate method for format 2. Extract comprehensive study information 3. Generate `notes/ethics-summary.md` ### Ethics Summary Template Create `notes/ethics-summary.md`: ```markdown # Ethics/Governance Document Summary **Source**: [filename] **Extracted**: [timestamp] ## Study Identification - **Protocol Title**: [extracted or "[not found]"] - **Approval Number**: [extracted or "[not found]"] - **Approving Body**: [IRB, IACUC, Ethics Committee, etc.] - **Principal Investigator**: [extracted or "[not found]"] - **Approval Date**: [extracted or "[not found]"] ## Study Design - **Study Type**: [interventional/observational/retrospective/computational/etc.] - **Design**: [RCT, cohort, case-control, cross-sectional, simulation, etc.] - **Duration**: [study period] ## Population/Subjects - **Target Population**: [description] - **Inclusion Criteria**: - [criterion 1] - [criterion 2] - ... - **Exclusion Criteria**: - [criterion 1] - [criterion 2] - ... - **Sample Size**: [N with justification if provided] ## Procedures & Interventions - [Procedure 1] - [Procedure 2] - ... ## Endpoints/Outcomes - **Primary**: [endpoint] - **Secondary**: [endpoints] ## Statistical Considerations - **Power Analysis**: [if provided or "[not found]"] - **Planned Analyses**: [if provided or "[not found]"] ## Notes [Any additional relevant context, caveats, or sections that were unclear] ``` Mark fields as `[not found]` if not present in the document. ## Step 6: Generate inventory.md Create structured inventory document: ```markdown # Project Inventory Generated: [timestamp] Project: [folder name] ## Configuration - **GitHub**: [url] (branch: [branch]) - **Target Journal**: [journal] - **Word Limit**: [limit] - **Citation Style**: [style] ## Papers ([count] files) | Filename | Size | Notes | |----------|------|-------| | smith-2023.pdf | 1.2 MB | | | jones-2022.pdf | 0.8 MB | | ## Data ([count] files) | Filename | Size | Rows | Columns | Preview | |----------|------|------|---------|---------| | results.csv | 45 KB | 156 | 12 | patient_id, age, sex, ... | | demographics.csv | 12 KB | 156 | 8 | patient_id, age, sex, ... | ## Figures ([count] files) | Filename | Dimensions | Size | |----------|------------|------| | figure1.png | 1200x800 | 340 KB | | figure2.png | 1000x600 | 210 KB | ## Code Repository - **URL**: [github url] - **Language**: Python - **Key Files**: - `analysis.ipynb` - Main analysis notebook - `preprocessing.py` - Data preprocessing - `models.py` - ML models - **Dependencies**: pandas, scikit-learn, matplotlib, ... ## Ethics Documents | Filename | Format | Status | |----------|--------|--------| | protocol.pdf | PDF | ✓ Extracted to notes/ethics-summary.md | *Or: "No ethics documents provided"* ## Summary | Category | Count | Status | |----------|-------|--------| | Papers | [n] | ✓ Ready | | Data files | [n] | ✓ Ready | | Figures | [n] | ✓ Ready | | Code repo | 1 | ✓ Cloned | | Ethics documents | [n] | ✓ Extracted / Not provided | ## Missing/Warnings - [List any issues found] ``` ## Output Save to: `project/inventory.md` Create notes directory structure: ```bash mkdir -p notes/papers notes/papers-library drafts ``` Return to parent skill with inventory summary.