---
name: project-sharing
description: Prepare organized packages of project files for sharing at different levels - from summary PDFs to fully reproducible archives. Creates copies with cleaned notebooks, documentation, and appropriate file selection. After creating sharing package, all work continues in the main project directory.
version: 1.1.0
---

# Project Sharing and Output Preparation

Expert guidance for preparing project outputs for sharing with collaborators, reviewers, or repositories. Creates organized packages at different sharing levels while preserving your working directory.

## When to Use This Skill

- Sharing analysis results with collaborators
- Preparing supplementary materials for publications
- Creating reproducible research packages
- Archiving completed projects
- Handoff to other researchers
- Submitting to data repositories

## Core Principles

1. **Work on copies** - Never modify the working directory
2. **Choose appropriate level** - Match sharing depth to audience needs
3. **Document everything** - Include clear guides and metadata
4. **Clean before sharing** - Remove debug code, clear outputs, anonymize if needed
5. **Make it reproducible** - Include dependencies and instructions
6. **⚠️ CRITICAL: After creating sharing folder, all future work happens in the main project directory, NOT in the sharing folder** - Sharing folders are read-only snapshots

---

## Three Sharing Levels

### Level 1: Summary Only

**Purpose:** Quick sharing for presentations, reports, or high-level review

**What to include:**
- PDF export of final notebook(s)
- Final data/results (CSV, Excel, figures) - optional
- Brief README

**Use when:**
- Sharing results with non-technical stakeholders
- Presentations or talks
- Quick review without reproduction needs
- Space/time constraints

**Structure:**
```
shared-summary/
├── README.md                          # Brief overview
├── analysis-YYYY-MM-DD.pdf           # Notebook as PDF
└── results/
    ├── figures/
    │   ├── fig1-main-result.png
    │   └── fig2-comparison.png
    └── tables/
        └── summary-statistics.csv
```

---

### Level 2: Reproducible

**Purpose:** Enable others to reproduce your analysis from processed data

**What to include:**
- Analysis notebooks (.ipynb) - cleaned
- Scripts for figure generation
- Processed/analysis-ready data
- Requirements file (requirements.txt or environment.yml)
- Detailed README with instructions

**Use when:**
- Sharing with collaborating researchers
- Peer review / manuscript supplementary materials
- Teaching or tutorials
- Standard collaboration needs

**Structure:**
```
shared-reproducible/
├── README.md                          # Setup and reproduction instructions
├── MANIFEST.md                        # File descriptions
├── environment.yml                    # Conda environment OR requirements.txt
├── notebooks/
│   ├── 01-data-processing.ipynb      # Cleaned, outputs cleared
│   ├── 02-analysis.ipynb
│   └── 03-visualization.ipynb
├── scripts/
│   ├── generate_figures.py           # Standalone scripts
│   └── utils.py
└── data/
    ├── processed/
    │   ├── cleaned_data.csv
    │   └── processed_results.tsv
    └── README.md                      # Data provenance
```

---

### Level 3: Full Traceability

**Purpose:** Complete transparency from raw data through all processing steps

**What to include:**
- Starting/raw data
- All processing scripts and notebooks
- All intermediate files
- Final results
- Complete documentation
- Full dependency specification

**Use when:**
- Archiving for future reference
- Regulatory compliance
- High-stakes reproducibility (clinical, policy)
- Data repository submission (Zenodo, Dryad, etc.)
- Complete project handoff

**Structure:**
```
shared-complete/
├── README.md                          # Complete project guide
├── MANIFEST.md                        # Comprehensive file listing
├── environment.yml
├── data/
│   ├── raw/                          # Original, unmodified data
│   │   ├── sample_A_reads.fastq.gz
│   │   └── README.md                 # Data source, download date
│   ├── intermediate/                 # Processing steps
│   │   ├── 01-filtered/
│   │   ├── 02-aligned/
│   │   └── README.md
│   └── processed/                    # Final analysis-ready
│       └── final_dataset.csv
├── scripts/
│   ├── 01-download-data.sh
│   ├── 02-quality-control.py
│   ├── 03-filtering.py
│   ├── 04-analysis.py
│   └── utils/
├── notebooks/
│   ├── exploratory/                  # Early exploration
│   └── final/                        # Publication analyses
├── results/
│   ├── figures/
│   ├── tables/
│   └── supplementary/
└── documentation/
    ├── methods.md                    # Detailed methodology
    ├── changelog.md                  # Processing decisions
    └── data-dictionary.md            # Variable definitions
```

---

## Preparation Workflow

### Step 1: Ask User for Sharing Level

**Questions to determine level:**

```
Which sharing level do you need?

1. Summary Only - PDF + final results (quick sharing)
2. Reproducible - Notebooks + scripts + data (standard sharing)
3. Full Traceability - Everything from raw data (archival/compliance)

Additional questions:
- Who is the audience? (colleagues, reviewers, public)
- Are there size constraints?
- Any sensitive data to handle?
- Timeline for sharing?
```

### Step 2: Identify Files to Include

**For each level, identify:**

**Level 1 - Summary:**
- Main analysis notebook(s)
- Key figures (publication-quality)
- Summary tables/statistics
- Optional: Final processed dataset

**Level 2 - Reproducible:**
- All analysis notebooks (not exploratory)
- Figure generation scripts
- Processed/cleaned data
- Environment specification
- Any utility functions/modules

**Level 3 - Full:**
- Raw data (or links if too large)
- All processing scripts
- All notebooks (including exploratory)
- All intermediate files
- Complete documentation

### Step 3: Create Sharing Directory

```bash
# Create dated directory
SHARE_DIR="shared-$(date +%Y%m%d)-[level]"
mkdir -p "$SHARE_DIR"

# Create subdirectories based on level
# ... appropriate structure from above
```

### Step 4: Copy and Clean Files

**For notebooks (.ipynb):**

```python
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor

def clean_notebook(input_path, output_path):
    """Clean notebook: clear outputs, remove debug cells."""

    # Read notebook
    with open(input_path, 'r') as f:
        nb = nbformat.read(f, as_version=4)

    # Clear outputs
    clear_output = ClearOutputPreprocessor()
    nb, _ = clear_output.preprocess(nb, {})

    # Remove cells tagged as 'debug' or 'remove'
    nb.cells = [cell for cell in nb.cells
                if 'debug' not in cell.metadata.get('tags', [])
                and 'remove' not in cell.metadata.get('tags', [])]

    # Write cleaned notebook
    with open(output_path, 'w') as f:
        nbformat.write(nb, f)
```

**For data files:**
- Copy as-is for small files
- Consider compression for large files
- Check for sensitive information

**For scripts:**
- Remove debugging code
- Add docstrings if missing
- Ensure paths are relative

### Step 5: Generate Documentation

#### README.md Template

```markdown
# Project: [Project Name]

**Date:** YYYY-MM-DD
**Author:** [Your Name]
**Sharing Level:** [Summary/Reproducible/Full]

## Overview

Brief description of the project and analysis.

## Contents

See MANIFEST.md for detailed file descriptions.

## Requirements

[For Reproducible/Full levels]
- Python 3.X
- See environment.yml for dependencies

## Setup

\`\`\`bash
# Create environment
conda env create -f environment.yml
conda activate project-name
\`\`\`

## Reproduction Steps

[For Reproducible/Full levels]

1. [Description of first step]
   \`\`\`bash
   jupyter notebook notebooks/01-analysis.ipynb
   \`\`\`

2. [Description of second step]

## Data Sources

[For Full level]
- Dataset A: [Source, download date, version]
- Dataset B: [Source, download date, version]

## Contact

[Your email or preferred contact]

## License

[If applicable - e.g., CC BY 4.0, MIT]
```

#### MANIFEST.md Template

```markdown
# File Manifest

Generated: YYYY-MM-DD

## Directory Structure

\`\`\`
shared-YYYYMMDD/
├── README.md                  - Project overview and setup
├── MANIFEST.md               - This file
[... complete tree ...]
\`\`\`

## File Descriptions

### Notebooks

- \`notebooks/01-data-processing.ipynb\` - Initial data loading and cleaning
- \`notebooks/02-analysis.ipynb\` - Main statistical analysis
- \`notebooks/03-visualization.ipynb\` - Figure generation for publication

### Data

- \`data/processed/cleaned_data.csv\` - Quality-controlled dataset (N=XXX samples)
  - Columns: [list key columns]
  - Missing values handled by [method]

### Scripts

- \`scripts/generate_figures.py\` - Automated figure generation
  - Usage: \`python generate_figures.py --input data/processed/cleaned_data.csv\`

### Results

- \`results/figures/fig1-main.png\` - Main result showing [description]
- \`results/tables/summary_stats.csv\` - Descriptive statistics

[Continue for all files...]
```

### Step 6: Handle Sensitive Data

**Check for sensitive information:**
- Personal identifiable information (PII)
- Access credentials (API keys, passwords)
- Proprietary data
- Institutional data with sharing restrictions
- Patient/subject identifiers

**Strategies:**
1. **Anonymize** - Remove or hash identifiers
2. **Exclude** - Don't include sensitive files
3. **Aggregate** - Share summary statistics only
4. **Document restrictions** - Note what's excluded and why

**Example anonymization:**
```python
import hashlib

def anonymize_ids(df, id_column='subject_id'):
    """Replace IDs with hashed values."""
    df[id_column] = df[id_column].apply(
        lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:8]
    )
    return df
```

### Step 7: Package and Compress

**For smaller packages (<100MB):**
```bash
# Create zip archive
zip -r shared-YYYYMMDD.zip shared-YYYYMMDD/
```

**For larger packages:**
```bash
# Create tar.gz (better compression)
tar -czf shared-YYYYMMDD.tar.gz shared-YYYYMMDD/

# Or split into parts if very large
tar -czf - shared-YYYYMMDD/ | split -b 1G - shared-YYYYMMDD.tar.gz.part
```

**Document package contents:**
- Total size
- Number of files
- Compression method
- How to extract

### Step 8: Return to Working Directory

**⚠️ IMPORTANT: After creating the sharing package, always work in the main project directory.**

The sharing folder is a **snapshot for distribution only**. Any future development, analysis, or modifications should happen in your original working directory, not in the `shared-*/` folder.

**Claude should:**
- Change directory back to main project: `cd ..` (if needed)
- Confirm working directory: `pwd`
- Continue all work in the original project location
- Treat sharing folders as read-only archives

**Example:**
```bash
# After creating sharing package
cd /path/to/main/project  # Return to working directory
pwd                        # Verify location
# Continue work here, NOT in shared-YYYYMMDD/
```

---

## Best Practices

### Notebook Cleaning

**Before sharing notebooks:**

1. **Clear all outputs**
   ```bash
   jupyter nbconvert --clear-output --inplace notebook.ipynb
   ```

2. **Remove debug cells**
   - Tag cells for removal: Cell → Cell Tags → add "remove"
   - Filter during copy

3. **Add markdown explanations**
   - Ensure each code cell has context
   - Add section headers
   - Document assumptions

4. **Check cell execution order**
   - Run "Restart & Run All" to verify
   - Fix any out-of-order dependencies

5. **Remove absolute paths**
   ```python
   # ❌ Bad
   data = pd.read_csv('/Users/yourname/project/data.csv')

   # ✅ Good
   data = pd.read_csv('../data/data.csv')
   # or
   from pathlib import Path
   data_dir = Path(__file__).parent / 'data'
   ```

### File Organization

**Naming conventions for shared files:**
- Use descriptive names: `telomere_analysis_results.csv` not `results.csv`
- Include dates for time-sensitive data: `data_2024-01-15.csv`
- Version if applicable: `analysis_v2.ipynb`
- No spaces: use `-` or `_`

**Size considerations:**
- Document large files in README
- Consider hosting large data separately (institutional storage, Zenodo)
- Provide download links instead of including in package
- Use `.gitattributes` for large file tracking if using Git

### Documentation Requirements

**Minimum documentation for each level:**

**Level 1 - Summary:**
- What the results show
- Key findings
- Date and author

**Level 2 - Reproducible:**
- Setup instructions
- How to run the analysis
- Software dependencies
- Expected runtime
- Data source information

**Level 3 - Full:**
- Complete methodology
- All data sources with versions
- Processing decisions and rationale
- Known issues or limitations
- Contact information

### Dependency Management

**Create requirements file:**

**For pip:**
```bash
# From active environment
pip freeze > requirements.txt

# Or manually curated (better)
cat > requirements.txt << EOF
pandas>=1.5.0
numpy>=1.23.0
matplotlib>=3.6.0
scipy>=1.9.0
EOF
```

**For conda:**
```bash
# Export current environment
conda env export > environment.yml

# Or minimal (recommended)
conda env export --from-history > environment.yml

# Then edit to remove build-specific details
```

---

## Common Scenarios

### Scenario 1: Sharing with Lab Collaborators

**Level:** Reproducible

**Include:**
- Cleaned analysis notebooks
- Processed data
- Figure generation scripts
- environment.yml
- README with reproduction steps

**Don't include:**
- Exploratory notebooks
- Failed analysis attempts
- Debug outputs
- Personal notes

### Scenario 2: Manuscript Supplementary Material

**Level:** Reproducible or Full (depending on journal)

**Include:**
- All notebooks used for figures in paper
- Scripts for each figure panel
- Processed data (or instructions to obtain)
- Complete environment specification
- Detailed methods document

**Best practices:**
- Number notebooks to match paper sections
- Export key figures in publication formats (PDF, high-res PNG)
- Include data dictionary for all variables
- Test reproduction on clean environment

### Scenario 3: Project Archival

**Level:** Full Traceability

**Include:**
- Complete data pipeline from raw to processed
- All versions of analysis
- Meeting notes or decision logs
- External tool versions
- System information

**Organization tips:**
- Use dates in directory names
- Keep chronological changelog
- Document all external dependencies
- Include contact info for questions

### Scenario 4: Data Repository Submission (Zenodo, Figshare)

**Level:** Full Traceability

**Additional considerations:**
- Add LICENSE file (CC BY 4.0, MIT, etc.)
- Include CITATION.cff or CITATION.txt
- Comprehensive metadata
- README with DOI/reference instructions
- Consider maximum file sizes
- Review repository-specific guidelines

---

## Quality Checklist

Before finalizing the sharing package:

### File Quality
- [ ] All notebooks run without errors
- [ ] Notebook outputs cleared
- [ ] No absolute paths in code
- [ ] No hardcoded credentials or API keys
- [ ] File sizes documented
- [ ] Large files compressed or linked

### Documentation
- [ ] README explains setup and usage
- [ ] MANIFEST describes all files
- [ ] Data sources documented
- [ ] Dependencies specified
- [ ] Contact information included
- [ ] License specified (if applicable)

### Reproducibility
- [ ] Requirements file tested in clean environment
- [ ] All data accessible (included or linked)
- [ ] Scripts run in documented order
- [ ] Expected outputs match actual outputs
- [ ] Processing time documented

### Privacy & Sensitivity
- [ ] No sensitive data included
- [ ] Identifiers anonymized if needed
- [ ] Institutional policies checked
- [ ] Collaborator permissions obtained

### Organization
- [ ] Clear directory structure
- [ ] Consistent naming conventions
- [ ] Files logically grouped
- [ ] No duplicate files
- [ ] No unnecessary files (cache, .DS_Store, etc.)

---

## Integration with Other Skills

**Works well with:**
- **folder-organization** - Ensures source project is well-organized before sharing
- **jupyter-notebook-analysis** - Creates notebooks that are share-ready
- **managing-environments** - Documents dependencies properly

**Before using this skill:**
1. Organize working directory (folder-organization)
2. Finalize analysis (jupyter-notebook-analysis)
3. Document environment (managing-environments)

**After using this skill:**
1. Test package in clean environment
2. Share via appropriate channel (email, repository, cloud storage)
3. Keep archived copy for reference

---

## Example Scripts

### Create Sharing Package Script

```python
#!/usr/bin/env python3
"""Create sharing package for project."""

import shutil
from pathlib import Path
from datetime import date
import nbformat
from nbconvert.preprocessors import ClearOutputPreprocessor

def create_sharing_package(level='reproducible', output_dir=None):
    """
    Create sharing package.

    Args:
        level: 'summary', 'reproducible', or 'full'
        output_dir: Output directory name (auto-generated if None)
    """

    # Create output directory
    if output_dir is None:
        output_dir = f"shared-{date.today():%Y%m%d}-{level}"

    share_path = Path(output_dir)
    share_path.mkdir(exist_ok=True)

    print(f"Creating {level} sharing package in {share_path}")

    # Create structure based on level
    if level == 'summary':
        create_summary_package(share_path)
    elif level == 'reproducible':
        create_reproducible_package(share_path)
    elif level == 'full':
        create_full_package(share_path)

    print(f"✓ Package created: {share_path}")
    print(f"  Review and compress: tar -czf {share_path}.tar.gz {share_path}")

def clean_notebook(input_path, output_path):
    """Clean notebook outputs and debug cells."""
    with open(input_path) as f:
        nb = nbformat.read(f, as_version=4)

    # Clear outputs
    clear = ClearOutputPreprocessor()
    nb, _ = clear.preprocess(nb, {})

    # Remove debug cells
    nb.cells = [c for c in nb.cells
                if 'debug' not in c.metadata.get('tags', [])]

    with open(output_path, 'w') as f:
        nbformat.write(nb, f)

# ... implement level-specific functions ...

if __name__ == '__main__':
    import sys
    level = sys.argv[1] if len(sys.argv) > 1 else 'reproducible'
    create_sharing_package(level)
```

---

## Summary

**Key principles for project sharing:**

1. 🎯 **Choose the right level** - Match sharing depth to audience needs
2. 📋 **Copy, don't move** - Preserve your working directory
3. 🧹 **Clean thoroughly** - Remove debug code, clear outputs
4. 📝 **Document everything** - README + MANIFEST minimum
5. 🔒 **Check sensitivity** - Anonymize or exclude as needed
6. ✅ **Test before sharing** - Run in clean environment
7. 📦 **Package properly** - Compress and document contents
8. ⚠️ **Work in main directory** - After creating sharing package, ALL future work happens in the original project directory, NOT in the sharing folder

**Remember:** Good sharing practices benefit both collaborators and your future self!

---

## ⚠️ Critical Reminder for Claude

**After creating any sharing package:**

1. **Always return to the main project directory**
2. **Never work in `shared-*/` directories** - These are read-only snapshots
3. **All future edits, analysis, and development happen in the original working directory**
4. **Sharing folders are for distribution only, not active development**

If the user asks to modify files, always check the current directory and ensure you're working in the main project location, not in a sharing package.