--- name: jupytext description: This skill should be used when the user asks to "convert notebook to text", "use jupytext", "version control notebooks", "share data between kernels", "set up multi-kernel project", "pair notebooks with Python files", "sync ipynb and py files", or needs multi-kernel projects (Python/R/Stata/SAS) with version-control-friendly notebooks. --- ## Contents - [Execution Enforcement](#execution-enforcement) - [Core Concepts](#core-concepts) - [Multi-Kernel Data Sharing](#multi-kernel-data-sharing) - [Workflow Integration](#workflow-integration) - [Project Structure](#project-structure) - [Kernel Specification](#kernel-specification) - [Quick Troubleshooting](#quick-troubleshooting) - [Additional Resources](#additional-resources) - [Best Practices](#best-practices) # Jupytext Skill Jupytext converts Jupyter notebooks to/from text formats (.py, .R, .md), enabling version control and multi-kernel workflows. ## Execution Enforcement ### IRON LAW: NO EXECUTION CLAIM WITHOUT OUTPUT VERIFICATION Before claiming ANY jupytext script executed successfully, follow this sequence: 1. **EXECUTE** using the papermill pipeline: `jupytext --to notebook --output - script.py | papermill - output.ipynb` 2. **CHECK** for execution errors (papermill exit code and stderr) 3. **VERIFY** output.ipynb exists and is non-empty 4. **INSPECT** outputs using notebook-debug skill verification 5. **CLAIM** success only after verification passes This is non-negotiable. Claiming "script works" without executing through papermill is LYING to the user. ### Rationalization Table - STOP If You Think: | Excuse | Reality | Do Instead | |--------|---------|------------| | "I converted to ipynb, so it works" | Conversion ≠ execution | EXECUTE with papermill, not just convert | | "The .py file looks correct" | Syntax correctness ≠ runtime correctness | RUN and CHECK outputs | | "I'll let the user execute it" | You're passing broken code | VERIFY before claiming completion | | "Just a conversion task, no execution needed" | User expects working notebook | EXECUTE to confirm it works | | "I can use `jupyter nbconvert --execute`" | Papermill has better error handling | USE the recommended papermill pipeline | | "I'll save the intermediate ipynb first" | Creates clutter | USE the recommended pipeline (no intermediate files) | | "Exit code 0 means success" | Papermill can succeed with errors in cells | CHECK output.ipynb for tracebacks | ### Red Flags - STOP Immediately If You Think: - "Let me just convert and return the ipynb" → NO. EXECUTE with papermill first. - "The .py file is simple, can't have errors" → NO. Simple code fails too. - "I'll execute without papermill" → NO. Use the recommended pipeline. - "Conversion completed, so job done" → NO. Execution verification required. ### Execution Verification Checklist Before EVERY "notebook works" claim: **Conversion:** - [ ] Correct format specified (py:percent recommended) - [ ] Conversion command succeeded - [ ] No syntax errors in conversion **Execution (MANDATORY):** - [ ] Used recommended papermill pipeline: `jupytext --to notebook --output - script.py | papermill - output.ipynb` - [ ] Papermill exit code is 0 - [ ] No errors in stderr - [ ] output.ipynb file created - [ ] output.ipynb is non-empty (>100 bytes) **Output Verification:** - [ ] Used notebook-debug skill's verification checklist - [ ] No tracebacks in any cell - [ ] All cells have execution_count (not null) - [ ] Expected outputs present (plots, dataframes, metrics) - [ ] No unexpected warnings or errors **Multi-Kernel Projects (if applicable):** - [ ] Correct kernel specified in header - [ ] Interchange files created (parquet/DTA) - [ ] Downstream notebooks can read interchange files **Only after ALL checks pass:** - [ ] Claim "notebook executed successfully" ### Gate Function: Jupytext Execution Follow this sequence for EVERY jupytext task involving execution: ``` 1. CONVERT → jupytext --to notebook --output - 2. EXECUTE → papermill - output.ipynb (with params if needed) 3. CHECK → Verify exit code and stderr 4. INSPECT → Use notebook-debug verification 5. VERIFY → Outputs match expectations 6. CLAIM → "Notebook works" only after all gates passed ``` **NEVER skip execution gate.** Converting without executing proves nothing about correctness. ### Honesty Framing **Claiming a jupytext script works without executing it through papermill is LYING.** This is not just format conversion - verify that the notebook executes correctly. The user expects a working notebook, not just syntactically valid code. ## Core Concepts ### Percent Format (Recommended) Use percent format (`py:percent`) for all projects: ```python # %% [markdown] # # Analysis Title # %% import pandas as pd df = pd.read_csv("data.csv") # %% tags=["parameters"] input_file = "data.csv" ``` Cell markers: `# %%` for code, `# %% [markdown]` for markdown. **Markdown dollar signs:** Always wrap `$` in backticks to prevent LaTeX rendering - `# Cost: `$50`` not `# Cost: $50` ### Project Configuration Create `jupytext.toml` in project root: ```toml formats = "ipynb,py:percent" notebook_metadata_filter = "-all" cell_metadata_filter = "-all" ``` ### Essential Commands ```bash # Convert notebook to percent-format Python file jupytext --to py:percent notebook.ipynb # Convert Python script to Jupyter notebook format jupytext --to notebook script.py # Enable bidirectional pairing to keep formats synchronized jupytext --set-formats ipynb,py:percent notebook.ipynb # Synchronize paired notebook and text file jupytext --sync notebook.ipynb ``` ### Execution (Recommended Pattern) **Always pipe to papermill for execution** - no intermediate files: ```bash # Convert script to notebook and execute in atomic operation jupytext --to notebook --output - script.py | papermill - output.ipynb # Convert and execute with parameter injection jupytext --to notebook --output - script.py | papermill - output.ipynb -p start_date "2024-01-01" -p n_samples 1000 # Convert and execute with detailed logging output jupytext --to notebook --output - script.py | papermill - output.ipynb --log-output # Convert and execute in memory without saving intermediate files jupytext --to notebook --output - script.py | papermill - - ``` Key flags: - `--output -` tells jupytext to write to stdout - `papermill - output.ipynb` reads from stdin, writes to file - `papermill - -` reads from stdin, writes to stdout (for inspection) **Why this pattern:** 1. No intermediate `.ipynb` files cluttering the workspace 2. Single atomic operation - convert and execute together 3. Papermill handles parameters, logging, and error reporting 4. Works in CI/CD pipelines without temp file cleanup ### Debugging Runtime Errors After execution, use `notebook-debug` skill to inspect tracebacks in the output ipynb. ## Multi-Kernel Data Sharing Share data between Python/R/Stata/SAS via files: | Route | Format | Write | Read | |-------|--------|-------|------| | Python -> R | Parquet | `df.to_parquet()` | `arrow::read_parquet()` | | Python -> Stata | DTA | `df.to_stata()` | `use "file.dta"` | | Any -> Any | CSV | Native | Native | | SQL queries | DuckDB | Query parquet directly | Query parquet directly | ### Cross-Kernel Pipeline Pattern ``` Python (prep) -> Parquet -> R (stats) -> Parquet -> Python (report) | v Stata (.dta) -> Econometrics ``` ## Workflow Integration ### Git Pre-commit Hook Add the following to `.pre-commit-config.yaml`: ```yaml repos: - repo: https://github.com/mwouts/jupytext rev: v1.16.0 hooks: - id: jupytext args: [--sync] # Synchronize paired formats before commit ``` ### Version Control Strategy Choose one approach: - **Option A**: Commit only .py files (add `*.ipynb` to `.gitignore`) for minimal repository size - **Option B**: Commit both formats to give reviewers format choice ### Editor Integration Configure editors for automatic synchronization: - **VS Code**: Install Jupytext extension for automatic bidirectional sync - **JupyterLab**: Right-click notebook and select "Pair Notebook" for synchronization ## Project Structure Standard multi-kernel project layout: ``` project/ ├── jupytext.toml # Project-wide settings ├── environment.yml # Conda env with all kernels ├── notebooks/ │ ├── 01_python_prep.py # Python percent format │ ├── 02_r_analysis.R # R percent format │ └── 03_stata_models.do # Stata script ├── data/ │ ├── raw/ │ └── processed/ # Parquet/DTA interchange files └── results/ ``` ## Kernel Specification Specify kernel in file header: ```python # --- # jupyter: # kernelspec: # display_name: Python 3 # language: python # name: python3 # --- # %% [markdown] # # Python Analysis ``` ## Quick Troubleshooting | Issue | Solution | |-------|----------| | Sync conflict | Delete .ipynb, regenerate from .py | | Wrong kernel | Add kernelspec header to .py file | | Metadata noise | Set `notebook_metadata_filter = "-all"` | | Cell order lost | Use percent format (preserves structure) | ## Additional Resources ### Reference Files Detailed patterns and configurations: - **`references/formats.md`** - All format specifications (percent, light, sphinx, myst, rmd, quarto), cell metadata, configuration options - **`references/kernels.md`** - Kernel setup (IRkernel, xeus-r, stata_kernel, pystata, saspy), environment configuration, troubleshooting - **`references/data-sharing.md`** - Cross-kernel data sharing patterns (parquet, dta, csv, duckdb), full pipeline examples, validation patterns ### Example Files Working code in `examples/`: - **`examples/python_analysis.py`** - Python percent-format template with common patterns - **`examples/r_analysis.R`** - R percent-format template for statistical analysis - **`examples/cross_kernel_pipeline.py`** - Multi-kernel data sharing example ### Scripts Utility scripts in `scripts/`: - **`scripts/init_project.sh`** - Initialize jupytext project with standard structure - **`scripts/sync_all.sh`** - Sync all paired notebooks in project ## Best Practices 1. **Use percent format** - Best balance of readability and cell preservation 2. **Strip metadata for git** - Use metadata filters for cleaner diffs 3. **Use parquet for interchange** - Type-safe, cross-language compatible format 4. **Document kernel requirements** - Include in README or environment.yml 5. **Enable pre-commit hooks** - Ensure synchronization before commits