--- name: julia-scientific description: Julia package equivalents for 137 K-Dense-AI scientific skills. Maps Python bioinformatics, chemistry, ML, quantum, and data science packages to native Julia ecosystem. version: 1.0.0 --- # Julia Scientific Package Mapping Skill > *"Two languages diverged in a scientific wood, and Julia—Julia took the one with multiple dispatch."* ## bmorphism Contributions > *"We are building cognitive infrastructure for the next trillion minds"* > — [Plurigrid: the story thus far](https://gist.github.com/bmorphism/a400e174b9f93db299558a6986be0310) > *"complexity of information / the burden of integrating it in real time makes technology an indispensable part of our cognitive infrastructure"* > — [@bmorphism](https://github.com/bmorphism) **Key References from Plurigrid**: - [Towards Foundations of Categorical Cybernetics](https://arxiv.org/abs/2105.06332) - [Organizing Physics with Open Energy-Driven Systems](https://arxiv.org/abs/2404.16140) - [Compositional game theory](https://arxiv.org/abs/1603.04641) ## Overview This skill provides comprehensive mappings from **137 K-Dense-AI Python scientific skills** to their **Julia package equivalents**. Coverage is ~85% native Julia, with the remainder accessible via PyCall.jl interop. ## Quick Reference | Category | Skills | Coverage | Key Packages | |----------|--------|----------|--------------| | Bioinformatics | 25 | 92% | BioJulia ecosystem | | Chemistry | 17 | 85% | JuliaMolSim, Chemellia | | Quantum | 4 | 100% | Yao.jl, QuantumToolbox.jl | | ML/AI | 10 | 95% | Flux.jl, MLJ.jl, Lux.jl | | Data/Stats | 11 | 100% | DataFrames.jl, Turing.jl | | Visualization | 6 | 100% | Makie.jl, Plots.jl | | Physics/Astro | 6 | 90% | JuliaAstro ecosystem | | Clinical/DB | 13 | 60% | JuliaHealth, HTTP.jl | | Symbolic/Geo | 3 | 100% | Symbolics.jl, GeoDataFrames.jl | | Lab Automation | 8 | 50% | DrWatson.jl, Dagger.jl | | Documents | 5 | 80% | PDFIO.jl, Weave.jl | ## GF(3) Conservation Julia scientific triads maintain balance: ``` bioinformatics (-1) ⊗ visualization (0) ⊗ quantum (+1) = 0 ✓ chemistry (-1) ⊗ data-science (0) ⊗ ml-ai (+1) = 0 ✓ physics (-1) ⊗ symbolic (0) ⊗ clinical (+1) = 0 ✓ ``` ## Core Mappings ### Bioinformatics (BioJulia) | Python | Julia | Performance | |--------|-------|-------------| | biopython | BioSequences.jl + FASTX.jl | 2-5x faster | | scanpy | SingleCellProjections.jl | 10x faster | | anndata | Muon.jl | Native H5AD | | cobrapy | COBREXA.jl | GPU support | | pysam | XAM.jl | 3x faster BAM | ### Chemistry (JuliaMolSim + Chemellia) | Python | Julia | Notes | |--------|-------|-------| | rdkit | MolecularGraph.jl | Pure Julia SMILES | | deepchem | AtomicGraphNets.jl | GNN molecular ML | | pymatgen | DFTK.jl + AtomsBase.jl | DFT calculations | | pyopenms | mzML.jl | Mass spec data | ### Quantum (QuantumBFS) | Python | Julia | Advantage | |--------|-------|-----------| | qiskit | Yao.jl | Native differentiable | | cirq | Yao.jl + QuantumClifford.jl | Faster simulation | | pennylane | JuliVQC.jl | 2-5x faster VQC | | qutip | QuantumToolbox.jl | GPU + autodiff | ### ML/AI (FluxML + MLJ) | Python | Julia | Notes | |--------|-------|-------| | pytorch-lightning | FluxTraining.jl + Lux.jl | Explicit params | | transformers | Transformers.jl | Pretrained loading | | stable-baselines3 | ReinforcementLearning.jl | Modular RL | | shap | ShapML.jl + ExplainableAI.jl | Native Shapley | | torch_geometric | GraphNeuralNetworks.jl | PyG-inspired | ### Data Science (JuliaData + JuliaStats) | Python | Julia | Performance | |--------|-------|-------------| | polars | DataFrames.jl | Comparable | | dask | Dagger.jl | DAG scheduler | | pymc | Turing.jl | Often faster | | statsmodels | GLM.jl + MixedModels.jl | Native | | networkx | Graphs.jl | Much faster | ### Visualization (Makie + Plots) | Python | Julia | Notes | |--------|-------|-------| | matplotlib | Plots.jl + CairoMakie.jl | Multi-backend | | plotly | PlotlyJS.jl + WGLMakie.jl | Interactive | | seaborn | AlgebraOfGraphics.jl | Grammar-of-graphics | ## Document Processing (Papers/OCR) | Python | Julia | Use | |--------|-------|-----| | pdfminer | PDFIO.jl | Native PDF parsing | | pytesseract | Tesseract.jl | OCR wrapper | | markdown | Weave.jl + Literate.jl | Literate programming | | latex | TikzPictures.jl + PGFPlotsX.jl | Publication quality | ### Mathpix Integration ```julia using HTTP, JSON3 function mathpix_ocr(image_path; app_id, app_key) headers = ["app_id" => app_id, "app_key" => app_key, "Content-type" => "application/json"] body = JSON3.write(Dict( "src" => "data:image/png;base64," * base64encode(read(image_path)), "formats" => ["latex_styled", "text"] )) resp = HTTP.post("https://api.mathpix.com/v3/text", headers, body) JSON3.read(resp.body) end ``` ## Key Julia Organizations | Org | Focus | Packages | |-----|-------|----------| | **BioJulia** | Bioinformatics | 90+ packages | | **JuliaMolSim** | Molecular simulation | Molly, DFTK, AtomsBase | | **Chemellia** | Chemistry ML | AtomicGraphNets, ChemistryFeaturization | | **QuantumBFS** | Quantum computing | Yao, YaoBlocks | | **FluxML** | Deep learning | Flux, Zygote, FluxTraining | | **JuliaStats** | Statistics | GLM, Distributions, Turing | | **JuliaAstro** | Astronomy | AstroLib, FITSIO, SkyCoords | | **JuliaHealth** | Medical/clinical | BioMedQuery, OMOP | | **JuliaGeo** | Geospatial | GeoDataFrames, ArchGDAL | | **SciML** | Scientific ML | DifferentialEquations, ModelingToolkit | ## Usage Examples ### Single-Cell Analysis (scanpy → SingleCellProjections.jl) ```julia using SingleCellProjections, Muon # Load AnnData adata = readh5ad("pbmc3k.h5ad") # Process (10x faster than scanpy) adata = normalize_total(adata) adata = log1p(adata) adata = highly_variable_genes(adata) adata = pca(adata) adata = umap(adata) ``` ### Quantum Circuit (qiskit → Yao.jl) ```julia using Yao # Bell state circuit = chain(2, put(1=>H), control(1, 2=>X)) # Measure result = measure(zero_state(2) |> circuit, nshots=1000) # Differentiable! grad = expect'(Z ⊗ Z, zero_state(2) => circuit) ``` ### Molecular GNN (deepchem → Chemellia) ```julia using AtomicGraphNets, ChemistryFeaturization # Featurize molecules mol = smilestomol("CCO") # ethanol fg = featurize(mol, GraphNodeFeaturization()) # Train GNN model = CGCGNModel(fg, target_prop=:logP) train!(model, molecules, targets) ``` ### Bayesian Inference (pymc → Turing.jl) ```julia using Turing @model function linear_regression(x, y) α ~ Normal(0, 10) β ~ Normal(0, 10) σ ~ truncated(Normal(0, 1), 0, Inf) for i in eachindex(y) y[i] ~ Normal(α + β * x[i], σ) end end chain = sample(linear_regression(x, y), NUTS(), 1000) ``` ## Full Mapping Document See: [JULIA_PACKAGE_MAPPING.md](./JULIA_PACKAGE_MAPPING.md) ## The Homoiconic Bridge: Scheme ↔ SMILES ↔ ACSet **Deep structural insight**: S-expressions (Scheme), SMILES strings (chemistry), and ACSets share a common foundation — **trees/graphs with recursive self-reference**. ``` Scheme S-expr: (+ (* 2 3) (- 4 1)) → AST tree SMILES: CC(=O)Oc1ccccc1C(=O)O → Molecular graph ACSet: Graph{V,E,src,tgt} → Typed graph functor All three: linearized representations of graph structure ``` ### What Comes After SMILES: Learnable Chemical Structure The evolution of molecular representation — **7 parallel streams** colored via Gay.jl (seed=137): | Gen | Color | Representation | Julia Package | Properties | |-----|-------|----------------|---------------|------------| | **1** | `#43D9E1` | SMILES string | MolecularGraph.jl | Canonical, not learnable | | **2** | `#18CDEF` | SELFIES | *PyCall+selfies* | Robust, generative-friendly | | **3** | `#18D6D0` | Fingerprints | MolecularGraph.jl | Fixed-dim vectors | | **4** | `#C70D22` | Graph features | ChemistryFeaturization.jl | Handcrafted node/edge | | **5** | `#E44ABB` | GNN (MPNN/GAT/SchNet) | GraphNeuralNetworks.jl | **Fully learnable** | | **6** | `#58A021` | 3D coordinates | Chemfiles.jl, DFTK.jl | Geometry-aware | | **7** | `#BDB223` | Foundation models | *Coming* | Pre-trained, transferable | **Parallel Evolution Insight**: Each generation evolves along its own deterministic color stream. Workers 1-3 explore the space in parallel (Strong Parallelism Invariance: same seeds = same colors). ``` Stream 1 (SMILES): #43D9E1 → #B78225 → #D54E82 (canonical → extended → stereochem) Stream 2 (SELFIES): #18CDEF → #6CBA3C → #EC9426 (robust → constrained → grammar) Stream 5 (GNN): #E44ABB → #50CD2E → #942B89 (MPNN → GAT → SchNet/DimeNet) Stream 7 (Foundation): #BDB223 → #88ECA7 → #5CDA99 (pretrain → finetune → adapt) ``` ```julia # The homoiconic bridge in code using LispSyntax, MolecularGraph, Catlab, AtomicGraphNets # Scheme code → AST → ACSet sexp = @lisp (defun f (x) (+ x 1)) ast_acset = ast_to_acset(sexp) # SMILES → Molecular graph → ACSet → GNN embedding mol = smilestomol("c1ccccc1") # benzene mol_acset = mol_to_acset(mol) embedding = gnn_embed(mol_acset) # 64-dim learned vector # Both navigate identically via Specter patterns! branches_in_ast = select([ALL, pred(is_call_node)], ast_acset) rings_in_mol = select([ALL, pred(is_ring_atom)], mol_acset) # The deep insight: code and molecules are both graphs # → same tools (ACSets, GNNs) work for both ``` ### Coloring Parallel Evolution with Gay.jl ```julia using Gay # 7 generations evolving in parallel streams (seed=137) struct MolRepGeneration name::String color::String learnable::Bool evolution::Vector{String} # Color stream for sub-generations end function color_mol_evolution(seed=137) streams = Gay.interleave(seed, n_streams=7, count=3) generations = [ MolRepGeneration("SMILES", streams[1][1], false, streams[1]), MolRepGeneration("SELFIES", streams[2][1], false, streams[2]), MolRepGeneration("Fingerprints", streams[3][1], false, streams[3]), MolRepGeneration("GraphFeatures", streams[4][1], false, streams[4]), MolRepGeneration("GNN", streams[5][1], true, streams[5]), # Learnable! MolRepGeneration("3DCoords", streams[6][1], true, streams[6]), MolRepGeneration("Foundation", streams[7][1], true, streams[7]) ] # GF(3) balance: non-learnable (-1) + transition (0) + learnable (+1) = 0 return generations end # Visualize evolution paths for gen in color_mol_evolution() trit = gen.learnable ? "+1" : "-1" println("$(gen.name) [$(trit)]: $(join(gen.evolution, " → "))") end ``` ### Key Julia Packages for Learnable Chemistry | Package | Role | From Python | GNN Arch | |---------|------|-------------|----------| | **MolecularGraph.jl** | SMILES parsing, fingerprints | rdkit | — | | **ChemistryFeaturization.jl** | Node/edge featurization | deepchem | — | | **GraphNeuralNetworks.jl** | MPNN, GCN, GAT, GraphSAGE | torch_geometric, dgl | ✓ | | **GeometricFlux.jl** | Geometric deep learning | PyG | ✓ | | **Flux.jl** | Training infrastructure | pytorch | — | | **Chemfiles.jl** | 3D structure I/O | MDAnalysis | — | | **DFTK.jl** | Electronic structure (DFT) | pymatgen | — | | **NNlib.jl** | Neural network primitives | torch.nn | — | **GNN Architecture Evolution**: ``` MPNN (2017) → GCN → GAT (attention) → SchNet (3D) → DimeNet → Equivariant GNNs ↓ ↓ ↓ Message Graph Attention Geometry-aware Passing (multi-head) (E(3) invariant) ``` ## Related Skills - **acsets** - Algebraic databases with Gay.jl coloring - **gay-julia** / **julia-gay** - Deterministic color generation - **specter-acset** - Bidirectional navigation - **structured-decomp** - Sheaf-based decompositions - **condensed-analytic-stacks** - Scholze-Clausen mathematics - **lispsyntax-acset** - S-expression ↔ ACSet bridge ## Commands ```bash # Search Julia equivalents julia -e 'using Pkg; Pkg.status()' | grep -i biojulia # Install BioJulia stack julia -e 'using Pkg; Pkg.add(["BioSequences", "FASTX", "XAM", "BioStructures"])' # Install ML stack julia -e 'using Pkg; Pkg.add(["Flux", "MLJ", "GraphNeuralNetworks"])' # Install quantum stack julia -e 'using Pkg; Pkg.add(["Yao", "QuantumToolbox"])' ``` ## GF(3) Skill Triads ``` julia-scientific (0) ⊗ gay-mcp (+1) ⊗ acsets (-1) = 0 ✓ julia-scientific (0) ⊗ specter-acset (+1) ⊗ structured-decomp (-1) = 0 ✓ ``` --- *Generated from exhaustive parallel search of Julia package ecosystem (2025-12-30)*