---
name: "mofaplus-multi-omics"
description: "Multi-Omics Factor Analysis v2 (MOFA+) with mofapy2. Jointly decompose omics layers (scRNA, ATAC, proteomics, methylation) into latent factors capturing major variation. Multi-group designs. AnnData views → MOFA object → train → variance explained → correlate factors with metadata → visualize/cluster → enrich top loadings."
license: "LGPL-3.0"
---

# MOFA+ Multi-Omics Factor Analysis

## Overview

MOFA+ (Multi-Omics Factor Analysis v2) is an unsupervised statistical framework that jointly decomposes multiple omics datasets into a small set of latent factors. Each factor captures an independent source of variation (e.g., cell cycle, a disease phenotype, a technical batch) and is associated with feature weights (loadings) that reveal which genes, peaks, or proteins drive it. The Python package `mofapy2` produces an HDF5 model file compatible with downstream analysis in both Python and R. MOFA+ extends the original MOFA to support multi-group settings where samples belong to distinct cohorts or conditions.

## When to Use

- Integrating two or more omics layers from the same set of cells or samples (e.g., scRNA-seq + scATAC-seq, RNA + proteomics, methylation + RNA)
- Identifying shared and view-specific sources of variation across omics modalities without supervised labels
- Comparing how latent factors differ between patient groups, treatment conditions, or time points in a multi-group analysis
- Reducing multi-omics dimensionality before clustering, trajectory inference, or survival modeling
- Discovering which genomic features (genes, peaks, proteins) drive each factor via sparse loadings
- Annotating latent factors by correlating factor scores with sample metadata (age, stage, treatment response)
- Use **scVI / MultiVI** (scverse) instead when you need deep generative batch correction across modalities with explicit latent space inference and VAE architecture
- Use **LIGER** instead when your primary goal is integrating datasets across technologies (e.g., snRNA-seq + snATAC-seq) with shared and dataset-specific factors via iNMF

## Prerequisites

- **Python packages**: `mofapy2>=0.7`, `anndata>=0.10`, `numpy>=1.24`, `pandas>=2.0`, `scipy>=1.11`, `matplotlib>=3.7`, `seaborn>=0.13`, `muon` (optional, for MuData integration)
- **Data requirements**: One AnnData object per omics view (cells/samples x features). All views must share the same obs (cell/sample) index. Missing samples per group are supported.
- **Environment**: Python 3.9+; trained model is saved as HDF5 and can be analyzed in R via the `MOFA2` Bioconductor package

```bash
pip install mofapy2 anndata muon matplotlib seaborn
```

## Quick Start

```python
import numpy as np
import pandas as pd
import anndata as ad
from mofapy2.run.entry_point import entry_point

# Simulate two omics views, 200 cells, 500 RNA genes, 300 ATAC peaks
np.random.seed(42)
n_cells, n_rna, n_atac = 200, 500, 300

adata_rna  = ad.AnnData(np.abs(np.random.randn(n_cells, n_rna)),
                         obs=pd.DataFrame(index=[f"cell_{i}" for i in range(n_cells)]))
adata_atac = ad.AnnData(np.abs(np.random.randn(n_cells, n_atac)),
                         obs=adata_rna.obs.copy())

ent = entry_point()
ent.set_data_options(scale_groups=False, scale_views=True)
ent.set_data_matrix([[adata_rna.X, adata_atac.X]],
                    likelihoods=["gaussian", "gaussian"],
                    views_names=["RNA", "ATAC"],
                    groups_names=["all_cells"],
                    samples_names=[list(adata_rna.obs_names)])
ent.set_model_options(factors=10)
ent.set_train_options(iter=500, convergence_mode="fast", seed=42)
ent.build()
ent.run()
ent.save("mofa_model.hdf5")
print("Model saved to mofa_model.hdf5")
```

## Workflow

### Step 1: Load and Prepare Multi-Omics Data

Each omics layer is represented as an AnnData object. Align cell indices across modalities, log-normalize RNA counts, and binarize or normalize ATAC/methylation data as appropriate.

```python
import numpy as np
import pandas as pd
import anndata as ad
import scipy.sparse as sp

# --- RNA-seq: 200 cells x 2000 highly variable genes ---
np.random.seed(42)
n_cells = 200
cell_ids = [f"cell_{i:03d}" for i in range(n_cells)]

# Simulate log-normalized counts (in practice: load from h5ad after Scanpy preprocessing)
rna_counts = np.abs(np.random.randn(n_cells, 2000) * 2)
adata_rna = ad.AnnData(
    X=rna_counts,
    obs=pd.DataFrame(
        {"condition": ["A"] * 100 + ["B"] * 100,
         "patient": [f"P{i % 10}" for i in range(n_cells)]},
        index=cell_ids
    ),
    var=pd.DataFrame(index=[f"Gene_{i}" for i in range(2000)])
)

# --- ATAC-seq: same cells x 1000 peaks ---
atac_matrix = (np.random.rand(n_cells, 1000) > 0.8).astype(float)
adata_atac = ad.AnnData(
    X=atac_matrix,
    obs=adata_rna.obs.copy(),
    var=pd.DataFrame(index=[f"Peak_{i}" for i in range(1000)])
)

# Confirm alignment
assert list(adata_rna.obs_names) == list(adata_atac.obs_names), "Cell indices must match"
print(f"RNA: {adata_rna.shape}, ATAC: {adata_atac.shape}")
print(f"Conditions: {adata_rna.obs['condition'].value_counts().to_dict()}")
```

### Step 2: Create the MOFA+ Model Object

Instantiate the `entry_point` and register all data views. Views are provided as a list-of-lists: `data[groups][views]`. Assign meaningful view and group names for interpretability.

```python
from mofapy2.run.entry_point import entry_point

ent = entry_point()

# Configure data options before setting data
ent.set_data_options(
    scale_groups=False,   # Do not rescale variance between groups
    scale_views=True,     # Rescale each view to unit variance (recommended when views differ in scale)
)

# Provide data as list-of-lists: [groups][views]
# Single group → wrap each view in a list
ent.set_data_matrix(
    data=[[adata_rna.X, adata_atac.X]],       # outer list = groups, inner = views
    likelihoods=["gaussian", "bernoulli"],      # gaussian for continuous, bernoulli for binary ATAC
    views_names=["RNA", "ATAC"],
    groups_names=["all_cells"],
    samples_names=[list(adata_rna.obs_names)]  # one list per group
)

print("Data registered. Views: RNA, ATAC | Groups: all_cells")
```

### Step 3: Set Model and Training Options

Configure the number of factors and training hyperparameters. More factors capture finer variation but increase computation and risk overfitting; `convergence_mode="medium"` balances speed and accuracy.

```python
# Model options
ent.set_model_options(
    factors=15,          # Number of latent factors (start with 15; prune inactive ones automatically)
    spikeslab_weights=True,  # Sparse weight prior (ARD+spike-slab); recommended for feature selection
    ard_factors=True,    # Automatic relevance determination per factor per view
    ard_weights=True,    # ARD per feature weight; enables pruning of irrelevant features
)

# Training options
ent.set_train_options(
    iter=1000,                     # Maximum EM iterations
    convergence_mode="medium",     # "fast" (<1000 iter), "medium" (default), "slow" (>5000 iter)
    startELBO=1,                   # Start computing ELBO from iteration 1
    freqELBO=5,                    # Compute ELBO every 5 iterations
    dropR2=0.01,                   # Drop factors explaining < 1% variance (set to None to disable)
    seed=42,
    verbose=False
)

print("Model and training options set")
```

### Step 4: Build and Train the Model

Build the internal data structures, then run variational inference. Training produces a fitted model where each factor's weights and scores are optimized to maximize the evidence lower bound (ELBO).

```python
# Build internal model structure
ent.build()

# Run training (EM algorithm with variational Bayes updates)
ent.run()

# Save trained model to HDF5 — required for downstream analysis
output_path = "mofa_model.hdf5"
ent.save(output_path, overwrite=True)
print(f"Model trained and saved to {output_path}")
```

### Step 5: Load Trained Model and Inspect Variance Explained

Load the HDF5 model and inspect how much variance each factor explains per view. Factors explaining less than ~1-2% total variance are typically noise.

```python
import h5py
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def load_mofa_r2(model_path):
    """Extract variance explained (R2) per factor per view from MOFA+ HDF5."""
    with h5py.File(model_path, "r") as f:
        r2 = f["variance_explained"]["r2_per_factor"]
        views = [v.decode() for v in f["views"]["views"][:]]
        groups = [g.decode() for g in f["groups"]["groups"][:]]

        # r2_per_factor: shape (n_groups, n_views, n_factors)
        r2_array = np.stack([r2[g][:] for g in groups], axis=0)  # (groups, views, factors)

    # Average across groups; shape → (n_views, n_factors)
    r2_mean = r2_array.mean(axis=0)
    n_factors = r2_mean.shape[1]
    df = pd.DataFrame(r2_mean * 100,
                      index=views,
                      columns=[f"Factor{i+1}" for i in range(n_factors)])
    return df

r2_df = load_mofa_r2("mofa_model.hdf5")
print("Variance explained (%) per factor per view:")
print(r2_df.round(2))

# Heatmap of variance explained
fig, ax = plt.subplots(figsize=(max(8, r2_df.shape[1] * 0.6), 3))
sns.heatmap(r2_df, annot=True, fmt=".1f", cmap="YlOrRd",
            linewidths=0.5, ax=ax, vmin=0)
ax.set_title("MOFA+ Variance Explained (%) per Factor per View")
ax.set_xlabel("Factor")
ax.set_ylabel("Omics View")
plt.tight_layout()
plt.savefig("mofa_variance_explained.png", dpi=200)
print("Saved mofa_variance_explained.png")
```

### Step 6: Extract Factor Scores and Correlate with Metadata

Factor scores (Z matrix) are the per-sample coordinates in factor space. Correlate scores with continuous or categorical metadata to biologically annotate each factor.

```python
import scipy.stats as stats

def load_mofa_factors(model_path):
    """Load factor scores (Z) from MOFA+ HDF5. Returns DataFrame (samples x factors)."""
    with h5py.File(model_path, "r") as f:
        groups = [g.decode() for g in f["groups"]["groups"][:]]
        factors_list = []
        for g in groups:
            z = f["expectations"]["Z"][g][:]   # shape: (n_factors, n_samples)
            samples = [s.decode() for s in f["samples"][g][:]]
            n_factors = z.shape[0]
            df = pd.DataFrame(z.T, index=samples,
                              columns=[f"Factor{i+1}" for i in range(n_factors)])
            factors_list.append(df)
    return pd.concat(factors_list, axis=0)

factors_df = load_mofa_factors("mofa_model.hdf5")
print(f"Factor scores: {factors_df.shape} (cells x factors)")

# Merge with metadata
meta = adata_rna.obs[["condition", "patient"]].copy()
factors_meta = factors_df.join(meta)

# Point-biserial correlation: factor score vs binary metadata
factor_cols = [c for c in factors_meta.columns if c.startswith("Factor")]
print("\nFactor–Condition correlation (eta-squared approximation):")
for fc in factor_cols[:5]:
    groups_vals = [factors_meta.loc[factors_meta["condition"] == g, fc].values
                   for g in factors_meta["condition"].unique()]
    stat, pval = stats.f_oneway(*groups_vals)
    print(f"  {fc}: F={stat:.2f}, p={pval:.3f}")
```

### Step 7: Visualize Factors — Scatter Plots and Feature Heatmaps

Plot factor score scatter plots colored by metadata, and heatmaps of the top-weighted features (loadings) per factor to understand what each factor captures.

```python
def load_mofa_weights(model_path, view_name):
    """Load feature weights (W) for a specific view. Returns DataFrame (features x factors)."""
    with h5py.File(model_path, "r") as f:
        views = [v.decode() for v in f["views"]["views"][:]]
        view_idx = views.index(view_name)
        # Weights stored per view as (n_factors, n_features)
        w = f["expectations"]["W"][view_name][:]
        features = [ft.decode() for ft in f["features"][view_name][:]]
        n_factors = w.shape[0]
        df = pd.DataFrame(w.T, index=features,
                          columns=[f"Factor{i+1}" for i in range(n_factors)])
    return df

# --- Factor scatter plot ---
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

for ax, (fx, fy) in zip(axes, [("Factor1", "Factor2"), ("Factor1", "Factor3")]):
    for cond, grp in factors_meta.groupby("condition"):
        ax.scatter(grp[fx], grp[fy], label=cond, alpha=0.6, s=20)
    ax.set_xlabel(fx)
    ax.set_ylabel(fy)
    ax.legend(title="Condition")
    ax.set_title(f"{fx} vs {fy}")

plt.tight_layout()
plt.savefig("mofa_factor_scatter.png", dpi=200)
print("Saved mofa_factor_scatter.png")

# --- Top-loading heatmap for RNA view ---
weights_rna = load_mofa_weights("mofa_model.hdf5", "RNA")
top_n = 20
top_features = []
for fc in [f"Factor{i+1}" for i in range(min(5, weights_rna.shape[1]))]:
    top_pos = weights_rna[fc].nlargest(top_n // 2).index.tolist()
    top_neg = weights_rna[fc].nsmallest(top_n // 2).index.tolist()
    top_features.extend(top_pos + top_neg)
top_features = list(dict.fromkeys(top_features))  # unique, preserve order

fig, ax = plt.subplots(figsize=(8, max(6, len(top_features) * 0.3)))
sns.heatmap(weights_rna.loc[top_features, [f"Factor{i+1}" for i in range(min(5, weights_rna.shape[1]))]],
            cmap="RdBu_r", center=0, ax=ax, yticklabels=True)
ax.set_title("Top RNA Feature Weights per Factor")
plt.tight_layout()
plt.savefig("mofa_rna_weights_heatmap.png", dpi=200)
print("Saved mofa_rna_weights_heatmap.png")
```

### Step 8: Downstream — Cluster Cells by Factor Scores and Enrichment

Use factor scores as a low-dimensional embedding for clustering, and extract top-weighted genes per factor for pathway enrichment.

```python
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import warnings

# --- Cluster cells using factor scores ---
factor_cols = [c for c in factors_df.columns if c.startswith("Factor")]
X_factors = StandardScaler().fit_transform(factors_df[factor_cols].values)

kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    cluster_labels = kmeans.fit_predict(X_factors)

factors_df["mofa_cluster"] = cluster_labels.astype(str)
print(f"K-means clusters (k=4):\n{pd.Series(cluster_labels).value_counts().sort_index()}")

# --- Extract top-weighted genes per factor for enrichment input ---
weights_rna = load_mofa_weights("mofa_model.hdf5", "RNA")

enrichment_inputs = {}
for fc in [f"Factor{i+1}" for i in range(min(5, weights_rna.shape[1]))]:
    # Positive weights: top activating genes; negative: top repressing genes
    top_pos = weights_rna[fc].nlargest(100).index.tolist()
    top_neg = weights_rna[fc].nsmallest(100).index.tolist()
    enrichment_inputs[f"{fc}_positive"] = top_pos
    enrichment_inputs[f"{fc}_negative"] = top_neg

# Save gene lists for external enrichment (e.g., gseapy Enrichr)
for name, genes in list(enrichment_inputs.items())[:2]:
    print(f"\n{name} — top 10: {genes[:10]}")

# Example: run ORA with gseapy (install separately)
# import gseapy
# enr = gseapy.enrichr(gene_list=enrichment_inputs["Factor1_positive"],
#                       gene_sets="GO_Biological_Process_2023",
#                       organism="human", outdir=None)
# print(enr.results.head(5)[["Term", "Adjusted P-value", "Genes"]])

factors_df.to_csv("mofa_factor_scores.csv")
print("\nFactor scores with cluster labels saved to mofa_factor_scores.csv")
```

## Key Parameters

| Parameter | Default | Range / Options | Effect |
|-----------|---------|-----------------|--------|
| `factors` | `15` | `5`–`50` | Number of latent factors to infer; inactive ones pruned by ARD |
| `likelihoods` | (required) | `"gaussian"`, `"bernoulli"`, `"poisson"` | Per-view likelihood; gaussian for normalized continuous, bernoulli for binary ATAC, poisson for raw counts |
| `scale_views` | `True` | `True`/`False` | Rescale each view to unit variance; recommended when views differ in scale or unit |
| `scale_groups` | `False` | `True`/`False` | Rescale variance across groups; set `True` if groups have very different total variances |
| `spikeslab_weights` | `True` | `True`/`False` | Spike-and-slab sparsity prior on weights; enables feature selection via near-zero weights |
| `ard_factors` | `True` | `True`/`False` | Automatic relevance determination per factor per view; prunes factors not used in a view |
| `iter` | `1000` | `200`–`5000` | Maximum EM iterations; convergence usually reached in 200–800 |
| `convergence_mode` | `"medium"` | `"fast"`, `"medium"`, `"slow"` | ELBO convergence tolerance: fast = 1e-4, medium = 1e-6, slow = 1e-8 |
| `dropR2` | `0.01` | `None`, `0.001`–`0.05` | Drop factors explaining less than this fraction of variance; `None` keeps all |
| `startELBO` | `1` | `1`–`100` | Iteration to start ELBO monitoring; set higher to skip initial instability |

## Key Concepts

### Latent Factors and Loadings

Each latent factor Z_k (a vector of length n_samples) represents a source of variation. The corresponding weight matrix W_k (a vector of length n_features per view) contains the loading of each feature on that factor. Factors with spike-and-slab priors produce sparse loadings: most weights are shrunk to near zero, leaving a small set of features that meaningfully drive the factor. A positive weight means higher factor score correlates with higher feature expression.

### Multi-Group vs Single-Group

A "group" in MOFA+ is a set of samples that share the same factor weight matrices W but have independent factor score distributions Z. Use multi-group analysis when:
- Samples come from distinct cohorts with batch-level differences (patients, datasets)
- You want to compare how much each factor explains within vs between groups
- You expect the same biological programs to operate differently across conditions

Single-group analysis (all samples in one group) is appropriate when samples are from a single experiment with no major batch structure.

### Variance Explained Heatmap Interpretation

The R2 heatmap (views x factors) is the primary diagnostic output. Factors should show:
- View-specific R2: a factor driving only RNA captures transcriptional programs; one driving both RNA and ATAC captures chromatin-accessibility-coupled gene expression
- Declining R2: Factor 1 explains the most variance; factors should be inspected in order
- Factors with <1% R2 in all views can generally be ignored

## Common Recipes

### Recipe: Multi-Group Analysis Across Conditions

Use when comparing two or more patient groups or experimental conditions, where you want to identify condition-specific vs shared factors.

```python
from mofapy2.run.entry_point import entry_point
import numpy as np
import pandas as pd
import anndata as ad

# Simulate two groups: condition A (100 cells) and condition B (100 cells)
np.random.seed(0)
n_per_group, n_genes, n_peaks = 100, 1000, 500
groups = {"condA": {}, "condB": {}}

for g in groups:
    groups[g]["rna"] = np.abs(np.random.randn(n_per_group, n_genes))
    groups[g]["atac"] = (np.random.rand(n_per_group, n_peaks) > 0.75).astype(float)

sample_ids_A = [f"A_cell_{i}" for i in range(n_per_group)]
sample_ids_B = [f"B_cell_{i}" for i in range(n_per_group)]

ent = entry_point()
ent.set_data_options(scale_groups=False, scale_views=True)

# Multi-group: data[groups][views] — two groups, each with RNA and ATAC
ent.set_data_matrix(
    data=[[groups["condA"]["rna"], groups["condA"]["atac"]],
          [groups["condB"]["rna"], groups["condB"]["atac"]]],
    likelihoods=["gaussian", "bernoulli"],
    views_names=["RNA", "ATAC"],
    groups_names=["condA", "condB"],
    samples_names=[sample_ids_A, sample_ids_B]
)

ent.set_model_options(factors=10, spikeslab_weights=True, ard_factors=True)
ent.set_train_options(iter=500, convergence_mode="fast", seed=0, verbose=False)
ent.build()
ent.run()
ent.save("mofa_multigroup.hdf5", overwrite=True)

# Compare factor scores between groups
factors_all = load_mofa_factors("mofa_multigroup.hdf5")
factors_all["group"] = ["condA"] * n_per_group + ["condB"] * n_per_group
print(f"Multi-group model trained. Factor scores: {factors_all.shape}")
print(factors_all.groupby("group")[["Factor1", "Factor2"]].mean().round(3))
```

### Recipe: Identify and Annotate Factors by Top-Weighted Genes

Retrieve the top positive and negative loading genes per factor and print a summary table for biological annotation.

```python
import pandas as pd

def annotate_factors(model_path, view_name="RNA", top_n=20, n_factors=5):
    """
    Summarize top-loading features per factor to assist biological annotation.

    Returns a DataFrame with factor names, top positive genes, top negative genes,
    and the absolute weight range (an activity proxy).
    """
    weights = load_mofa_weights(model_path, view_name)
    factor_cols = [f"Factor{i+1}" for i in range(min(n_factors, weights.shape[1]))]

    rows = []
    for fc in factor_cols:
        w = weights[fc]
        top_pos = w.nlargest(top_n).index.tolist()
        top_neg = w.nsmallest(top_n).index.tolist()
        rows.append({
            "Factor": fc,
            "Max_weight": round(w.max(), 4),
            "Min_weight": round(w.min(), 4),
            "Top_positive": ", ".join(top_pos[:5]),
            "Top_negative": ", ".join(top_neg[:5]),
        })

    summary = pd.DataFrame(rows)
    return summary

summary_df = annotate_factors("mofa_model.hdf5", view_name="RNA", top_n=20, n_factors=5)
print("\nFactor annotation summary (RNA view):")
print(summary_df.to_string(index=False))
summary_df.to_csv("mofa_factor_annotation.csv", index=False)
print("\nSaved mofa_factor_annotation.csv")
```

## Expected Outputs

| Output | Description |
|--------|-------------|
| `mofa_model.hdf5` | Trained MOFA+ model — factor scores, weights, ELBO trace, variance explained |
| `mofa_variance_explained.png` | Heatmap of R2 (%) per factor per view; primary diagnostic for factor selection |
| `mofa_factor_scatter.png` | Scatter plots of Factor1 vs Factor2/3 colored by metadata (condition, patient) |
| `mofa_rna_weights_heatmap.png` | Heatmap of top RNA feature weights across the first 5 factors |
| `mofa_factor_scores.csv` | Table of per-cell factor scores (cells x factors) with cluster labels |
| `mofa_factor_annotation.csv` | Factor annotation table: top positive/negative genes per factor |
| Per-factor gene lists | Input for gseapy Enrichr or GSEA to identify enriched pathways per factor |

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| `KeyError` in `set_data_matrix` | Mismatched number of groups, views, or sample list dimensions | Ensure `data`, `likelihoods`, `views_names`, `groups_names`, `samples_names` all have consistent lengths |
| All factors show near-zero variance explained | Data not preprocessed or scale mismatch across views | Normalize each view before input; set `scale_views=True`; verify non-zero variance in input matrices |
| Model trains but factor scores are NaN | Convergence failure due to extreme values or near-singular data | Check for Inf/NaN in input matrices; reduce `iter`; try `convergence_mode="fast"` first |
| Too many factors pruned (only 1-2 remain) | `dropR2` threshold too aggressive or insufficient variation in data | Set `dropR2=0.001` or `dropR2=None`; increase data diversity or reduce noise |
| `HDF5` file cannot be read | File truncated due to crash during training | Re-run training; check disk space; use `overwrite=True` in `ent.save()` |
| Factor scores identical across all samples | Single-sample group or zero-variance input view | Confirm at least 2 distinct samples per group; check input matrix is not all zeros |
| Very slow training (>1 hr) | Large feature space (>10k features per view) or many factors | Pre-filter to top HVGs (2000-5000) per view; reduce factors to 10-15; enable `verbose=False` |
| ELBO not converging (oscillates) | Learning rate instability or poorly scaled data | Increase `startELBO`; standardize each view independently; use `convergence_mode="slow"` |
| Weights all near zero for one view | Bernoulli likelihood on continuous data or vice versa | Verify `likelihoods` list matches view data types; use `"gaussian"` for normalized RNA |
| `ModuleNotFoundError: mofapy2` | Package not installed | `pip install mofapy2` |

## References

- [MOFA2 GitHub repository](https://github.com/bioFAM/MOFA2) — Source code for mofapy2 (Python) and MOFA2 R package
- [MOFA2 documentation and tutorials](https://biofam.github.io/MOFA2/) — Official documentation with vignettes for Python and R
- [Argelaguet et al. (2020)](https://doi.org/10.1186/s13059-020-02015-1) — "MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data", *Genome Biology*
- [Argelaguet et al. (2018)](https://doi.org/10.15252/msb.20178124) — Original MOFA paper, *Molecular Systems Biology*
- [mofapy2 PyPI package](https://pypi.org/project/mofapy2/) — Installation and version history