---
name: "mdanalysis-trajectory"
description: "Analyze MD trajectories from GROMACS, AMBER, NAMD, CHARMM, LAMMPS. Reads topology/trajectory into Universe objects; supports RMSD, RMSF, radius of gyration, contact maps, H-bonds, PCA, and custom distance/angle calculations. Use for post-simulation structural analysis; use OpenMM/GROMACS for running simulations."
license: "GPL-2.0"
---

# MDAnalysis — Molecular Dynamics Trajectory Analysis

## Overview

MDAnalysis provides a uniform Python interface for reading and analyzing molecular dynamics trajectories regardless of MD engine (GROMACS, AMBER, NAMD, CHARMM, LAMMPS, OpenMM). It represents molecular systems as `Universe` objects containing an `AtomGroup` with positions, velocities, forces, and topology data. Trajectories are iterated frame-by-frame or analyzed in bulk using analysis modules for RMSD, RMSF, radius of gyration, hydrogen bonds, solvent-accessible surface area, and PCA. MDAnalysis integrates with NumPy, pandas, and matplotlib, making it the standard tool for post-simulation structural analysis in computational chemistry and drug discovery.

## When to Use

- Computing RMSD and RMSF of protein backbone or specific residue groups after MD simulation
- Analyzing ligand binding stability: pocket RMSD, contact persistence, hydrogen bond occupancy
- Performing principal component analysis (PCA) on trajectory conformations
- Computing solvent-accessible surface area (SASA), radius of gyration, and end-to-end distance
- Extracting representative cluster structures from long MD trajectories for visualization
- Use **GROMACS** or **AMBER** analysis tools (`gmx rms`, `cpptraj`) instead for engine-specific analysis within a HPC pipeline
- Use **OpenMM** or **GROMACS** directly for running MD simulations; MDAnalysis is for post-simulation analysis

## Prerequisites

- **Python packages**: `MDAnalysis`, `numpy`, `matplotlib`, `pandas`
- **Input**: topology file (.psf, .prmtop, .gro, .pdb) + trajectory file (.dcd, .trr, .xtc, .nc, .dms)

```bash
# Install MDAnalysis
pip install MDAnalysis

# Install with all analysis extras
pip install "MDAnalysis[analysis]"

# Verify
python -c "import MDAnalysis as mda; print(mda.__version__)"
# 2.7.0
```

## Quick Start

```python
import MDAnalysis as mda
import numpy as np

# Load a GROMACS topology + trajectory
u = mda.Universe("protein.gro", "trajectory.xtc")

print(f"Atoms: {u.atoms.n_atoms}")
print(f"Residues: {u.residues.n_residues}")
print(f"Frames: {u.trajectory.n_frames}")
print(f"First frame positions (first 3 atoms):\n{u.atoms.positions[:3]}")
```

## Core API

### Module 1: Universe and AtomGroup — Loading and Selecting Atoms

Load trajectories and select atom subsets.

```python
import MDAnalysis as mda

# Load topology + trajectory (GROMACS xtc format)
u = mda.Universe("system.gro", "md_production.xtc")

# AtomGroup selections (CHARMM-style selection language)
protein = u.select_atoms("protein")
backbone = u.select_atoms("backbone")
ca_atoms = u.select_atoms("name CA")
ligand = u.select_atoms("resname LIG")
binding_site = u.select_atoms("protein and around 5.0 resname LIG")

print(f"Protein atoms: {protein.n_atoms}")
print(f"CA atoms: {ca_atoms.n_atoms}")
print(f"Ligand atoms: {ligand.n_atoms}")
print(f"Binding site residues: {binding_site.residues.n_residues}")

# Access atom properties at current frame
print(f"CA positions shape: {ca_atoms.positions.shape}")  # (N, 3)
print(f"Protein mass: {protein.total_mass():.1f} Da")
```

### Module 2: Trajectory Iteration — Per-Frame Analysis

Iterate over trajectory frames for time-series analysis.

```python
import MDAnalysis as mda
import numpy as np

u = mda.Universe("protein.gro", "trajectory.xtc")
backbone = u.select_atoms("backbone")

times = []
rg_values = []

for ts in u.trajectory:
    times.append(u.trajectory.time)
    rg_values.append(backbone.radius_of_gyration())

import pandas as pd
df = pd.DataFrame({"time_ps": times, "Rg_A": rg_values})
print(f"Frames analyzed: {len(df)}")
print(f"Mean Rg: {df['Rg_A'].mean():.2f} Å")
print(f"Rg std: {df['Rg_A'].std():.2f} Å")
df.to_csv("radius_of_gyration.csv", index=False)
```

### Module 3: RMSD Analysis — Structural Drift Over Time

Compute backbone RMSD relative to a reference structure.

```python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe("protein.gro", "trajectory.xtc")

# RMSD of Cα atoms relative to first frame
rmsd = rms.RMSD(u, select="name CA")
rmsd.run()

# Results: frame, time (ps), RMSD (Å)
results = rmsd.results.rmsd
print(f"Mean RMSD: {results[:, 2].mean():.2f} Å")
print(f"Max RMSD: {results[:, 2].max():.2f} Å")

# Plot
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(results[:, 1] / 1000, results[:, 2], color="steelblue", lw=0.8)
ax.set_xlabel("Time (ns)")
ax.set_ylabel("RMSD (Å)")
ax.set_title("Backbone RMSD")
plt.tight_layout()
plt.savefig("rmsd.png", dpi=150)
print("Saved: rmsd.png")
```

### Module 4: RMSF Analysis — Per-Residue Flexibility

Compute root-mean-square fluctuations to identify flexible regions.

```python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe("protein.gro", "trajectory.xtc")

# RMSF per Cα atom (after aligning trajectory)
ca_atoms = u.select_atoms("name CA")
rmsf_analysis = rms.RMSF(ca_atoms)
rmsf_analysis.run()

rmsf_values = rmsf_analysis.results.rmsf
resids = ca_atoms.resids

print(f"Most flexible residue: {resids[np.argmax(rmsf_values)]} ({rmsf_values.max():.2f} Å)")
print(f"Most rigid residue:    {resids[np.argmin(rmsf_values)]} ({rmsf_values.min():.2f} Å)")

# Plot B-factor-like profile
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(resids, rmsf_values, color="coral", lw=1)
ax.fill_between(resids, 0, rmsf_values, alpha=0.3, color="coral")
ax.set_xlabel("Residue ID")
ax.set_ylabel("RMSF (Å)")
ax.set_title("Per-residue RMSF")
plt.tight_layout()
plt.savefig("rmsf.png", dpi=150)
```

### Module 5: Hydrogen Bond Analysis

Identify and count hydrogen bonds between protein and ligand over the trajectory.

```python
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds import HydrogenBondAnalysis
import pandas as pd

u = mda.Universe("complex.gro", "trajectory.xtc")

# Protein-ligand hydrogen bond analysis
hbonds = HydrogenBondAnalysis(
    universe=u,
    donors_sel="protein",
    acceptors_sel="resname LIG",
    d_h_cutoff=1.2,      # donor-hydrogen distance cutoff (Å)
    d_a_cutoff=3.0,      # donor-acceptor distance cutoff (Å)
    d_h_a_angle_cutoff=150.0,  # angle cutoff (degrees)
)
hbonds.run()

# Get occupancy: fraction of frames with each H-bond
hbonds.generate_table()
df = pd.DataFrame(hbonds.table)
print(f"Total unique H-bonds observed: {len(df['donor_resid'].unique())}")

# Count H-bonds per frame
counts = hbonds.count_by_time()
print(f"Mean H-bonds per frame: {counts[:, 1].mean():.2f}")
print(f"Max H-bonds in one frame: {counts[:, 1].max():.0f}")
```

### Module 6: PCA and Conformational Clustering

Extract principal modes of motion and cluster conformations.

```python
import MDAnalysis as mda
from MDAnalysis.analysis import pca, align
import numpy as np
import matplotlib.pyplot as plt

u = mda.Universe("protein.gro", "trajectory.xtc")

# Align trajectory to first frame
aligner = align.AlignTraj(u, u, select="backbone", in_memory=True)
aligner.run()

# PCA on backbone Cα atoms
ca = u.select_atoms("name CA")
pc = pca.PCA(u, select="backbone")
pc.run()

# Explained variance
cumvar = np.cumsum(pc.results.variance / pc.results.variance.sum())
n_for_90 = np.searchsorted(cumvar, 0.90) + 1
print(f"PCs to explain 90% variance: {n_for_90}")
print(f"PC1 variance: {pc.results.variance[0] / pc.results.variance.sum() * 100:.1f}%")

# Project trajectory onto PC1-PC2 space
transformed = pc.transform(ca, n_components=2)
plt.scatter(transformed[:, 0], transformed[:, 1], c=range(len(transformed)),
            cmap="viridis", s=2)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.colorbar(label="Frame")
plt.title("PCA Conformational Landscape")
plt.savefig("pca.png", dpi=150)
print("Saved: pca.png")
```

## Key Parameters

| Parameter | Module | Default | Effect |
|-----------|--------|---------|--------|
| `select` (Universe) | AtomGroup | — | MDAnalysis selection language string (e.g., `"protein and name CA"`) |
| `in_memory` | align.AlignTraj | `False` | Load full trajectory into RAM for faster analysis |
| `step` | trajectory loop | `1` | Analyze every Nth frame; `step=10` reduces computation 10× |
| `start/stop` | analysis.run() | all | Frame range; `run(start=100, stop=500)` analyzes frames 100-500 |
| `d_a_cutoff` | HydrogenBondAnalysis | `3.0` | Donor-acceptor distance cutoff (Å) |
| `d_h_a_angle_cutoff` | HydrogenBondAnalysis | `150.0` | H-bond angle cutoff (degrees) |
| `n_components` | PCA.transform | all | Number of PCs to project onto |
| `groupselection` | RMSD | `None` | Secondary selection for fitting; primary used for RMSD calculation |
| `weights` | RMSD/align | `None` | `"mass"` for mass-weighted RMSD |
| `verbose` | analysis.run() | `False` | Print progress bar during analysis |

## Common Workflows

### Workflow 1: Complete Protein-Ligand Binding Stability Analysis

```python
import MDAnalysis as mda
from MDAnalysis.analysis import rms, align
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

u = mda.Universe("complex.gro", "md_100ns.xtc")

# Align trajectory to initial frame
align.AlignTraj(u, u, select="protein and backbone", in_memory=True).run()

protein_bb = u.select_atoms("backbone")
ligand = u.select_atoms("resname LIG")

results = {"time_ns": [], "protein_rmsd": [], "ligand_rmsd": [], "pocket_rg": []}
ref_positions = {"protein": protein_bb.positions.copy(), "ligand": ligand.positions.copy()}

for ts in u.trajectory:
    results["time_ns"].append(ts.time / 1000)
    results["protein_rmsd"].append(
        rms.rmsd(protein_bb.positions, ref_positions["protein"], superposition=False))
    results["ligand_rmsd"].append(
        rms.rmsd(ligand.positions, ref_positions["ligand"], superposition=False))
    results["pocket_rg"].append(
        u.select_atoms("protein and around 6.0 resname LIG").radius_of_gyration())

df = pd.DataFrame(results)
df.to_csv("binding_stability.csv", index=False)
print(f"Ligand mean RMSD: {df['ligand_rmsd'].mean():.2f} Å")
print(f"Pocket stable: {'Yes' if df['pocket_rg'].std() < 0.5 else 'No'}")
```

### Workflow 2: Extract Minimum RMSD Representative Structures

```python
import MDAnalysis as mda
from MDAnalysis.analysis import rms
import numpy as np

u = mda.Universe("protein.gro", "trajectory.xtc")

# Compute RMSD for all frames
rmsd_analysis = rms.RMSD(u, select="backbone")
rmsd_analysis.run()
rmsd_values = rmsd_analysis.results.rmsd[:, 2]

# Find frame with lowest RMSD (most representative)
min_frame = np.argmin(rmsd_values)
print(f"Most representative frame: {min_frame} (RMSD: {rmsd_values[min_frame]:.2f} Å)")

# Extract and save that frame as PDB
u.trajectory[min_frame]
with mda.Writer("representative_structure.pdb", u.atoms.n_atoms) as writer:
    writer.write(u.atoms)
print("Saved: representative_structure.pdb")
```

## Common Recipes

### Recipe 1: Compute Contact Map Between Two Protein Domains

```python
import MDAnalysis as mda
from MDAnalysis.analysis import contacts
import numpy as np

u = mda.Universe("protein.gro", "trajectory.xtc")

# Define two domains
domain_A = u.select_atoms("resid 1-100 and name CA")
domain_B = u.select_atoms("resid 200-300 and name CA")

# Count domain-domain contacts (< 8 Å) over time
contact_counts = []
for ts in u.trajectory[::10]:  # every 10th frame
    dist_matrix = np.sqrt(np.sum(
        (domain_A.positions[:, None] - domain_B.positions[None, :]) ** 2, axis=-1
    ))
    contact_counts.append((dist_matrix < 8.0).sum())

print(f"Mean contacts: {np.mean(contact_counts):.1f}")
print(f"Contact count range: {min(contact_counts)} – {max(contact_counts)}")
```

### Recipe 2: Write Trajectory Subset to New File

```python
import MDAnalysis as mda

u = mda.Universe("system.gro", "long_trajectory.xtc")

# Write only protein atoms, every 10th frame, frames 500-2000
protein = u.select_atoms("protein")
with mda.Writer("protein_subset.xtc", protein.n_atoms) as writer:
    for ts in u.trajectory[500:2000:10]:
        writer.write(protein)

print(f"Subset trajectory written: {(2000-500)//10} frames")
```

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| `ValueError: Universe has no file` | Missing trajectory argument | Provide both topology AND trajectory: `mda.Universe(top, traj)` |
| RMSD drift despite alignment | Wrong selection for fitting | Use `"backbone"` for fitting; verify topology matches trajectory |
| `KeyError: 'resname LIG'` | Non-standard residue name | Check `u.residues.resnames`; use exact name from topology |
| Very slow frame iteration | Large trajectory in memory | Use `step=10` to skip frames; convert xtc to smaller DCD |
| Hydrogen bond count is 0 | No explicit hydrogens in topology | Use topology with explicit H atoms; add hydrogens with `psfgen` or `tleap` |
| `NoDataError: xtc has no charges` | Property not in trajectory | Use topology file that contains charges (.psf, .prmtop, .gro with itp) |
| Memory error with `in_memory=True` | Trajectory too large for RAM | Remove `in_memory=True`; increase RAM; or process in chunks |
| Import error on Apple Silicon | Binary not compiled for arm64 | Install via conda-forge: `conda install -c conda-forge MDAnalysis` |

## References

- [MDAnalysis documentation](https://docs.mdanalysis.org/) — official API reference and user guide
- [MDAnalysis GitHub: MDAnalysis/mdanalysis](https://github.com/MDAnalysis/mdanalysis) — source code, tutorials, and issue tracker
- Michaud-Agrawal N et al. (2011) "MDAnalysis: A toolkit for the analysis of molecular dynamics simulations" — *J Comput Chem* 32:2319-2327. [DOI:10.1002/jcc.21787](https://doi.org/10.1002/jcc.21787)
- Gowers RJ et al. (2016) "MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations" — *Proc SciPy 2016*. [DOI:10.25080/majora-629e541a-00e](https://doi.org/10.25080/majora-629e541a-00e)