--- name: "seaborn-statistical-plots" description: "Statistical visualization on matplotlib with native pandas support. Auto aggregation, CIs, grouping for distributions (histplot, kdeplot), categorical (boxplot, violinplot), relational (scatterplot, lineplot), regression (regplot, lmplot), matrix (heatmap, clustermap), grids (pairplot, FacetGrid). Use for quick statistical summaries; matplotlib for fine control; plotly for interactive HTML." license: BSD-3-Clause --- # Seaborn — Statistical Plots ## Overview Seaborn is a Python library for statistical data visualization built on top of matplotlib. It works directly with pandas DataFrames, automatically handles grouping by categorical variables, computes confidence intervals and kernel density estimates, and produces attractive publication-ready figures with minimal configuration. Seaborn separates axes-level functions (embeddable in custom layouts) from figure-level functions (with built-in faceting), enabling both quick exploratory analysis and structured multi-panel figures. ## When to Use - Comparing gene expression, protein abundance, or measurement distributions across experimental conditions (treatment vs. control, cell lines, time points) - Generating grouped box plots, violin plots, or strip plots to show both summary statistics and individual data points simultaneously - Visualizing pairwise correlations in multi-gene or multi-feature datasets as annotated heatmaps - Plotting regression fits with confidence bands between continuous variables (e.g., cell viability vs. drug concentration) - Faceting a single plot type across multiple sample subsets, tissue types, or experimental batches in one call - Rapid exploratory analysis of a new dataset using `pairplot` to survey all pairwise relationships at once - Use `matplotlib` directly when you need pixel-level control over figure elements, complex mixed-type layouts, or non-statistical custom plots - Use `plotly` when the output must be interactive (hover tooltips, zoom, pan) or embedded in a web application ## Prerequisites - **Python packages**: `seaborn>=0.13`, `matplotlib`, `pandas`, `numpy` - **Data requirements**: Pandas DataFrame in long-form (tidy) format; each observation is a row, each variable is a column - **Environment**: Standard Python environment; no GPU or special hardware required ```bash pip install "seaborn>=0.13" matplotlib pandas numpy scipy ``` ## Quick Start ```python import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import numpy as np # Simulate gene expression across conditions rng = np.random.default_rng(42) df = pd.DataFrame({ "gene": ["BRCA1"] * 60 + ["TP53"] * 60, "condition": ["control", "treated"] * 60, "log2_expr": np.concatenate([ rng.normal(5.2, 0.8, 60), rng.normal(6.1, 0.9, 60), ]) }) sns.set_theme(style="ticks", context="notebook") sns.boxplot(data=df, x="gene", y="log2_expr", hue="condition", palette="Set2") plt.ylabel("log2 Expression") plt.title("Gene Expression by Condition") plt.tight_layout() plt.savefig("quickstart_boxplot.png", dpi=150) print("Saved quickstart_boxplot.png") ``` ## Core API ### 1. Distribution Plots Visualize univariate distributions and compare them across groups. `histplot` bins data; `kdeplot` fits a smooth density estimate; `displot` is the figure-level wrapper that adds faceting. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(0) n = 200 df = pd.DataFrame({ "log2_tpm": np.concatenate([rng.normal(4.5, 1.1, n), rng.normal(6.0, 1.3, n)]), "sample": ["tumor"] * n + ["normal"] * n, }) fig, axes = plt.subplots(1, 3, figsize=(15, 4)) # Histogram with density normalization and stacked hue groups sns.histplot(data=df, x="log2_tpm", hue="sample", stat="density", multiple="stack", bins=30, ax=axes[0]) axes[0].set_title("Histogram (stacked)") # KDE with fill — bandwidth controlled by bw_adjust sns.kdeplot(data=df, x="log2_tpm", hue="sample", fill=True, bw_adjust=0.8, alpha=0.4, ax=axes[1]) axes[1].set_title("KDE (filled)") # ECDF — useful for comparing cumulative distributions sns.ecdfplot(data=df, x="log2_tpm", hue="sample", ax=axes[2]) axes[2].set_title("ECDF") plt.tight_layout() plt.savefig("distributions.png", dpi=150) print("Saved distributions.png") ``` ```python # Bivariate KDE: joint distribution of two continuous variables rng = np.random.default_rng(1) df2 = pd.DataFrame({ "log2_rna": rng.normal(5.5, 1.2, 300), "log2_prot": rng.normal(4.8, 1.0, 300) + 0.6 * rng.normal(5.5, 1.2, 300), }) sns.kdeplot(data=df2, x="log2_rna", y="log2_prot", fill=True, levels=8, thresh=0.05, cmap="Blues") plt.xlabel("log2 RNA (TPM)") plt.ylabel("log2 Protein (iBAQ)") plt.title("RNA–Protein Correlation Density") plt.tight_layout() plt.savefig("bivariate_kde.png", dpi=150) print("Saved bivariate_kde.png") ``` ### 2. Categorical Plots Compare distributions or aggregated statistics across categorical groups. Axes-level functions (`boxplot`, `violinplot`, `stripplot`, `swarmplot`, `barplot`) accept an `ax=` parameter for embedding in custom layouts. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(2) conditions = ["DMSO", "Drug A 1uM", "Drug A 10uM", "Drug B 1uM", "Drug B 10uM"] df = pd.DataFrame({ "condition": np.repeat(conditions, 30), "viability": np.concatenate([ rng.normal(100, 5, 30), rng.normal(92, 7, 30), rng.normal(65, 10, 30), rng.normal(88, 8, 30), rng.normal(45, 12, 30), ]) }) fig, axes = plt.subplots(1, 3, figsize=(18, 5)) # Box plot — shows quartiles and outliers sns.boxplot(data=df, x="condition", y="viability", palette="husl", width=0.5, ax=axes[0]) axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=30, ha="right") axes[0].set_title("Box Plot") # Violin — KDE shape + inner quartile lines sns.violinplot(data=df, x="condition", y="viability", inner="quart", palette="muted", ax=axes[1]) axes[1].set_xticklabels(axes[1].get_xticklabels(), rotation=30, ha="right") axes[1].set_title("Violin Plot") # Strip plot overlaid on box — shows all individual points sns.boxplot(data=df, x="condition", y="viability", palette="pastel", width=0.5, ax=axes[2]) sns.stripplot(data=df, x="condition", y="viability", color="black", alpha=0.4, size=3, jitter=True, ax=axes[2]) axes[2].set_xticklabels(axes[2].get_xticklabels(), rotation=30, ha="right") axes[2].set_title("Box + Strip") plt.tight_layout() plt.savefig("categorical.png", dpi=150) print("Saved categorical.png") ``` ```python # Bar plot with mean ± 95% CI and individual points (swarm) fig, ax = plt.subplots(figsize=(8, 5)) sns.barplot(data=df, x="condition", y="viability", estimator="mean", errorbar="ci", palette="Set3", ax=ax) sns.swarmplot(data=df, x="condition", y="viability", color="black", size=3, alpha=0.5, ax=ax) ax.set_ylabel("Cell Viability (%)") ax.set_xticklabels(ax.get_xticklabels(), rotation=30, ha="right") plt.tight_layout() plt.savefig("barswarm.png", dpi=150) print("Saved barswarm.png") ``` ### 3. Relational Plots Visualize relationships between continuous variables. `scatterplot` and `lineplot` are axes-level; `relplot` is the figure-level wrapper that supports `col` and `row` faceting. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(3) n = 150 df = pd.DataFrame({ "molecular_weight": rng.uniform(200, 800, n), "logP": rng.uniform(-2, 6, n), "pIC50": rng.normal(6.5, 1.2, n), "target_class": rng.choice(["kinase", "GPCR", "protease"], n), "pass_lipinski": rng.choice(["yes", "no"], n, p=[0.7, 0.3]), }) # Scatter with hue (categorical color) + size (continuous) + style (marker) sns.scatterplot(data=df, x="molecular_weight", y="pIC50", hue="target_class", size="logP", style="pass_lipinski", sizes=(30, 120), alpha=0.7) plt.xlabel("Molecular Weight (Da)") plt.ylabel("pIC50") plt.title("Compound Bioactivity by Target Class") plt.tight_layout() plt.savefig("relational_scatter.png", dpi=150) print("Saved relational_scatter.png") ``` ```python # Line plot with automatic mean aggregation and SD error band across replicates timepoints = [0, 1, 2, 4, 8, 24] groups = ["untreated", "low_dose", "high_dose"] rows = [] for grp, base in zip(groups, [100.0, 95.0, 80.0]): for tp in timepoints: for _ in range(5): # 5 replicates rows.append({"timepoint_h": tp, "group": grp, "confluency": base * np.exp(-0.02 * tp * (1 + rng.normal(0, 0.1)))}) time_df = pd.DataFrame(rows) sns.lineplot(data=time_df, x="timepoint_h", y="confluency", hue="group", style="group", errorbar="sd", markers=True, dashes=False) plt.xlabel("Time (h)") plt.ylabel("Confluency (%)") plt.title("Cell Growth Inhibition (mean ± SD, n=5)") plt.tight_layout() plt.savefig("lineplot.png", dpi=150) print("Saved lineplot.png") ``` ### 4. Regression Plots Fit linear (or polynomial/lowess) models and visualize them with confidence bands. `regplot` is axes-level; `lmplot` is figure-level with faceting support. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(4) n = 120 tumor_size = rng.uniform(0.5, 6.0, n) survival_months = 40 - 5 * tumor_size + rng.normal(0, 4, n) grade = rng.choice(["low", "high"], n, p=[0.5, 0.5]) df = pd.DataFrame({"tumor_size_cm": tumor_size, "survival_months": survival_months, "grade": grade}) fig, axes = plt.subplots(1, 2, figsize=(13, 5)) # Linear regression with 95% CI band sns.regplot(data=df, x="tumor_size_cm", y="survival_months", ci=95, scatter_kws={"alpha": 0.4, "s": 25}, ax=axes[0]) axes[0].set_title("Linear Regression (95% CI)") # Residuals plot — check for homoscedasticity sns.residplot(data=df, x="tumor_size_cm", y="survival_months", scatter_kws={"alpha": 0.4, "s": 25}, ax=axes[1]) axes[1].axhline(0, color="red", linestyle="--", linewidth=1) axes[1].set_title("Residuals vs Fitted") plt.tight_layout() plt.savefig("regression.png", dpi=150) print("Saved regression.png") ``` ```python # lmplot — figure-level: separate regression lines per grade (hue) + facets g = sns.lmplot(data=df, x="tumor_size_cm", y="survival_months", hue="grade", col="grade", ci=95, scatter_kws={"alpha": 0.4}, height=4, aspect=1.1) g.set_axis_labels("Tumor Size (cm)", "Survival (months)") g.set_titles("{col_name} grade") g.savefig("lmplot_faceted.png", dpi=150) print("Saved lmplot_faceted.png") ``` ### 5. Matrix Plots Visualize rectangular data as color-encoded matrices. `heatmap` is axes-level; `clustermap` is figure-level and applies hierarchical clustering to rows and columns. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(5) genes = [f"GENE{i}" for i in range(1, 9)] samples = [f"S{i}" for i in range(1, 7)] # Simulate log2 fold-change matrix (rows=genes, cols=samples) lfc = pd.DataFrame( rng.normal(0, 1.5, size=(8, 6)), index=genes, columns=samples ) # Inject a pattern: first 3 genes up in samples 1-3, down in 4-6 lfc.iloc[:3, :3] += 2.5 lfc.iloc[:3, 3:] -= 2.5 # Correlation heatmap of numeric features df_num = pd.DataFrame( rng.standard_normal((80, 5)), columns=["GeneA", "GeneB", "GeneC", "GeneD", "GeneE"] ) df_num["GeneB"] = df_num["GeneA"] * 0.85 + rng.normal(0, 0.3, 80) corr = df_num.corr() fig, axes = plt.subplots(1, 2, figsize=(14, 5)) sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", center=0, square=True, linewidths=0.5, ax=axes[0]) axes[0].set_title("Pearson Correlation Heatmap") sns.heatmap(lfc, cmap="RdBu_r", center=0, annot=True, fmt=".1f", linewidths=0.3, cbar_kws={"label": "log2FC"}, ax=axes[1]) axes[1].set_title("log2 Fold Change Matrix") plt.tight_layout() plt.savefig("heatmaps.png", dpi=150) print("Saved heatmaps.png") ``` ```python # Clustermap with hierarchical clustering and row/column color annotations rng = np.random.default_rng(6) n_genes, n_samples = 30, 16 expr = pd.DataFrame( rng.lognormal(mean=2.0, sigma=1.2, size=(n_genes, n_samples)), index=[f"GENE{i:03d}" for i in range(n_genes)], columns=[f"{'T' if i < 8 else 'N'}{i:02d}" for i in range(n_samples)] ) # Column annotation colors (tumor vs normal) col_colors = ["#D32F2F" if c.startswith("T") else "#1976D2" for c in expr.columns] g = sns.clustermap( np.log2(expr + 1), cmap="viridis", standard_scale=0, # z-score across rows (genes) method="ward", metric="euclidean", col_colors=col_colors, figsize=(12, 10), linewidths=0, cbar_pos=(0.02, 0.8, 0.03, 0.15), cbar_kws={"label": "Row z-score"}, ) g.ax_heatmap.set_xlabel("Sample") g.ax_heatmap.set_ylabel("Gene") plt.savefig("clustermap.png", dpi=150, bbox_inches="tight") print("Saved clustermap.png") ``` ### 6. Multi-Variable Grids Survey all pairwise relationships with `pairplot` or display a bivariate distribution with marginals using `jointplot`. For fully custom grid layouts, use `FacetGrid` directly. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(7) n = 60 df = pd.DataFrame({ "cell_area": rng.normal(350, 60, n * 3), "nucleus_area": rng.normal(90, 15, n * 3), "mean_intensity": rng.exponential(500, n * 3), "aspect_ratio": np.abs(rng.normal(1.3, 0.3, n * 3)), "cell_type": (["HeLa"] * n + ["MCF7"] * n + ["A549"] * n), }) # Pairplot — matrix of pairwise scatter + KDE on diagonal g = sns.pairplot(df, hue="cell_type", corner=True, diag_kind="kde", plot_kws={"alpha": 0.5, "s": 20}) g.savefig("pairplot.png", dpi=150) print("Saved pairplot.png") ``` ```python # Jointplot — bivariate KDE with marginal histograms g = sns.jointplot(data=df, x="cell_area", y="nucleus_area", hue="cell_type", kind="scatter", marginal_kws={"fill": True, "alpha": 0.3}) g.set_axis_labels("Cell Area (µm²)", "Nucleus Area (µm²)") g.savefig("jointplot.png", dpi=150) print("Saved jointplot.png") ``` ```python # FacetGrid — custom layout: KDE of mean_intensity per cell type g = sns.FacetGrid(df, col="cell_type", height=3.5, aspect=1.1, sharey=False) g.map(sns.histplot, "mean_intensity", bins=20, kde=True, color="steelblue") g.set_axis_labels("Mean Intensity (AU)", "Count") g.set_titles("{col_name}") g.tight_layout() g.savefig("facetgrid_intensity.png", dpi=150) print("Saved facetgrid_intensity.png") ``` ## Key Concepts ### Figure-Level vs Axes-Level Functions Seaborn has two tiers of functions with different return types and composability: | Feature | Axes-Level | Figure-Level | |---------|-----------|--------------| | **Examples** | `scatterplot`, `histplot`, `boxplot`, `heatmap`, `regplot` | `relplot`, `displot`, `catplot`, `lmplot` | | **Returns** | `matplotlib.axes.Axes` | `FacetGrid` / `JointGrid` / `PairGrid` | | **Faceting** | Manual (create subplots yourself) | Built-in (`col=`, `row=` params) | | **Sizing** | `figsize=` on parent figure | `height=` + `aspect=` per facet panel | | **Placement** | `ax=` parameter | Cannot be placed in an existing axes | | **Saving** | `plt.savefig(...)` | `g.savefig(...)` | | **Use when** | Combining different plot types in one figure | Quick multi-panel exploratory views | ```python # Axes-level: place in a pre-allocated subplot grid fig, axes = plt.subplots(1, 2, figsize=(12, 5)) sns.violinplot(data=df, x="cell_type", y="cell_area", ax=axes[0]) sns.scatterplot(data=df, x="cell_area", y="nucleus_area", hue="cell_type", ax=axes[1]) ``` ### Long-Form vs Wide-Form Data Seaborn semantic mappings (`hue`, `size`, `style`) require **long-form (tidy) data** where each variable is a column and each observation is a row. Some functions (`heatmap`, `clustermap`, `lineplot`) also accept wide-form. ```python # Wide-form: unsuitable for hue/style mappings # sample_A sample_B sample_C # 0 5.1 6.2 4.8 # Long-form (preferred): melt wide → long wide = pd.DataFrame({"sampleA": [5.1, 4.3], "sampleB": [6.2, 5.9]}) long = wide.melt(var_name="sample", value_name="log2_expr") # → columns: sample, log2_expr ``` ## Common Workflows ### Workflow 1: Differential Expression Scatter with Significance Thresholds **Goal**: Visualize log2 fold-change vs -log10 p-value (volcano-style) with significance annotations, colored by regulation status, and labeled top hits. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(42) n = 500 lfc = rng.normal(0, 1.5, n) pvals = 10 ** (-rng.exponential(1.5, n)) # skewed toward low significance pvals = np.clip(pvals, 1e-20, 1.0) genes = [f"GENE{i:04d}" for i in range(n)] df_de = pd.DataFrame({"gene": genes, "log2fc": lfc, "pvalue": pvals}) df_de["neg_log10_p"] = -np.log10(df_de["pvalue"]) # Classify regulation status lfc_thresh = 1.0 padj_thresh = 0.05 df_de["sig"] = "NS" df_de.loc[(df_de["log2fc"] > lfc_thresh) & (df_de["pvalue"] < padj_thresh), "sig"] = "Up" df_de.loc[(df_de["log2fc"] < -lfc_thresh) & (df_de["pvalue"] < padj_thresh), "sig"] = "Down" palette = {"NS": "#AAAAAA", "Up": "#D32F2F", "Down": "#1976D2"} sns.set_theme(style="ticks", context="paper", font_scale=1.1) fig, ax = plt.subplots(figsize=(8, 6)) sns.scatterplot(data=df_de, x="log2fc", y="neg_log10_p", hue="sig", palette=palette, alpha=0.6, s=18, linewidth=0, ax=ax) # Threshold lines ax.axhline(-np.log10(padj_thresh), color="black", linestyle="--", linewidth=0.8) ax.axvline( lfc_thresh, color="black", linestyle="--", linewidth=0.8) ax.axvline(-lfc_thresh, color="black", linestyle="--", linewidth=0.8) # Label top 5 most significant genes per direction for direction in ["Up", "Down"]: top = df_de[df_de["sig"] == direction].nlargest(5, "neg_log10_p") for _, row in top.iterrows(): ax.text(row["log2fc"], row["neg_log10_p"] + 0.3, row["gene"], fontsize=6, ha="center", va="bottom", color=palette[direction]) # Annotation counts n_up = (df_de["sig"] == "Up").sum() n_down = (df_de["sig"] == "Down").sum() ax.set_title(f"Volcano Plot | Up: {n_up} Down: {n_down}") ax.set_xlabel("log2 Fold Change") ax.set_ylabel("-log10 p-value") sns.despine(trim=True) plt.tight_layout() plt.savefig("volcano_plot.png", dpi=300, bbox_inches="tight") print(f"Volcano: {n_up} up, {n_down} down — saved volcano_plot.png") ``` ### Workflow 2: Multi-Condition Comparison with Grouped Violin + Strip Plots **Goal**: Compare gene expression (or any continuous measurement) across multiple treatments and time points, showing full distributions plus individual replicates. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(99) genes = ["BRCA1", "TP53", "EGFR"] treats = ["DMSO", "Drug A", "Drug B"] timepoints = ["6h", "24h", "48h"] rows = [] for gene in genes: base_expr = {"BRCA1": 7.5, "TP53": 6.2, "EGFR": 8.1}[gene] for treat in treats: treat_shift = {"DMSO": 0.0, "Drug A": -0.8, "Drug B": 0.6}[treat] for tp in timepoints: tp_shift = {"6h": 0.0, "24h": 0.3, "48h": 0.6}[tp] for _ in range(12): rows.append({ "gene": gene, "treatment": treat, "timepoint": tp, "log2_expr": base_expr + treat_shift + tp_shift + rng.normal(0, 0.5), }) df_mc = pd.DataFrame(rows) sns.set_theme(style="whitegrid", context="paper", font_scale=1.0) g = sns.catplot( data=df_mc, x="timepoint", y="log2_expr", hue="treatment", col="gene", kind="violin", inner="quart", dodge=True, palette="Set2", height=4, aspect=0.9, col_order=genes, order=timepoints, ) # Overlay individual points for ax in g.axes.flat: gene_label = ax.get_title() gene_name = gene_label.split(" = ")[-1] if " = " in gene_label else gene_label subset = df_mc[df_mc["gene"] == gene_name] sns.stripplot( data=subset, x="timepoint", y="log2_expr", hue="treatment", dodge=True, jitter=True, size=2.5, alpha=0.4, palette="dark:black", order=timepoints, legend=False, ax=ax, ) g.set_axis_labels("Timepoint", "log2 Expression") g.set_titles("{col_name}") g.add_legend(title="Treatment") sns.despine(trim=True) g.tight_layout() g.savefig("multigroup_violin.png", dpi=300, bbox_inches="tight") print("Saved multigroup_violin.png") ``` ### Workflow 3: Pairwise Feature Exploration for Cell Morphology **Goal**: Quickly survey pairwise relationships in a multi-feature cell morphology dataset using `pairplot`, then examine one key pair with a `jointplot`. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd rng = np.random.default_rng(12) n_per_type = 80 df_morph = pd.DataFrame({ "cell_area_um2": np.concatenate([rng.normal(320, 50, n_per_type), rng.normal(420, 70, n_per_type), rng.normal(280, 40, n_per_type)]), "nucleus_area_um2": np.concatenate([rng.normal(85, 12, n_per_type), rng.normal(110, 18, n_per_type), rng.normal(75, 10, n_per_type)]), "eccentricity": np.abs(np.concatenate([rng.normal(0.6, 0.12, n_per_type), rng.normal(0.8, 0.10, n_per_type), rng.normal(0.5, 0.09, n_per_type)])), "mean_dapi": np.concatenate([rng.exponential(400, n_per_type), rng.exponential(600, n_per_type), rng.exponential(350, n_per_type)]), "cell_line": ["HeLa"] * n_per_type + ["MCF7"] * n_per_type + ["U2OS"] * n_per_type, }) # 1. Pairplot survey g = sns.pairplot(df_morph, hue="cell_line", corner=True, diag_kind="kde", plot_kws={"alpha": 0.5, "s": 15}, palette="Dark2") g.savefig("morphology_pairplot.png", dpi=150) print("Saved morphology_pairplot.png") # 2. Focused jointplot for the most informative pair g2 = sns.jointplot(data=df_morph, x="cell_area_um2", y="nucleus_area_um2", hue="cell_line", kind="scatter", marginal_kws={"fill": True, "alpha": 0.25}, palette="Dark2", alpha=0.6) g2.set_axis_labels("Cell Area (µm²)", "Nucleus Area (µm²)") g2.savefig("morphology_jointplot.png", dpi=150) print("Saved morphology_jointplot.png") ``` ## Key Parameters | Parameter | Function(s) | Default | Range / Options | Effect | |-----------|-------------|---------|-----------------|--------| | `hue` | All plot functions | `None` | Column name (categorical or continuous) | Color-encodes a variable; triggers automatic legend | | `style` | `scatterplot`, `lineplot` | `None` | Categorical column name | Encodes variable with marker shape or line dash pattern | | `size` | `scatterplot`, `lineplot` | `None` | Categorical or continuous column | Encodes variable via point or line size | | `col` / `row` | Figure-level only (`relplot`, `displot`, `catplot`, `lmplot`) | `None` | Categorical column name | Creates one subplot panel per unique value | | `col_wrap` | Figure-level only | `None` | int | Wraps columns onto a new row after N panels | | `estimator` | `barplot`, `pointplot` | `"mean"` | `"mean"`, `"median"`, any callable | Aggregation function applied within each category | | `errorbar` | `barplot`, `lineplot`, `pointplot` | `("ci", 95)` | `"ci"`, `"sd"`, `"se"`, `"pi"`, `None` | Error bar type displayed around the estimate | | `stat` | `histplot` | `"count"` | `"count"`, `"frequency"`, `"density"`, `"probability"` | Normalization applied to histogram bar heights | | `bw_adjust` | `kdeplot`, `violinplot` | `1.0` | `0.1`–`3.0` | KDE bandwidth multiplier; lower=spikier, higher=smoother | | `multiple` | `histplot`, `kdeplot` | `"layer"` | `"layer"`, `"stack"`, `"dodge"`, `"fill"` | How overlapping hue groups are drawn | | `inner` | `violinplot` | `"box"` | `"box"`, `"quart"`, `"point"`, `"stick"`, `None` | Interior annotation inside the violin body | | `standard_scale` | `clustermap` | `None` | `0` (rows), `1` (columns) | Z-score normalization axis before clustering | | `dodge` | `boxplot`, `violinplot`, `stripplot` | Varies | `True`, `False` | Separate hue-grouped elements along the axis | | `context` | `set_theme()` | `"notebook"` | `"paper"`, `"notebook"`, `"talk"`, `"poster"` | Scales font and line widths for output medium | ## Best Practices 1. **Prefer long-form DataFrames with named columns**: Seaborn's semantic mapping (`hue`, `style`, `size`) reads variable names directly from column names. Passing raw arrays loses axis labels and legends. Use `pd.melt()` to convert wide-form data. 2. **Call `set_theme()` once at the top of a script**: This sets the global style, context, and palette for all subsequent plots, ensuring consistency. Reset to defaults with `sns.set_theme()`. ```python sns.set_theme(style="ticks", context="paper", font_scale=1.1, rc={"axes.spines.right": False, "axes.spines.top": False}) ``` 3. **Use axes-level functions for mixed-type custom layouts**: Figure-level functions (`relplot`, `catplot`) create their own figure and cannot be placed in an existing `Axes`. When combining different plot types (e.g., scatter + violin + heatmap), allocate a `plt.subplots()` grid and use axes-level functions with `ax=`. 4. **Use colorblind-safe palettes**: `sns.set_palette("colorblind")` or `palette="colorblind"` produces a palette distinguishable by readers with common color vision deficiencies. For diverging data, use `"RdBu_r"` or `"coolwarm"` with `center=0`. 5. **Overlay individual data points on summary plots**: Violin and bar plots hide distribution shape and sample size. Overlaying a `stripplot` or `swarmplot` with `alpha=0.4` and small `size` conveys data density without obscuring the summary statistic. 6. **Size figure-level plots with `height` and `aspect`, not `figsize`**: Figure-level functions ignore `figsize`. Use `height=` (inches per panel) and `aspect=` (width-to-height ratio per panel). For axes-level, set `figsize` on the `plt.subplots()` call. 7. **Anti-pattern — calling `plt.savefig()` on a figure-level grid**: Figure-level functions return a `FacetGrid`/`JointGrid` object. Save it with `g.savefig("out.png", dpi=300, bbox_inches="tight")`, not `plt.savefig()`, which may capture a blank figure. ## Common Recipes ### Recipe: Publication-Ready Figure with Custom Palette and 300 DPI Export When to use: Preparing a multi-panel figure for journal submission or a slide deck. ```python import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd sns.set_theme(style="ticks", context="paper", font_scale=1.2, rc={"pdf.fonttype": 42, "ps.fonttype": 42}) rng = np.random.default_rng(7) df = pd.DataFrame({ "condition": np.repeat(["Control", "Treated"], 40), "ki67_pct": np.concatenate([rng.normal(18, 4, 40), rng.normal(32, 6, 40)]), "apoptosis": np.concatenate([rng.normal(5, 1.5, 40), rng.normal(12, 2.5, 40)]), }) custom_palette = {"Control": "#4575B4", "Treated": "#D73027"} fig, axes = plt.subplots(1, 2, figsize=(8, 4)) # Panel A sns.boxplot(data=df, x="condition", y="ki67_pct", palette=custom_palette, width=0.45, linewidth=1.2, ax=axes[0]) sns.stripplot(data=df, x="condition", y="ki67_pct", color="black", alpha=0.35, size=3, jitter=True, ax=axes[0]) axes[0].set_ylabel("Ki67 Positive Cells (%)") axes[0].set_xlabel("") axes[0].set_title("A", loc="left", fontweight="bold") # Panel B sns.boxplot(data=df, x="condition", y="apoptosis", palette=custom_palette, width=0.45, linewidth=1.2, ax=axes[1]) sns.stripplot(data=df, x="condition", y="apoptosis", color="black", alpha=0.35, size=3, jitter=True, ax=axes[1]) axes[1].set_ylabel("Apoptotic Cells (%)") axes[1].set_xlabel("") axes[1].set_title("B", loc="left", fontweight="bold") sns.despine(trim=True) plt.tight_layout() plt.savefig("figure1.pdf", dpi=300, bbox_inches="tight") plt.savefig("figure1.png", dpi=300, bbox_inches="tight") print("Saved figure1.pdf and figure1.png at 300 DPI") ``` ### Recipe: Clustered Heatmap with Row and Column Color Annotations When to use: Displaying a gene expression matrix with sample group annotations and hierarchical clustering to reveal co-expression modules. ```python import seaborn as sns import matplotlib.pyplot as plt import matplotlib.patches as mpatches import numpy as np import pandas as pd rng = np.random.default_rng(21) n_genes, n_samples = 40, 20 conditions = ["tumor"] * 10 + ["normal"] * 10 # Simulate expression: 3 co-expression modules expr = pd.DataFrame( rng.lognormal(2.5, 0.8, (n_genes, n_samples)), index=[f"GENE{i:03d}" for i in range(n_genes)], columns=[f"{c[0].upper()}{i:02d}" for i, c in enumerate(conditions)], ) # Module 1: genes 0-13 up in tumor expr.iloc[:14, :10] *= 3.0 # Module 2: genes 14-27 down in tumor expr.iloc[14:28, :10] *= 0.3 # Module 3: genes 28-39 unchanged log_expr = np.log2(expr + 1) # Column colors: tumor=red, normal=blue cond_pal = {"tumor": "#C62828", "normal": "#1565C0"} col_colors = [cond_pal[c] for c in conditions] # Row colors: module membership module_pal = {"up": "#EF9A9A", "down": "#90CAF9", "stable": "#C8E6C9"} row_modules = (["up"] * 14) + (["down"] * 14) + (["stable"] * 12) row_colors = [module_pal[m] for m in row_modules] g = sns.clustermap( log_expr, cmap="RdYlBu_r", center=log_expr.values.mean(), standard_scale=0, # z-score per gene (row) method="ward", metric="euclidean", col_colors=col_colors, row_colors=row_colors, figsize=(14, 12), linewidths=0, cbar_pos=(0.02, 0.85, 0.03, 0.12), cbar_kws={"label": "Row z-score"}, dendrogram_ratio=(0.12, 0.08), ) g.ax_heatmap.set_xlabel("Sample", fontsize=10) g.ax_heatmap.set_ylabel("Gene", fontsize=10) g.ax_heatmap.set_title("Gene Expression Clustermap", fontsize=12, pad=80) # Manual legend for column/row annotations legend_handles = [ mpatches.Patch(color="#C62828", label="Tumor"), mpatches.Patch(color="#1565C0", label="Normal"), mpatches.Patch(color="#EF9A9A", label="Up in tumor"), mpatches.Patch(color="#90CAF9", label="Down in tumor"), mpatches.Patch(color="#C8E6C9", label="Stable"), ] g.ax_heatmap.legend(handles=legend_handles, bbox_to_anchor=(1.25, 1.05), loc="upper left", frameon=False, fontsize=9) plt.savefig("clustermap_annotated.png", dpi=300, bbox_inches="tight") print("Saved clustermap_annotated.png") ``` ## Troubleshooting | Problem | Cause | Solution | |---------|-------|----------| | Legend placed outside plot, clipped in saved file | Figure-level functions place the legend outside by default | Add `bbox_inches="tight"` to `savefig()`: `g.savefig("out.png", dpi=300, bbox_inches="tight")` | | `TypeError: FacetGrid.savefig()` or blank figure saved | Called `plt.savefig()` on a figure-level grid that owns its own figure | Use `g.savefig(...)` instead of `plt.savefig(...)` | | Overlapping x-axis category labels | Long label strings overlap at default rotation | Add `plt.xticks(rotation=45, ha="right")` and `plt.tight_layout()` after the plot call | | `ValueError: Could not interpret value ... for parameter 'hue'` | Data is in wide-form; hue mapping requires long-form | Convert with `df.melt(id_vars=[...], var_name="sample", value_name="expr")` | | KDE bandwidth too smooth (loses bimodality) | Default `bw_adjust=1.0` over-smooths small datasets | Lower to `bw_adjust=0.5`; confirm peaks with `histplot` | | `clustermap` ignores `figsize` | Figure-level functions do not accept `figsize` as a kwarg in older seaborn | Pass `figsize` as a direct argument: `sns.clustermap(..., figsize=(12, 10))` | | Violin plot is a thin line (no shape) | Too few observations for KDE estimation | Switch to `kind="box"` or `kind="strip"`; or use `cut=0` to restrict KDE to data range | | Colors not distinguishable for many groups | Default palette repeats with >6 categories | Use `sns.color_palette("husl", n_colors=N)` or `"tab20"` for up to 20 distinct colors | | Figure-level function ignores `ax=` parameter | Axes-level distinction: figure-level functions create their own figure | Use the corresponding axes-level function (`scatterplot`, `histplot`, etc.) with `ax=` | ## Related Skills - **matplotlib-scientific-plotting** — low-level figure building, custom annotations, non-statistical plot types, and multi-panel layouts that mix seaborn with raw matplotlib - **plotly-interactive-visualization** — interactive charts with hover, zoom, and HTML/Dash export - **pydeseq2-differential-expression** — produces the log2FC and p-values that feed into volcano-style scatter plots - **scikit-image-processing** — generates cell morphology measurements visualized with seaborn categorical/distribution plots - **scientific-visualization** — decision guide for selecting the right chart type and color scheme before coding ## References - [Seaborn official documentation](https://seaborn.pydata.org/) — API reference, tutorial, and gallery - [Seaborn example gallery](https://seaborn.pydata.org/examples/index.html) — visual index of all plot types - [Seaborn GitHub](https://github.com/mwaskom/seaborn) — source code and issue tracker - Waskom ML (2021). "seaborn: statistical data visualization." *Journal of Open Source Software*, 6(60), 3021. [https://doi.org/10.21105/joss.03021](https://doi.org/10.21105/joss.03021)