Chapter 11: Hierarchical (Multilevel) Bayesian Models Data often has a grouped structure (e.g., students in schools). Hierarchical models allow parameters to vary by group, assuming that the group-level parameters themselves come from a shared higher-level distribution. This allows groups to "borrow strength" from each other, leading to more stable estimates. This is also called partial pooling. 11.1 Code Example: The "Eight Schools" Model This is a classic example of a hierarchical model, estimating the effect of coaching programs in 8 different schools. Python # Data on treatment effects from 8 schools J = 8 y_schools = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]) sigma_schools = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]) with pm.Model() as hierarchical_model: # Hyperpriors for the group-level distribution mu = pm.Normal("mu", mu=0, sigma=5) tau = pm.HalfCauchy("tau", beta=5) # Group-level standard deviation # Non-centered parameterization for better sampling theta_offset = pm.Normal("theta_offset", mu=0, sigma=1, shape=J) theta = pm.Deterministic("theta", mu + tau * theta_offset) # School-specific effects # Likelihood obs = pm.Normal("obs", mu=theta, sigma=sigma_schools, observed=y_schools) trace_hierarchical = pm.sample(2000, tune=2000) az.plot_forest(trace_hierarchical, var_names=["theta"], r_hat=True) plt.show() The forest plot shows the posterior for each school's effect (theta). Notice how the estimates are "shrunk" from their raw values towards the overall group mean (mu), especially for schools with high uncertainty (large sigma_schools).