--- title: "Least Squares By Springs" author: "Michael Friendly" date: "`r Sys.Date()`" output: bookdown::html_document2: base_format: rmarkdown::html_vignette number_sections: false toc: true self_contained: true math_method: engine: mathjax url: https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js bibliography: "references.bib" vignette: > %\VignetteIndexEntry{Least Squares By Springs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r nomessages, echo = FALSE} knitr::opts_chunk$set( warning = FALSE, message = FALSE, fig.height = 5, fig.width = 5 ) options(digits=4) par(mar=c(3,3,1,1)+.1) options(tinytable_html_mathjax = TRUE) ``` ```{r setup, include=FALSE} library(ggsprings) library(tibble) library(tinytable) library(dplyr) library(matlib) # set basic theme theme_set(theme_minimal(base_size = 16)) ``` ```{r} #| echo: false #| out-width: "70%" #| fig-align: center knitr::include_graphics(here::here("man/figures/horizontal-spring2.png")) ``` ## Introduction The method of least squares fitting is remarkable in its versatility. It grew out of practical problems in astronomy (the orbits of planets, librations of the moon) and geodesy (finding the "shape" of the earth), where astronomers and mathematicians sought to find a way to combine a collection of fallible observations (angular separation between stars) into a a single "best" estimate. Some of the best names in mathematics are associated with this discovery: Newton, Laplace, Legendre, Gauss. @Stigler1981 recounts some of this history. It's original application was to justify the use of the arithmetic average $\bar{x}$ as the value that gave the smallest sum of squares of errors $\text{SSE} =\Sigma (x - \bar{x})^2$, but the same principle gives solutions for linear least squares regression, multiple regression, smoothing splines and non-linear models, all the way to exploratory factor analysis. As a mathematical method of estimation, least squares is also remarkably versatile in the variety of methods of proof that can justify it's application. Minimization of the sum of squares of errors can be solved by calculus, linear algebra or by numerical optimization and even by a simple geometric argument. It is the purpose of this vignette to show how least squares problems can also be solved by the physics of springs and a bit of the history of this idea. No need to invoke a function minimization algorithm, invert a matrix, or solve a system of equations. Just connect your observations to what you want to estimate with springs, and _bingo!_, let the springs give the answer. ### A mechanical solution Among the earliest physical realizations of solving least squares problems by springs, @GainesPalphrey:1932 included the following figures to illustrate a remarkable physical device for fitting a linear relation $y = a + b x$ to observations. In the left figure, seven uniform springs placed at $x = 0 : 6$ are suspended from the observed values $y_i = 5$ and from these light bands are connected to a heavy stepped plumb wheel. The dotted horizontal line represented the least squares fit; it goes through the mean $\bar{y} = 5$ and has the equation $\hat{y} = 5 + 0 x$. The number gauge at the left allows one to read the intercept, $a = 5$ and from the scale markings on the wheel one can read the slope, $b = 0$. ```{r} #| label: fig-Gaines-fig2 #| echo: false #| out-width: "60%" #| fig-align: center #| fig-cap: "Diagrams to illustrate the use of springs in fitting a linear regression relation. Left: all observed $y$ #| values are the same ($y = 5$); right: $y_i = \\{9, 6, 8, 4, 4, 1, 3\\}$. The equation of the line can be read #| from the scale at the left for the intercept and the scale on the wheel for the slope." knitr::include_graphics(here::here("man/figures/Gaines-fig2.png")) ``` In the figure at the right the $y$ values are $y_i = \{9, 6, 8, 4, 4, 1, 3\}$ which has the same sum, $\Sigma y = 35$ The dotted fitted line again passes through the mean $\bar{y} = 5$. The springs connecting the points to the fitted line are stretched or compressed by their deviations, $e_i = y_i - \hat{y}$, and for equilibrium these deviations must sum to zero, $\Sigma e_i = 0$. Each spring stores potential energy proportional to $e_i^2$ and the system is in equilibrium when the sum of energies, $\Sigma e_i^2$ is at its minimum. For this problem, `lm()` gives $\hat{y} = 8.43 - 1.143 \;x$ and these values can be read approximately from the device. ```{r} x <- 0:6 y <- c(9, 6, 8, 4, 4, 1, 3) lm(y~x) |> coef() ``` In reaching equilibrium, the weighted wheel is rotated through an angle proportional to the slope, $b$, whose value is shown on the scale for the wheel. The intercept is the value shown on the scale at the left for the line at $x=0$. Thus, the spring diagram illustrates mechanically the properties of the least squares solution: 1. The fitted line passes through the central value $\bar{x}$ at the value of the mean $\bar{y}$ of the $y_i$. 2. The algebraic sum of the deviations from the line is zero, $\Sigma e_i = \Sigma (y_i - \bar{y}) = 0$. 3. The sum of squares of the deviations $\Sigma e_i^2$ is minimum. ### Early history The idea of a mechanical analog of least squares by springs actually goes back almost to the initial ideas stated in Gauss' _Theoria Motus_ [-@Gauss-1809theoria], as detailed by @Farebrother1999a. @Donkin-1844 was the first to suggest that in finding the best measure of location $\tilde{x}$ in a sample, an observation $x_0$ of weight $g$ exerts a force proportional to $g \cdot (\tilde{x} - x_0)$ and equilibrium requires $\Sigma g \cdot (\tilde{x} - x_0) = 0$. Simon Newcomb [@Newcomb-1873a; @Newcomb-1873b] shortly extended this to the problem of finding, by least squares, the best position of a line describing changes in some astronomical observation which varies uniformly (linearly) over time. He describes the physical analog as that of a rigid rod "attracted by each of the points with a force proportional to the weight of the corresponding observation multiplied by the distance of the point to the rod" (i.e., residual). Newcombe's mechanical model does not refer to springs, but it is remarkable in that: (a) he recognizes weighted least squares, because observations may be more or less reliable, and (b) he recognizes the idea of _influence_ on the rotational forces-- observations further from the centroid exert greater influence on the slope of the line. ## Hooke's Law, potential energy and force The application of springs to problems in statistics depends on understanding the physics of springs and physical systems more generally, using the concepts of potential energy, forces applied by physical objects and how these can balance in a state of equilibrium. ### Linear springs A linear spring is the one whose tension is directly proportional to its length: stretching such a spring to a length $x$ requires the force $F(x) = k\;x$. Force acts in a given direction, so compressing a spring requires a force $- k\;x$. This is illustrated in the figure below, where $x$ indicates the stretching of a spring. ```{r} #| label: fig-hookes-law #| echo: false #| out-width: "50%" #| fig-align: center #| fig-cap: "Hooke's Law for springs. The force required to stretch the spring is proportional to the extension, $x$, represented by the weights. _Source_: [Wikipedia](https://en.wikipedia.org/wiki/Hooke%27s_law)" knitr::include_graphics(here::here("man/figures/Hookes-law-springs.png")) ``` Here the multiplier $k$ is a constant (called the Hooke’s constant) that characterizes a particular spring. A small $k$ means a lax spring, while a large $k$ means a stiff spring. By convention, the unstretched length of such a spring is zero. In the ggSprings package to date, all observations are considered to have the same, arbitrary spring constant. Allowing these to vary, with a `weight` aesthetic would be a natural way to implement weighted least squares. ### Functions, energy, force A spring is something that acts elastically, meaning that it stores (potential) energy if you either stretch it or compress it. How much energy it stores is proportional to the square of the distance it is stretched or compressed. $$ P(x) = \frac12 k \; x^2 $$ In this notation, the force exerted by a spring, $F(x)$ can be seen as the (negative) derivative, or slope at $x$ of potential energy, $$ F(x) = -\frac{d}{dx} \left( \frac12 k \; x^2 \right)= -k \; x $$ A spring is at equilibrium when the potential energy is minimized, or the force exerted is zero. The general relations between mathematics and physics are shown in the table below [@Levi2009, p.27]. ```{r echo = FALSE} tbl <- tibble::tribble( ~Mathematics, ~Physics, "function, $f (x)$", "potential energy, $P(x)$", "derivative, $f\\prime (x)$", "force, $F(x) = -P\\prime(x)$", "$\\min_x f(x) \\implies f\\prime (x) = 0$", "equilibrium, $\\min_x P(x) \\implies F (x) = 0$" ) tt(tbl) ``` These relations are illustrated in the figure below, where positive $x$ means the spring is stretched and negative $x$ represents compression. When the spring is stretched to $x = 3$, the potential energy is $P(x) = \frac12 (3^2) = 4.5$. The line through $(3, 4.5)$ tangent to the curve has slope $P^\prime (x) = x = 3$ so the force the spring exerts is $F(x) = -3$, where the negative sign means a force in the direction of $x=0$. ```{r} #| label: fig-potential #| echo: false #| out-width: "50%" #| fig-align: center #| fig-cap: "Potential energy and force of a spring." # quadratic for potential energy k <- 1/2 x <- seq(-6, 6, 0.5) Px <- k * x^2 linepts <- function(a, b, x) { y <- a + b * x data.frame(x,y) } op <- par(mar = c(4,4,1,1)+.5) plot(x, Px, type = "l", lwd = 2, axes = FALSE, ylab = "", cex.lab = 2) axis(1) axis(2) abline(v=0, lwd = 1.2) text(0, 18, "P(x)", cex = 2, xpd = TRUE) # line through (3, 4.5), extending from x=2 to x=4 # slope = d k X^2 / dx = X = 3 # -> y - 4.5 = 3 * (x - 3) -> y = 3 X - 9 pts <- linepts(-4.5, 3, c(1.5, 3, 4.5)) lines(pts, col = "blue", lwd = 3) points(pts[2,], pch = 16, cex= 2.3, col = "red") text(-3, 14, expression(P(x) == frac(1,2) * X^2), cex = 1.2) text(-2.8, 12, expression(F(x) == 2 * X), cex = 1.2) points(0, 0, pch = 16, cex= 1.5, col = "red") segments(x0 = -1.2, x1 = 1.2, y0 = 0, y1 = 0, col = "blue", lwd=2) text(4 + c(0, .5), 6 + c(-0.7, -3), c(expression(F(x) == -dP/dx), expression(F(x) == - slope[x])), cex = 1.4, srt = 60, col = "blue", xpd = TRUE) par(op) ``` The horizontal line through $(0, 0)$, that is, when $x = 0$ illustrates equilibrium where potential energy is minimized, and the force (slope) is $F(0) = 0$. ## Mean of a sample The sample mean has several nice physical analogs, which stem from the properties that * the sum of deviations ($e_i$) of observations ($x_i$) from the mean $\bar{x}$ equals 0: $\Sigma_i e_i =\Sigma_i ( x_i - \bar{x} ) = 0$. * the sum of squared deviations is the smallest for any choice of $\bar{x} = \min_{\bar{x}} ( x_i - \bar{x} )^2$ ### Demonstration using springs Create a set of observations, sampled randomly on [1, 10] ```{r mean1} set.seed(1234) N <- 8 df <- tibble( x = runif(N, 1, 10), y = seq(1, N) ) means <- colMeans(df) xbar <- means[1] |> print() ``` Then, set the tension of the spring to be the absolute value of the deviation of the observation from the mean. ```{r mean2} df <- df |> mutate(tension = abs(x - xbar), diameter = 0.2) ``` Then, visualize this with springs. In the call to `geom_spring()` for this problem, the spring is specified to go from the value of `x` to `xend = xbar`, while setting `y` and `yend` to the value of `y`. A lot of code is devoted to making this a pretty figure. ```{r mean-springs} ggplot(df, aes(x=x, y=y)) + geom_point(size = 5, color = "red") + geom_segment(x = xbar, xend = xbar, y = 1/2, yend = N + 1/2, linewidth = 3) + geom_spring(aes(x = x, xend = xbar, y = y, yend = y, tension = tension, diameter = diameter), color = "blue", linewidth = 1.2) + labs(x = "Value (x)", y = "Observation number") + ylim(0, N+2) + scale_y_continuous(breaks = 1:N) + annotate("text", x = xbar, y = N + 1, label = "Movable\nrod", size = 5, lineheight = 3/4) ``` ## Bivariate centroid For a bivariate sample, $(x_i, y_i)$, the centroid point, $(\bar{x}, \bar{y})$ is realized physically as the result of attaching springs between the fixed points and a free, movable point. If we make the tension on the spring proportional to it's length or the distance to the centroid, each point will have potential energy proportional to its **squared** distance from the movable one, where the forces balance (sum to zero). Set this up for a sample of 10 points, uniformly distributed on (1, 10). ```{r cent1} set.seed(1234) N <- 10 df <- tibble( x = runif(N, 1, 10), y = runif(N, 1, 10) ) ``` We'll want to calculate tension in relation to the distance from the centroid, given by $(\bar{x}, \bar{y})$, so calculate these now as `xbar`, `ybar`. ```{r cent2} means <- colMeans(df) xbar <- means[1]; ybar <- means[2] ``` Then, for this example, we set the tension as the distance between the point and the mean, in the dataframe `df` so it can be used as an aesthetic in `geom_spring()`. ```{r cent3} df <- df |> mutate(tension = sqrt((x - xbar)^2 + (y - ybar)^2), diameter = 0.4) ``` Visualize the springs: ```{r centroid-springs} ggplot(df, aes(x=x, y=y)) + geom_point(size = 5, color = "red") + geom_spring(aes(x = x, xend = xbar, y = y, yend = ybar, tension = tension / 5, diameter = diameter), color = "blue", linewidth = 1.2) + geom_point(x = xbar, y = ybar, size = 7, shape = 15, color = "black") + scale_x_continuous(breaks = 1:10) + scale_y_continuous(breaks = 1:10) ``` ### Animate this? One way to animate this would be to imagine the springs acting one at a time, sequentially on the position of the moving point, and looping until nothing changes. Not sure how to program this with `gganimate`. ## Least squares regression For linear least squares regression, we have a collection of points $(x_i, y_i)$, and it is desired to fit a "best" line $y = a + b \times x$ that makes the sum of squares of errors, $$ \text{RSS} (a,b) = \sum_i \; \left[y_i - (a + b \times x_i) \right]^2 = \sum_i e_i^2 $$ as small as possible. The representation by springs is shown in \@ref(fig:levi-least-sq-springs) where the observations are drawn as small ovals, connected to the fitted points on the line $y = a + b \times x$, realized by a rigid rod in the mechanical analog. The Guides represent vertical channels that force the springs to work only up or down in the vertical direction. ```{r} #| label: levi-least-sq-springs #| echo: false #| out-width: "70%" #| fig-align: center #| fig-cap: " _Source_: [@Levi2009, p. 33]" knitr::include_graphics(here::here("man/figures/levi-least-sq-springs.png")) ``` Minimizing the sum of squares of the spring lengths means minimizing the potential energy of the mechanical system, and consequently the rod is in equilibrium. This implies that the sum of (vertical) forces $F_i$ is zero, and that the sum of rotational forces, or torques around the intercept, $A$ in the diagram is also zero. \begin{eqnarray} \sum_i F_i = 0 & \implies & \sum_i e_i = 0 (\#eq:force) \\ \sum_i x_i F_i = 0 & \implies & \sum_i x_i e_i = 0 (\#eq:torque) \end{eqnarray} Substituting $e_i = y_i - (a + b x_i)$ gives what are recognized as the "normal equations" for the least squares solution, \begin{eqnarray*} \sum y_i - a \sum x_i - n b & = & 0 \\ \sum x_i y_i - a \sum x_i^2 - b \sum x_i & = & 0 \end{eqnarray*} or, rearranging terms, \begin{eqnarray*} a \sum x_i - n b & = & \sum y_i \\ a \sum x_i^2 - b \sum x_i & = & \sum x_i y_i \end{eqnarray*} These equations may be more recognizable in matrix form. Letting $\mathbf{X} = [\mathbf{1}, \mathbf{x}]$ and $\mathbf{b} = \begin{pmatrix} a \\ b \\ \end{pmatrix}$, we arrive at ```{r} #| results: asis #| echo: false XPX <- matrix(c( "n", "\\sum x", "\\sum x", "\\sum x^2"), 2, 2) XPY <- matrix(c("\\sum y", "\\sum x y"), 2,1) b <- matrix(c("a", "b"), 2, 1) Eqn(latexMatrix(XPX, matrix = "bmatrix"), latexMatrix(b), "& =", latexMatrix(XPY), Eqn_newline(), "\\mathbf{X}^\\top \\mathbf{X} \\mathbf{b}", ' & =', "\\mathbf{X}^\\top \\mathbf{y}", align = TRUE ) ``` from which the matrix solution is $$ \widehat{\mathbf{b}} = ( \mathbf{X}^\top \mathbf{X})^{-1} \; \mathbf{X}^\top \mathbf{y} \; , $$ as long as $(\mathbf{X}^\top \mathbf{X})^{-1}$ exists, i.e., $\text{det}(\mathbf{X}^\top \mathbf{X}) \ne 0$. ### Geometric solution It is of interest to see how the solution using springs is equivalent to a geometric argument. \@ref(fig:least-squares-geometry) shows the vector space representation of a vector $\mathbf{y} of three observations. Any possible regression line corresponds to a vector $\boldsymbol{\beta} = (a, b)$ in the dashed plane that represents linear combinations of $\mathbf{X} = [\mathbf{1}, \mathbf{x}]$. The least squares solution is that which minimizes the length of the residual vector $\mathbf{y} - \widehat{\mathbf{y}} = \mathbf{e}$. ```{r} #| label: least-squares-geometry #| echo: false #| out-width: "50%" #| fig-align: center #| fig-cap: "Geometrical interpretation of least squares regression. The vector $\\mathbf{y}$ to point A #| represents the data for three observations" knitr::include_graphics(here::here("man/figures/least-squares-geometry.png")) ``` From this diagram it can be seen that the residual vector must be orthogonal to $\mathbf{X} = [\mathbf{1}, \mathbf{x}]$, so, \begin{eqnarray*} \mathbf{X} \perp \mathbf{e} & \implies & \mathbf{X}^\top \mathbf{e} = 0 \\ & \implies &\mathbf{X}^\top (\mathbf{y}-\mathbf{X}\;\widehat{\mathbf{b}}) = 0 \\ & \implies &\mathbf{X}^\top \mathbf{y} = \mathbf{X}^\top \mathbf{X}\; \widehat{\mathbf{b}} \\ & \implies &\widehat{\mathbf{b}} = (\mathbf{X}^\top \mathbf{X}) ^\top \mathbf{X}^\top \mathbf{y} \end{eqnarray*} We can also note that the orthogonality condition, $\mathbf{X} = [\mathbf{1}, \mathbf{x}] \perp \mathbf{e}$ means that \begin{eqnarray*} \mathbf{1}^\top \mathbf{e} &= \Sigma e_i = 0 \\ \mathbf{x}^\top \mathbf{e} &= \Sigma x_i e_i = 0 \;\; , \end{eqnarray*} which are the same as \@ref(eq:force) and \@ref(eq:torque). ### Example For a simple example of `geom_springs()` for a regression problem, we plot the first two variables, `x1, y1` in the `anscombe` data set, together with the linear regression line. ```{r anscombe1} simple_plot <- ggplot(anscombe, aes(x = x1, y = y1)) + geom_point(size = 4, color = "red") + geom_smooth(method = lm, formula = y~x, se = FALSE, linewidth = 1.4) simple_plot ``` The current implementation of `geom_springs()` requires that we fit the linear model, add the predicted value of `y1` to the dataset and use that value as `yend` in the call to `geom_spring()` ```{r anscombe2} model <- lm(y1 ~ x1, data = anscombe) anscombe <- anscombe |> mutate(fitted = predict(model)) simple_plot + geom_spring(data=anscombe, aes(xend = x1, yend = fitted), color = "blue", linewidth = 1) ``` Note that `diameter` and `tension` are not specified here. By default, `geom_spring()` uses 0.025 times the range of `x` for the spring diameter, and the distance `abs(y - yend)` from the point to the fitted line as the tension. **NOTE**: Something is off in the calculation of `diameter` in the vignette here. This example works more nicely when run in the console. ## What's missing For regression problems, it is inconvenient to have to fit the same model that has been constructed by `geom_smooth()` and add the `fitted` value to the dataset. The [ggsmoothfit repo](https://github.com/EvaMaeRey/ggsmoothfit/) provides convienient functions, `geom_smooth_fit()` and `geom_smooth_residuals()` that provide some useful methods, and the [README file](https://github.com/EvaMaeRey/ggsmoothfit/blob/main/README.md) defines a `GeomSmoothSpring` that has this capability. However, this is _not_ a package that can be installed from GitHub, and the implementation there depends on another package [statexpress](https://github.com/EvaMaeRey/statexpress) One way to simplify spring plots for models is to use `broom::augment()` for a fitted model to get the data.frame containing all necessary information. ```{r augment} broom::augment(model) ``` Then, the anscombe example can be re-created as follows: ```{r anscombe3} ans_mod <- lm(y1 ~ x1, data = anscombe) ans_aug <- broom::augment(ans_mod) ggplot(ans_aug) + aes(x = x1, y = y1) + geom_point(size = 4, color = "red") + geom_smooth(method = lm, formula = y~x, se = FALSE, linewidth = 1.4) + geom_spring( aes(xend = x1, yend = .fitted), color = "blue", linewidth = 1) ``` ### Smoothing methods & families It is also important to be able to construct spring plots for _any model method_ that can be represented by `geom_smooth(method = AA, formula = y ~ BB)`, where: * `method` can be: `lm`, `glm`, `loess` or nearly any other fitting function, such as `MASS::rlm` for robust fits or `mgcv::gam` for generalized additive models, (**TODO**: Another reason to implement an observation `weight` aesthetic param.) * `formula` can be any model formula involving only `y` and `x`, such as `y ~ x` or `y ~ poly(x, 2)` or even `y ~ splines::ns(x, df =)` For example @GainesPalphrey:1932 actually constructed the machine shown in the figure \@ref{fig:Gaines-fig1} for fitting the relation $y = A \; e^{-k x}$ This solved a practical agricultural problem: > Breeders of dairy cattle are accumulating a large body of yearly records of production based on the milk yield and fat test of individual cows by calendar months. The problem is to measure the rate of milk secretion for each cow by fitting the data of the first _eleven_ full months of lactation with the equation $y = A \; e^{-k x}$, in which $y$ is milk-energy yield per day, and $x$ is time in months with origin at one month after calving. ```{r} #| label: Gaines-fig1 #| echo: false #| out-width: "60%" #| fig-align: center #| fig-cap: "A mechanical spring-driven curve fitting machine," knitr::include_graphics(here::here("man/figures/Gaines-fig1.png")) ``` For this figure, they say, "When the eleven stops, representing the observed y's, are set the machine immediately indicates the fitted $A$ and $k$ constants". ## References