\input fontmac
\input mathmac

\def\F{{\cal F}}

\classicmode

\maketitle{Ergodic theory and arithmetic progressions}{by}{Marcel K. Goh}{4 December 2020}

\advsect Introduction

Let $(Z,+)$ be an abelian group. For any integer $n$ and $z\in Z$ we let $nz$ denote the iterated sum; for
instance, $2z = z+z$ and $-3z = -z-z-z$.
Letting $[a,b]$ denote the discrete interval $\{n\in \ZZ:a\leq n\leq b\}$
for $a,b\in \ZZ$, we define an {\it arithmetic progression} to be a set of the form
$$a+[0,k-1]\cdot r = \{a,a+r,a+2r,\ldots,a+(k-1)r\},\adveq$$
where $k>1$ is the {\it length}, $a\in Z$ is the {\it base point}, and $r\in Z$ is the {\it common difference}
or {\it step}. If $r\neq 0$, the progression is {\it nontrivial}, and this condition is assumed unless otherwise
stated. In these notes, we will study the case $Z = \ZZ$.

The quest to find arithmetic progressions in subsets of $\ZZ$ began with the following celebrated theorem
of B. L. van der Waerden, which answered a conjecture of P. J. H. Baudet.

\parenproclaim Theorem V (van der Waerden, {\rm 1927}). For any integers $r$ and $k$, there exists $N\in \NN$
such that for any partition of $[1,N]$ into disjoint sets $C_1, C_2, \ldots, C_r$, some $C_i$ contains
a $k$-term arithmetic progression.\slug

\noindent
An infinitary formulation of this theorem states that if the positive integers are coloured with $r$ colours,
then some colour contains arbitrarily long arithmetic progressions. P. Erd\H os and P. Tur\'an suspected that
this had to do with the density of the partition. A subset $A$ of the nonnegative integers is said to have
{\it positive (upper) density} if
$$\limsup_{N\to \infty} {\big|A\cap [0,N]\big|\over N+1} > 0.\adveq$$
In the more general setting where $A\subseteq \ZZ^d$ for $d\geq 1$, we get a similar definition by intersecting
with $[-N,N]^d$ and dividing by $(2N+1)^d$. (It does not really matter exactly which interval we intersect
with, since we only care to know whether the density is positive or zero.)
Erd\H os and Tur\'an's conjecture was that van der Waerden's
theorem holds because at least one of the $C_i$ must have density at least $1/r$. This was proved for the case $k=3$
by K. F. Roth (1953) using Fourier-analytic methods. A proof for $k=4$ did not arrive until 1969;
it was due to E. Szemer\'edi, who also proved the general case six years later:

\parenproclaim Theorem S (Szemer\'edi, {\rm 1975}). Any subset of the nonnegative integers with positive upper
density contains arithmetic progressions of arbitrary length. Equivalently, given any real $\delta > 0$ and
$k\in \NN$, there exists $N\in \NN$ such that for any $A\subseteq [0,N]$ with $|A|\geq \delta(N+1)$, there exists
an arithmetic progression of length $k$ contained in $A$.

\noindent
The proof that Szemer\'edi gave was long and complicated, relying on an intricate combinatorial argument.
We will only supply a proof of the fact
that the infinitary version of the theorem is in fact equivalent to the finitary one. (This treatment
can be found in Furstenberg, Katznelson, and Ornstein (1982).)

\proof It is clear that the finitary statement implies the infinitary one. Now suppose that the finitary
statement does not hold; that is, there exists $\delta>0$ and $k\in \NN$ such that for all $N\in \NN$,
there exists $A_N\subseteq [0,N]$ such that $|A_N|\geq \delta(N+1)$ but there is no arithmetic progression
of length $k$ contained in $A$. Note that the presence of arithmetic progressions in a set $A$ does not
change if we shift all the elements of the set by a fixed value: for any $s\in \NN$,
$s+A = \{s + a : a\in A\}$ has the
same number of arithmetic progressions as $A$. So we can translate the sets $A_N$ far enough apart that no
arithmetic progression can belong to two of the $A_N$ at once. Letting $A = \bigcup_{N\in \NN} A_N$,
we find that $A$ has upper density $\geq \delta$, but does not contain arithmetic progressions of length $k$.\slug

Szemer\'edi's quantitative bounds on the size of $N$ relative to $\delta$ and $k$ were not as tight
as Roth's bounds for the $k=3$ case. In fact, Roth's Fourier-analytic approach was generalisable,
but this was not discovered until 1998 for the case
$k=4$ and finally for the general case in 2001 (both of these results, which established much better bounds,
are due to W. T. Gowers). On the other
hand, if we are only interested in the infinitary statement and allow non-constructive proofs, then ergodic
methods can be applied to give much shorter arguments than were employed in Szemer\'edi's original paper.
This approach, devised by H. Furstenberg in 1977, led to many generalisations of Szemer\'edi's result.

\advsect The ergodic formulation

We will need some terminology from the realm of dynamical systems.
Let $(X,\F,\mu)$ be a probability space and let $T:X\to X$ be measure-preserving, i.e.,
$\mu(T^{-n} A) = \mu(A)$ for all $A\in \F$. Note that $T^{-n}A$ is the set of all $x\in X$ such that $Tx\in A$;
one should not be tricked into thinking that $T$ is necessarily invertible!
The quadruple $(X,\F,\mu,T)$ is called a {\it measure-preserving
system}. For a bounded measurable function $f:X\to\RR$, $T$ acts as a shift operator by sending $f$ to $T^n f$,
given by $T^nf(x) = f(T^nx)$. We have the following lemma.

\proclaim Lemma D. Let $(X,\F,\mu)$ be a probability space and let $f$ be a measurable function on $X$
such that $\int_Xf\d\mu>0$ and such that for all $x\in X$, $0\leq f(x)\leq C$ for some constant $C$.
Let $\delta>0$ be such that
$$G = \{x\in X : f(x)\geq \delta\}$$
has positive measure. For any $T:X\to X$ that is measure-preserving,
there exists $E_T$ with $\mu(E_T)\geq \mu(G)/2$ such that the set
$$R_x = \{n\in \NN : T^nf(x) \geq \delta\}$$
has upper density $\geq \mu(G)/2$ for every $x\in E_T$.

\noindent This is a variant of Lemma 1 from a blog post by Terry Tao, dated 10 February 2008. The
proof we supply follows the same outline (with some details added).

\proof First, note that $\int_Xf\d\mu\leq C$, since $X$ is a probability space. Next, notice that
the for any $x\in X$ and any $n$, $T^nf(x)\geq \delta$ if and only if $f(T^nx)\geq \delta$,
which is equivalent to
$x\in T^{-n}G$. Because $\mu$ is shift-invariant, we have
\newcount\eqA
\eqA=\eqcount
$$\mu(G) = \int_X {1\over N} \sum_{n=1}^N \chi_{G}\d\mu = \int_X{1\over N}\sum_{n=1}^N \chi_{T^{-n}G}\d\mu.\adveq$$
The integrand on the right-hand side no more than $1$, so the set
$$A_N = \bigg\{ x\in X : {1\over N} \sum_{n=1}^N \chi_{T^{-n}G}(x)\geq {\mu(G)\over 2} \bigg\}$$
has measure at least $\mu(G)/2$. For otherwise, we would have
$$\int_X {1\over N}\sum_{n=1}^N \chi_{T^{-n}G}\d\mu \leq \int_{A_N} 1
  + \int_{{A_N}^c} {1\over N}\sum_{n=1}^N \chi_{T^{-n}G}\d\mu < {\mu(G)\over 2} + {\mu({A_N}^c)\mu(G)\over 2}
  \leq\mu(G),\adveq$$
contradicting \refeq{\the\eqA}. Note that
$$\limsup_{N\to\infty} A_N = \bigcap_{N=1}^\infty \bigcup_{n=N}^\infty A_N$$
is the set of all $x\in X$ that are in $A_N$ for infinitely many $N$. Put another way, this is the set
of all $x\in X$ for which
$$\limsup_{N\to\infty} {R_x \cap [1,N]\over N} \geq {\mu(G)\over 2}.$$
Since $\mu(A_N)\geq \mu(G)/2$ for every $N$, we have
$$\mu\Big(\limsup_{N\to\infty} A_N\Big) = \mu\Big(\bigcap_{N=1}^\infty \bigcup_{n=N}^\infty A_n\Big)\geq
\mu(G)/2\adveq$$
and $E_T = \limsup_{N\to\infty} A_N$ is a set of positive measure such that for every point $x\in E_T$,
$R_x$ has upper density $\geq \mu(G)/2$.\slug

\noindent We can now formulate Furstenberg's multiple recurrence theorem:

\parenproclaim Theorem F (Furstenberg, {\rm 1977}). Let $(X,\F,\mu,T)$ be a measure-preserving
system and let $f$ be a bounded measurable function such that $\int_X f\d\mu >0$. For any $k\in \NN$, we have
$$\liminf_{N\to\infty} {1\over N} \sum_{s=1}^N \int_X f (T^sf)\cdots (T^{(k-1)s}f)\d\mu>0.\noskipslug\adveq$$

In the particular case that $k=2$, this is a form of Poincar\'e's recurrence theorem. We will not be able
to prove Theorem F directly in these notes, but we will prove the important correspondence between Furstenberg's
multiple recurrence and arithmetic progressions in $\NN_0$.

\proclaim Lemma E. Theorems S and F are equivalent.

\proof First, suppose that Theorem S holds. Let $(X,\F,\mu,T)$ be a measure-preserving system, $f$ a bounded
measurable function with $\int_X f\d\mu>0$, and let $k\in \NN$ be arbitrary. Let $\delta>0$ be small enough that
$G = \{x\in X : f(x)\geq \delta\}$ has positive measure; by Lemma D, there exists a set $E_T$ of measure at least
$\mu(G)/2$ such that $R_x = \{n\in \NN : T^nf(x)\ge\delta\}$ has upper density $\ge\mu(G)/2$ for every $x\in E_T$.
By Theorem S, there exists an $N_0$ such that for any $x\in E_T$, there is a $k$-term arithmetic progression
in $R_x\cap [0,N_0]$. But the set of $k$-term arithmetic progressions
in $[0,N_0]$ is finite, with some cardinality $K$;
as a crude upper bound we have $K\leq {N_0\choose k}$. So there exists
some fixed progression $P_T = \{a, a+r, \ldots, a+(k-1)r\}\subseteq [0,N_0]$
such that the set $E'_T = \{x\in E_T : P_T\subseteq R_x\}$ has $\mu(E'_T)\ge\mu(G)/(2K)$. We have
\newcount\eqC
\eqC=\eqcount
$$\eqalign{
\int_X f(T^rf)\cdots(T^{(k-1)r}f)\d\mu &\geq \int_{T^{-a}E'_T} f(T^rf)\cdots(T^{(k-1)r}f)\d\mu \cr
&= \int_{E'_T} T^af(T^{a+r}f)\cdots(T^{a+(k-1)r}f)\d\mu\cr
&\geq {\delta^k\mu(G)\over 2K}.\cr
}\adveq$$
The progression $P_T$ is particular to the transformation $T$, but the bound depends only on $f$ and $k$,
so in the argument above we could have applied Lemma D to $T^m$ for $m\in \NN$ and obtained \refeq{\the\eqC}
with $T^m$ in place of $T$. Since $r$ cannot exceed $N_0$ no matter what $T^m$ is, we find that
$${1\over N_0} \sum_{s'=1}^{N_0} \int_X f(T^{ms'}f)\cdots(T^{(k-1)ms'})\d\mu \geq {\delta^k\mu(G)\over 2K},\adveq$$
where $m\in \NN$ is arbitrary. For simplicity's sake, suppose that $N\geq {N_0}^2$ and averaging over all
$1\leq m\leq N/N_0$, we have
$${N_0\over N} \sum_{m=1}^{N/N_0}
{1\over N_0} \sum_{s'=1}^{N_0} \int_X f(T^{ms'}f)\cdots(T^{(k-1)ms'})\d\mu \geq {\delta^k\mu(G)\over 2K},\adveq$$
and since every $1\leq s\leq N$ has at most $\min(N_0, N/N_0) = N_0$ representations of the form $ms'$, where
$1\leq m\leq N/N_0$ and $1\leq s'\leq N_0$, this gives
$${1\over N} \sum_{s=1}^{N} \int_X f(T^{s}f)\cdots(T^{(k-1)s})\d\mu \geq {\delta^k\mu(G)\over 2KN_0}\adveq$$
for all large enough $N$, yielding Theorem F.

For the other direction, fix a subset $A\subseteq \NN_0$ with positive upper density and let $k\in \NN$.
We can find a sequence of natural numbers
$N_1,N_2,\ldots$ such that
$$\liminf_{j\to\infty} {A \cap [0,N_j] \over N_j+1} >0.$$
Consider the linear space $\ell^\infty(\NN)$ of all bounded real-valued sequences under the supremum norm. By
the Hahn-Banach theorem applied to the functional $\lambda(x) = \lim_{j\to\infty} x_j$, which is defined
on the subspace of convergent sequences, we can construct a linear functional $\Lambda : \ell^\infty(\NN)\to\RR$
such that
$$\liminf_{j\to\infty} x_j \leq \Lambda(x) \leq \limsup_{j\to\infty} x_j.\adveq$$
Now let $X = \{0,1\}^{\NN_0}$ be the space of all binary sequences indexed on the nonnegative integers
with the product topology and Borel $\sigma$-algebra $\F$. Let $T$ be the left-shift operator given by
$Tx = (x_{n+1})_{n\in \NN_0}$ and let $a\in X$ be given by
$$a_n = \cases{1,&if $n\in A$;\cr0,&otherwise.\cr}\adveq$$
For any sequence of indices $s_1,\ldots,s_m \in \NN_0$, define a measure $\mu_{s_1,\ldots,s_m}$ on $\{0,1\}^m$
given by
$$\mu_{s_1,\ldots,s_m}(E_1\times\ldots\times E_m) = \Lambda\Bigg(\!\!\bigg(
{1\over N_j+1} \sum_{i=0}^{N_j} \chi_{E_1}(T^{s_1}a_i)\cdots\chi_{E_m}(T^{s_m}a_i)\bigg)_{j=1}^\infty\Bigg),\adveq$$
where $E_1,\ldots,E_m\subseteq \{0,1\}$.
Permuting the indices $s_1, \ldots,s_m$ does not change the value on the right-hand
side (multiplication is commutative), and for any $s_{m+1}\in \NN_0$,
$$\eqalign{
\mu_{s_1,\ldots,s_m,s_{m+1}}(E_1\times\ldots\times E_m\times \{0,1\})
&= \Lambda\Bigg(\!\!\bigg(
{1\over N_j+1} \sum_{i=0}^{N_j} \chi_{E_1}(T^{s_1}a_i)\cdots\chi_{E_m}(T^{s_m}a_i)\chi_{\{0,1\}}(T^{s_{m+1}}a_i)
\bigg)_{j=1}^\infty\Bigg)\cr
&= \Lambda\Bigg(\!\!\bigg(
{1\over N_j+1} \sum_{i=0}^{N_j} \chi_{E_1}(T^{s_1}a_i)\cdots\chi_{E_m}(T^{s_m}a_i)\bigg)_{j=1}^\infty\Bigg)\cr
&= \mu_{s_1,\ldots,s_m}(E_1\times\ldots\times E_m).
}\adveq$$
So by the Kolmogorov consistency theorem, there exists a measure $\mu$ on $X$ such that for all $s_1,\ldots s_m\in
\NN_0$, and $E_1,\ldots,E_m\subseteq \{0,1\}$,
$$\mu\big(\{x \in X : x_{s_1}\in E_1,\,\ldots,\,x_{s_m}\in E_m\}\big)
= \mu_{s_1,\ldots,s_m}(E_1\times\ldots\times E_m).$$
This is invariant under the shift $T$, since
$$\eqalign{
\mu\big(T^{-1}\{x \in X : x_{s_1}\in E_1,\,\ldots,\,x_{s_m}\in E_m\}\big)
&= \mu\big(\{x \in X : x_{s_1+1}\in E_1,\,\ldots,\,x_{s_m+1}\in E_m\}\big)\cr
&= \mu_{s_1+1,\ldots,s_m+1}(E_1\times\ldots\times E_m)\cr
&= \Lambda\Bigg(\!\!\bigg(
{1\over N_j+1} \sum_{i=0}^{N_j} \chi_{E_1}(T^{s_1+1}a_i)\cdots\chi_{E_m}(T^{s_m+1}a_i)\bigg)_{j=1}^\infty\Bigg) \cr
&= \Lambda\Bigg(\!\!\bigg(
{1\over N_j+1} \sum_{i=1}^{N_j+1} \chi_{E_1}(T^{s_1}a_i)\cdots\chi_{E_m}(T^{s_m}a_i)\bigg)_{j=1}^\infty\Bigg),\cr
}\adveq$$
and the $j$th term in the inner sequence differs from the original sequence by no more than $1/(N_j+1)$. So the
differences between these two sequences goes to zero and their values under $\Lambda$ are the same.


Let $B$ be the cylinder of all $x\in X$
with $x_0 = 1$. Note that
$$\eqalign{
\mu(B) &= \mu\big(\{x \in X : x_0 = 1\}\big) \cr
&= \Lambda\Bigg(\!\!\bigg({1\over N_j+1} \sum_{i=0}^{N_j} \chi_{\{1\}}(a_i)\bigg)_{j=1}^\infty\Bigg) \cr
&= \Lambda\Bigg(\!\!\bigg({A\cap [0,N_j]\over N_j+1}\bigg)_{j=1}^\infty\Bigg) \cr
&= \liminf_{j\to\infty}{A\cap [0,N_j]\over N_j+1},\cr
}\adveq$$
and by our choice of the sequence $N_j$, this is positive and equals the $\limsup$.
By Theorem F applied to $k$ and the function $\chi_B$,
there exists some $r$ such that
$$\mu(B\cap T^{-r}B \cap \cdots\cap T^{-(k-1)r}B) > 0.$$
Letting $A-s$ denote the set $\{a-s : a\in A\}$, and working backwards, we discover that
$$\eqalign{
\limsup_{j\to\infty}&{A\cap (A-r)\cap \cdots \cap (A-(k-1)r)\cap [0,N_j]\over N_j+1}\cr
&\geq\Lambda\Bigg(\!\!\bigg({A\cap (A-r)\cap \cdots \cap (A-(k-1)r)\cap [0,N_j]\over N_j+1}\bigg)_{j=1}^\infty\Bigg)\cr
&=\Lambda\Bigg(\!\!\bigg(
{1\over N_j+1}\sum_{i=0}^{N_j}\chi_{\{1\}}(a_i)\chi_{\{1\}}(T^ra_i) \cdots \chi_{\{1\}}(T^{(k-1)r}a_i)
\bigg)_{j=1}^\infty\Bigg)\cr
&= \mu\big(\{x\in X : x_0 = 1,\,x_r = 1,\,\ldots,\,x_{(k-1)r} = 1\}\big)\cr
&= \mu(B\cap T^{-r}B \cap \cdots\cap T^{-(k-1)r}B),\cr
}\adveq$$
and since this is positive, there exists an arithmetic progression of length $k$ in $A$\footnote{$^*$}{\eightpoint
Thank you to Terry Tao, whose answers to my questions on his blog helped me work out some of the details in this
proof.}.\slug

The first part of the presented proof draws from the aforementioned blog post of Terry Tao;
the second part is a fleshed-out version of the proof sketch found in Tao and Vu (2006).

\advsect Generalisations

The techniques that Furstenberg used to link arithmetic progressions to recurrence phenomena came to be known
as the Furstenberg correspondence principle and, as mentioned above, it led to various generalisations of
Szemer\'edi's theorem. We list a few of them here, along with their equivalent statements in the domain
of dynamical systems.

\parenproclaim Theorem M (Furstenberg, Katznelson, {\rm 1979}). The following are equivalent.
\medskip
\item{i)} Let $d\geq 1$ and let $A\subseteq \ZZ^d$ have positive upper density, that is,
$$\limsup_{N\to\infty} {A\cap [-N,N]^d\over (2N+1)^d} \geq 0.$$
For any $v_1,\ldots,v_k\in \ZZ^d$, there exist infinitely many pairs $(a,r)\in \ZZ^d$ such that
$\{a+rv_1,\ldots,a+rv_k\} \subseteq A$.
\smallskip
\item{ii)} Let $(X,\F,\mu)$ be a probability space and $k\in \NN$. If $T_1,T_2,\ldots,T_k:X\to X$ are
measure-preserving maps that commute wieh each other and $E$ is a set of positive measure, then there exists
$r>0$ such that ${T_1}^rE\cap {T_2}^rE\cap\cdots\cap{T_k}^rE$ is nonempty.\slug

\noindent This multidimensional version of Szemer\'edi's theorem is often called the constellation theorem because,
roughly speaking, it asserts that a fixed finite constellation (up to translation and scaling)
can be found in any set of points with positive density. A polynomial version of the theorem is also known:

\parenproclaim Theorem P (Bergelson, Leibman, {\rm 1996}). The following are equivalent.
\medskip
\item{i)} Let $P_1, \ldots, P_k : \ZZ \to \ZZ$ be polynomials such that $P_1(0) = \cdots = P_k(0) = 0$.
Let $A\subseteq \ZZ^d$ have positive upper density. Then there exist infinitely many pairs
$(a,r)\in \ZZ^d\times\NN_0$ such that $a+P_1(r), \ldots, a+P_k(r)\in A$.
\smallskip
\item{ii)} Let $(X,\F,\mu)$ be a probability space, let $k\in \NN$, and let $T_1,T_2,\ldots,T_k:X\to X$ be
commuting measure-preserving maps. Let
$P_1, \ldots, P_k : \ZZ \to \ZZ$ be polynomials such that $P_1(0) = \cdots = P_k(0) = 0$. There exists $r>0$
such that
$$T^{-P_1(r)}E + T^{-P_2(r)}E\cap \cdots\cap T^{-P_k(r)}E$$
is nonempty, where $T^{(a_1,\ldots,a_k)}$ is defined to be ${T_1}^{a_1}\cdots {T_k}^{a_k}$.\slug
\medskip

\noindent Lastly, we have the density Hales-Jewett theorem, which tells us that subsets of large enough density must
contain a combinatorial line.

\parenproclaim Theorem D (Furstenberg, Katznelson, {\rm 1991}).
Let $n\geq 1$ and $0<\delta\leq 1$. There exists an integer $d\geq 1$ such that if $A$ is any
subset of $[0,n-1]^d$ of cardinality $|A|\geq \delta n^d$, then $A$ contains a proper arithmetic progression
$\{a, a+v, \ldots, a+(k-1)v\}$ for some $a\in [0,n-1]^d$ and $v\in [0,1]^d$.\slug

\noindent Theorem D also has a an ergodic counterpart, but it is rather
complicated. For many years, the ergodic proof of the density Hales-Jewett theorem was the only one known.
In 2009, in an online initiative called the Polymath Project spearheaded by W. T. Gowers,
a group of over 40 mathematicians found a purely combinatorial proof of Theorem D. The results were published
under the pseudonym D. H. J. Polymath and the success of this experiment
led to over a dozen other online Polymath Projects.

\section References

\bye