{ "cells": [ { "cell_type": "markdown", "id": "a4fcde2b", "metadata": {}, "source": [ "# Goldbach's comet" ] }, { "cell_type": "markdown", "id": "0a36168a", "metadata": {}, "source": [ "## Goldbach's comet with Numba and Datashader\n", "\n", "This Python notebook is about computing and plotting the Goldbach function. It requires some basic mathematical knowledge, but no fancy fancy prime number theory! The main point is to perfom some computations with [Numba](http://numba.pydata.org/) and some efficient plotting with [Datashader](https://datashader.org/).\n", "\n", "Here is the definition of the Goldbach function from [Wikipedia](https://en.wikipedia.org/wiki/Goldbach%27s_comet):\n", "\n", "> The function $g ( E )$ is defined for all even integers $E > 2$ to be the number of different ways in which E can be expressed as the sum of two primes. For example, $g ( 22 ) = 3$ since 22 can be expressed as the sum of two primes in three different ways ( 22 = 11 + 11 = 5 + 17 = 3 + 19).\n", "\n", "The different prime pairs $(p_1, p_2)$ that sum to an even integer $E=p_1+p_2$ are called Goldbach partitions. So $g(E)$ is the count of distinct Goldbach partitions of $E$, without regard to the order of the primes in the pairs.\n", "\n", "Note that for Goldbach's conjecture to be false, there must be $g(E) = 0$ somewhere for $E > 2$. This is very unlikely to occur, but has not been proved yet. Anyway, here are the steps used in this notebook to compute Goldbach function. Given a positive integer $n$:\n", "- For each natural number $k \\leq n$, build a quick way to check if $k$ is a prime or not, and list all the primes smaller or equal to $n$, using a sieve method.\n", "- For each even number $E \\leq n$, compute $g(E)$ by counting the number of cases where $E-p$ is prime for all primes $p$ not larger than $E/2$. " ] }, { "cell_type": "markdown", "id": "1298b785", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "id": "5e584110", "metadata": {}, "outputs": [], "source": [ "from typing import Tuple\n", "\n", "from colorcet import palette\n", "import datashader as ds\n", "from datashader.mpl_ext import dsshow\n", "import matplotlib.pyplot as plt\n", "from numba import jit, njit, prange\n", "import numpy as np\n", "import pandas as pd\n", "from sympy import sieve\n", "\n", "plt.style.use(\"seaborn\")\n", "\n", "FS = (20, 10) # figure size" ] }, { "cell_type": "markdown", "id": "221667db", "metadata": {}, "source": [ "## Prime number sieve\n", "\n", "The `generate_primes` function creates a list of primes smaller or equal to $n$ and a boolean vector `is_prime_vec` of size $n+1$: `is_prime_vec[k]` is `True` if `k` is a prime." ] }, { "cell_type": "code", "execution_count": null, "id": "1fef60d1", "metadata": {}, "outputs": [], "source": [ "def generate_primes(n: int) -> Tuple[np.ndarray, np.ndarray]:\n", "\n", " primes = np.array([i for i in sieve.primerange(n)])\n", " is_prime_vec = np.zeros(n + 1, dtype=np.bool_)\n", " is_prime_vec[primes] = True\n", "\n", " return primes, is_prime_vec\n", "\n", "\n", "n = 11\n", "primes, is_prime_vec = generate_primes(n)\n", "print(f\"primes = {primes}\")\n", "print(f\"is_prime_vec = {is_prime_vec}\")" ] }, { "cell_type": "markdown", "id": "272b6a9b", "metadata": {}, "source": [ "## Evaluate Goldbach function for a given even number\n", "\n", "Now we show how $g(E)$ can be computed for a given value of $E$ with Numba. In the `compute_g` function, we loop over all primes $p \\leq E/2 $ with a `for` loop : if $E-p$ is a prime, $(p, E-P)$ is a partition of $E$. By looping over all primes $p \\leq E/2$, we count all the possible partitions of $E$. The upper bound of the `for` loop is computed using `np.searchsorted`, since primes are sorted within the `primes` array. This returns the index of the largest prime in the array smaller or equal to $E/2$." ] }, { "cell_type": "code", "execution_count": null, "id": "439375f5", "metadata": {}, "outputs": [], "source": [ "@jit(nopython=True)\n", "def compute_g(is_prime_vec: np.ndarray, primes: np.ndarray, E: int) -> int:\n", "\n", " assert E < len(is_prime_vec)\n", " assert E % 2 == 0\n", " E_half = int(0.5 * E)\n", "\n", " # initialization\n", " count = 0 # number of prime pairs\n", "\n", " # we loop over all the prime numbers smaller than or equal to half of E\n", " i_max = np.searchsorted(primes, E_half, side=\"right\")\n", " for i in range(i_max):\n", " if is_prime_vec[E - primes[i]]:\n", " count += 1\n", "\n", " return count" ] }, { "cell_type": "markdown", "id": "2e6025b8", "metadata": {}, "source": [ "We check the function for a few values of $E$:" ] }, { "cell_type": "code", "execution_count": null, "id": "9643da0b", "metadata": {}, "outputs": [], "source": [ "E = 22\n", "primes, is_prime_vec = generate_primes(E)\n", "g = compute_g(is_prime_vec, primes, E)\n", "assert g == 3\n", "\n", "E = 1890 # https://www.ias.ac.in/article/fulltext/reso/019/11/1028-1037\n", "primes, is_prime_vec = generate_primes(E)\n", "g = compute_g(is_prime_vec, primes, E)\n", "assert g == 91\n", "\n", "E = 1_000_000 # https://resources.wolframcloud.com/FunctionRepository/resources/Goldbach/\n", "primes, is_prime_vec = generate_primes(E)\n", "g = compute_g(is_prime_vec, primes, E)\n", "assert g == 5402" ] }, { "cell_type": "markdown", "id": "6f1af1e3", "metadata": {}, "source": [ "## Evaluate Goldbach function over a range of even numbers\n", "\n", "Now we are going to loop over all even values $E \\leq n$ to compute all the corresponding values of $g$. Note that in the following `compute_g_vector` function, the outer loop has a constant step size of 1, in order to use Numba `prange`, which only supports this unit step size. This means that we loop on contiguous $E/2$ integer values instead of even $E$ values. Also, we compute `is_prime_vec` and `primes` only once and use it for all the evaluations of $g$.\n", "\n", "`n` is calculted from the length of `is_prime_vec`. In the arguments, we assume that `primes` is corresponding to the primes of `is_prime_vec`." ] }, { "cell_type": "code", "execution_count": null, "id": "41b8209f", "metadata": {}, "outputs": [], "source": [ "@njit(parallel=True)\n", "def compute_g_vector(is_prime_vec: np.ndarray, primes: np.ndarray) -> np.ndarray:\n", "\n", " n_max = len(is_prime_vec) - 1\n", " n_max_half = int(0.5 * n_max) + 1\n", "\n", " g_vec = np.empty(n_max_half, dtype=np.uint)\n", "\n", " for E_half in prange(n_max_half):\n", " count = 0\n", " E = 2 * E_half\n", " i_max = np.searchsorted(primes, E_half, side=\"right\")\n", " for i in range(i_max):\n", " if is_prime_vec[E - primes[i]]:\n", " count += 1\n", "\n", " g_vec[E_half] = np.uint(count)\n", "\n", " return g_vec\n", "\n", "\n", "n = 10\n", "primes, is_prime_vec = generate_primes(n)\n", "g_vec = compute_g_vector(is_prime_vec, primes)\n", "g_vec" ] }, { "cell_type": "markdown", "id": "265e5aa1", "metadata": {}, "source": [ "The $i$-th value of `g_vec` correponds to $g(2 \\, i)$ with $i \\geq 0 $:\n", "\n", "\n", "| i | E | g_vec |\n", "|--:|----:|---------:|\n", "| 0 | 0 | 0 |\n", "| 1 | 2 | 0 |\n", "| 2 | 4 | 1 |\n", "| 3 | 6 | 1 |\n", "| 4 | 8 | 1 |\n", "| 5 | 10 | 2 |\n", "\n", "We can check $g$ at least for some for some small values of $E$ :" ] }, { "cell_type": "code", "execution_count": null, "id": "48afe739", "metadata": {}, "outputs": [], "source": [ "n = 56\n", "primes, is_prime_vec = generate_primes(n)\n", "g_vec = compute_g_vector(is_prime_vec, primes)\n", "g_vec_ref = [0, 0, 1, 1, 1, 2, 1, 2, 2, 2, 2, 3, 3, 3, 2, 3, 2, 4, 4, 2, 3, 4, \n", " 3, 4, 5, 4, 3, 5, 3]\n", "np.testing.assert_array_equal(g_vec, g_vec_ref)" ] }, { "cell_type": "markdown", "id": "4bee248b", "metadata": {}, "source": [ "Finally we wrap everything into a function that also loads the values of `g_vec` into a Pandas dataframe for convenience:" ] }, { "cell_type": "code", "execution_count": null, "id": "0da4244a", "metadata": {}, "outputs": [], "source": [ "def compute_g_df(n):\n", " primes, is_prime_vec = generate_primes(n)\n", " g_vec = compute_g_vector(is_prime_vec, primes)\n", " g_df = pd.DataFrame(data={\"E\": 2 * np.arange(len(g_vec)), \"g\": g_vec})\n", " g_df = g_df[g_df.E > 2] # The function g(E) is defined for all even integers E>2\n", " return g_df\n", "\n", "\n", "compute_g_df(11)" ] }, { "cell_type": "markdown", "id": "5f35edf1", "metadata": {}, "source": [ "## First plot of the comet\n", "\n", "We start by computing the Goldbach function up to $E=10000$ : " ] }, { "cell_type": "code", "execution_count": null, "id": "9dc8df10", "metadata": {}, "outputs": [], "source": [ "%%time\n", "n = 20_000\n", "g_df_small = compute_g_df(n)" ] }, { "cell_type": "code", "execution_count": null, "id": "bf8b1703", "metadata": {}, "outputs": [], "source": [ "len(g_df_small)" ] }, { "cell_type": "markdown", "id": "041f8acf", "metadata": {}, "source": [ "And plot it with Matplotlib:" ] }, { "cell_type": "code", "execution_count": null, "id": "56f8300c", "metadata": {}, "outputs": [], "source": [ "ax = g_df_small.plot(x=\"E\", y=\"g\", style=\".\", ms=5, alpha=0.5, legend=False, figsize=FS)\n", "_ = ax.set(\n", " title=\"Goldbach's comet\",\n", " xlabel=\"E\",\n", " ylabel=\"g(E)\",\n", ")\n", "ax.autoscale(enable=True, axis=\"x\", tight=True)\n", "ax.grid(True)" ] }, { "cell_type": "markdown", "id": "3de87029", "metadata": {}, "source": [ "## Plot the comet with Datashader\n", "\n", "Now let's compute the Goldbach function with a larger value of $n$ (this takes about 25 - 30s on my laptop, Intel(R) i7-7700HQ CPU @ 2.80GHz with 8 cores): " ] }, { "cell_type": "code", "execution_count": null, "id": "9e0f46c2", "metadata": {}, "outputs": [], "source": [ "%%time\n", "n = 2_000_000\n", "g_df = compute_g_df(n)" ] }, { "cell_type": "code", "execution_count": null, "id": "20d33acc", "metadata": {}, "outputs": [], "source": [ "len(g_df)" ] }, { "cell_type": "markdown", "id": "a627297c-8977-4fad-8951-b047a0d156ea", "metadata": {}, "source": [ "The plot is made using Datashader's interface for Matplotlib." ] }, { "cell_type": "code", "execution_count": null, "id": "c102e259-8742-488f-b588-c35460bc484f", "metadata": {}, "outputs": [], "source": [ "cmap = palette[\"dimgray\"][::-1]\n", "bg_col = \"white\"" ] }, { "cell_type": "code", "execution_count": null, "id": "f976aee9-f659-4369-9a01-eec698c3cbeb", "metadata": {}, "outputs": [], "source": [ "%%time\n", "fig, ax = plt.subplots(figsize=FS)\n", "_ = dsshow(\n", " g_df,\n", " ds.Point(\"E\", \"g\"),\n", " norm=\"eq_hist\",\n", " cmap=cmap,\n", " aspect=\"auto\",\n", " ax=ax,\n", ")\n", "ax.grid(False)\n", "ax.set_facecolor(bg_col)\n", "_ = ax.set(title=\"Goldbach's comet\", xlabel=\"E\", ylabel=\"g(E)\")" ] }, { "cell_type": "markdown", "id": "3976f24a", "metadata": {}, "source": [ "We can clearly observe some dense lines in this \"comet tail\". In order to visualize this vertical distribution of prime pairs count, we are are going to normalize $g$. As explained on [wikipedia](https://en.wikipedia.org/wiki/Goldbach%27s_comet):\n", "\n", "> An illuminating way of presenting the comet data is as a histogram. The function $g(E)$ can be normalized by dividing by the locally averaged value of $g$, $g_{av}$, taken over perhaps 1000 neighboring values of the even number $E$. The histogram can then be accumulated over a range of up to about 10% either side of a central $E$. " ] }, { "cell_type": "code", "execution_count": null, "id": "fdf920b7", "metadata": {}, "outputs": [], "source": [ "%%time\n", "g_df[\"g_av\"] = g_df[\"g\"].rolling(window=1000, center=True).mean()\n", "g_df[\"g_norm\"] = g_df[\"g\"] / g_df[\"g_av\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "4e3d84f0", "metadata": {}, "outputs": [], "source": [ "%%time\n", "fig, ax = plt.subplots(figsize=FS)\n", "_ = dsshow(\n", " g_df,\n", " ds.Point(\"E\", \"g_norm\"),\n", " norm=\"eq_hist\",\n", " cmap=cmap,\n", " aspect=\"auto\",\n", " ax=ax,\n", ")\n", "ax.grid(False)\n", "ax.set_facecolor(bg_col)\n", "_ = ax.set(title=\"Normalized Goldbach function\", xlabel=\"E\", ylabel=\"g_norm(E)\")" ] }, { "cell_type": "markdown", "id": "ba06b7cd", "metadata": {}, "source": [ "We can also plot the histogram of the comet data, which will lead to some kind of cross section of the above plot:" ] }, { "cell_type": "code", "execution_count": null, "id": "85881bdb", "metadata": {}, "outputs": [], "source": [ "%%time\n", "ax = g_df[\"g_norm\"].hist(bins=1000, alpha=0.5, figsize=FS)\n", "ax.grid(True)\n", "_ = ax.set(\n", " title=\"Histogram of the normalized Goldbach function\",\n", " xlabel=\"Normalized number of prime pairs\",\n", " ylabel=\"Number of occurrences\",\n", ")\n", "_ = plt.xticks(np.arange(0.5, 3.0, 0.1))" ] }, { "cell_type": "markdown", "id": "3e905696", "metadata": {}, "source": [ "## The Hardy-Littlewood estimate\n", "\n", "As described on the [Wikipedia page](https://en.wikipedia.org/wiki/Goldbach%27s_comet) for Goldbach's comet, the number of Goldbach partitions can be estimated using the following formulae from Hardy and Littlewood (1922) :\n", "\n", "$$\\frac{g(E)}{g_{av}} \\approx c \\prod_{F(E/2)} \\frac{p-1}{p-2}$$\n", "\n", "where the product is taken over $F(E/2)$ : all primes p that are factors of $E/2$. The constant $c$ is the twin primes constant :\n", "\n", "$$c = \\prod_{p \\geq 3} \\left( 1 - \\frac{1}{(1-p)^2} \\right)$$\n", "\n", "where the product is taken over all primes larger or equal to 3." ] }, { "cell_type": "code", "execution_count": null, "id": "259d2409", "metadata": {}, "outputs": [], "source": [ "%%time\n", "primes = np.array([i for i in sieve.primerange(n)])\n", "c = np.prod(1.0 - 1.0 / np.power(1.0 - primes[1:], 2))\n", "c" ] }, { "cell_type": "markdown", "id": "25483e73", "metadata": {}, "source": [ "So let's compute this estimate of the normalized Goldbach function with Numba `njit` :" ] }, { "cell_type": "code", "execution_count": null, "id": "2c0824c9", "metadata": {}, "outputs": [], "source": [ "@njit(parallel=True)\n", "def compute_g_hl_vector(primes: np.ndarray, n_max: int, c: float) -> np.ndarray:\n", "\n", " n_max_half = int(0.5 * n_max) + 1\n", "\n", " g_hl_vec = np.empty(n_max_half, dtype=np.float64)\n", "\n", " for E_half in prange(n_max_half):\n", " i_max = np.searchsorted(primes, E_half, side=\"right\")\n", " prod = 1.0\n", " for i in range(1, i_max):\n", " p = primes[i]\n", " if E_half % p == 0: # if p is a factor of E/2\n", " prod *= (p - 1.0) / (p - 2.0)\n", "\n", " g_hl_vec[E_half] = np.float64(prod)\n", " g_hl_vec *= c\n", "\n", " return g_hl_vec" ] }, { "cell_type": "code", "execution_count": null, "id": "fe617020", "metadata": {}, "outputs": [], "source": [ "%%time\n", "g_hl_vec = compute_g_hl_vector(primes, n, c)" ] }, { "cell_type": "code", "execution_count": null, "id": "c6934635", "metadata": {}, "outputs": [], "source": [ "g_hl_df = pd.DataFrame(data={\"E\": 2 * np.arange(len(g_hl_vec)), \"g_norm\": g_hl_vec})" ] }, { "cell_type": "code", "execution_count": null, "id": "d8bd9396", "metadata": {}, "outputs": [], "source": [ "%%time\n", "fig, ax = plt.subplots(figsize=FS)\n", "_ = dsshow(\n", " g_hl_df,\n", " ds.Point(\"E\", \"g_norm\"),\n", " norm=\"eq_hist\",\n", " cmap=cmap,\n", " aspect=\"auto\",\n", " ax=ax,\n", ")\n", "ax.grid(False)\n", "ax.set_facecolor(bg_col)\n", "_ = ax.set(title=\"Hardy-Littlewood estimate of the normalized Goldbach function\", xlabel=\"E\", ylabel=\"g_norm(E) estimate\")" ] }, { "cell_type": "markdown", "id": "d8743e45", "metadata": {}, "source": [ "The vertical distribution of the dense lines seems to be similar to the one from the normalized Goldbach function. This can be checked by computing and plotting both kernel density estimates." ] }, { "cell_type": "code", "execution_count": null, "id": "8cd7124a", "metadata": {}, "outputs": [], "source": [ "%%time\n", "ax = g_df[\"g_norm\"].plot.density(alpha=0.5, figsize=FS, label='Golbach function')\n", "ax = g_hl_df.g_norm.plot.density(alpha=0.5, ax=ax, label='Hardy-Littlewood estimate')\n", "ax.grid(True)\n", "_ = ax.set(\n", " title=\"KDEs of the normalized Goldbach function and its Hardy-Littlewood estimate\",\n", " xlabel=\"Normalized number of prime pairs\",\n", " ylabel=\"Density\",\n", ")\n", "_ = ax.legend()\n", "_ = plt.xticks(np.arange(0.5, 3.0, 0.1))\n", "_ = ax.set_xlim(0.5, 2.5)" ] }, { "cell_type": "markdown", "id": "72b93386", "metadata": {}, "source": [ "## Prime E/2 values only\n", "\n", "Finally, we are going to isolate a part of the most dense line from the comet tail (for a normalized number of prime pairs around 0.66). As explained on [Wikipedia](https://en.wikipedia.org/wiki/Goldbach%27s_comet):\n", " \n", "> Of particular interest is the peak formed by selecting only values of $E/2$ that are prime. [...] The peak is very close to a Gaussian form. " ] }, { "cell_type": "code", "execution_count": null, "id": "23ababa4", "metadata": {}, "outputs": [], "source": [ "g_df[\"E_half\"] = (0.5 * g_df[\"E\"]).astype(int)\n", "g_df_primes = g_df[g_df[\"E_half\"].isin(primes)]" ] }, { "cell_type": "code", "execution_count": null, "id": "b55d93b6", "metadata": {}, "outputs": [], "source": [ "%%time\n", "fig, ax = plt.subplots(figsize=FS)\n", "_ = dsshow(\n", " g_df_primes,\n", " ds.Point(\"E\", \"g_norm\"),\n", " norm=\"eq_hist\",\n", " cmap=cmap,\n", " aspect=\"auto\",\n", " ax=ax,\n", ")\n", "ax.grid(False)\n", "ax.set_facecolor(bg_col)\n", "_ = ax.set(title=\"Normalized Goldbach function for prime E/2 values only\", xlabel=\"E\", ylabel=\"g_norm(E)\")" ] }, { "cell_type": "code", "execution_count": null, "id": "ea99b181", "metadata": {}, "outputs": [], "source": [ "ax = g_df_primes[\"g_norm\"].hist(bins=500, alpha=0.5, figsize=FS)\n", "ax.grid(True)\n", "_ = ax.set(\n", " title=\"Histogram of the normalized Goldbach function for prime E/2 values only\",\n", " xlabel=\"Normalized number of prime pairs\",\n", " ylabel=\"Number of occurrences\",\n", ")" ] }, { "cell_type": "markdown", "id": "8cafaf27-c352-4b0f-96eb-905488f29d1f", "metadata": {}, "source": [ "## Conclusion\n", "\n", "As is often the case with number theory, it appears that the computation of Goldbach function is quite expensive. But we saw that speeding up these CPU bound processes using Numba is straightforward, as well as it is to quickly plot a dataset with a large number of points with Datashader. I am not familiar with this branch of mathematics, and with this kind of new subject, I am amazed how easy it is to launch a notebook, play with these formulas and visualize the data in Python." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }