{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# List vs numpy array\n", "\n", "##### Germain Salvato Vallverdu\n", "---" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import math as m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Utilisation d'une fonction" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def np_func(x):\n", " return (3 * x ** 2 + 2 * x - 1) * np.exp(- x / 2.3) * np.sin(2 * x)\n", "\n", "def m_func(x):\n", " return (3 * x ** 2 + 2 * x - 1) * m.exp(- x / 2.3) * m.sin(2 * x)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "x = np.linspace(0, 10, 1000)\n", "xl = x.tolist()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "37.7 µs ± 815 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%timeit y = np_func(x)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "642 µs ± 6.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "%%timeit\n", "y = list()\n", "for xi in xl:\n", " yi = m_func(xi)\n", " y.append(yi)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17.02917771883289" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "642 / 37.7" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Produit scalaire\n", "\n", "Ou produit de matrices ou matrices-vecteurs ou opération sur des vecteurs (sommes, produit) en général." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "x = np.random.random(1000)\n", "y = np.random.random(1000)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.38 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" ] } ], "source": [ "%timeit np.dot(x, y)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "xl = x.tolist()\n", "yl = y.tolist()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "68.3 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%%timeit\n", "scal = 0\n", "for xi, yi in zip(xl, yl):\n", " scal += xi * yi" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "49.492753623188406" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "68.3 / 1.38" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Centre de gravité" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "coords = np.random.uniform(0, 30, size=(100, 3))\n", "weight = np.random.random(100)\n", "lcoords = coords.tolist()\n", "lweight = weight.tolist()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10.4 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" ] } ], "source": [ "%%timeit\n", "w_coords = coords * weight[:, np.newaxis]\n", "G = w_coords.sum(axis=0) / len(w_coords)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[7.81686019 8.63915013 7.2122051 ]\n" ] } ], "source": [ "w_coords = coords * weight[:, np.newaxis]\n", "G = w_coords.sum(axis=0) / len(w_coords)\n", "print(G)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "106 µs ± 3.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%%timeit\n", "G = [0, 0, 0]\n", "for w_i, coords_i in zip(lweight, lcoords):\n", " w_coords_i = [w_i * xi for xi in coords_i]\n", " for i in range(3):\n", " G[i] += w_coords_i[i]\n", "\n", "nat = len(lweight)\n", "for i in range(3):\n", " G[i] /= nat" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10.288461538461538" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "107 / 10.4" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[7.816860188194455, 8.639150131333633, 7.2122051018216045]\n" ] } ], "source": [ "G = [0, 0, 0]\n", "for w_i, coords_i in zip(lweight, lcoords):\n", " w_coords_i = [w_i * xi for xi in coords_i]\n", " for i in range(3):\n", " G[i] += w_coords_i[i]\n", "\n", "nat = len(lweight)\n", "for i in range(3):\n", " G[i] /= nat\n", "\n", "print(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Matrice de distances\n", "\n", "Un peu plus sioux ... calculer toutes les distances entre atomes. En ajoutant un axe dans le tableau numpy il fait le calcul de toutes les opérations." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "coords = np.random.uniform(0, 30, (5, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Je découpe un peu l'ajout des axes. L'idée c'est d'avoir quelque chose de 5 x 5 à la fin (la matrice)." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(5, 1, 3)\n", "[[[23.40646835 18.6866633 2.21638428]]\n", "\n", " [[ 4.66508894 1.69156122 15.23312689]]\n", "\n", " [[14.324572 9.34691585 20.53275304]]\n", "\n", " [[18.0876806 19.05955999 28.26381997]]\n", "\n", " [[10.03703844 21.41928076 6.58185643]]]\n" ] } ], "source": [ "print(coords[:, np.newaxis, :].shape)\n", "print(coords[:, np.newaxis, :])" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1, 5, 3)\n", "[[[23.40646835 18.6866633 2.21638428]\n", " [ 4.66508894 1.69156122 15.23312689]\n", " [14.324572 9.34691585 20.53275304]\n", " [18.0876806 19.05955999 28.26381997]\n", " [10.03703844 21.41928076 6.58185643]]]\n" ] } ], "source": [ "print(coords[np.newaxis, :, :].shape)\n", "print(coords[np.newaxis, :, :])" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(5, 5, 3)\n", "[[[ 0. 0. 0. ]\n", " [ 18.74137941 16.99510209 -13.01674261]\n", " [ 9.08189635 9.33974745 -18.31636877]\n", " [ 5.31878775 -0.37289669 -26.04743569]\n", " [ 13.36942991 -2.73261746 -4.36547215]]\n", "\n", " [[-18.74137941 -16.99510209 13.01674261]\n", " [ 0. 0. 0. ]\n", " [ -9.65948306 -7.65535464 -5.29962616]\n", " [-13.42259166 -17.36799878 -13.03069308]\n", " [ -5.3719495 -19.72771955 8.65127046]]\n", "\n", " [[ -9.08189635 -9.33974745 18.31636877]\n", " [ 9.65948306 7.65535464 5.29962616]\n", " [ 0. 0. 0. ]\n", " [ -3.76310861 -9.71264414 -7.73106693]\n", " [ 4.28753355 -12.07236491 13.95089662]]\n", "\n", " [[ -5.31878775 0.37289669 26.04743569]\n", " [ 13.42259166 17.36799878 13.03069308]\n", " [ 3.76310861 9.71264414 7.73106693]\n", " [ 0. 0. 0. ]\n", " [ 8.05064216 -2.35972077 21.68196354]]\n", "\n", " [[-13.36942991 2.73261746 4.36547215]\n", " [ 5.3719495 19.72771955 -8.65127046]\n", " [ -4.28753355 12.07236491 -13.95089662]\n", " [ -8.05064216 2.35972077 -21.68196354]\n", " [ 0. 0. 0. ]]]\n" ] } ], "source": [ "rij = coords[:, np.newaxis, :] - coords[np.newaxis, :, :]\n", "print(rij.shape)\n", "print(rij)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Du coup le calcul des distances avec numpy ça donne :" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 28.45186084 22.47667877 26.58754335 14.3271142 ]\n", " [28.45186084 0. 13.4162627 25.526698 22.20101891]\n", " [22.47667877 13.4162627 0. 12.97173228 18.94076173]\n", " [26.58754335 25.526698 12.97173228 0. 23.24841208]\n", " [14.3271142 22.20101891 18.94076173 23.24841208 0. ]]\n" ] } ], "source": [ "rij2 = (coords[:, np.newaxis, :] - coords[np.newaxis, :, :]) ** 2\n", "d = np.sum(rij2, axis=2) ** 0.5\n", "print(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Avec plus d'atomes" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "coords = np.random.uniform(0, 30, (100, 3))\n", "lcoords = coords.tolist()" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "378 µs ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "%%timeit\n", "rij2 = (coords[:, np.newaxis, :] - coords[np.newaxis, :, :]) ** 2\n", "d = np.sum(rij2, axis=2) ** 0.5" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7.94 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%%timeit\n", "nat = len(lcoords)\n", "distances = [[0. for iat in range(nat)] for jat in range(nat)]\n", "for iat in range(nat):\n", " for jat in range(iat + 1, nat):\n", " ix = lcoords[iat]\n", " jx = lcoords[jat]\n", " rij2 = [(ix[i] - jx[i]) **2 for i in range(3)]\n", " d2 = sum(rij2)\n", " distances[iat][jat] = m.sqrt(d2)\n", " distances[jat][iat] = distances[iat][jat]" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21.005291005291006" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7940 / 378" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tu noteras qu'en plus, dans ce cas, l'implémentation numpy fait le double de calculs que l'implémentation standard... Dans le calcul numpy tu calcules la distance i-j et j-i alors que tu évites de faire ça dans l'implementation standard." ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "14.6 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%%timeit\n", "# en calculant toutes les distances\n", "nat = len(lcoords)\n", "distances = [[0. for iat in range(nat)] for jat in range(nat)]\n", "for iat in range(nat):\n", " for jat in range(nat):\n", " ix = lcoords[iat]\n", " jx = lcoords[jat]\n", " rij2 = [(ix[i] - jx[i]) **2 for i in range(3)]\n", " d2 = sum(rij2)\n", " distances[iat][jat] = m.sqrt(d2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "C'est 2 fois plus long ... normal tu as le double de calcul. Du coup numpy est environ 40 fois plus rapide." ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "38.62433862433863" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "14.6e3 / 378" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }