{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Introduction à Python\n", "## (Master 2 Mathématiques, modélisation et apprentissage)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Première version date de 1991 (Guido van Rossum, Pays-Bas). \n", "- On utilisera Python 3 dans ce cours. Python 2 ne sera plus maintenu à partir de janvier 2020.\n", "- Langage interprété, avec des usages très variés (calcul scientifique, web, interface graphique,...) \n", "- **Open Source**, en très forte croissance depuis quelques années, et langage le plus utilisé par les développeurs aujourd'hui.\n", "- Communauté d’utilisateurs très active (StackOverFlow.com) \n", "- Quelques softwares écrits en Python : BitTorrent, Dropbox..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Langage interprété vs compilé\n", "- Exemples de langages interprétés : Python, Matlab, Scilab, Octave, R\n", "- Exemples de langages compilés : C, C++, Java\n", "- Vitesse d’exécution : interprété < compilé\n", "\n", " *Pourquoi, dans ce cas, considérer Python pour du calcul scientifique ?*\n", " \n", " \n", "- Temps d’implémentation vs temps d’exécution : langage lisible et épuré ⇒ développement et maintenance rapides\n", "- Exécution en Python : rapide si les passages critiques sont exécutés avec un langage compilé : de nombreuses fonctions sont compilées et le code est interfaçable avec C/C++/FORTRAN" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Fonctionnement\n", "\n", "1. Ouverture de l’environnement de développement (ex : Spyder, Jupyter Notebook). Non indispensable mais conseillé.\n", "2. Au choix :\n", " - Commande en ligne\n", " - Ecriture d’un script → exécution du script\n", " - Ecriture d’une fonction → chargement de la fonction → appel à la fonction \n", "\n", "Tutoriel/aide dans la console : **nom_fonction?**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Les types et opérations sous Python" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Voyons maintenant les principaux types et opérations sous Python. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Booléens" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Deux valeurs possibles : False, True" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "type(False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x = (1<2)\n", "x=3\n", "type(x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Opérateurs de comparaison : ==, !=, >, >=, <, <=" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "2 <=8 <15" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Opérateurs logiques : **not, or, and**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "(3 == 3) or (9 > 24) " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "(9 > 24) and (3 == 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### int, float, complex" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "type(2**100) \n", "type(3.6)\n", "type(3+2j)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "2*3 # produit\n", "#2**3 # puissance\n", "#20/3 # division flottante\n", "#20//3 # division entière\n", "#20%3 # modulo\n", "#(9+5j).real\n", "#(9+5j).imag\n", "#abs(3+4j) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### les chaîne de caractères " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "type('abc') \n", "c1 = 'L’eau vive'\n", "c2 = ' est \"froide\" !'\n", "c1+c2 # concatenation\n", "#c1*2 #repetition \n", "#c1[2] \n", "#c1[-2]\n", "#c1[2:5]\n", "#len(\"abc\") # longueur \n", "#\"abcde\".split(\"c\") # scinde \n", "#\"a−ha\".replace('a','o') # 'o-ho' \n", "#'-'.join(['ci', 'joint']) # 'ci-joint'\n", "#'abracadabra'.count('bra') # 2\n", "#'PETIT'.lower() # 'petit'\n", "#'grand'.upper() # 'GRAND' " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### les listes\n", "\n", "list : collection hétérogène, ordonnée et modifiable d’éléments séparés par des virgules, entourée de crochets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "my_list=[4,7,3.7,'E',5,7] \n", "#type(my_list) # list\n", "#len(my_list) # longueur\n", "#my_list[0] # premier terme\n", "#my_list[1:3] # [7, 3.7]\n", "#[0,1] + [2,4] # [0,1,2,4] (concatenation)\n", "#l = [2,4,5,9,1,6,4]\n", "#k = [x for x in l if x<6] #(extract under condition)\n", "#range(6) # attention différent en Python 2 (liste) et 3 (boucle)\n", "#range(2, 9, 2) \n", "#[x for x in range(3,6)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### opérations sur les listes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "nombres = [17, 38, 10, 25, 72]\n", "nombres.sort()\n", "#nombres.append(12)\n", "#nombres.reverse()\n", "#nombres.remove(38)\n", "#print(nombres.index(17))\n", "#nombres[0] = 11\n", "#nombres[1:3] = [14, 17, 2]\n", "#nombres.count(17) # 2 \n", "nombres" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**ATTENTION**, par défaut, en Python, les listes ne sont pas copiées." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x = [4, 2, 10, 9, \"toto\"]\n", "print(x)\n", "y = x # y: seulement un alias de x, pas une copie\n", "y[2] = \"tintin\" # change x(2) en \"tintin\"\n", "print(x) \n", "#x = [4, 2, 10, 9, \"toto\"]\n", "#y = x[:] # On demande une copie\n", "#y[2] = \"tintin\" # modifie y mais pas x\n", "#print(x) # [4, 2, 10, 9, \"toto\"]\n", "#print(y) # [4, 2, \"tintin\", 9, \"toto\"]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### les tuples\n", "\n", "tuple : collection hétérogène, ordonnée, immuable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "t=(5,7)\n", "type(t) # tuple\n", "t[0] # 5\n", "#t[0]=2 # error: item assignment for tuple" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Les boucles et opérateurs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### if – [elif] – [else]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x = 11\n", "if x < 0:\n", " print(\"x est negatif\")\n", "elif x % 2:\n", " print(\"x est positif et impair\")\n", "else:\n", " print(\"x n'est pas negatif et est pair\")\n", " print(\"Eh oui !\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### while" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "N = 0\n", "x = 687687.567476\n", "while (x > 0):\n", " x//=2\n", " N+=1\n", "print(\"Approx. de log_2(x) : \" + str(N-1))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### for" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "for lettre in \"ciao\":\n", " print(lettre)\n", " \n", "for x in [\"\\n\",2,'a', 3.14,\"\\n\"]: \n", " print(x) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Les fonctions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercice \n", "*Ecrire une fonction f qui prend $n$ en entrée et calcule*\n", "$$\\sum_{k=1}^{n-1} k^2$$\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def f(n):\n", " print(sum([k**2 for k in range(n)])) " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "f(3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exercice\n", "*Ecrire une fonction f qui calcule le nième terme de la suite de Fibonacci initialisée par a et b, dont la relation de récurrence est* \n", "$$ x_{n+2} = x_{n+1} + x_n.$$\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def fib(n,a=0,b=1): #0,1: val. par defaut \n", " '''n-th term of Fibonacci sequence starting at a,b.'''#tutoriel de fib\n", " for i in range(n):\n", " z = a+b\n", " a=b\n", " b=z\n", " return a " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "fib(7,2,2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Les Bibliothèques d’usage courant\n", "\n", "- **NumPy** : manipulation de tableaux numériques, fonctions mathématiques de base, simulation de variables aléatoires...\n", "- **SciPy** : fonctions mathématiques plus avancées (résolution d’équations, d’équations différentielles, calcul d’intégrales...)\n", "- **Matplotlib** : visualisation de données sous forme de graphiques scikit-learn : machine learning\n", "- SymPy : calcul symbolique\n", "\n", "Calcul numérique = manipulations de nombres décimaux $\\neq$ Calcul symbolique = manipulation d’expressions symboliques\n", "\n", "Exemple : racines de $x^2 − x − 1 = 0$\n", " - calcul symbolique : $\\frac{1+ \\sqrt{5}}{2}$ , $\\frac{1- \\sqrt{5}}{2}$\n", " - calcul numérique : 1.618034, - 0.6180340" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Import de bibliothèques ou de fonctions\n", "- import ma_bibliotheque\n", "- ma_bibliotheque.la_fonction(...)\n", "\n", "- import ma_bibliotheque as bibli # raccourci \n", "- bibli.la_fonction(...) \n", "\n", "Moins précis (car la bibliothèque d’origine des fonctions n’est pas précisée à leur appel) :\n", "from ma_bibliotheque import la_fonction\n", "- from ma_bibliotheque import la_fonction\n", "- la_fonction (...)\n", " \n", "- from ma_bibliotheque import ∗ \n", "- la_fonction (...) \n", "- Attention si on importe plusieurs bibliothèques ayant les mêmes fonctions..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### La librairie `Numpy`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import numpy as np # toujours commencer par cette commande quand on veut utiliser la bibliothèque numpy" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "b = np.array([[8,3,2,4],[5,1,6,0],[9,7,4,1]])\n", "print(b)\n", "#type(b) # numpy.ndarray \n", "#b.dtype # datatype: int\n", "b.shape # (3,4)\n", "#c = np.array([[8,2],[5,6],[9,7]], dtype=complex)\n", "#c.dtype # datatype: complex\n", "#c[0,0] # 8+0j\n", "\n", "#More than 2 dimensions:\n", "#d = np.array([[[8,3],[1,2]],[[5,1],[4,5]],[[9,7],[4,5]]])\n", "#d.shape # (3, 2, 2)\n", "\n", " ##Reshaping:\n", "#x = d.reshape(4,3) # tableau de taille (4,3)\n", "#d.reshape(12,1) # tableau de taille (12,1)\n", "#d.reshape(12,) # tableau unidimensionel de taille 12\n", "#np.insert(np.arange(4,9),3,17) # 4,5,6,17,7,8" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Attention, il existe aussi une classe numpy.matrix mais il est recommandé d'utiliser numpy.array." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Opérations sur les tableaux de nombres numpy" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "X = np.arange(start=5,step=3,stop=16) # 5, 8,11,14\n", "#A = np.ones((2,3)) # matrix filled with ones\n", "#B = X.reshape(2,2)\n", "#C = np.zeros((3,2)) # matrix filled with zeros\n", "#D = np.eye(2) # identity matrix\n", "#np.diag([1,2]) # diagonal matrix\n", "#E = C+np.ones(C.shape) # addition: same as C+1\n", "#F = B*D # entry-wise multiplication\n", "#J=np.dot(B,D) # linear algebra product\n", "#G = F.T # transpose matrix\n", "#H = np.exp(G) #as most functions, exp is entry-wise(else use np.vectorize(my_function))\n", "#x = np.array([4, 2, 1, 5, 1, 10])\n", "#y=np.logical_and(x>=3, x<= 9, x!=1) # [T,F,F,T,F,F]\n", "#x[y] \n", "#print(np.mean(np.random.randn(1000)>1.96)) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### algèbre linéaire avec numpy" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "A = np.array([[2, 1, 1], [4, 3, 0]])\n", "B = np.array([[1, 2], [12, 0]])\n", "C = np.array([[1, 2], [12, 0], [-1, 2]])\n", "D = np.array([[1, 2, -4], [2, 0, 0], [1, 2, 3]])\n", "E = np.concatenate((A,B), axis=1)\n", "F = np.concatenate((C,D), axis=1)\n", "G = np.concatenate((E,F),axis=0)\n", "H = np.random.randn(5,5)\n", "I = H*G # produit terme à terme \n", "B5 = B**5 # puissance terme à terme\n", "B5 =np.linalg.matrix_power(B, 5)\n", "Bm1 = np.linalg.inv(B)\n", "dB = np.linalg.det(B)\n", "x = np.linalg.solve(B,[3,12]) #résout B*x=[[3],[12]]\n", "print(E)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### analyse spectrale avec numpy" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "A = np.array([[1, 2], [12, 3]])\n", "x = np.linalg.eigvals(A) # eigenvalues\n", "# eigenvalues and eigenvectors:\n", "valp, vectp = np.linalg.eig(A) \n", "\n", "#Hermitian matrices methods:\n", "S = np.array([[1, 2], [2, 3]])\n", "y = np.linalg.eigvalsh(S) # eigenvalues\n", "# eigenvalues and eigenvectors:\n", "valp, vectp = np.linalg.eigh(S)\n", " \n", "#Singular Value Decomposition\n", "U,s,V=np.linalg.svd(A) \n", "Ap = np.matrix(U)*np.diag(s)*V\n", "print(A-Ap)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### copie de tableaux numpy\n", "**Attention encore une fois !!!**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x = np.array([[8,3,2],[5,1,0],[9,7,1]])\n", "y = x\n", "x[0,0]+=1\n", "x[0,0]-y[0,0] # 0\n", "z=x.copy()\n", "x[0,0]+=1\n", "x[0,0]-z[0,0] # 1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Génération de variables aléatoires discrètes avec numpy\n", " - import numpy.random as npr\n", " - my_sample = npr.ma_loi(paramètres, taille_du_tableau)\n", "\n", "- npr.randint(low=a,high=b,size=n) : v.a. unif. sur [ a, b[ \n", "- npr.choice([a1,...,an],p=[p1,...,pn],size=n) : tirages indép. dans [a1,...,an] de loi [p1,...,pn] \n", "- npr.permutation(mon_urne) : permutation de mon_urne \n", "- npr.binomial(N,p,size=n)\n", "- npr.geometric(p,size=n) \n", "- npr.multinomial(n,tableau_des_probas,size=n) \n", "- npr.poisson(alpha,size=n)\n", "\n", "Beaucoup d’autres exemples sur http://docs.scipy.org/doc/numpy/reference/routines.random.html" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Génération de variables aléatoires continues avec numpy\n", " - import numpy.random as npr\n", " - my_sample = npr.ma_loi(paramètres, taille_du_tableau)\n", "\n", "- npr.rand(d1,d2,...) : tableau d1 x d2 x... de v.a.i. unif. sur [0, 1]\n", "- npr.uniform(low=a,high=b,size=n) : v.a.i. unif. sur [a, b[ \n", "(size=n peut être remplacé par size=(d1,d2,...), comme partout dans ce qui suit)\n", "- npr.randn(d1,d2,...) : tableau d1 x d2 x... de v.a.i. $\\mathcal{N}(0,1)$\n", "- npr.multivariate_normal(mean=V,cov=C,size=n) : vecteurs aléatoires indépendants de loi $\\mathcal{N} (V, C)$ rangés dans un tableau de taille $n\\times N$,où $N$ est la taille de $V$ et $N\\times N$ celle de $C$\n", "- npr.exponential(scale=s,size=n) : v.a.i. exponentielles de moyenne s\n", "\n", "Beaucoup d’autres exemples sur http://docs.scipy.org/doc/numpy/reference/routines.random.html" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Fonctions de numpy utiles en proba/stat\n", "- np.mean(x), np.std(x), np.percentile(x) : moyenne, écart-type et percentile d’un vecteur x (échantillon)\n", "- np.sum(x) somme des valeurs de x\n", "- np.cumsum(x) vecteur [x1,x1 +x2,...,x1 +···+xn] des sommes cumulées des coordonnées x1, . . . , xn de x \n", "- np.cov(x) matrice n × n de covariance des lignes du tableau x de taille n × p\n", "- scipy.stats : bibliothèque proposant densités, fonctions de répartition, quantiles, etc... de lois classiques. Cf http://docs.scipy.org/doc/scipy/reference/stats.html\n", "- matplotlib.pyplot bibliothèque d’affichage graphique " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Boucles vs programmation matricielle\n", "** -> Eviter si possible les boucles en Python !!!**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from time import time" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "n = int(1e7)\n", "# Methode 1. Boucle for\n", "t1 = time()\n", "gamma1=sum([1./i for i in range(1,n+1)]) - np.log(n)\n", "t2 = time()\n", "temps1 = t2 - t1\n", "# Methode 2. Numpy\n", "t1 = time()\n", "gamma2=np.sum(1. / np.arange(1,n+1)) - np.log(n)\n", "t2 = time()\n", "temps2 = t2 - t1\n", "print(\"Facteur de gain: \", temps1/temps2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "from timeit import timeit\n", "N = 100\n", "setup = \"\"\"\"\"\n", "import numpy as np\n", "n = int(1e5)\n", "\"\"\"\"\"\n", "code_boucle = \"\"\"\n", "np.sum([1. / i for i in range(1, n)]) - np.log(n)\n", "\"\"\"\n", "time_boucle=timeit(code_boucle,setup=setup,number=N)\n", "\n", "code_numpy = \"\"\"\n", "np.sum(1. / np.arange(1, n)) - np.log(n)\n", "\"\"\"\n", "time_numpy=timeit(code_numpy,setup=setup,number=N) \n", "\n", "print(\"Facteur : {}\".format(time_boucle/time_numpy))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Affichage graphique avec `matplotlib`\n", "\n", " - import matplotlib.pyplot as plt \n", " \n", "\n", "- plt.plot(x,y) affiche la courbe affine par morceaux reliant les points d’abscisses x et d’ordonnées y (nombreuses options) pour x, y vecteurs de même dimension,\n", "- plt.hist trace un histogramme. Deux options pour les colonnes : bins= nombre de colonnes ou bins= abscisses des séparations des colonnes\n", "- plt.bar trace un diagramme en bâtons \n", "- plt.scatter(x,y) affiche le nuage de points d’abs. x et d’ord. y\n", "- plt.stem(x,y) affiche des barres verticales d’abs. x et hauteur y\n", "- plt.axis([xmin,xmax,ymin,ymax]) définit les intervales couverts par la figure\n", "- plt.axis(’scaled’) impose que les échelles en x et en y soient les mêmes\n", "- plt.show() affiche les fenêtres créées dans le script\n", "- plt.figure() crée une nouvelle fenêtre graphique\n", "- plt.title(\"mon titre\") donne un titre à une figure\n", "- plt.legend(loc=’best’) affiche la légende d’un graphique (en position optimale)\n", "- plt.subplot subdivise la fenêtre graphique de façon à y afficher plusieurs graphiques" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Exemple : représentation d’un échantillon de loi discrète" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import scipy.stats as sps\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "n, p, N = 20, 0.3, int(1e4)\n", "B = np.random.binomial(n, p, N)\n", "f = sps.binom.pmf(np.arange(n+1), n, p)\n", "plt.hist(B,bins=n+1,density = 1,range=(-.5,n+.5),color=\"white\",label=\"loi empirique\")\n", "plt.stem(np.arange(n+1),f,\"r\",label=\"loi theorique\")\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Histogramme d’un échantillon de loi continue" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exercice\n", "\n", "1. Créer un vecteur E contenant 10000 réalisations indépendantes d'une loi $\\mathcal{N}(0,1)$.\n", "2. Aficher sur le même graphique ce vecteur et la loi gaussienne théorique, obtenue grâce à la fonction norm.pdf de la bibliothèque scipy. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "E = np.random.randn(int(1e5))#echantillon\n", "x = np.linspace(-4,4,1000)\n", "f_x = sps.norm.pdf(x) #Densite gaussienne\n", "plt.plot(x,f_x,\"r\",label=\"Theory\")\n", "#Affichage histo:\n", "plt.hist(E,bins=50,density=1,label=\"Data\")\n", "plt.legend(loc='best')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## La loi des grands nombres\n", "\n", "### Theorème, (Loi des Grands Nombres), Kolmogorov, 1929 \n", "\n", "Soient $(X_i)_{i\\ge 1}$ copies indépendantes d'une même variable aléatoire $X$:\\begin{eqnarray*} \\mathbb{E}[|X|] <\\infty&\\implies& S_n:=\\frac{X_1+\\dots+X_n}{n}%\\underset{n\\to\\infty}{\\stackrel{p.s.}{\\longrightarrow}}\n", " \\underset{n\\to\\infty}{\\longrightarrow}\n", " \\mathbb{E}[X],\\\\\n", " \\mathbb{E}[|X|] =\\infty&\\implies& S_n:=\\frac{X_1+\\dots+X_n}{n}\\;\\textrm{ diverge}.\\end{eqnarray*} " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exercice\n", "\n", "1. Tirer un vecteur $X$ dont les coordonnées sont $n$ réalisations indépendantes de la loi uniforme sur $[0,1]$. Afficher sur le même graphique la courbe $S_n$ en fonction de $n$ et une droite qui vaut $\\mathbb{E}[X]$ pour tout $n$. Commentez le résultat. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "wn=10000\n", "S=np.cumsum(np.random.rand(n))/np.arange(1,n+1)\n", "plt.plot(range(1,n+1),S,'r',label=\"S_n\")\n", "plt.plot((1,n),(.5,.5),\"b--\",label=\"Esperance\")\n", "plt.ylabel('S_n')\n", "plt.xlabel(\"n\")\n", "plt.legend(loc='best')\n", "plt.title(\"LGN\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "2. On va maintenant se placer dans le cas d'une distribution dont l'espérance n'est pas forcément finie. Soient $s,U$ v.a. indépendantes, $U$ uniforme sur $[0,1]$ et $s=\\pm 1$ avec probas 0.5,0.5. Alors la v.a. $X:=s U^{-1/\\alpha}$, appelée ici **$\\alpha$-variable aléatoire**, a pour densité $ (\\alpha/2)\\mathbf{1}_{|x|\\ge 1} |x|^{-\\alpha-1}$ et est d'espérance finie ssi $\\alpha>1$. Afficher à nouveau $S_n$ en fonction de $n$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "alpha, n = 0.9, 1000\n", "U = np.random.rand(n)\n", "X = (2*np.random.randint(0,2,n)-1)*U**(-1/alpha)\n", "S=np.cumsum(X)/np.arange(1,n+1)\n", "plt.title(\"LGN: convergence ou pas selon alpha\")\n", "plt.plot((0,n),(0,0),\"b--\")\n", "plt.plot(S,\"r\",label=\"Moyenne empirique\")\n", "plt.xlabel(\"n\")\n", "plt.ylabel(\"S_n\")\n", "plt.legend(loc='best')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "3. On se replace dans le cas où les $X_i$ suivent une loi uniforme sur $[0,1]$. On peut se demander maintenant à quelle vitesse $S_n$ converge vers $\\mu = \\mathbb{E}[X]$ ? On cherche donc $\\beta$\n", " tel que ${n^\\beta(S_n-\\mu)\\longrightarrow \\ell\\ne 0}$,\n", "i.e. $${S_n\\approx \\mu +\\frac{\\ell}{n^\\beta}}.$$\n", "Pour $\\beta$ fixé, afficher sur un même graphique plusieurs courbes $(n, n^\\beta(S_n-\\mu))$ à l'aide d'une boucle for. Essayez par exemple $\\beta =0.2$, $\\beta = 0.5$, $\\beta = 0.7$. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "beta = 0.8\n", "n=1000\n", "for k in range(10):\n", " S=np.cumsum(np.random.rand(n))/np.arange(1,n+1)\n", " T = (S - 0.5)*np.arange(1,n+1)**beta\n", " plt.plot(range(1,n+1),T)\n", " plt.plot((1,n),(0,0),\"b--\")\n", "\n", "plt.ylabel('n^beta(S_n-mu)')\n", "plt.xlabel(\"n\")\n", "plt.legend(loc='best')\n", "plt.title(\"n^beta (erreur dans la LGN)\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Théorème de la limite centrale, Laplace, 1812, Lindeberg, 1920\n", "$(X_i)_{i\\ge 1}$ v.a. indépendantes de même loi d'espérance $\\mu$ et d'écart-type $\\sigma$: $$T_n = \\frac{n^{1/2}}{\\sigma}\\left(\\frac{X_1+\\dots+X_n}{n}-\\mu\\right)\\underset{n\\to\\infty}{\\stackrel{loi}{\\longrightarrow}} \\mathcal{N}(0,1). $$\n", "\n", "En d'autres termes, la moyenne empirique des $X_i$ $\\approx$ moyenne théorique, avec une erreur aléatoire gaussienne d'ordre ${1/\\sqrt{n}}$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "1. Tirer aléatoirement $m$ vecteurs $(X_1,\\dots, X_n)$ avec tous les $X_i$ i.i.d. de loi uniforme sur $[-1,1]$ et afficher l'histogramme des $m$ valeurs de $T_n$ ainsi obtenues. Afficher sur le même graphique la loi gaussienne centrée d'écart-type $\\sigma$. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "n,m,sigma=1000,10000,3**(-1/2)\n", "X=2*np.random.rand(m,n)-1\n", "S=np.sum(X,axis=1)/(np.sqrt(n)*sigma)\n", "M=max(np.abs(S))\n", "x=np.linspace(-M,M,1000)\n", "y=sps.norm.pdf(x)\n", "plt.plot(x,y,'r',label=\"densite\")\n", "plt.hist(S,bins=int(round(m**(1./3)*M*.5)),density=1,histtype='step',label=\"Histogramme\")\n", "plt.legend(loc='best')\n", "plt.title(\"TCL\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Interprétation \n", "Généralement, une grande somme de petits aléas peu corrélés fluctue autour de sa moyenne selon une\n", "distribution gaussienne.\n", "\n", "**Exemples** : anatomie (ex : taille des individus de sexe donné), QI, nombreuses mesures physiques, données économiques, etc..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Lecture et écriture dans un fichier externe\n", "- x=range(5)\n", "- mon_flux=open(\"my_data.txt\",\"w\") #w=write\n", "- mon_flux.write(str(x))\n", "- mon_flux.close()\n", "\n", "Ecriture a la fin d'un fichier existant:\n", "- mon_flux=open(\"my_data.txt\",\"a\") #a=append\n", "- mon_flux.write(\"\\n\"+str(x+1))\n", "- mon_flux.close()\n", "\n", "Lecture:\n", "- mon_flux=open(\"my_data.txt\",\"r\") #r=read\n", "- y=mon_flux.read()\n", "- print(y)\n", " \n", "Voir aussi np.save, np.load, np.savetxt, csv.reader...\n", "- import numpy as np\n", "- x=np.random.rand(5,3)\n", "- np.save(\"my_npy_file.npy\",x) #creation et ecriture\n", "- x2=np.load(\"my_npy_file.npy\") #lecture" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Lecture dans un fichier externe : exemple " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "f=open(\"PopLynxRegionCanada_1821_1934.dat\",\"r\")\n", "ytxt=f.readlines() # list de str\n", "y=[int(row) for row in ytxt] # convertit str en int\n", "plt.plot(range(1821,1935),y,\"r\")\n", "plt.title(\"Population de lynx\")\n", "plt.tight_layout() # pratique pour l’export\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Python pour l’analyse : exemples" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import numpy as np\n", "import scipy\n", "import matplotlib.pyplot as plt\n", "def f(x):\n", " return np.exp(x)+x\n", "#zeros of f, computed starting at -0.2:\n", "a=scipy.optimize.fsolve(f,-.2)\n", "print(a,f(a))\n", "#integral of f from 0 to 1:\n", "b=scipy.integrate.quad(f,0,1)\n", "print(b)\n", "def g(y,t):\n", " return y\n", "T=np.arange(start=0,stop=1,step=.001)\n", "##solution, at T, of y’=g(y,t), y(T[0])=1:\n", "y=scipy.integrate.odeint(g,1,T)\n", "plt.plot(T,np.log(y),\"r\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Bibliographie : les tutoriels/sites officiels\n", "- Python : https://docs.python.org/2/tutorial/\n", "- NumPy : http://docs.scipy.org/doc/numpy/reference/\n", "- ScipyStats : http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html\n", "- Matplotlib : http://matplotlib.org/users/pyplot_tutorial.html\n", "- NumPy user guide (pdf) : https://docs.scipy.org/doc/numpy-1.8.0/numpy-user-1.8.0.pdf\n", "- Matplotlib user guide (pdf) : http://matplotlib.org/Matplotlib.pdf\n", "- scikit-learn : http://scikit-learn.org/stable/\n", "- SymPy : http://www.sympy.org/fr/index.html\n", "- Anaconda (distribution contenant les interfaces dedéveloppement Spyder et Jupyter) : https://www.continuum.io/downloads" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Quelques cours \n", "\n", "- Un cours de l’X : http://www.cmap.polytechnique.fr/~gaiffas/intro_python.html\n", "- Un cours du lycée Saint Louis : http://mathprepa.fr/python-project-euler-mpsi/\n", "- Un cours de l’INRIA : http://www.labri.fr/perso/nrougier/teaching/index.html\n", "- Un cours d’Orsay : http://www.iut-orsay.u-psud.fr/fr/specialites/mesures_physiques/mphy_pedagogie.html" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }