{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Introduction à Python\n",
"## (Master 2 Mathématiques, modélisation et apprentissage)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"- Première version date de 1991 (Guido van Rossum, Pays-Bas). \n",
"- On utilisera Python 3 dans ce cours. Python 2 ne sera plus maintenu à partir de janvier 2020.\n",
"- Langage interprété, avec des usages très variés (calcul scientifique, web, interface graphique,...) \n",
"- **Open Source**, en très forte croissance depuis quelques années, et langage le plus utilisé par les développeurs aujourd'hui.\n",
"- Communauté d’utilisateurs très active (StackOverFlow.com) \n",
"- Quelques softwares écrits en Python : BitTorrent, Dropbox..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Langage interprété vs compilé\n",
"- Exemples de langages interprétés : Python, Matlab, Scilab, Octave, R\n",
"- Exemples de langages compilés : C, C++, Java\n",
"- Vitesse d’exécution : interprété < compilé\n",
"\n",
" *Pourquoi, dans ce cas, considérer Python pour du calcul scientifique ?*\n",
" \n",
" \n",
"- Temps d’implémentation vs temps d’exécution : langage lisible et épuré ⇒ développement et maintenance rapides\n",
"- Exécution en Python : rapide si les passages critiques sont exécutés avec un langage compilé : de nombreuses fonctions sont compilées et le code est interfaçable avec C/C++/FORTRAN"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Fonctionnement\n",
"\n",
"1. Ouverture de l’environnement de développement (ex : Spyder, Jupyter Notebook). Non indispensable mais conseillé.\n",
"2. Au choix :\n",
" - Commande en ligne\n",
" - Ecriture d’un script → exécution du script\n",
" - Ecriture d’une fonction → chargement de la fonction → appel à la fonction \n",
"\n",
"Tutoriel/aide dans la console : **nom_fonction?**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Les types et opérations sous Python"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Voyons maintenant les principaux types et opérations sous Python. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Booléens"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Deux valeurs possibles : False, True"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"type(False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"x = (1<2)\n",
"x=3\n",
"type(x)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Opérateurs de comparaison : ==, !=, >, >=, <, <="
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"2 <=8 <15"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Opérateurs logiques : **not, or, and**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"(3 == 3) or (9 > 24) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"(9 > 24) and (3 == 3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### int, float, complex"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"type(2**100) \n",
"type(3.6)\n",
"type(3+2j)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"2*3 # produit\n",
"#2**3 # puissance\n",
"#20/3 # division flottante\n",
"#20//3 # division entière\n",
"#20%3 # modulo\n",
"#(9+5j).real\n",
"#(9+5j).imag\n",
"#abs(3+4j) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### les chaîne de caractères "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"type('abc') \n",
"c1 = 'L’eau vive'\n",
"c2 = ' est \"froide\" !'\n",
"c1+c2 # concatenation\n",
"#c1*2 #repetition \n",
"#c1[2] \n",
"#c1[-2]\n",
"#c1[2:5]\n",
"#len(\"abc\") # longueur \n",
"#\"abcde\".split(\"c\") # scinde \n",
"#\"a−ha\".replace('a','o') # 'o-ho' \n",
"#'-'.join(['ci', 'joint']) # 'ci-joint'\n",
"#'abracadabra'.count('bra') # 2\n",
"#'PETIT'.lower() # 'petit'\n",
"#'grand'.upper() # 'GRAND' "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### les listes\n",
"\n",
"list : collection hétérogène, ordonnée et modifiable d’éléments séparés par des virgules, entourée de crochets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"my_list=[4,7,3.7,'E',5,7] \n",
"#type(my_list) # list\n",
"#len(my_list) # longueur\n",
"#my_list[0] # premier terme\n",
"#my_list[1:3] # [7, 3.7]\n",
"#[0,1] + [2,4] # [0,1,2,4] (concatenation)\n",
"#l = [2,4,5,9,1,6,4]\n",
"#k = [x for x in l if x<6] #(extract under condition)\n",
"#range(6) # attention différent en Python 2 (liste) et 3 (boucle)\n",
"#range(2, 9, 2) \n",
"#[x for x in range(3,6)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### opérations sur les listes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"nombres = [17, 38, 10, 25, 72]\n",
"nombres.sort()\n",
"#nombres.append(12)\n",
"#nombres.reverse()\n",
"#nombres.remove(38)\n",
"#print(nombres.index(17))\n",
"#nombres[0] = 11\n",
"#nombres[1:3] = [14, 17, 2]\n",
"#nombres.count(17) # 2 \n",
"nombres"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"**ATTENTION**, par défaut, en Python, les listes ne sont pas copiées."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"x = [4, 2, 10, 9, \"toto\"]\n",
"print(x)\n",
"y = x # y: seulement un alias de x, pas une copie\n",
"y[2] = \"tintin\" # change x(2) en \"tintin\"\n",
"print(x) \n",
"#x = [4, 2, 10, 9, \"toto\"]\n",
"#y = x[:] # On demande une copie\n",
"#y[2] = \"tintin\" # modifie y mais pas x\n",
"#print(x) # [4, 2, 10, 9, \"toto\"]\n",
"#print(y) # [4, 2, \"tintin\", 9, \"toto\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### les tuples\n",
"\n",
"tuple : collection hétérogène, ordonnée, immuable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"t=(5,7)\n",
"type(t) # tuple\n",
"t[0] # 5\n",
"#t[0]=2 # error: item assignment for tuple"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Les boucles et opérateurs"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### if – [elif] – [else]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"x = 11\n",
"if x < 0:\n",
" print(\"x est negatif\")\n",
"elif x % 2:\n",
" print(\"x est positif et impair\")\n",
"else:\n",
" print(\"x n'est pas negatif et est pair\")\n",
" print(\"Eh oui !\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### while"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"N = 0\n",
"x = 687687.567476\n",
"while (x > 0):\n",
" x//=2\n",
" N+=1\n",
"print(\"Approx. de log_2(x) : \" + str(N-1))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### for"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"for lettre in \"ciao\":\n",
" print(lettre)\n",
" \n",
"for x in [\"\\n\",2,'a', 3.14,\"\\n\"]: \n",
" print(x) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Les fonctions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Exercice \n",
"*Ecrire une fonction f qui prend $n$ en entrée et calcule*\n",
"$$\\sum_{k=1}^{n-1} k^2$$\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def f(n):\n",
" print(sum([k**2 for k in range(n)])) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"f(3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exercice\n",
"*Ecrire une fonction f qui calcule le nième terme de la suite de Fibonacci initialisée par a et b, dont la relation de récurrence est* \n",
"$$ x_{n+2} = x_{n+1} + x_n.$$\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def fib(n,a=0,b=1): #0,1: val. par defaut \n",
" '''n-th term of Fibonacci sequence starting at a,b.'''#tutoriel de fib\n",
" for i in range(n):\n",
" z = a+b\n",
" a=b\n",
" b=z\n",
" return a "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"fib(7,2,2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Les Bibliothèques d’usage courant\n",
"\n",
"- **NumPy** : manipulation de tableaux numériques, fonctions mathématiques de base, simulation de variables aléatoires...\n",
"- **SciPy** : fonctions mathématiques plus avancées (résolution d’équations, d’équations différentielles, calcul d’intégrales...)\n",
"- **Matplotlib** : visualisation de données sous forme de graphiques scikit-learn : machine learning\n",
"- SymPy : calcul symbolique\n",
"\n",
"Calcul numérique = manipulations de nombres décimaux $\\neq$ Calcul symbolique = manipulation d’expressions symboliques\n",
"\n",
"Exemple : racines de $x^2 − x − 1 = 0$\n",
" - calcul symbolique : $\\frac{1+ \\sqrt{5}}{2}$ , $\\frac{1- \\sqrt{5}}{2}$\n",
" - calcul numérique : 1.618034, - 0.6180340"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Import de bibliothèques ou de fonctions\n",
"- import ma_bibliotheque\n",
"- ma_bibliotheque.la_fonction(...)\n",
"\n",
"- import ma_bibliotheque as bibli # raccourci \n",
"- bibli.la_fonction(...) \n",
"\n",
"Moins précis (car la bibliothèque d’origine des fonctions n’est pas précisée à leur appel) :\n",
"from ma_bibliotheque import la_fonction\n",
"- from ma_bibliotheque import la_fonction\n",
"- la_fonction (...)\n",
" \n",
"- from ma_bibliotheque import ∗ \n",
"- la_fonction (...) \n",
"- Attention si on importe plusieurs bibliothèques ayant les mêmes fonctions..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### La librairie `Numpy`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"import numpy as np # toujours commencer par cette commande quand on veut utiliser la bibliothèque numpy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"b = np.array([[8,3,2,4],[5,1,6,0],[9,7,4,1]])\n",
"print(b)\n",
"#type(b) # numpy.ndarray \n",
"#b.dtype # datatype: int\n",
"b.shape # (3,4)\n",
"#c = np.array([[8,2],[5,6],[9,7]], dtype=complex)\n",
"#c.dtype # datatype: complex\n",
"#c[0,0] # 8+0j\n",
"\n",
"#More than 2 dimensions:\n",
"#d = np.array([[[8,3],[1,2]],[[5,1],[4,5]],[[9,7],[4,5]]])\n",
"#d.shape # (3, 2, 2)\n",
"\n",
" ##Reshaping:\n",
"#x = d.reshape(4,3) # tableau de taille (4,3)\n",
"#d.reshape(12,1) # tableau de taille (12,1)\n",
"#d.reshape(12,) # tableau unidimensionel de taille 12\n",
"#np.insert(np.arange(4,9),3,17) # 4,5,6,17,7,8"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Attention, il existe aussi une classe numpy.matrix mais il est recommandé d'utiliser numpy.array."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Opérations sur les tableaux de nombres numpy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"X = np.arange(start=5,step=3,stop=16) # 5, 8,11,14\n",
"#A = np.ones((2,3)) # matrix filled with ones\n",
"#B = X.reshape(2,2)\n",
"#C = np.zeros((3,2)) # matrix filled with zeros\n",
"#D = np.eye(2) # identity matrix\n",
"#np.diag([1,2]) # diagonal matrix\n",
"#E = C+np.ones(C.shape) # addition: same as C+1\n",
"#F = B*D # entry-wise multiplication\n",
"#J=np.dot(B,D) # linear algebra product\n",
"#G = F.T # transpose matrix\n",
"#H = np.exp(G) #as most functions, exp is entry-wise(else use np.vectorize(my_function))\n",
"#x = np.array([4, 2, 1, 5, 1, 10])\n",
"#y=np.logical_and(x>=3, x<= 9, x!=1) # [T,F,F,T,F,F]\n",
"#x[y] \n",
"#print(np.mean(np.random.randn(1000)>1.96)) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### algèbre linéaire avec numpy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"A = np.array([[2, 1, 1], [4, 3, 0]])\n",
"B = np.array([[1, 2], [12, 0]])\n",
"C = np.array([[1, 2], [12, 0], [-1, 2]])\n",
"D = np.array([[1, 2, -4], [2, 0, 0], [1, 2, 3]])\n",
"E = np.concatenate((A,B), axis=1)\n",
"F = np.concatenate((C,D), axis=1)\n",
"G = np.concatenate((E,F),axis=0)\n",
"H = np.random.randn(5,5)\n",
"I = H*G # produit terme à terme \n",
"B5 = B**5 # puissance terme à terme\n",
"B5 =np.linalg.matrix_power(B, 5)\n",
"Bm1 = np.linalg.inv(B)\n",
"dB = np.linalg.det(B)\n",
"x = np.linalg.solve(B,[3,12]) #résout B*x=[[3],[12]]\n",
"print(E)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### analyse spectrale avec numpy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"A = np.array([[1, 2], [12, 3]])\n",
"x = np.linalg.eigvals(A) # eigenvalues\n",
"# eigenvalues and eigenvectors:\n",
"valp, vectp = np.linalg.eig(A) \n",
"\n",
"#Hermitian matrices methods:\n",
"S = np.array([[1, 2], [2, 3]])\n",
"y = np.linalg.eigvalsh(S) # eigenvalues\n",
"# eigenvalues and eigenvectors:\n",
"valp, vectp = np.linalg.eigh(S)\n",
" \n",
"#Singular Value Decomposition\n",
"U,s,V=np.linalg.svd(A) \n",
"Ap = np.matrix(U)*np.diag(s)*V\n",
"print(A-Ap)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### copie de tableaux numpy\n",
"**Attention encore une fois !!!**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"x = np.array([[8,3,2],[5,1,0],[9,7,1]])\n",
"y = x\n",
"x[0,0]+=1\n",
"x[0,0]-y[0,0] # 0\n",
"z=x.copy()\n",
"x[0,0]+=1\n",
"x[0,0]-z[0,0] # 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Génération de variables aléatoires discrètes avec numpy\n",
" - import numpy.random as npr\n",
" - my_sample = npr.ma_loi(paramètres, taille_du_tableau)\n",
"\n",
"- npr.randint(low=a,high=b,size=n) : v.a. unif. sur [ a, b[ \n",
"- npr.choice([a1,...,an],p=[p1,...,pn],size=n) : tirages indép. dans [a1,...,an] de loi [p1,...,pn] \n",
"- npr.permutation(mon_urne) : permutation de mon_urne \n",
"- npr.binomial(N,p,size=n)\n",
"- npr.geometric(p,size=n) \n",
"- npr.multinomial(n,tableau_des_probas,size=n) \n",
"- npr.poisson(alpha,size=n)\n",
"\n",
"Beaucoup d’autres exemples sur http://docs.scipy.org/doc/numpy/reference/routines.random.html"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Génération de variables aléatoires continues avec numpy\n",
" - import numpy.random as npr\n",
" - my_sample = npr.ma_loi(paramètres, taille_du_tableau)\n",
"\n",
"- npr.rand(d1,d2,...) : tableau d1 x d2 x... de v.a.i. unif. sur [0, 1]\n",
"- npr.uniform(low=a,high=b,size=n) : v.a.i. unif. sur [a, b[ \n",
"(size=n peut être remplacé par size=(d1,d2,...), comme partout dans ce qui suit)\n",
"- npr.randn(d1,d2,...) : tableau d1 x d2 x... de v.a.i. $\\mathcal{N}(0,1)$\n",
"- npr.multivariate_normal(mean=V,cov=C,size=n) : vecteurs aléatoires indépendants de loi $\\mathcal{N} (V, C)$ rangés dans un tableau de taille $n\\times N$,où $N$ est la taille de $V$ et $N\\times N$ celle de $C$\n",
"- npr.exponential(scale=s,size=n) : v.a.i. exponentielles de moyenne s\n",
"\n",
"Beaucoup d’autres exemples sur http://docs.scipy.org/doc/numpy/reference/routines.random.html"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Fonctions de numpy utiles en proba/stat\n",
"- np.mean(x), np.std(x), np.percentile(x) : moyenne, écart-type et percentile d’un vecteur x (échantillon)\n",
"- np.sum(x) somme des valeurs de x\n",
"- np.cumsum(x) vecteur [x1,x1 +x2,...,x1 +···+xn] des sommes cumulées des coordonnées x1, . . . , xn de x \n",
"- np.cov(x) matrice n × n de covariance des lignes du tableau x de taille n × p\n",
"- scipy.stats : bibliothèque proposant densités, fonctions de répartition, quantiles, etc... de lois classiques. Cf http://docs.scipy.org/doc/scipy/reference/stats.html\n",
"- matplotlib.pyplot bibliothèque d’affichage graphique "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Boucles vs programmation matricielle\n",
"** -> Eviter si possible les boucles en Python !!!**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from time import time"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"n = int(1e7)\n",
"# Methode 1. Boucle for\n",
"t1 = time()\n",
"gamma1=sum([1./i for i in range(1,n+1)]) - np.log(n)\n",
"t2 = time()\n",
"temps1 = t2 - t1\n",
"# Methode 2. Numpy\n",
"t1 = time()\n",
"gamma2=np.sum(1. / np.arange(1,n+1)) - np.log(n)\n",
"t2 = time()\n",
"temps2 = t2 - t1\n",
"print(\"Facteur de gain: \", temps1/temps2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"from timeit import timeit\n",
"N = 100\n",
"setup = \"\"\"\"\"\n",
"import numpy as np\n",
"n = int(1e5)\n",
"\"\"\"\"\"\n",
"code_boucle = \"\"\"\n",
"np.sum([1. / i for i in range(1, n)]) - np.log(n)\n",
"\"\"\"\n",
"time_boucle=timeit(code_boucle,setup=setup,number=N)\n",
"\n",
"code_numpy = \"\"\"\n",
"np.sum(1. / np.arange(1, n)) - np.log(n)\n",
"\"\"\"\n",
"time_numpy=timeit(code_numpy,setup=setup,number=N) \n",
"\n",
"print(\"Facteur : {}\".format(time_boucle/time_numpy))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Affichage graphique avec `matplotlib`\n",
"\n",
" - import matplotlib.pyplot as plt \n",
" \n",
"\n",
"- plt.plot(x,y) affiche la courbe affine par morceaux reliant les points d’abscisses x et d’ordonnées y (nombreuses options) pour x, y vecteurs de même dimension,\n",
"- plt.hist trace un histogramme. Deux options pour les colonnes : bins= nombre de colonnes ou bins= abscisses des séparations des colonnes\n",
"- plt.bar trace un diagramme en bâtons \n",
"- plt.scatter(x,y) affiche le nuage de points d’abs. x et d’ord. y\n",
"- plt.stem(x,y) affiche des barres verticales d’abs. x et hauteur y\n",
"- plt.axis([xmin,xmax,ymin,ymax]) définit les intervales couverts par la figure\n",
"- plt.axis(’scaled’) impose que les échelles en x et en y soient les mêmes\n",
"- plt.show() affiche les fenêtres créées dans le script\n",
"- plt.figure() crée une nouvelle fenêtre graphique\n",
"- plt.title(\"mon titre\") donne un titre à une figure\n",
"- plt.legend(loc=’best’) affiche la légende d’un graphique (en position optimale)\n",
"- plt.subplot subdivise la fenêtre graphique de façon à y afficher plusieurs graphiques"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Exemple : représentation d’un échantillon de loi discrète"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"import scipy.stats as sps\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"n, p, N = 20, 0.3, int(1e4)\n",
"B = np.random.binomial(n, p, N)\n",
"f = sps.binom.pmf(np.arange(n+1), n, p)\n",
"plt.hist(B,bins=n+1,density = 1,range=(-.5,n+.5),color=\"white\",label=\"loi empirique\")\n",
"plt.stem(np.arange(n+1),f,\"r\",label=\"loi theorique\")\n",
"plt.legend()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Histogramme d’un échantillon de loi continue"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exercice\n",
"\n",
"1. Créer un vecteur E contenant 10000 réalisations indépendantes d'une loi $\\mathcal{N}(0,1)$.\n",
"2. Aficher sur le même graphique ce vecteur et la loi gaussienne théorique, obtenue grâce à la fonction norm.pdf de la bibliothèque scipy. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"E = np.random.randn(int(1e5))#echantillon\n",
"x = np.linspace(-4,4,1000)\n",
"f_x = sps.norm.pdf(x) #Densite gaussienne\n",
"plt.plot(x,f_x,\"r\",label=\"Theory\")\n",
"#Affichage histo:\n",
"plt.hist(E,bins=50,density=1,label=\"Data\")\n",
"plt.legend(loc='best')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## La loi des grands nombres\n",
"\n",
"### Theorème, (Loi des Grands Nombres), Kolmogorov, 1929 \n",
"\n",
"Soient $(X_i)_{i\\ge 1}$ copies indépendantes d'une même variable aléatoire $X$:\\begin{eqnarray*} \\mathbb{E}[|X|] <\\infty&\\implies& S_n:=\\frac{X_1+\\dots+X_n}{n}%\\underset{n\\to\\infty}{\\stackrel{p.s.}{\\longrightarrow}}\n",
" \\underset{n\\to\\infty}{\\longrightarrow}\n",
" \\mathbb{E}[X],\\\\\n",
" \\mathbb{E}[|X|] =\\infty&\\implies& S_n:=\\frac{X_1+\\dots+X_n}{n}\\;\\textrm{ diverge}.\\end{eqnarray*} "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exercice\n",
"\n",
"1. Tirer un vecteur $X$ dont les coordonnées sont $n$ réalisations indépendantes de la loi uniforme sur $[0,1]$. Afficher sur le même graphique la courbe $S_n$ en fonction de $n$ et une droite qui vaut $\\mathbb{E}[X]$ pour tout $n$. Commentez le résultat. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"wn=10000\n",
"S=np.cumsum(np.random.rand(n))/np.arange(1,n+1)\n",
"plt.plot(range(1,n+1),S,'r',label=\"S_n\")\n",
"plt.plot((1,n),(.5,.5),\"b--\",label=\"Esperance\")\n",
"plt.ylabel('S_n')\n",
"plt.xlabel(\"n\")\n",
"plt.legend(loc='best')\n",
"plt.title(\"LGN\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"2. On va maintenant se placer dans le cas d'une distribution dont l'espérance n'est pas forcément finie. Soient $s,U$ v.a. indépendantes, $U$ uniforme sur $[0,1]$ et $s=\\pm 1$ avec probas 0.5,0.5. Alors la v.a. $X:=s U^{-1/\\alpha}$, appelée ici **$\\alpha$-variable aléatoire**, a pour densité $ (\\alpha/2)\\mathbf{1}_{|x|\\ge 1} |x|^{-\\alpha-1}$ et est d'espérance finie ssi $\\alpha>1$. Afficher à nouveau $S_n$ en fonction de $n$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"alpha, n = 0.9, 1000\n",
"U = np.random.rand(n)\n",
"X = (2*np.random.randint(0,2,n)-1)*U**(-1/alpha)\n",
"S=np.cumsum(X)/np.arange(1,n+1)\n",
"plt.title(\"LGN: convergence ou pas selon alpha\")\n",
"plt.plot((0,n),(0,0),\"b--\")\n",
"plt.plot(S,\"r\",label=\"Moyenne empirique\")\n",
"plt.xlabel(\"n\")\n",
"plt.ylabel(\"S_n\")\n",
"plt.legend(loc='best')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"3. On se replace dans le cas où les $X_i$ suivent une loi uniforme sur $[0,1]$. On peut se demander maintenant à quelle vitesse $S_n$ converge vers $\\mu = \\mathbb{E}[X]$ ? On cherche donc $\\beta$\n",
" tel que ${n^\\beta(S_n-\\mu)\\longrightarrow \\ell\\ne 0}$,\n",
"i.e. $${S_n\\approx \\mu +\\frac{\\ell}{n^\\beta}}.$$\n",
"Pour $\\beta$ fixé, afficher sur un même graphique plusieurs courbes $(n, n^\\beta(S_n-\\mu))$ à l'aide d'une boucle for. Essayez par exemple $\\beta =0.2$, $\\beta = 0.5$, $\\beta = 0.7$. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"beta = 0.8\n",
"n=1000\n",
"for k in range(10):\n",
" S=np.cumsum(np.random.rand(n))/np.arange(1,n+1)\n",
" T = (S - 0.5)*np.arange(1,n+1)**beta\n",
" plt.plot(range(1,n+1),T)\n",
" plt.plot((1,n),(0,0),\"b--\")\n",
"\n",
"plt.ylabel('n^beta(S_n-mu)')\n",
"plt.xlabel(\"n\")\n",
"plt.legend(loc='best')\n",
"plt.title(\"n^beta (erreur dans la LGN)\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Théorème de la limite centrale, Laplace, 1812, Lindeberg, 1920\n",
"$(X_i)_{i\\ge 1}$ v.a. indépendantes de même loi d'espérance $\\mu$ et d'écart-type $\\sigma$: $$T_n = \\frac{n^{1/2}}{\\sigma}\\left(\\frac{X_1+\\dots+X_n}{n}-\\mu\\right)\\underset{n\\to\\infty}{\\stackrel{loi}{\\longrightarrow}} \\mathcal{N}(0,1). $$\n",
"\n",
"En d'autres termes, la moyenne empirique des $X_i$ $\\approx$ moyenne théorique, avec une erreur aléatoire gaussienne d'ordre ${1/\\sqrt{n}}$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"1. Tirer aléatoirement $m$ vecteurs $(X_1,\\dots, X_n)$ avec tous les $X_i$ i.i.d. de loi uniforme sur $[-1,1]$ et afficher l'histogramme des $m$ valeurs de $T_n$ ainsi obtenues. Afficher sur le même graphique la loi gaussienne centrée d'écart-type $\\sigma$. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"n,m,sigma=1000,10000,3**(-1/2)\n",
"X=2*np.random.rand(m,n)-1\n",
"S=np.sum(X,axis=1)/(np.sqrt(n)*sigma)\n",
"M=max(np.abs(S))\n",
"x=np.linspace(-M,M,1000)\n",
"y=sps.norm.pdf(x)\n",
"plt.plot(x,y,'r',label=\"densite\")\n",
"plt.hist(S,bins=int(round(m**(1./3)*M*.5)),density=1,histtype='step',label=\"Histogramme\")\n",
"plt.legend(loc='best')\n",
"plt.title(\"TCL\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Interprétation \n",
"Généralement, une grande somme de petits aléas peu corrélés fluctue autour de sa moyenne selon une\n",
"distribution gaussienne.\n",
"\n",
"**Exemples** : anatomie (ex : taille des individus de sexe donné), QI, nombreuses mesures physiques, données économiques, etc..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Lecture et écriture dans un fichier externe\n",
"- x=range(5)\n",
"- mon_flux=open(\"my_data.txt\",\"w\") #w=write\n",
"- mon_flux.write(str(x))\n",
"- mon_flux.close()\n",
"\n",
"Ecriture a la fin d'un fichier existant:\n",
"- mon_flux=open(\"my_data.txt\",\"a\") #a=append\n",
"- mon_flux.write(\"\\n\"+str(x+1))\n",
"- mon_flux.close()\n",
"\n",
"Lecture:\n",
"- mon_flux=open(\"my_data.txt\",\"r\") #r=read\n",
"- y=mon_flux.read()\n",
"- print(y)\n",
" \n",
"Voir aussi np.save, np.load, np.savetxt, csv.reader...\n",
"- import numpy as np\n",
"- x=np.random.rand(5,3)\n",
"- np.save(\"my_npy_file.npy\",x) #creation et ecriture\n",
"- x2=np.load(\"my_npy_file.npy\") #lecture"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Lecture dans un fichier externe : exemple "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"f=open(\"PopLynxRegionCanada_1821_1934.dat\",\"r\")\n",
"ytxt=f.readlines() # list de str\n",
"y=[int(row) for row in ytxt] # convertit str en int\n",
"plt.plot(range(1821,1935),y,\"r\")\n",
"plt.title(\"Population de lynx\")\n",
"plt.tight_layout() # pratique pour l’export\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Python pour l’analyse : exemples"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"import scipy\n",
"import matplotlib.pyplot as plt\n",
"def f(x):\n",
" return np.exp(x)+x\n",
"#zeros of f, computed starting at -0.2:\n",
"a=scipy.optimize.fsolve(f,-.2)\n",
"print(a,f(a))\n",
"#integral of f from 0 to 1:\n",
"b=scipy.integrate.quad(f,0,1)\n",
"print(b)\n",
"def g(y,t):\n",
" return y\n",
"T=np.arange(start=0,stop=1,step=.001)\n",
"##solution, at T, of y’=g(y,t), y(T[0])=1:\n",
"y=scipy.integrate.odeint(g,1,T)\n",
"plt.plot(T,np.log(y),\"r\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Bibliographie : les tutoriels/sites officiels\n",
"- Python : https://docs.python.org/2/tutorial/\n",
"- NumPy : http://docs.scipy.org/doc/numpy/reference/\n",
"- ScipyStats : http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html\n",
"- Matplotlib : http://matplotlib.org/users/pyplot_tutorial.html\n",
"- NumPy user guide (pdf) : https://docs.scipy.org/doc/numpy-1.8.0/numpy-user-1.8.0.pdf\n",
"- Matplotlib user guide (pdf) : http://matplotlib.org/Matplotlib.pdf\n",
"- scikit-learn : http://scikit-learn.org/stable/\n",
"- SymPy : http://www.sympy.org/fr/index.html\n",
"- Anaconda (distribution contenant les interfaces dedéveloppement Spyder et Jupyter) : https://www.continuum.io/downloads"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Quelques cours \n",
"\n",
"- Un cours de l’X : http://www.cmap.polytechnique.fr/~gaiffas/intro_python.html\n",
"- Un cours du lycée Saint Louis : http://mathprepa.fr/python-project-euler-mpsi/\n",
"- Un cours de l’INRIA : http://www.labri.fr/perso/nrougier/teaching/index.html\n",
"- Un cours d’Orsay : http://www.iut-orsay.u-psud.fr/fr/specialites/mesures_physiques/mphy_pedagogie.html"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}