{"cells":[{"metadata":{},"cell_type":"markdown","source":"\n# TP1 Probabilité et statistique\n\n \n HTML Base Tag Example\n \n \n \n \"Logo\n \n\n\n"},{"metadata":{"trusted":false},"cell_type":"code","source":"from __future__ import print_function\nimport numpy as np\nimport pandas as pd\nfrom ipywidgets import interact, interactive, fixed, interact_manual\nimport ipywidgets as widgets","execution_count":1,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Probabilités - approche fréquentiste\n### Définition par la fréquence relative :\n* une expérience d’ensemble fondamental est exécutée plusieurs fois sous les mêmes conditions.\n* Pour chaque événement E de , n(E) est le nombre de fois où l’événement E survient lors des n premières répétitions de l’expérience.\n* P(E), la probabilité de l’événement E est définie de la manière suivante :\n\n$$P(E)=\\lim_{n\\to\\infty}\\dfrac{n(E)}{n} $$ "},{"metadata":{},"cell_type":"markdown","source":"## Simulation d'un dé parfait"},{"metadata":{"trusted":false},"cell_type":"code","source":"# seed the random number generator\nnp.random.seed(1)\n\n# Example: sampling \n#\n# do not forget that Python arrays are zero-indexed,\n# and the 2nd argument to NumPy arange must be incremented by 1\n# if you want to include that value\nn = 6\nk = 200000\nT=np.random.choice(np.arange(1, n+1), k, replace=True)\nunique, counts = np.unique(T, return_counts=True)\ndic=dict(zip(unique, counts))\ndf=pd.DataFrame(list(dic.items()),columns=['i','Occurence'])\ndf.set_index(['i'], inplace=True)\ndf['Freq']=df['Occurence']/k\ndf['P({i})']='{}'.format(1/6)\ndf","execution_count":2,"outputs":[{"data":{"text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
OccurenceFreqP({i})
i
1333460.1667300.16666666666666666
2332210.1661050.16666666666666666
3334470.1672350.16666666666666666
4332240.1661200.16666666666666666
5333790.1668950.16666666666666666
6333830.1669150.16666666666666666
\n
","text/plain":" Occurence Freq P({i})\ni \n1 33346 0.166730 0.16666666666666666\n2 33221 0.166105 0.16666666666666666\n3 33447 0.167235 0.16666666666666666\n4 33224 0.166120 0.16666666666666666\n5 33379 0.166895 0.16666666666666666\n6 33383 0.166915 0.16666666666666666"},"execution_count":2,"metadata":{},"output_type":"execute_result"}]},{"metadata":{},"cell_type":"markdown","source":"## Ajouter de l'intéraction "},{"metadata":{"trusted":false},"cell_type":"code","source":"def dice_sim(k=100):\n n = 6 \n T=np.random.choice(np.arange(1, n+1), k, replace=True)\n unique, counts = np.unique(T, return_counts=True)\n dic=dict(zip(unique, counts))\n df=pd.DataFrame(list(dic.items()),columns=['i','Occurence'])\n df.set_index(['i'], inplace=True)\n df['Freq']=df['Occurence']/k\n df['P({i})']='{0:.3f}'.format(1/6)\n return df\n ","execution_count":3,"outputs":[]},{"metadata":{"trusted":false},"cell_type":"code","source":"dice_sim(100)","execution_count":4,"outputs":[{"data":{"text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
OccurenceFreqP({i})
i
1190.190.167
2160.160.167
3170.170.167
4150.150.167
5150.150.167
6180.180.167
\n
","text/plain":" Occurence Freq P({i})\ni \n1 19 0.19 0.167\n2 16 0.16 0.167\n3 17 0.17 0.167\n4 15 0.15 0.167\n5 15 0.15 0.167\n6 18 0.18 0.167"},"execution_count":4,"metadata":{},"output_type":"execute_result"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"interact(dice_sim,k=widgets.IntSlider(min=1000, max=50000, step=500, value=10));","execution_count":5,"outputs":[{"data":{"application/vnd.jupyter.widget-view+json":{"model_id":"a195f2610f9d457c8e4a479ce2fc91d9","version_major":2,"version_minor":0},"text/plain":"interactive(children=(IntSlider(value=1000, description='k', max=50000, min=1000, step=500), Output()), _dom_c…"},"metadata":{},"output_type":"display_data"}]},{"metadata":{},"cell_type":"markdown","source":"## Cas d'un dé truqué"},{"metadata":{"trusted":false},"cell_type":"code","source":"p=[0.1, 0.1, 0.1, 0.1,0.1,0.5]\nsum(p)","execution_count":6,"outputs":[{"data":{"text/plain":"1.0"},"execution_count":6,"metadata":{},"output_type":"execute_result"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"def dice_sim(k=100,q=[[0.1, 0.1, 0.1, 0.1,0.1,0.5],[0.2, 0.1, 0.2, 0.1,0.1,0.3]]):\n n = 6\n qq=q\n T=np.random.choice(np.arange(1, n+1), k, replace=True,p=qq)\n unique, counts = np.unique(T, return_counts=True)\n dic=dict(zip(unique, counts))\n df=pd.DataFrame(list(dic.items()),columns=['i','Occurence'])\n df.set_index(['i'], inplace=True)\n df['Freq']=df['Occurence']/k\n df['P({i})']=['{0:.3f}'.format(j) for j in q]\n return df","execution_count":7,"outputs":[]},{"metadata":{"trusted":false},"cell_type":"code","source":"interact(dice_sim,k=widgets.IntSlider(min=1000, max=50000, step=500, value=10));","execution_count":8,"outputs":[{"data":{"application/vnd.jupyter.widget-view+json":{"model_id":"49aa41fa0f80460fac61e032025cbdb8","version_major":2,"version_minor":0},"text/plain":"interactive(children=(IntSlider(value=1000, description='k', max=50000, min=1000, step=500), Dropdown(descript…"},"metadata":{},"output_type":"display_data"}]},{"metadata":{},"cell_type":"markdown","source":"## Exercice 1: \n\nTester l'intéraction précédente pour plusieurs valeurs de `p`\nDonner votre conclusion :"},{"metadata":{"trusted":false},"cell_type":"code","source":"# Conclusion \n# la simulation montra que reagit avec la meme maniere \n# on remarque que la frequance est appepre egal a la valeur de la probabilite quan a deja fixer","execution_count":9,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Permutation Aléatoire"},{"metadata":{"trusted":false},"cell_type":"code","source":"np.random.seed(2)\n\nm = 1\nn = 10\n\nv = np.arange(m, n+1)\nprint('v =', v)\n\nnp.random.shuffle(v)\nprint('v, shuffled =', v)","execution_count":10,"outputs":[{"name":"stdout","output_type":"stream","text":"v = [ 1 2 3 4 5 6 7 8 9 10]\nv, shuffled = [ 5 2 6 1 8 3 4 7 10 9]\n"}]},{"metadata":{},"cell_type":"markdown","source":"## Exercice 2\nVérifier que les permutation aléatoires sont uniforme , c'est à dire que la probabilité de générer une permutation d'élement de {1,2,3} est 1/6.\nEn effet les permutations de {1,2,3} sont :\n* 1 2 3\n* 1 3 2\n* 2 1 3\n* 2 3 1\n* 3 1 2\n* 3 2 1\n"},{"metadata":{"trusted":false},"cell_type":"code","source":"k =10\nm = 1\nn = 3\nv = np.arange(m, n+1)\nT=[]\nfor i in range(k):\n np.random.shuffle(v)\n w=np.copy(v)\n T.append(w)","execution_count":11,"outputs":[]},{"metadata":{"trusted":false},"cell_type":"code","source":"TT=[str(i) for i in T]\nTT","execution_count":12,"outputs":[{"data":{"text/plain":"['[1 3 2]',\n '[3 1 2]',\n '[1 2 3]',\n '[1 3 2]',\n '[3 1 2]',\n '[3 1 2]',\n '[1 3 2]',\n '[2 1 3]',\n '[3 1 2]',\n '[2 3 1]']"},"execution_count":12,"metadata":{},"output_type":"execute_result"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"k =1000\nm = 1\nn = 3\nv = np.arange(m, n+1)\nT=[]\nfor i in range(k):\n np.random.shuffle(v)\n w=np.copy(v)\n T.append(w)\n\nTT=[str(i) for i in T]\nunique, counts = np.unique(TT, return_counts=True)\ndic=dict(zip(unique, counts))\ndf=pd.DataFrame(list(dic.items()),columns=['i','Occurence'])\ndf.set_index(['i'], inplace=True)\ndf['Freq']=df['Occurence']/k\ndf['P({i,j,k})']='{0:.3f}'.format(1/6)\ndf","execution_count":13,"outputs":[{"data":{"text/html":"
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
OccurenceFreqP({i,j,k})
i
[1 2 3]1690.1690.167
[1 3 2]1800.1800.167
[2 1 3]1610.1610.167
[2 3 1]1690.1690.167
[3 1 2]1570.1570.167
[3 2 1]1640.1640.167
\n
","text/plain":" Occurence Freq P({i,j,k})\ni \n[1 2 3] 169 0.169 0.167\n[1 3 2] 180 0.180 0.167\n[2 1 3] 161 0.161 0.167\n[2 3 1] 169 0.169 0.167\n[3 1 2] 157 0.157 0.167\n[3 2 1] 164 0.164 0.167"},"execution_count":13,"metadata":{},"output_type":"execute_result"}]},{"metadata":{},"cell_type":"markdown","source":"### Donner votre conclusion en expliquant le script "},{"metadata":{"trusted":true},"cell_type":"code","source":"## Explication \n\n# la simulation montra que reagit avec la meme maniere\n","execution_count":1,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Probabilité conditionnelle "},{"metadata":{},"cell_type":"markdown","source":"Rappelons que l'interprétation fréquentiste de la probabilité conditionnelle basée sur un grand nombre `n` de répétitions d'une expérience est $ P (A | B) ≈ n_ {AB} / n_ {B} $, où $ n_ {AB} $ est le nombre de fois où $ A \\cap B $ se produit et $ n_ {B} $ est le nombre de fois où $ B $ se produit. Essayons cela par simulation et vérifions les résultats de l'exemple 2.2.5. Utilisons donc [`numpy.random.choice`] (https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html) pour simuler les familles` n`, chacun avec deux enfants.\n"},{"metadata":{"trusted":false},"cell_type":"code","source":"np.random.seed(34)\n\nn = 10**5\nchild1 = np.random.choice([1,2], n, replace=True) \nchild2 = np.random.choice([1,2], n, replace=True) \n\nprint('child1:\\n{}\\n'.format(child1))\n\nprint('child2:\\n{}\\n'.format(child2))","execution_count":15,"outputs":[{"name":"stdout","output_type":"stream","text":"child1:\n[2 1 1 ... 1 2 1]\n\nchild2:\n[2 2 2 ... 2 2 1]\n\n"}]},{"metadata":{},"cell_type":"markdown","source":"Ici, «child1» est un «tableau NumPy» de longueur «n», où chaque élément est un 1 ou un 2. En laissant 1 pour «fille» et 2 pour «garçon», ce «tableau» représente le sexe du enfant aîné dans chacune des familles «n». De même, «enfant2» représente le sexe du plus jeune enfant de chaque famille.\n"},{"metadata":{"trusted":false},"cell_type":"code","source":"np.random.choice([\"girl\", \"boy\"], n, replace=True)","execution_count":16,"outputs":[{"data":{"text/plain":"array(['boy', 'boy', 'boy', ..., 'boy', 'boy', 'boy'], dtype='