{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "internals": { "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "Aprendiendo Machine Learning con el Mundial\n", "===========================================\n", "\n", "Nombre alternativo de la charla: Antonio\n", "\n", "Juan Pedro Fisanotti" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "internals": { "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "Objetivo de la charla:\n", "======================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "internals": { "frag_number": 2 }, "slideshow": { "slide_type": "fragment" } }, "source": [ "- Aprender algunas nociones básicas de Machine Learning\n", "- Reforzar esos conocimientos con un ejemplo real" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "Machine Learning\n", "================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Qué es?\n", "=======" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- Rama de la inteligencia artificial\n", "- Programas que \"aprenden desde los datos\"\n", "- Cercana a la estadística y métodos matemáticos" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Aprendizaje no supervisado\n", "==========================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Idea principal: reconocer patrones o **agrupaciones** en los datos.\n", "\n", "No interpreta los grupos, eso nos toca a nosotros." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/unsupervised_learning_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/unsupervised_learning_2.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Aprendizaje supervisado\n", "=======================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Idea principal: encontrar una **función** que explique los datos de salida a partir de los datos de entrada.\n", "\n", "Nosotros tenemos **ambos** grupos de datos." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/supervised_learning_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Aprendizaje supervisado - Regresión\n", "===================================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Una función corriente y moliente." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/regression_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/regression_2.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Aprendizaje supervisado - Clasificación\n", "=======================================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Una función que devuelva **etiquetas**." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/clasification_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/clasification_2.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "Algunas nociones importantes de aprendizaje supervisado\n", "=======================================================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Idea de hipótesis\n", "=================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/idea_hipotesis_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/idea_hipotesis_2.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Es una función **candidata** a ser la que buscamos para predecir las salidas. \n", "\n", "Los algoritmos van a probar **muchas** hipótesis, y devolvernos la mejor que encuentren." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Idea de modelo\n", "==============" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/idea_modelo_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/idea_modelo_2.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Es el tipo de función, el **molde**, que vamos a utilizar para armar nuestras hipótesis.\n", "\n", "Por lo general, un algoritmo específico de IA sabe trabajar con un tipo específico de funciones, o grupo de tipos similares." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Entrenamiento y predicción\n", "=====================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/entrenamiento.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/prediccion.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "* Primero entrenamos para encontrar una buena hipótesis\n", "* Luego podemos predecir nuevos casos usando la hipótesis encontrada" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Modelos paramétricos y no paramétricos\n", "======================================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Hablando mal y pronto:\n", "\n", "* **Modelo paramétrico**: arma una f() con algunos parámetros, y eso **solo** basta para hacer predicciones.\n", "\n", "* **Modelo no paramétrico**: tiene una f() que consulta los datos. Para predecir hace falta la f() **más** los datos." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/parametricos_vs_no_parametricos.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Sobreentrenamiento\n", "==================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/high_variance_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/high_variance_2.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/high_variance_3.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/high_variance_4.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/high_variance_5.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/high_variance_6.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "Algunos algoritmos de aprendizaje supervisado\n", "=============================================\n", "\n", "(ejemplos, hay muchos)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "K-Vecinos\n", "=========" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/k_neighbors_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/k_neighbors_2.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/k_neighbors_3.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/k_neighbors_4.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "- Es **no paramétrico**. Necesita los datos siempre.\n", "- Es muy **simple**, no tiene tiempo de entrenamiento." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Redes neuronales\n", "================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/neural_networks_1.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "![](files/neural_networks_2.svg)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "- Es **paramétrico**, aprende una función y con eso puede predecir, sin llevarse los datos.\n", "- Hay que entrenarlas, y es un proceso **pesado** y **lento**.\n", "- Pueden aprender funciones super complejas (técnicamente, **cualquier** función)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "El Mundial\n", "==========" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- Gente que patea una **pelota**.\n", "- Aparentemente, meter la pelota en el **arco** del otro equipo, es bueno. No se sabe bien por qué.\n", "- Cada tanto hay una competencia para ver qué país tiene los mejores peloteadores." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "El problema\n", "===========" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Cometí el error de hablar demasiado en un almuerzo con la familia de mi novia (me suele pasar).\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- Prode familiar!!\n", "- Participación obligatoria\n", "- El perdedor cocina para todos\n", "- Se tanto de cocina como de fútbol" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "La solución\n", "===========" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- Armar un **clasificador**\n", "- Entradas: partidos\n", "- Salidas: quiénes ganan los partidos" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Paso 1: obtener datos\n", "=====================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- Suele renegarse. Renegué (lugar, formato, completitud, etc).\n", "- Mejor fuente: Wikipedia\n", "- Formato: html, no muy estandarizado --> scrapeo + copy/paste --> csv" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Paso 2: pre-procesar los datos\n", "==============================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- \"**Numerizar**\" nombres de equipos --> estadísticas\n", "- **Normalizar** valores\n", "- Contrarrestar bias de **orden** de equipo\n", "- **Foco** en lo que pueda predecir mejor. Ganados vs empatados." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Paso 3: los datos son clasificables?\n", "====================================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- No se pueden graficar todas las dimensiones\n", "- Confirmadas algunas intuiciones obvias" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Paso 4: elegir algoritmo\n", "========================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- K-vecinos?\n", "- Redes neuronales? " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Paso 5: entrenamiento\n", "=====================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- Back propagation\n", "- Evitar sobreentrenamiento" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "Paso 6: predecir!\n", "=================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "- No olvidar re-pre-procesar, normalizar, etc.\n", "- Ida y vuelta\n", "- Todos contra todos\n", "- Re-entrenamiento entre etapas" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Ya que estamos, competir online en El Ega." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "El resultado\n", "============" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Prode familiar:" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "**Primer puesto!!**" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "El Ega (+200 personas):" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "**Primer puesto!!!!!!!**" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "Conclusiones\n", "============" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Machine Learning es algo que puede **fácilmente** aplicarse a problemas reales y no tan grandes, con resultados **suficientemente** buenos (mejores que un humano) para el problema.\n", "\n", "Hay que perderle el miedo." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "?\n", "====" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "Links útiles\n", "============\n", "\n", "El código del predictor del mundial visualizado con ejemplos de datos y gráficos:\n", "\n", "http://nbviewer.ipython.org/github/fisadev/world_cup_learning/blob/master/learn.ipynb\n", "\n", "El código fuente de eso mismo en github: \n", "\n", "https://github.com/fisadev/world_cup_learning" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2+" } }, "nbformat": 4, "nbformat_minor": 0 }