{"cells":[{"cell_type":"markdown","source":"# Feature Selection : Wrapper Methods\n\nEl proceso de selección de características se basa en un algoritmo de aprendizaje automático específico que intentamos encajar en un conjunto de datos determinado.\n\nSigue un enfoque de búsqueda codiciosa al evaluar todas las posibles combinaciones de características contra el criterio de evaluación. El criterio de evaluación es simplemente la medida del desempeño que depende del tipo de problema, por ejemplo, para el criterio de evaluación de regresión puede ser p-valores, R-cuadrado, R-cuadrado ajustado, de manera similar para la clasificación el criterio de evaluación puede ser accuracy, precision, recall, puntaje f1, etc. Finalmente, selecciona la combinación de características que da el resultados óptimos para el algoritmo de aprendizaje automático especificado.\n\n![whapeer.gif]()","metadata":{"id":"_M75CFhDyXXg","cell_id":"d5b7f6a40bbd418fbf14242fe717338f","deepnote_cell_type":"markdown"}},{"cell_type":"markdown","source":"Los metodos mas comunes son:\n1. Forward Selection\n2. Backward elimination\n3. Bi-directional elimination (stepwise)\n\nAhora analicemos los métodos con un ejemplo del conjunto de datos de precios de la vivienda de Boston disponible en sklearn. El conjunto de datos contiene 506 observaciones de 14 características diferentes. El conjunto de datos se puede importar utilizando la función load_boston() disponible en el módulo sklearn.datasets.","metadata":{"id":"tfSn3TAxyt8R","cell_id":"e00ea4a890d644febf97c553f7c88eee","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"from sklearn.datasets import load_boston\nboston = load_boston()\nprint(boston.data.shape) # dataset dimension\nprint(boston.feature_names) # nombre feature \nprint(boston.target) # target variable\nprint(boston.DESCR) # data description","metadata":{"id":"6ptXwgv9Nbvg","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"d7493c525e6e49fd8d83abfdbb0dd1d4","outputId":"c7382232-66f6-427b-aa5f-e565c4e740d2","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":451,"user_tz":300,"timestamp":1642854243684},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"(506, 13)\n['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'\n 'B' 'LSTAT']\n[24. 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 15. 18.9 21.7 20.4\n 18.2 19.9 23.1 17.5 20.2 18.2 13.6 19.6 15.2 14.5 15.6 13.9 16.6 14.8\n 18.4 21. 12.7 14.5 13.2 13.1 13.5 18.9 20. 21. 24.7 30.8 34.9 26.6\n 25.3 24.7 21.2 19.3 20. 16.6 14.4 19.4 19.7 20.5 25. 23.4 18.9 35.4\n 24.7 31.6 23.3 19.6 18.7 16. 22.2 25. 33. 23.5 19.4 22. 17.4 20.9\n 24.2 21.7 22.8 23.4 24.1 21.4 20. 20.8 21.2 20.3 28. 23.9 24.8 22.9\n 23.9 26.6 22.5 22.2 23.6 28.7 22.6 22. 22.9 25. 20.6 28.4 21.4 38.7\n 43.8 33.2 27.5 26.5 18.6 19.3 20.1 19.5 19.5 20.4 19.8 19.4 21.7 22.8\n 18.8 18.7 18.5 18.3 21.2 19.2 20.4 19.3 22. 20.3 20.5 17.3 18.8 21.4\n 15.7 16.2 18. 14.3 19.2 19.6 23. 18.4 15.6 18.1 17.4 17.1 13.3 17.8\n 14. 14.4 13.4 15.6 11.8 13.8 15.6 14.6 17.8 15.4 21.5 19.6 15.3 19.4\n 17. 15.6 13.1 41.3 24.3 23.3 27. 50. 50. 50. 22.7 25. 50. 23.8\n 23.8 22.3 17.4 19.1 23.1 23.6 22.6 29.4 23.2 24.6 29.9 37.2 39.8 36.2\n 37.9 32.5 26.4 29.6 50. 32. 29.8 34.9 37. 30.5 36.4 31.1 29.1 50.\n 33.3 30.3 34.6 34.9 32.9 24.1 42.3 48.5 50. 22.6 24.4 22.5 24.4 20.\n 21.7 19.3 22.4 28.1 23.7 25. 23.3 28.7 21.5 23. 26.7 21.7 27.5 30.1\n 44.8 50. 37.6 31.6 46.7 31.5 24.3 31.7 41.7 48.3 29. 24. 25.1 31.5\n 23.7 23.3 22. 20.1 22.2 23.7 17.6 18.5 24.3 20.5 24.5 26.2 24.4 24.8\n 29.6 42.8 21.9 20.9 44. 50. 36. 30.1 33.8 43.1 48.8 31. 36.5 22.8\n 30.7 50. 43.5 20.7 21.1 25.2 24.4 35.2 32.4 32. 33.2 33.1 29.1 35.1\n 45.4 35.4 46. 50. 32.2 22. 20.1 23.2 22.3 24.8 28.5 37.3 27.9 23.9\n 21.7 28.6 27.1 20.3 22.5 29. 24.8 22. 26.4 33.1 36.1 28.4 33.4 28.2\n 22.8 20.3 16.1 22.1 19.4 21.6 23.8 16.2 17.8 19.8 23.1 21. 23.8 23.1\n 20.4 18.5 25. 24.6 23. 22.2 19.3 22.6 19.8 17.1 19.4 22.2 20.7 21.1\n 19.5 18.5 20.6 19. 18.7 32.7 16.5 23.9 31.2 17.5 17.2 23.1 24.5 26.6\n 22.9 24.1 18.6 30.1 18.2 20.6 17.8 21.7 22.7 22.6 25. 19.9 20.8 16.8\n 21.9 27.5 21.9 23.1 50. 50. 50. 50. 50. 13.8 13.8 15. 13.9 13.3\n 13.1 10.2 10.4 10.9 11.3 12.3 8.8 7.2 10.5 7.4 10.2 11.5 15.1 23.2\n 9.7 13.8 12.7 13.1 12.5 8.5 5. 6.3 5.6 7.2 12.1 8.3 8.5 5.\n 11.9 27.9 17.2 27.5 15. 17.2 17.9 16.3 7. 7.2 7.5 10.4 8.8 8.4\n 16.7 14.2 20.8 13.4 11.7 8.3 10.2 10.9 11. 9.5 14.5 14.1 16.1 14.3\n 11.7 13.4 9.6 8.7 8.4 12.8 10.5 17.1 18.4 15.4 10.8 11.8 14.9 12.6\n 14.1 13. 13.4 15.2 16.1 17.8 14.9 14.1 12.7 13.5 14.9 20. 16.4 17.7\n 19.5 20.2 21.4 19.9 19. 19.1 19.1 20.1 19.9 19.6 23.2 29.8 13.8 13.3\n 16.7 12. 14.6 21.4 23. 23.7 25. 21.8 20.6 21.2 19.1 20.6 15.2 7.\n 8.1 13.6 20.1 21.8 24.5 23.1 19.7 18.3 21.2 17.5 16.8 22.4 20.6 23.9\n 22. 11.9]\n.. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:** \n\n :Number of Instances: 506 \n\n :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n :Attribute Information (in order):\n - CRIM per capita crime rate by town\n - ZN proportion of residential land zoned for lots over 25,000 sq.ft.\n - INDUS proportion of non-retail business acres per town\n - CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n - NOX nitric oxides concentration (parts per 10 million)\n - RM average number of rooms per dwelling\n - AGE proportion of owner-occupied units built prior to 1940\n - DIS weighted distances to five Boston employment centres\n - RAD index of accessibility to radial highways\n - TAX full-value property-tax rate per $10,000\n - PTRATIO pupil-teacher ratio by town\n - B 1000(Bk - 0.63)^2 where Bk is the proportion of black people by town\n - LSTAT % lower status of the population\n - MEDV Median value of owner-occupied homes in $1000's\n\n :Missing Attribute Values: None\n\n :Creator: Harrison, D. and Rubinfeld, D.L.\n\nThis is a copy of UCI ML housing dataset.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/housing/\n\n\nThis dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.\n\nThe Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic\nprices and the demand for clean air', J. Environ. Economics & Management,\nvol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics\n...', Wiley, 1980. N.B. Various transformations are used in the table on\npages 244-261 of the latter.\n\nThe Boston house-price data has been used in many machine learning papers that address regression\nproblems. \n \n.. topic:: References\n\n - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.\n - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.\n\n"},{"output_type":"stream","name":"stderr","text":"/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.\n\n The Boston housing prices dataset has an ethical problem. You can refer to\n the documentation of this function for further details.\n\n The scikit-learn maintainers therefore strongly discourage the use of this\n dataset unless the purpose of the code is to study and educate about\n ethical issues in data science and machine learning.\n\n In this special case, you can fetch the dataset from the original\n source::\n\n import pandas as pd\n import numpy as np\n\n\n data_url = \"http://lib.stat.cmu.edu/datasets/boston\"\n raw_df = pd.read_csv(data_url, sep=\"\\s+\", skiprows=22, header=None)\n data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])\n target = raw_df.values[1::2, 2]\n\n Alternative datasets include the California housing dataset (i.e.\n :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing\n dataset. You can load the datasets as follows::\n\n from sklearn.datasets import fetch_california_housing\n housing = fetch_california_housing()\n\n for the California housing dataset and::\n\n from sklearn.datasets import fetch_openml\n housing = fetch_openml(name=\"house_prices\", as_frame=True)\n\n for the Ames housing dataset.\n \n warnings.warn(msg, category=FutureWarning)\n"}],"execution_count":1},{"cell_type":"markdown","source":"Convirtamos estos datos sin procesar en un marco de datos que incluya la variable de destino y los datos reales junto con los nombres de las funciones.","metadata":{"id":"VZHZMmqozVXV","cell_id":"0b383a4662a3485ab34b18a5d6529f78","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"import pandas as pd\nbos = pd.DataFrame(boston.data, columns = boston.feature_names)\nbos['Price'] = boston.target\nX = bos.drop(\"Price\", 1) # feature matrix\ny = bos['Price'] # target feature\nbos.head()","metadata":{"id":"Jw7Z9pM0zYX2","colab":{"height":206,"base_uri":"https://localhost:8080/"},"cell_id":"0e566c45452846d0822bc1e589e1c5cc","outputId":"a36f70a3-412e-4133-b23b-d83d3d763694","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":497,"user_tz":300,"timestamp":1642854247373},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/html":"\n
\n
\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATPrice
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.9824.0
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.1421.6
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.0334.7
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.9433.4
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.3336.2
\n
\n \n \n \n\n \n
\n
\n ","text/plain":" CRIM ZN INDUS CHAS NOX ... TAX PTRATIO B LSTAT Price\n0 0.00632 18.0 2.31 0.0 0.538 ... 296.0 15.3 396.90 4.98 24.0\n1 0.02731 0.0 7.07 0.0 0.469 ... 242.0 17.8 396.90 9.14 21.6\n2 0.02729 0.0 7.07 0.0 0.469 ... 242.0 17.8 392.83 4.03 34.7\n3 0.03237 0.0 2.18 0.0 0.458 ... 222.0 18.7 394.63 2.94 33.4\n4 0.06905 0.0 2.18 0.0 0.458 ... 222.0 18.7 396.90 5.33 36.2\n\n[5 rows x 14 columns]"},"metadata":{},"execution_count":2}],"execution_count":2},{"cell_type":"code","source":"X","metadata":{"id":"nmALkm_46yZp","colab":{"height":424,"base_uri":"https://localhost:8080/"},"cell_id":"9ac90433a2a14d79a5b8e3cda5a68151","outputId":"63bc5166-db1d-468a-ca0a-1d461a621964","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":732,"user_tz":300,"timestamp":1642854249896},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/html":"\n
\n
\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTAT
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.98
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.14
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.03
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.94
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.33
..........................................
5010.062630.011.930.00.5736.59369.12.47861.0273.021.0391.999.67
5020.045270.011.930.00.5736.12076.72.28751.0273.021.0396.909.08
5030.060760.011.930.00.5736.97691.02.16751.0273.021.0396.905.64
5040.109590.011.930.00.5736.79489.32.38891.0273.021.0393.456.48
5050.047410.011.930.00.5736.03080.82.50501.0273.021.0396.907.88
\n

506 rows × 13 columns

\n
\n \n \n \n\n \n
\n
\n ","text/plain":" CRIM ZN INDUS CHAS NOX ... RAD TAX PTRATIO B LSTAT\n0 0.00632 18.0 2.31 0.0 0.538 ... 1.0 296.0 15.3 396.90 4.98\n1 0.02731 0.0 7.07 0.0 0.469 ... 2.0 242.0 17.8 396.90 9.14\n2 0.02729 0.0 7.07 0.0 0.469 ... 2.0 242.0 17.8 392.83 4.03\n3 0.03237 0.0 2.18 0.0 0.458 ... 3.0 222.0 18.7 394.63 2.94\n4 0.06905 0.0 2.18 0.0 0.458 ... 3.0 222.0 18.7 396.90 5.33\n.. ... ... ... ... ... ... ... ... ... ... ...\n501 0.06263 0.0 11.93 0.0 0.573 ... 1.0 273.0 21.0 391.99 9.67\n502 0.04527 0.0 11.93 0.0 0.573 ... 1.0 273.0 21.0 396.90 9.08\n503 0.06076 0.0 11.93 0.0 0.573 ... 1.0 273.0 21.0 396.90 5.64\n504 0.10959 0.0 11.93 0.0 0.573 ... 1.0 273.0 21.0 393.45 6.48\n505 0.04741 0.0 11.93 0.0 0.573 ... 1.0 273.0 21.0 396.90 7.88\n\n[506 rows x 13 columns]"},"metadata":{},"execution_count":3}],"execution_count":3},{"cell_type":"code","source":"y","metadata":{"id":"zhmk4xaP6zV8","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"254324ae9cdb4c6aaca2b60aba043d8f","outputId":"1f138417-b5c5-4b52-fc55-9e429662c1aa","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":5,"user_tz":300,"timestamp":1642854251719},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"0 24.0\n1 21.6\n2 34.7\n3 33.4\n4 36.2\n ... \n501 22.4\n502 20.6\n503 23.9\n504 22.0\n505 11.9\nName: Price, Length: 506, dtype: float64"},"metadata":{},"execution_count":4}],"execution_count":4},{"cell_type":"markdown","source":"## Forward selection\n\nEn la selección hacia adelante, comenzamos con un modelo nulo y luego comenzamos a ajustar el modelo con cada característica individual una a la vez y seleccionamos la característica con el valor p mínimo. Ahora ajuste un modelo con dos características probando combinaciones de la característica seleccionada anteriormente con todas las demás características restantes. Vuelva a seleccionar la función con el valor p mínimo. Ahora ajuste un modelo con tres características probando combinaciones de dos características previamente seleccionadas con otras características restantes. Repita este proceso hasta que tengamos un conjunto de características seleccionadas con un valor p de características individuales menor que el nivel de significancia.\n\nEn resumen, los pasos para la técnica de selección hacia adelante son los siguientes:\n\n1. Elija un nivel de significancia (por ejemplo, SL = 0.05 con un 95% de confianza).\n\n2. Ajuste todos los modelos de regresión simple posibles considerando una característica a la vez. Los modelos totales 'n' son posibles. Seleccione la característica con el valor p más bajo.\n\n3. Ajuste todos los modelos posibles con una característica adicional agregada a las características seleccionadas anteriormente.\n\n4. Nuevamente, seleccione la función con un valor p mínimo. si $p_v 0):\n remaining_features = list(set(initial_features)-set(best_features))\n new_pval = pd.Series(index=remaining_features)\n for new_column in remaining_features:\n model = sm.OLS(target, sm.add_constant(data[best_features+[new_column]])).fit()\n new_pval[new_column] = model.pvalues[new_column]\n min_p_value = new_pval.min()\n if(min_p_value=0.18 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.0.2)\nRequirement already satisfied: pandas>=0.17.1 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.1.5)\nRequirement already satisfied: matplotlib>=1.5.1 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (3.2.2)\nRequirement already satisfied: numpy>=1.10.4 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.19.5)\nRequirement already satisfied: scipy>=0.17 in /usr/local/lib/python3.7/dist-packages (from mlxtend) (1.4.1)\nRequirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=1.5.1->mlxtend) (3.0.6)\nRequirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=1.5.1->mlxtend) (0.11.0)\nRequirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=1.5.1->mlxtend) (2.8.2)\nRequirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=1.5.1->mlxtend) (1.3.2)\nRequirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.17.1->mlxtend) (2018.9)\nRequirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib>=1.5.1->mlxtend) (1.15.0)\nRequirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.18->mlxtend) (1.1.0)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.18->mlxtend) (3.0.0)\n"}],"execution_count":7},{"cell_type":"code","source":"import sys\nimport joblib\nsys.modules['sklearn.externals.joblib'] = joblib","metadata":{"id":"XfDg04ZrKt8G","cell_id":"87f1024307ba4e5d9d1c168c73af659f","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":14,"user_tz":300,"timestamp":1642854269914},"deepnote_cell_type":"code"},"outputs":[],"execution_count":8},{"cell_type":"code","source":"#Librerias\nfrom mlxtend.feature_selection import SequentialFeatureSelector as SFS\nfrom sklearn.linear_model import LinearRegression\n# Sequential Forward Selection(sfs)\nsfs = SFS(LinearRegression(),\n k_features=11,\n forward=True,\n floating=False,\n scoring = 'r2',\n cv = 0)","metadata":{"id":"APKIL2fl0enV","cell_id":"652596ae707f49dc8555f25cd88027c4","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":605,"user_tz":300,"timestamp":1642854272747},"deepnote_cell_type":"code"},"outputs":[],"execution_count":9},{"cell_type":"markdown","source":"La función SequentialFeatureSelector() acepta los siguientes argumentos principales:\n\n* LinearRegression() es un estimador de todo el proceso. Del mismo modo, puede ser cualquier algoritmo basado en clasificación.\n\n* k_features indica el número de características que se seleccionarán. Puede ser cualquier valor aleatorio, pero el valor óptimo se puede encontrar analizando y visualizando las puntuaciones para diferentes números de características.\n\n* argumentos hacia adelante y flotantes forward = Verdadero y floating = Falso son para la técnica de selección hacia adelante.\n\n* El argumento de puntuación especifica el criterio de evaluación que se utilizará. Para problemas de regresión, solo hay una puntuación $r^2$ en la implementación predeterminada. De manera similar, para la clasificación, puede ser exactitud, precisión, recuperación, puntaje f1, etc.\n\n* El argumento cv es para la validación cruzada usando k-fold.","metadata":{"id":"cPLrRXnB0mVy","cell_id":"ad8f2c0331d44439916ada380637a7c6","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"sfs.fit(X, y)\nsfs.k_feature_names_ #Lista final de features","metadata":{"id":"vXyZUkKR0-0H","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"2789a994d8dc4298ac28f667880f7d22","outputId":"13b5f638-9bcf-4281-d797-bcf7c09bff35","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":620,"user_tz":300,"timestamp":1642854275431},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"('CRIM',\n 'ZN',\n 'CHAS',\n 'NOX',\n 'RM',\n 'DIS',\n 'RAD',\n 'TAX',\n 'PTRATIO',\n 'B',\n 'LSTAT')"},"metadata":{},"execution_count":10}],"execution_count":10},{"cell_type":"markdown","source":"# Backward selection\n\nEn la eliminación hacia atrás, comenzamos con el modelo completo (incluidas todas las variables independientes) y luego eliminamos la característica insignificante con el valor p más alto (> nivel de significancia). Este proceso se repite una y otra vez hasta que tenemos el conjunto final de características importantes.\n\nEn resumen, los pasos involucrados en la eliminación hacia atrás son los siguientes:\n\n1. Elija un nivel de significancia (por ejemplo, SL = 0.05 con un 95% de confianza).\n\n2. Se ajusta a un modelo completo que incluye todas las características.\n\n3. Considere la característica con el valor p más alto. Si el valor p> nivel de significancia, vaya al Paso 4; de lo contrario, finalice el proceso.\n\n5. Elimine el feature que se está considerando.\n\n6. Ajustar un modelo sin esta función. Repita todo el proceso desde el paso 3.\n\nAhora hagamos lo mismo con los datos de precios de la vivienda en Boston.\n\n![gif2.gif]()","metadata":{"id":"zvhK-H7K1ULX","cell_id":"71f803e3923246bbb4b024b88bdeab33","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"def backward_elimination(data, target,significance_level = 0.05):\n features = data.columns.tolist()\n while(len(features)>0):\n features_with_constant = sm.add_constant(data[features])\n p_values = sm.OLS(target, features_with_constant).fit().pvalues[1:]\n max_p_value = p_values.max()\n if(max_p_value >= significance_level):\n excluded_feature = p_values.idxmax()\n features.remove(excluded_feature)\n else:\n break \n return features","metadata":{"id":"1G4A-KmQ1mtc","cell_id":"6fa29c20316743ceac2fd8862b26fcc4","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":619,"user_tz":300,"timestamp":1642854280549},"deepnote_cell_type":"code"},"outputs":[],"execution_count":11},{"cell_type":"code","source":"backward_elimination(X,y)","metadata":{"id":"grZpRRre1tnT","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"146556c34dce4c5da9c890cee3982320","outputId":"8f5209c2-7200-4abc-b0b2-1c7d79aed481","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":483,"user_tz":300,"timestamp":1642854284297},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"['CRIM',\n 'ZN',\n 'CHAS',\n 'NOX',\n 'RM',\n 'DIS',\n 'RAD',\n 'TAX',\n 'PTRATIO',\n 'B',\n 'LSTAT']"},"metadata":{},"execution_count":12}],"execution_count":12},{"cell_type":"markdown","source":"# Eliminación bidireccional (stepwise) \nEs similar a la selección hacia adelante, pero la diferencia es que al agregar una nueva característica, también verifica la importancia de las características ya agregadas y si encuentra que alguna de las características ya seleccionadas es insignificante, simplemente elimina esa característica en particular mediante la eliminación hacia atrás.\n\nPor lo tanto, es una combinación de selección hacia adelante y eliminación hacia atrás.\n\nEn resumen, los pasos involucrados en la eliminación bidireccional son los siguientes:\n\n1. Elija un nivel de significancia para ingresar y salir del modelo (por ejemplo, $SL_{in}$ = 0.05 y $SL_{out} = 0.05$ con un 95% de confianza).\n\n2. Realice el siguiente paso de la selección hacia adelante (la función recién agregada debe tener un valor $p SL_{out}$ está lista para salir del modelo).\n\n4. Repita los pasos 2 y 3 hasta que obtengamos un conjunto óptimo final de características.\n\nHagamos lo mismo con los datos de precios de la vivienda en Boston.\n\n![gif3.gif]()","metadata":{"id":"5YFar6SQ2rfs","cell_id":"fb08a84663a443259154bcbcf4f191b2","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"def stepwise_selection(data, target,SL_in=0.05,SL_out = 0.05):\n initial_features = data.columns.tolist()\n best_features = []\n while (len(initial_features)>0):\n remaining_features = list(set(initial_features)-set(best_features))\n new_pval = pd.Series(index=remaining_features)\n for new_column in remaining_features:\n model = sm.OLS(target, sm.add_constant(data[best_features+[new_column]])).fit()\n new_pval[new_column] = model.pvalues[new_column]\n min_p_value = new_pval.min()\n if(min_p_value0):\n best_features_with_constant = sm.add_constant(data[best_features])\n p_values = sm.OLS(target, best_features_with_constant).fit().pvalues[1:]\n max_p_value = p_values.max()\n if(max_p_value >= SL_out):\n excluded_feature = p_values.idxmax()\n best_features.remove(excluded_feature)\n else:\n break \n else:\n break\n return best_features","metadata":{"id":"sxrV9JH83Kcq","cell_id":"27323ec074bd4ec6a9e3b480e25b29ae","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":411,"user_tz":300,"timestamp":1642854338286},"deepnote_cell_type":"code"},"outputs":[],"execution_count":16},{"cell_type":"code","source":"stepwise_selection(X,y)","metadata":{"id":"o00UFGmj3NfE","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"49f45af329814ea2aab28368abc3f858","outputId":"bdde06ae-d83d-4941-e2e3-542269a78a11","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":485,"user_tz":300,"timestamp":1642854341346},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stderr","text":"/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:6: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.\n \n"},{"output_type":"execute_result","data":{"text/plain":"['LSTAT',\n 'RM',\n 'PTRATIO',\n 'DIS',\n 'NOX',\n 'CHAS',\n 'B',\n 'ZN',\n 'CRIM',\n 'RAD',\n 'TAX']"},"metadata":{},"execution_count":17}],"execution_count":17},{"cell_type":"markdown","source":"# Metricas algoritmos de clasificacion","metadata":{"id":"7E11V-Eb20tN","cell_id":"f3733c8c56f0410196b5f921833d008c","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":" from sklearn.datasets import load_breast_cancer\n from sklearn.ensemble import RandomForestClassifier\n from sklearn.model_selection import train_test_split\n from sklearn import metrics\n import pandas as pd\n import numpy as np\n from matplotlib import pyplot as plt\n import seaborn as sns\n sns.set_style('whitegrid')","metadata":{"id":"VLq1HCao22dh","cell_id":"f775083a202f4497b611584a40c3315b","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":878,"user_tz":300,"timestamp":1642856182078},"deepnote_cell_type":"code"},"outputs":[],"execution_count":19},{"cell_type":"code","source":"# Cargamos dataset de cancer de mama\ndata = load_breast_cancer()\n# definimos matriz de diseño X y vector respuesta y\nX = pd.DataFrame(data['data'], columns=data['feature_names'])\ny = abs(pd.Series(data['target'])-1)\n# Separamos en entrenamiento/test en razon 80/20 %\nX_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=1)\n# Creamos un modelo Random Forest con parametros por defect\nmodelo = RandomForestClassifier(random_state=1)\nmodelo.fit(X_train, y_train)\n# Obtenemos las predicciones del modelo con X_test\npreds = modelo.predict(X_test) ","metadata":{"id":"dWhIMamm2_GC","cell_id":"f5697580cefb4670b294240d1fab2133","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":413,"user_tz":300,"timestamp":1642856281921},"deepnote_cell_type":"code"},"outputs":[],"execution_count":21},{"cell_type":"code","source":"plt.figure(figsize=(10,6))\nmetrics.plot_confusion_matrix(modelo, X_test, y_test, display_labels=['Negative', 'Positive'])","metadata":{"id":"yZzLOBUf3XUx","colab":{"height":368,"base_uri":"https://localhost:8080/"},"cell_id":"15cfb046f69e4413a6c08007e49116bb","outputId":"0d389117-320f-460a-9776-def52d8371f9","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":631,"user_tz":300,"timestamp":1642856346359},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stderr","text":"/usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function plot_confusion_matrix is deprecated; Function `plot_confusion_matrix` is deprecated in 1.0 and will be removed in 1.2. Use one of the class methods: ConfusionMatrixDisplay.from_predictions or ConfusionMatrixDisplay.from_estimator.\n warnings.warn(msg, category=FutureWarning)\n"},{"output_type":"execute_result","data":{"text/plain":""},"metadata":{},"execution_count":24},{"output_type":"display_data","data":{"text/plain":"
"},"metadata":{}},{"output_type":"display_data","data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAVgAAAEGCAYAAAAg6I3HAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de1xUdf4/8NfhfpPLIOCNrTRTUzdTStgUBX4ichHCC2brmtsuWiqJbjdJ8/LAbNN0kbWW1XbRddWWCExFUUoxULxBpqGt5ireoO8gKteBmc/vD9bZSJgZdA4zjK9nj/N4eM7M+Zz3oXp5+JzP+RxJCCFARERGZ2XqAoiILBUDlohIJgxYIiKZMGCJiGTCgCUikomNqQvoDEpOfgU7mx9NXYbRqdResLO2vPMCgH9/42TqEmTh0d0NN6/fMnUZsnhkcC8MGTLkvvf//vQnaNJ0Mei7to5D0Ldv3/s+lqEYsAaws/kR/b2TTF2G0Z2tSLbI8wKA16bf//+o5mxO+nSkTk83dRmySDmz7IH2b9J0Mfi/53M3sx/oWIZiwBKRhRDQQGPqIlpgwBKRRRAAGoXaKG398MMPSExM1K6XlZUhISEBMTExSExMxNWrV9GzZ0+sXbsWbm5ubbbDm1xEZCGar2AN+Uef3r17Izs7G9nZ2cjMzISjoyPGjBmDtLQ0BAQEIDc3FwEBAUhLS9PZDgOWiCyCAKAWwqClPQ4fPgxfX1/07NkTeXl5iImJAQDExMRg//79OvdlFwERWQwNDAvPyspKxMbGatfj4uIQFxfX6nd37dqFyMhIAIBSqYS3tzcAwMvLC0qlUudxGLBEZBEEALWBAatQKJCZman3eyqVCl9++SUWLFhwz2eSJEGSJJ37s4uAiCyEgMbAxVD5+fkYOHAgunbtCgDw9PRERUUFAKCiogIKhULn/gxYIrIIQgCNQhi0GGrXrl2IiIjQrgcHByMrKwsAkJWVhZCQEJ37M2CJyCLc7SIwZDFEbW0tCgsLERoaqt0WHx+PgoIChIaGorCwEPHx8TrbYB8sEVmE5lEExmvPyckJRUVFLbZ5eHggPd3wJ+kYsERkMczrOS4GLBFZiOYuAt139TsaA5aILELzo7KGBWxHxTADlogshGTwFWxHBR8DlogsggCgMfAKtqMwYInIYrAPlohIBgKAxsyG9jNgichCSOwiICKSg0YAKmFt6jJaYMASkYWQ2EVARCQHPmhARCQjteAVLBGR0TWPIuAVLBGR0QlIUAnzijTzqoaI6L7xJhcRkSya54NlFwERkSzUvIIlIjI+AQkajiIgIpIHr2CJiGQghIRGPipLRGR8zTe5zOsK1ryqISJ6ABpIBi2GuH37NhISEhAWFoZx48ahuLgYVVVVmDFjBkJDQzFjxgzcunVLZxsMWCKyCAIS1MLKoMUQycnJGDlyJPbs2YPs7Gz06dMHaWlpCAgIQG5uLgICApCWlqazDQYsEVmE5slerAxa9Llz5w6OHTuGiRMnAgDs7Ozg6uqKvLw8xMTEAABiYmKwf/9+ne2wD5aILIaxJty+cuUKFAoF3n77bZw9exYDBw5EUlISlEolvL29AQBeXl5QKpU62+EVLBFZBAEJjcLGoKWyshKxsbHaZfv27S3aampqwnfffYcXXngBWVlZcHR0vKc7QJIkSJLuQOcVLBFZhPbMB6tQKJCZmdnm5926dUO3bt3w1FNPAQDCwsKQlpYGT09PVFRUwNvbGxUVFVAoFDqPwytYIrIMovlJLkMWfby8vNCtWzf88MMPAIDDhw+jT58+CA4ORlZWFgAgKysLISEhOtvhFSwRWQRjv9Fg0aJF+MMf/oDGxkb4+vrivffeg0ajwbx585CRkYEePXpg7dq1OttgwBKRxTB4LgIDcnjAgAGtdiOkp6cbXA8DlogsQvNNLj4qS0RkdOb4qCwDlogshGS0cbDGwoAlIotw90kuc8KAJSLLIIz3JJexMGCJyCIIvvSQiEgeAkCjhgFLRCQLvpOLiEgGApJRn+QyBgbsQ6LsvD1WzHpUu37jsh3Gzvoeh6p74Mg+V9jaCXR/pAEL1pTBxU1tukKpTX6jbyPoV+9hWMEt5GxV4NNUH1OXZHbM7SaXbNfT/fr1w8qVK7XrGzduxLp164x+nI8//rjF+pQpU4x+DEvg+3gDPtp/Dh/tP4fUvedg76jBL4N7YmjgHaR9dRYf551Dz94N2LbO29SlUiusrARmr7iKouJ4/H50PwRFV+EXfetNXZZ5ETDaZC/GItuR7OzskJubi8rKSrkOAQD4y1/+0mJ927Ztsh7PEpQc6oLujzRA0cMZw0bfgfV/f48ZMKwW/3fd1rTFUav6PV2La/+xQ22dJ5oarXAg2x0BY3W/D+phI2Dcd3IZg2wBa2Njg7i4uFYnRqisrMTcuXMxYcIETJgwASdOnNBunzFjBiIiIpCUlISgoCBtQL/66quIjY1FRESEdnLcVatWob6+HtHR0ViwYAEA4OmnnwYAJCYm4sCBA9pjvvXWW9izZw/UajXef/99TJgwAVFRUQ9lIB/IdsfomKp7tu/dqsAzwXdMUBHp49mtET9es9Ou/991W3Tt3mjCisyPgIRGjbVBS0eRtQ/2xRdfxPjx4/G73/2uxfbk5GRMnz4dfn5+uHbtGl5++WXk5OQgNTUV/v7+mDlzJvLz85GRkaHdZ8WKFXB3d0d9fT0mTpyI0NBQ/OEPf8CWLVuQnZ19z7HDw8ORk5OD0aNHQ6VS4fDhw1iyZAkyMjLQpUsXfPbZZ1CpVJgyZQqee+45+Pr6tnkeKrUXzlYkG+8HY0JNjRoU7P0CI+PHor6pp/a8cv9ailp1JXqM+BXOVphXP9b9mJPuZOoSjKq79zfw6noWN+o8MSd9Onp1Pw5310uY02uCqUszK+bWBytrwLq4uCA6OhqbNm2Cg4ODdnthYSHOnz+vXa+urkZNTQ1OnDiB1NRUAEBgYCDc3Ny039m8eTP27dsHALh+/TouXboEDw+PNo8dGBiI5ORkqFQq5Ofnw8/PDw4ODigoKMC5c+ewd+9eAM0vN7t06ZLOgLWz/hH9vZPu74dgZgr3uKLfL7vi2QHLcbYiGf29k5C7XYEfjnhi5fbzcHDabeoSjeK16UNMXYJRDRhWg18vuIGKRiVSp6cjbk45AGB7quFT55m7lDPLHmj/u10E5kT2UQTTp0/XvvfmLo1Gg08//RT29vYGtVFUVITCwkJs374djo6OmDZtGhoaGnTuY29vj2effRaHDh1CTk4OwsPDAQBCCLzzzjsYOXLk/Z9UJ3Ygy6NF98Cxr7rgX+u98UHmv+HgJExYGelyrsQJPR9ToeqyEja2GoyOrsLK2Y+YuizzIsxvshfZb6e5u7sjLCysxa/7I0aMwObNm7XrpaWlAIChQ4ciJycHAPD111/j1q3mTvw7d+7Azc0Njo6OuHDhAkpKSrT72tjYoLGx9b6o8PBwZGZm4vjx49pAHTFiBLZu3ard5+LFi6itrTXiGZuv+lornDzUBSPC/xewf07qhdpqK7wd9zhe+X/98Kc3e5mwQmqLRi3hz0k94T80DX89eA75X7jj0vcO+nd8yJjbKIIOGQf729/+Flu2bNGuJyUlYdmyZYiKioJarYafnx+WLVuGOXPmYP78+dixYweGDBkCLy8vuLi4IDAwENu2bcO4cePw2GOPYciQ//36N3nyZIwfPx5PPvkkVq9e3eK4zz33HN544w2EhITAzq75BsGkSZNw9epVxMbGQggBDw8PrF+/viN+DCbn4KRBxpnTLbb9vbDURNVQex370hVfFc5F6nTL6RYwJgGg6WF5kqu4uFj7565du+Kbb77RrisUilbfZdOlSxds3LgRNjY2KC4uxrfffqsNxg0bNrR6nNdffx2vv/56q8e1tbXF0aNHW3zfysoK8+fPx/z58+/vxIjILAnOB6vbtWvXMG/ePGg0Gtja2mL58uWmLomIOhEGrA6PPvqo9pW4RETtwvlgiYjkIcCAJSKSjTHHwQYHB8PZ2RlWVlawtrZGZmYmqqqqkJiYiKtXr6Jnz55Yu3Zti/H6P2det9yIiO6TgIQmjZVBi6HS09ORnZ2NzMxMAEBaWhoCAgKQm5uLgIAApKWl6dyfAUtEFkPz34cN9C33Ky8vDzExMQCAmJgY7N+/X+f32UVARBZBtOMmV2VlZYunS+Pi4hAXF3fP915++WVIkqT9XKlUwtu7eUpPLy8vKJVKncdhwBKRxRAGBqxCodD+2t+WrVu3wsfHB0qlEjNmzEDv3r1bfC5JEiRJ9/HYRUBEFsKwuWANvRHm49P8xghPT0+MGTMGp06dgqenJyoqKgAAFRUVUCgUOttgwBKRRbg7TMsYfbC1tbWorq7W/rmgoAB9+/ZFcHCwdqx+VlYWQkJCdLbDLgIishhqI722W6lUYvbs2c1tqtWIjIxEYGAgBg8ejHnz5iEjIwM9evRo9ZH/n2LAEpFFEMLwPlh9fH19sWPHjnu2e3h4tPqWlrYwYInIQnCyFyIi2QgzmzOeAUtEFuGhfGUMEVFHMdZNLmNhwBKRZRDsIiAikoWAZLRRBMbCgCUii8GAJSKSCYdpERHJQIB9sERE8hCAhqMIiIjkYWYXsAxYIrIUHEVARCQfM7uEZcASkUVovsnVSa5gly9frvN1CO+8844sBRER3RcBaDSdJGAHDRrUkXUQET24znIF+/zzz7dYr6urg6Ojo+wFERHdL3MbB6t30FhxcTHCw8Mxbtw4AMDZs2exZMkSuesiImo/YeDSQfQG7IoVK7Bx40a4u7sDAPr374/jx4/LXhgRUbv895UxhiwdxaBRBN27d2+xbmVlXk9LEBEB6HzDtLp3746TJ09CkiQ0NjZi06ZN6NOnT0fURkRkMAEJwsxGEei9FF2yZAm2bNmC8vJyjBw5EqWlpVi8eHFH1EZE1E6SgYth1Go1YmJiMHPmTABAWVkZJk2ahDFjxmDevHlQqVQ699d7BatQKLB69WqDCyIiMhkjdxHc/Y29uroaALBq1Sq89NJLiIiIwOLFi5GRkYGpU6e2ub/eK9iysjLMmjUL/v7+CAgIwCuvvIKysjLjnQERkTEYOoLAwBC+ceMGDhw4gIkTJzY3LwSOHDmCsWPHAmgeypqXl6ezDb0Bu2DBAoSFheHrr7/GoUOHEBYWhvnz5xtWIRFRRxKSQUtlZSViY2O1y/bt2+9pasWKFXj99de1N/Vv3rwJV1dX2Ng0/+LfrVs3lJeX6yxHbxdBXV0dYmJitOvR0dHYuHFju86ZiKgjGPqggUKhQGZmZpuff/XVV1AoFBg0aBCKioruu542A7aqqgoAEBgYiLS0NISHh0OSJOzevRujRo267wMSEcnGSKMITp48iS+//BL5+floaGhAdXU1kpOTcfv2bTQ1NcHGxgY3btyAj4+PznbaDNjY2FhIkgTx378Stm3bpv1MkiQsWLDAKCdCRGQUApCMdJNrwYIF2owrKirCJ598gtWrVyMhIQF79+5FREQEPv/8cwQHB+tsp82A/fLLL41TKRFRR5H5QYPXX38diYmJWLt2LQYMGIBJkybp/L5BT3J9//33OH/+fIsxXz/tlyUiMgsyPAY7fPhwDB8+HADg6+uLjIwMg/fVG7CpqakoKirChQsXMGrUKOTn52PYsGEMWCIyP2b2qKzeYVp79+5Feno6unbtivfeew/Z2dm4c+dOR9RGRGQ4AUBj4NJB9F7B2tvbw8rKCjY2NqiuroanpyeuX7/eEbUREbWD1Hkm3L5r0KBBuH37NiZNmoTY2Fg4OTnh6aef7ojaiIgMJsF4owiMRW/A3p1c+4UXXsDIkSNRXV2N/v37y10XEVH7dZaAPXPmTJs7nTlzBgMHDpSlICIiS9FmwK5cubLNnSRJwqZNm2QpyByd/84NC8LGmroMo3vlQ8s8LwD4d8qjpi5BFvW/cMa/U4abugyz1Wm6CDZv3tyRdRARPRgBoz0qaywGPWhARNQpdJYrWCKizqbTdBEQEXU6Zhawep/kEkIgOzsbqampAIBr167h1KlTshdGRNQuRn6jgTEY9NLDkpIS7Nq1CwDg7OyMpUuXyl4YEVF7ScKwpaPoDdhTp07h3Xffhb29PQDAzc0NjY2NshdGRNRuGsmwpYPo7YO1sbGBWq2GJDUXVVlZqX1HDRGROel0N7mmTZuG2bNnQ6lUYs2aNdizZw/mzZvXEbURERmug/tXDaE3YMePH4+BAwfiyJEjEEJg/fr16NOnT0fURkRksE452cu1a9fg6OiIoKCgFtt69Ogha2FERO3W2QJ25syZ2j83NDTgypUreOyxx7SjCoiIzIXUgZNpG0JvwH7xxRct1s+cOYN//vOfshVERGQp2v0k18CBA/mgARGZn854k+tvf/ub9s8ajQbfffcdvL29ZS2KiOh+GOsmV0NDA1588UWoVCqo1WqMHTsWCQkJKCsrw/z581FVVYWBAwfij3/8I+zs7NpsR++A1pqaGu2iUqkwatQorF+/3jhnQURkTEZ6VNbOzg7p6enYsWMHsrKycOjQIZSUlGDVqlV46aWXsG/fPri6uup9hbfOK1i1Wo2amhq8+eabhpwaEZFpGekKVpIkODs7AwCamprQ1NQESZJw5MgRrF69GgDw/PPPIzU1FVOnTm2znTYDtqmpCTY2Njh58qRxKiYikpMwfBRBZWUlYmNjtetxcXGIi4tr8R21Wo3Y2FhcvnwZU6dOha+vL1xdXWFj0xyb3bp1Q3l5uc7jtBmwkyZNwueff47+/ftj1qxZCAsLg5OTk/bz0NBQw86EiKiDGNoHq1AokJmZqfM71tbWyM7Oxu3btzF79mz88MMP7a5H700ulUoFDw8PFBUVtdjOgCUisyPDKAJXV1cMHz4cJSUluH37tva3+xs3bsDHx0fnvm0GrFKpxN/+9jf07dsXkiRBiP9VfnfiFyIis2KkgK2srISNjQ1cXV1RX1+PwsJC/P73v8fw4cOxd+9eRERE4PPPP0dwcLDOdtoMWI1Gg5qaGuNUS0QkM2PO9VpRUYG33noLarUaQgiEhYUhKCgIjz/+OBITE7F27VoMGDAAkyZN0tlOmwHr5eWFOXPmGKdaIqKOYKSA7d+/P7Kysu7Z7uvrq3do1k+1GbA/7RIgIuoMOs1cBH//+987sAwiIiMws+vCNgPW3d29I+sgInowHfy+LUPwtd1EZDkYsEREMmHAEhHJg10EREQy6JTv5CIi6hQ644TbRESdBgOWiEge7CIgIpILA5aISAbtmHC7ozBgichisIuAiEguDFgiIpkwYImI5MEuAiIiGUgCkDTmlbAMWCKyHOaVrwxYIrIc7CIgIpILA5aISAZ8owERkYyMFLDXr1/HG2+8AaVSCUmSMHnyZEyfPh1VVVVITEzE1atX0bNnT6xduxZubm5ttmNlnHKIiExP0hi26GNtbY233noLu3fvxvbt2/HPf/4T58+fR1paGgICApCbm4uAgACkpaXpbIcBS0QWQxKGLfp4e3tj4MCBAAAXFxf07t0b5eXlyMvLQ0xMDAAgJiYG+/fv19kOuwiIyDIIAMKwPoLKykrExsZq1+Pi4hAXF9fqd69cuYLS0lI89dRTUCqV8Pb2BgB4eXlBqVTqPA4DlogsQnteGaNQKJCZman3ezU1NUhISMDChQvh4uLS8niSBEmSdO7PgH1IfbIzH46u3+LJrdVQqyXM+7W/qUuin5EaNej1p+8gNQlAI1A9RAEMHoZea7+DVYMaAGB9pxH1j7jg+u+fMHG1ZsKIowgaGxuRkJCAqKgohIaGAgA8PT1RUVEBb29vVFRUQKFQ6GzDJAE7YMAAPPHEE1Cr1ejduzfef/99ODo6Grx/eXk5kpOTkZKSgtLSUlRUVGDUqFEAgLy8PFy4cAHx8fFylW8xDn+TgJS5X5m6DGqDsJFwZe4ACHtrQK2B79rvUPHdDVyZ96T2O903fo/qwR4mrNK8GGs+WCEEkpKS0Lt3b8yYMUO7PTg4GFlZWYiPj0dWVhZCQkJ0tmOSm1wODg7Izs7Gzp07YWtri23btrVrfx8fH6SkpAAASktLcfDgQe1nISEhDFeyDJLUHK4AJLUA1ALNvwg3s6prguP3t1HDgG0mjDeK4MSJE8jOzsaRI0cQHR2N6OhoHDx4EPHx8SgoKEBoaCgKCwv1Zo3Juwj8/Pxw7tw5VFVVYeHChSgrK4OjoyOWLVuG/v374+jRo0hOTgbQ3Ofxj3/8A1VVVZg1axYyMzORkpKC+vp6nDhxAjNnzkR9fT1Onz6NxMREjB8/Hnl5ebCyskJtbS3GjRuH/fv34/r161i6dClu3rwJBwcHLF++HH369DHxT6JjCQH4//LP6LPlFnI+88WezF6mLolaoxH4xQenYftjPapG+sD7SR/g2ysAAOdvb6L2CVdoHE3+v7GZEAbf5NLnbi61Jj093eB2TPpvpqmpCfn5+Rg5ciTWrVuHJ598EuvXr8fhw4fx5ptvIjs7G5988gkWL16MYcOGoaamBvb29tr97ezskJCQgNOnT2Px4sUAoO247tKlizag/f39ceDAAYwYMQK2trZYtGgRli5dikcffRTffPMNli5dik2bNrVZp7u3C175MEreH0YHO31pJLr4PIJblVcwfV4qhkWMQeWtx01dltHUdbczdQnG8w8/NFQ3YP+iHNheq8X7g4cBAPb8Yyf6Pf8sHhv8cF0c6MInuQDU19cjOjoaQPPfFBMnTsTkyZOxbt06AEBAQACqqqpQXV2NoUOHYuXKldqOZmdnZ4OPEx4ejt27d8Pf3x+7du3C1KlTUVNTg+LiYrz22mva76lUKp3tVFVU46P5X9zHmZq3Vz50x0fzD+DHmQ6or81B5uZHTV2S0ZQufNTUJRidorsE30PnsOVJDayqG/Ho6Ws4EOcD8W2VqUszipyQiAdvhAH7vz5YQ8THx2PUqFE4ePAgXnjhBWzYsKHFVawuwcHBWLNmDaqqqnDmzBn4+/ujrq4Orq6uBh/fEtk7NMHK6n9/HuqvxNa/8irI3FjfaYSwlqBxsoGk0sDp3G24j/QAoESXkkrUDHKHsOWzQlpmOBeB2fzb8fPzw44dOwAARUVF8PDwgIuLCy5fvox+/fohPj4egwcPxsWLF1vs5+zsjJqamlbbdHZ2xqBBg5CcnIzRo0fD2toaLi4u6NWrF3JycgA03y08e/asvCdnZjw8VfjjJ0cR6Pce1mwuwrGvvXCisKupy6Kfsb7diJ7rSvGLlafgu/o0avu54hcBjwIAXE4qcWeop2kLNDMSmifcNmTpKGbTOz5nzhwsXLgQUVFRcHR0xMqVKwE0dygXFRVBkiT07dsXgYGBqKio0O43fPhwpKWlITo6GjNnzryn3fDwcLz22mvYvHmzdtsHH3yAJUuW4KOPPkJTUxPCw8PRv39/+U/STNy46oS5U36FVz6MssiuD0uh6umEsjcHt/rZ1YQnW93+0DOzK1iTBGxxcfE929zd3bF+/fp7ti9atOiebb169cLOnTu1+3322WctPv/pI3BhYWH33A309fXFxo0b76t2IjJf5tZFYDZXsERED0QA4Du5iIhkYl75yoAlIsvBLgIiIjnwtd1ERDIyr3xlwBKR5ZCMNBeBsTBgichyGGm6QmNhwBKRRZCE4BUsEZFszCtfGbBEZDk4ioCISC7sIiAikoEw3ju5jIUBS0SWg1ewREQyMa98ZcASkYUQApLGvPoIzOaNBkRED0xj4KLH22+/jYCAAERGRmq3VVVVYcaMGQgNDcWMGTNw69Ytve0wYInIYtx92EDfok9sbCw2bNjQYltaWhoCAgKQm5uLgIAApKWl6W2HAUtElkMIwxY9nnnmGbi5ubXYlpeXh5iYGABATEwM9u/fr7cd9sESkWUQkHUUgVKphLe3NwDAy8sLSqVS7z4MWCKyHAbe46qsrGzx7r64uDjExcUZfBhJkiBJkt7vMWCJyCJIMHwUgUKhQGZmZrva9/T0REVFBby9vVFRUQGFQqF3H/bBEpFluNtFYIQ+2NYEBwcjKysLAJCVlYWQkBC9+zBgichyGClg58+fjylTpuDixYsIDAzEv/71L8THx6OgoAChoaEoLCxEfHy83nbYRUBElsNIzxl8+OGHrW5PT09vVzsMWCKyDIKvjCEiksn996/KhQFLRJZDbV5zETBgicgyyPygwf1gwBKR5WDAEhHJQQB8JxcRkUwE+2CJiIxPgDe5iIjkwWFaRETyYcASEcmAw7SIiGRkZi89ZMASkYVgHywRkTw4ioCISD6C42CJiOTAJ7mIiOTBUQRERDLiKAIiIpnwCpaIyPiEEBBqtanLaIEBS0SWgze5iIjkIMxuukIrUxdARGQUAhAaYdBiiPz8fIwdOxZjxoxBWlrafZXEgCUiyyE0hi16qNVqLFu2DBs2bMCuXbuwc+dOnD9/vt3lMGCJyDL89yaXIYs+p06dwiOPPAJfX1/Y2dkhIiICeXl57S6JfbAG+EV/H3yYN9vUZcjCUs/LkuWERJi6BFk0NDQ80P5P+vdDypllBn33xx9/RFJSknY9Li4OcXFx2vXy8nJ069ZNu+7j44NTp061uyYGrAGGDBli6hKISI++ffsa/N0BAwYgMDBQxmqasYuAiOhnfHx8cOPGDe16eXk5fHx82t0OA5aI6GcGDx6M//znPygrK4NKpcKuXbsQHBzc7nbYRUBE9DM2NjZYvHgxfve730GtVmPChAnt6oK4SxLCzB7eJSKyEOwiICKSCQOWiEgmDNhOol+/fli5cqV2fePGjVi3bp3Rj/Pxxx+3WJ8yZYrRj/EwGjBgAKKjoxEZGYmEhATU1dW1a//y8nIkJCQAAEpLS3Hw4EHtZ3l5eff9KCfJiwHbSdjZ2SE3NxeVlZWyHucvf/lLi/Vt27bJeryHhYODA7Kzs7Fz507Y2tq2++fq4+ODlJQUAPcGbEhICOLj441aLxkHA7aTsLGxQVxcHNLT0+/5rLKyEnPnzsWECRMwYcIEnDhxQrt9xowZiIiIQFJSEoKCgrQB/eqrryI2NhYRERHYvn07AGDVqlWor69HdHQ0FixYAAB4+umnAQCJiYk4cOCA9phvvUVE8soAAAfySURBVPUW9uzZA7Vajffffx8TJkxAVFQUA9kAfn5+uHTpEqqqqvDqq68iKioKkydPxtmzZwEAR48eRXR0NKKjoxETE4Pq6mpcuXIFkZGRUKlUSElJwe7duxEdHY3du3cjMzMTy5Ytw507dxAUFATNf2f1r62txahRo9DY2IjLly/j5ZdfRmxsLKZOnYoLFy6Y8kfw8BDUKQwZMkTcuXNHBAUFidu3b4sNGzaIlJQUIYQQ8+fPF8eOHRNCCHH16lURFhYmhBBi6dKl4uOPPxZCCHHw4EHxxBNPCKVSKYQQ4ubNm0IIIerq6kRERISorKzUHufnxxVCiNzcXPHGG28IIYRoaGgQgYGBoq6uTmzbtk38+c9/1m5//vnnxeXLl2X7OXRWd3+OjY2NYtasWWLLli1i2bJlYt26dUIIIQoLC8X48eOFEELMnDlTHD9+XAghRHV1tWhsbBRlZWUiIiJCCCHEZ599JpYuXapt+6frs2bNEocPHxZCCLFr1y6xcOFCIYQQv/nNb8TFixeFEEKUlJSIadOmyXzGJIQQHAfbibi4uCA6OhqbNm2Cg4ODdnthYWGLmX6qq6tRU1ODEydOIDU1FQAQGBgINzc37Xc2b96Mffv2AQCuX7+OS5cuwcPDo81jBwYGIjk5GSqVCvn5+fDz84ODgwMKCgpw7tw57N27FwBw584dXLp0Cb6+vkY9987u7m8GQPMV7MSJEzF58mRtP3pAQACqqqpQXV2NoUOHYuXKlYiKikJoaCicnZ0NPk54eDh2794Nf39/7Nq1C1OnTkVNTQ2Ki4vx2muvab+nUqmMe4LUKgZsJzN9+nTExsYiNjZWu02j0eDTTz+Fvb29QW0UFRWhsLAQ27dvh6OjI6ZNm6Z3og17e3s8++yzOHToEHJychAeHg6g+TUd77zzDkaOHHn/J/UQuNsHa4j4+HiMGjUKBw8exAsvvIANGzYY/O82ODgYa9asQVVVFc6cOQN/f3/U1dXB1dXV4OOT8bAPtpNxd3dHWFgYMjIytNtGjBiBzZs3a9dLS0sBAEOHDkVOTg4A4Ouvv8atW7cANF9lurm5wdHRERcuXEBJSYl2XxsbGzQ2NrZ67PDwcGRmZuL48ePaQB0xYgS2bt2q3efixYuora014hlbLj8/P+zYsQNA8196Hh4ecHFxweXLl9GvXz/Ex8dj8ODBuHjxYov9nJ2dUVNT02qbzs7OGDRoEJKTkzF69GhYW1vDxcUFvXr10v63IITQ9veSvBiwndBvf/tb3Lx5U7uelJSE06dPIyoqCuHh4di6dSsAYM6cOSgoKEBkZCT27NkDLy8vuLi4IDAwEE1NTRg3bhxWr17dYrawyZMnY/z48dqbXD/13HPP4dixY/jVr34FOzs7AMCkSZPw+OOPIzY2FpGRkVi8eDHUZvbiOXM1Z84cnDlzBlFRUVi9erV2GF56ejoiIyMRFRUFGxube2Z9Gj58OM6fP6+9yfVz4eHh2LFjh/a3DAD44IMPkJGRgfHjxyMiIgL79++X9+QIAB+VtWgqlQpWVlawsbFBcXExlixZwl8TiToQ+2At2LVr1zBv3jxoNBrY2tpi+fLlpi6J6KHCK1giIpmwD5aISCYMWCIimTBgiYhkwoAlo3jQ2aJ+6u48B0DzEDRd76MvKirCyZMn232M4ODgVifOaWv7T92dn8FQ69atw8aNG9u1D1kGBiwZhb7Zopqamu6r3eTkZDz++ONtfn706FEUFxffV9tEcuMwLTI6Pz8/nDt3DkVFRfjTn/4EV1dXXLx4Ebt378aqVatw9OhRqFQqvPjii5gyZQqEEFi+fDkKCgrQvXt32NraatuaNm0a3njjDQwePBj5+flYs2YN1Go1PDw8kJycjG3btsHKygo7duzAokWL0Lt3b7z77ru4du0aAGDhwoUYNmwYbt68iQULFqC8vBxDhgyBIYNnXn31Vdy4cQMNDQ34zW9+g7i4OO1nK1asQEFBAbp27Yo1a9ZAoVDg8uXLWLp0KW7evAkHBwcsX74cffr0Mf4PmDoP080zQ5aktdmijhw5Ip566int7Fptzby1d+9e8dJLL4mmpiZx48YNMWzYMJGTkyOEEOLXv/61OHXqlFAqlSIwMFDb1t3ZwFJSUsSGDRu0dbQ1s9jy5cu1M1d99dVXLWYW+6mgoCC9M4498cQTIjs7WwghxLp167QzWbU1Y9XPa6SHB69gyShamy2quLgYgwcP1s6s1dbMW8eOHUNERASsra3h4+MDf3//e9ovKSmBn5+fti13d/dW62hrZrFjx45pZxYbPXp0i5nF2tLWjGNWVlbax1Cjo6MxZ84czlhFrWLAklG0NVuUk5OT9s+ijZm3fjo7/4Nq78xibWnPjGOSJEEIwRmr6B68yUUdpq2Zt5555hnk5ORArVajoqICRUVF9+w7ZMgQHD9+HGVlZQCAqqoqAPfOLNXWzGLPPPMMvvjiCwDNgX53ZrG26JpxTKPRaK/Cv/jiCwwbNowzVlGrGLDUYdqaeWvMmDF45JFHEB4ejjfffLPF7F53KRQKLFu2DHPnzsX48eORmJgIAAgKCsK+ffsQHR2N48ePtzmz2OzZs3H8+HFERERg37596NGjh85adc045uTkhFOnTiEyMhJHjhzB7NmzAXDGKroX5yIgIpIJr2CJiGTCgCUikgkDlohIJgxYIiKZMGCJiGTCgCUikgkDlohIJv8fUNTJfzkskIMAAAAASUVORK5CYII=\n","text/plain":"
"},"metadata":{}}],"execution_count":24},{"cell_type":"code","source":"confusion = metrics.confusion_matrix(y_test, preds)\nconfusion.ravel()","metadata":{"id":"ZXpzbem33oOw","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"1fbf204070ed4ae980a1ee475c29d3c0","outputId":"f8077d61-f42a-4846-bd93-ea964be59ca0","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":419,"user_tz":300,"timestamp":1642856364786},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"array([72, 0, 5, 37])"},"metadata":{},"execution_count":25}],"execution_count":25},{"cell_type":"code","source":"accuracy = metrics.accuracy_score(y_test, preds)\naccuracy ","metadata":{"id":"L0DPTTJA3twJ","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"465dd4f3b1934b9ba75a8c6a5fc56cd8","outputId":"50b5b274-5dd7-4ca0-907d-6ae1380e0f92","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":8,"user_tz":300,"timestamp":1642856377171},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"0.956140350877193"},"metadata":{},"execution_count":26}],"execution_count":26},{"cell_type":"code","source":"# Precision se evalua para cada categoria\nprecision_positiva = metrics.precision_score(y_test, preds, pos_label=1)\nprecision_negativa = metrics.precision_score(y_test, preds, pos_label=0)\nprecision_positiva, precision_negativa ","metadata":{"id":"m5SUP2wc3vDj","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"f694e4714cdb4e52a33dae51ab1e790f","outputId":"7d253c66-bbbc-488e-da13-c3e2865f7d80","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":770,"user_tz":300,"timestamp":1642856400075},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"(1.0, 0.935064935064935)"},"metadata":{},"execution_count":27}],"execution_count":27},{"cell_type":"code","source":"recall_sensibilidad = metrics.recall_score(y_test, preds, pos_label=1)\nrecall_especificidad= metrics.recall_score(y_test, preds, pos_label=0)\nrecall_sensibilidad, recall_especificidad","metadata":{"id":"oCMxTEyJ35cd","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"7caf4b591bb74af8805639315bb7f490","outputId":"9408d9b5-df42-4274-d87a-3463cd4cab57","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":519,"user_tz":300,"timestamp":1642856450736},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"(0.8809523809523809, 1.0)"},"metadata":{},"execution_count":28}],"execution_count":28},{"cell_type":"code","source":"f1_positivo = metrics.f1_score(y_test, preds, pos_label=1)\nf1_negativo = metrics.f1_score(y_test, preds, pos_label=0)\nf1_positivo, f1_negativo ","metadata":{"id":"xHGJQGQt4EED","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"0c63d1f353074470a5ee1604411a5e9f","outputId":"c9f580a6-8bd3-4c72-9690-034b7eef4ca1","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":409,"user_tz":300,"timestamp":1642856477945},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"(0.9367088607594937, 0.9664429530201343)"},"metadata":{},"execution_count":29}],"execution_count":29},{"cell_type":"code","source":"# Todas las metricas en uno\nprint(metrics.classification_report(y_test, preds))","metadata":{"id":"W6kWKLxy4JMs","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"63b77dd1ae384d08834b412bb484a3dc","outputId":"426b09cc-caaa-4310-f90f-c623c7742322","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":425,"user_tz":300,"timestamp":1642856495777},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":" precision recall f1-score support\n\n 0 0.94 1.00 0.97 72\n 1 1.00 0.88 0.94 42\n\n accuracy 0.96 114\n macro avg 0.97 0.94 0.95 114\nweighted avg 0.96 0.96 0.96 114\n\n"}],"execution_count":30},{"cell_type":"markdown","source":"# Metrica algoritmos de regresion","metadata":{"id":"SOo4lnO-42My","cell_id":"fb20f75a62094fa6bba6053ed3f6d79b","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"import matplotlib.pyplot as plt\nimport numpy as np\nfrom sklearn import datasets, linear_model\nfrom sklearn.metrics import mean_squared_error, r2_score\n\n# Carguemos un dataset de ejemplo\ndiabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)\ndiabetes_X","metadata":{"id":"ftIruD6i432O","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"779346c4f7844623a1f0313faffe46d1","outputId":"46118290-635e-4577-b1ea-05aebed67d94","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":535,"user_tz":300,"timestamp":1642856703317},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"array([[ 0.03807591, 0.05068012, 0.06169621, ..., -0.00259226,\n 0.01990842, -0.01764613],\n [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,\n -0.06832974, -0.09220405],\n [ 0.08529891, 0.05068012, 0.04445121, ..., -0.00259226,\n 0.00286377, -0.02593034],\n ...,\n [ 0.04170844, 0.05068012, -0.01590626, ..., -0.01107952,\n -0.04687948, 0.01549073],\n [-0.04547248, -0.04464164, 0.03906215, ..., 0.02655962,\n 0.04452837, -0.02593034],\n [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,\n -0.00421986, 0.00306441]])"},"metadata":{},"execution_count":32}],"execution_count":32},{"cell_type":"code","source":"diabetes_y","metadata":{"id":"JpXUkZ8o5dPU","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"27d8ec8b537f40138856829c7990e494","outputId":"b3c810a4-b1bd-4d1c-a30c-a6987635bca5","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":607,"user_tz":300,"timestamp":1642856833716},"deepnote_cell_type":"code"},"outputs":[{"output_type":"execute_result","data":{"text/plain":"array([151., 75., 141., 206., 135., 97., 138., 63., 110., 310., 101.,\n 69., 179., 185., 118., 171., 166., 144., 97., 168., 68., 49.,\n 68., 245., 184., 202., 137., 85., 131., 283., 129., 59., 341.,\n 87., 65., 102., 265., 276., 252., 90., 100., 55., 61., 92.,\n 259., 53., 190., 142., 75., 142., 155., 225., 59., 104., 182.,\n 128., 52., 37., 170., 170., 61., 144., 52., 128., 71., 163.,\n 150., 97., 160., 178., 48., 270., 202., 111., 85., 42., 170.,\n 200., 252., 113., 143., 51., 52., 210., 65., 141., 55., 134.,\n 42., 111., 98., 164., 48., 96., 90., 162., 150., 279., 92.,\n 83., 128., 102., 302., 198., 95., 53., 134., 144., 232., 81.,\n 104., 59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,\n 173., 180., 84., 121., 161., 99., 109., 115., 268., 274., 158.,\n 107., 83., 103., 272., 85., 280., 336., 281., 118., 317., 235.,\n 60., 174., 259., 178., 128., 96., 126., 288., 88., 292., 71.,\n 197., 186., 25., 84., 96., 195., 53., 217., 172., 131., 214.,\n 59., 70., 220., 268., 152., 47., 74., 295., 101., 151., 127.,\n 237., 225., 81., 151., 107., 64., 138., 185., 265., 101., 137.,\n 143., 141., 79., 292., 178., 91., 116., 86., 122., 72., 129.,\n 142., 90., 158., 39., 196., 222., 277., 99., 196., 202., 155.,\n 77., 191., 70., 73., 49., 65., 263., 248., 296., 214., 185.,\n 78., 93., 252., 150., 77., 208., 77., 108., 160., 53., 220.,\n 154., 259., 90., 246., 124., 67., 72., 257., 262., 275., 177.,\n 71., 47., 187., 125., 78., 51., 258., 215., 303., 243., 91.,\n 150., 310., 153., 346., 63., 89., 50., 39., 103., 308., 116.,\n 145., 74., 45., 115., 264., 87., 202., 127., 182., 241., 66.,\n 94., 283., 64., 102., 200., 265., 94., 230., 181., 156., 233.,\n 60., 219., 80., 68., 332., 248., 84., 200., 55., 85., 89.,\n 31., 129., 83., 275., 65., 198., 236., 253., 124., 44., 172.,\n 114., 142., 109., 180., 144., 163., 147., 97., 220., 190., 109.,\n 191., 122., 230., 242., 248., 249., 192., 131., 237., 78., 135.,\n 244., 199., 270., 164., 72., 96., 306., 91., 214., 95., 216.,\n 263., 178., 113., 200., 139., 139., 88., 148., 88., 243., 71.,\n 77., 109., 272., 60., 54., 221., 90., 311., 281., 182., 321.,\n 58., 262., 206., 233., 242., 123., 167., 63., 197., 71., 168.,\n 140., 217., 121., 235., 245., 40., 52., 104., 132., 88., 69.,\n 219., 72., 201., 110., 51., 277., 63., 118., 69., 273., 258.,\n 43., 198., 242., 232., 175., 93., 168., 275., 293., 281., 72.,\n 140., 189., 181., 209., 136., 261., 113., 131., 174., 257., 55.,\n 84., 42., 146., 212., 233., 91., 111., 152., 120., 67., 310.,\n 94., 183., 66., 173., 72., 49., 64., 48., 178., 104., 132.,\n 220., 57.])"},"metadata":{},"execution_count":34}],"execution_count":34},{"cell_type":"code","source":"from sklearn.model_selection import train_test_split\nX_train,X_test,y_train,y_test = train_test_split(diabetes_X,diabetes_y,test_size=0.2,random_state=2)\nfrom sklearn.linear_model import LinearRegression\n# crear el modelo\nlr = LinearRegression()\n# Ajustar el modelo con X_train y y_train\nlr.fit(X_train,y_train)\n# PRedecir con X_test\ny_pred = lr.predict(X_test)","metadata":{"id":"k2Nk6Ifm5BfI","cell_id":"941965808ca046079e3ae20e472b7358","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":425,"user_tz":300,"timestamp":1642856868868},"deepnote_cell_type":"code"},"outputs":[],"execution_count":36},{"cell_type":"code","source":"from sklearn.metrics import mean_absolute_error\nprint(\"MAE\",mean_absolute_error(y_test,y_pred))","metadata":{"id":"D3NaAE2R5nMB","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"3af4b8c6edba4cac85828e96fb882b3b","outputId":"6e46fe3d-e0c8-401b-f25f-92e627f3d5e6","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":521,"user_tz":300,"timestamp":1642856904334},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"MAE 45.21292481299676\n"}],"execution_count":37},{"cell_type":"markdown","source":"Ventajas de MAE\n\n- El MAE que obtiene está en la misma unidad que la variable de salida.\n- Es más robusto a los valores atípicos.\n\nDesventajas de MAE\n\n- El gráfico de MAE no es diferenciable, por lo que debemos aplicar varios optimizadores, como el descenso de gradiente, que puede ser diferenciable.","metadata":{"id":"NR1vcHgs50zP","cell_id":"6ba3137f05c54062958d00ab2e04e4fe","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"from sklearn.metrics import mean_squared_error\nprint(\"MSE\",mean_squared_error(y_test,y_pred))","metadata":{"id":"cj5UJCmA56lw","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"3b02535043c349e9a6ded1a6914f40a3","outputId":"ab268a9a-d5a9-4a4b-a804-aae1051ca009","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":398,"user_tz":300,"timestamp":1642856965261},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"MSE 3094.4295991207027\n"}],"execution_count":38},{"cell_type":"markdown","source":"Ventajas de MSE\n\n- La gráfica de MSE es diferenciable, por lo que puede usarla fácilmente como una función de pérdida.\n\nDesventajas de MSE\n\n- El valor que obtiene después de calcular MSE es una unidad de salida al cuadrado. por ejemplo, la variable de salida está en metros (m), luego de calcular el MSE, la salida que obtenemos está en metros cuadrados.\n- Si tiene valores atípicos en el conjunto de datos, los penaliza más y el MSE calculado es mayor. Entonces, en resumen, no es robusto a los valores atípicos que fueron una ventaja en MAE.","metadata":{"id":"MkAGfRPN5-hd","cell_id":"83f30dbbe5b74bd8a4d5ec6de518f516","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"print(\"RMSE\",np.sqrt(mean_squared_error(y_test,y_pred)))","metadata":{"id":"6Q2zMkxt6I0c","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"fa72c365e5c846069c2848aee5856229","outputId":"6088a870-2a76-467e-f94c-14863fdc0715","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":628,"user_tz":300,"timestamp":1642857020691},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"RMSE 55.62759745954073\n"}],"execution_count":39},{"cell_type":"markdown","source":"Ventajas de RMSE\n\n- El valor de salida que obtiene está en la misma unidad que la variable de salida requerida, lo que facilita la interpretación de la pérdida.\n\nDesventajas de RMSE\n\n- No es tan resistente a los valores atípicos en comparación con MAE para realizar RMSE tenemos que NumPy función de raíz cuadrada sobre MSE.","metadata":{"id":"KeHptGUV6UGh","cell_id":"167844ecaa71474195d49f169c9a82d7","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"print(\"RMSE\",np.log(np.sqrt(mean_squared_error(y_test,y_pred))))","metadata":{"id":"VeHmFCmx6a7j","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"cf4ffac64dd2438a92ae31a092d177b5","outputId":"0b197090-71c3-43f9-eef9-a382472593e5","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":5,"user_tz":300,"timestamp":1642857084991},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"RMSE 4.018679435298041\n"}],"execution_count":40},{"cell_type":"markdown","source":"Esta métrica es muy útil cuando está desarrollando un modelo sin llamar a las entradas. En ese caso, la salida variará en gran escala.\n\nPara controlar esta situación de RMSE, tomamos el registro del error de RMSE calculado y obtenemos como resultado RMSLE","metadata":{"id":"vPE4P3x_6b6J","cell_id":"42bfe01b07b24228989e018ce918c433","deepnote_cell_type":"markdown"}},{"cell_type":"code","source":"from sklearn.metrics import r2_score\nr2 = r2_score(y_test,y_pred)\nprint(r2)","metadata":{"id":"H7SWJ-zh6naK","colab":{"base_uri":"https://localhost:8080/"},"cell_id":"0869a8cf72174870ba1cb4355cc530ce","outputId":"ba0b2461-3676-444d-c087-3b1777b38c40","executionInfo":{"user":{"userId":"04741209928239412574","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14Gi4e7mWJaOA2l-1KUn-omyigRGSrm83lG6XLzS5=s64","displayName":"david francisco bustos usta"},"status":"ok","elapsed":418,"user_tz":300,"timestamp":1642857135576},"deepnote_cell_type":"code"},"outputs":[{"output_type":"stream","name":"stdout","text":"0.4399387660024644\n"}],"execution_count":41},{"cell_type":"markdown","source":"R2 es una métrica que indica el rendimiento de su modelo, no la pérdida en un sentido absoluto.\n\nPor el contrario, MAE y MSE dependen del contexto como hemos visto, mientras que la puntuación R2 es independiente del contexto.\n","metadata":{"id":"BDXQNh976n-1","cell_id":"7eb220acd5be49edbbc381ed5e35f5d7","deepnote_cell_type":"markdown"}},{"cell_type":"markdown","source":"\nCreated in deepnote.com \nCreated in Deepnote","metadata":{"created_in_deepnote_cell":true,"deepnote_cell_type":"markdown"}}],"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Clase 19.ipynb","provenance":[],"authorship_tag":"ABX9TyOXbdO5+gV0wrZyoeNgg45f","collapsed_sections":[]},"deepnote":{},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"},"deepnote_notebook_id":"6cd42cd996ac49c49f23724646783c51","deepnote_execution_queue":[]}}