{ "cells": [ { "cell_type": "markdown", "id": "b6e3c0c7", "metadata": {}, "source": [ "## Introduction: \n", "\n", "In the development of a cancer diagnosis prediction model, I utilized a K-Nearest Neighbors (KNN) classifier with 3 neighbors. My primary goal was to optimize the model to accurately identify cancer cases, reducing the number of false negatives, that represent undetected cancer cases. I focused on improving recall, which measures the model's ability to correctly identify positive cases.\n" ] }, { "cell_type": "code", "execution_count": 269, "id": "d13293ad", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": 270, "id": "ae14f335", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(r\"C:\\Users\\Teni\\Desktop\\Git-Github\\Datasets\\KNN\\breast-cancer.csv\")" ] }, { "cell_type": "markdown", "id": "38488593", "metadata": {}, "source": [ "### Data Info Analysis \n" ] }, { "cell_type": "code", "execution_count": 271, "id": "1fcdea81", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "diagnosis | \n", "radius_mean | \n", "texture_mean | \n", "perimeter_mean | \n", "area_mean | \n", "smoothness_mean | \n", "compactness_mean | \n", "concavity_mean | \n", "concave points_mean | \n", "... | \n", "texture_worst | \n", "perimeter_worst | \n", "area_worst | \n", "smoothness_worst | \n", "compactness_worst | \n", "concavity_worst | \n", "concave points_worst | \n", "symmetry_worst | \n", "fractal_dimension_worst | \n", "Unnamed: 32 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "842302 | \n", "M | \n", "17.99 | \n", "10.38 | \n", "122.80 | \n", "1001.0 | \n", "0.11840 | \n", "0.27760 | \n", "0.3001 | \n", "0.14710 | \n", "... | \n", "17.33 | \n", "184.60 | \n", "2019.0 | \n", "0.1622 | \n", "0.6656 | \n", "0.7119 | \n", "0.2654 | \n", "0.4601 | \n", "0.11890 | \n", "NaN | \n", "
1 | \n", "842517 | \n", "M | \n", "20.57 | \n", "17.77 | \n", "132.90 | \n", "1326.0 | \n", "0.08474 | \n", "0.07864 | \n", "0.0869 | \n", "0.07017 | \n", "... | \n", "23.41 | \n", "158.80 | \n", "1956.0 | \n", "0.1238 | \n", "0.1866 | \n", "0.2416 | \n", "0.1860 | \n", "0.2750 | \n", "0.08902 | \n", "NaN | \n", "
2 | \n", "84300903 | \n", "M | \n", "19.69 | \n", "21.25 | \n", "130.00 | \n", "1203.0 | \n", "0.10960 | \n", "0.15990 | \n", "0.1974 | \n", "0.12790 | \n", "... | \n", "25.53 | \n", "152.50 | \n", "1709.0 | \n", "0.1444 | \n", "0.4245 | \n", "0.4504 | \n", "0.2430 | \n", "0.3613 | \n", "0.08758 | \n", "NaN | \n", "
3 | \n", "84348301 | \n", "M | \n", "11.42 | \n", "20.38 | \n", "77.58 | \n", "386.1 | \n", "0.14250 | \n", "0.28390 | \n", "0.2414 | \n", "0.10520 | \n", "... | \n", "26.50 | \n", "98.87 | \n", "567.7 | \n", "0.2098 | \n", "0.8663 | \n", "0.6869 | \n", "0.2575 | \n", "0.6638 | \n", "0.17300 | \n", "NaN | \n", "
4 | \n", "84358402 | \n", "M | \n", "20.29 | \n", "14.34 | \n", "135.10 | \n", "1297.0 | \n", "0.10030 | \n", "0.13280 | \n", "0.1980 | \n", "0.10430 | \n", "... | \n", "16.67 | \n", "152.20 | \n", "1575.0 | \n", "0.1374 | \n", "0.2050 | \n", "0.4000 | \n", "0.1625 | \n", "0.2364 | \n", "0.07678 | \n", "NaN | \n", "
5 rows × 33 columns
\n", "\n", " | id | \n", "diagnosis | \n", "radius_mean | \n", "texture_mean | \n", "perimeter_mean | \n", "area_mean | \n", "smoothness_mean | \n", "compactness_mean | \n", "concavity_mean | \n", "concave points_mean | \n", "... | \n", "texture_worst | \n", "perimeter_worst | \n", "area_worst | \n", "smoothness_worst | \n", "compactness_worst | \n", "concavity_worst | \n", "concave points_worst | \n", "symmetry_worst | \n", "fractal_dimension_worst | \n", "Unnamed: 32 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "842302 | \n", "1 | \n", "17.99 | \n", "10.38 | \n", "122.80 | \n", "1001.0 | \n", "0.11840 | \n", "0.27760 | \n", "0.3001 | \n", "0.14710 | \n", "... | \n", "17.33 | \n", "184.60 | \n", "2019.0 | \n", "0.1622 | \n", "0.6656 | \n", "0.7119 | \n", "0.2654 | \n", "0.4601 | \n", "0.11890 | \n", "NaN | \n", "
1 | \n", "842517 | \n", "1 | \n", "20.57 | \n", "17.77 | \n", "132.90 | \n", "1326.0 | \n", "0.08474 | \n", "0.07864 | \n", "0.0869 | \n", "0.07017 | \n", "... | \n", "23.41 | \n", "158.80 | \n", "1956.0 | \n", "0.1238 | \n", "0.1866 | \n", "0.2416 | \n", "0.1860 | \n", "0.2750 | \n", "0.08902 | \n", "NaN | \n", "
2 | \n", "84300903 | \n", "1 | \n", "19.69 | \n", "21.25 | \n", "130.00 | \n", "1203.0 | \n", "0.10960 | \n", "0.15990 | \n", "0.1974 | \n", "0.12790 | \n", "... | \n", "25.53 | \n", "152.50 | \n", "1709.0 | \n", "0.1444 | \n", "0.4245 | \n", "0.4504 | \n", "0.2430 | \n", "0.3613 | \n", "0.08758 | \n", "NaN | \n", "
3 | \n", "84348301 | \n", "1 | \n", "11.42 | \n", "20.38 | \n", "77.58 | \n", "386.1 | \n", "0.14250 | \n", "0.28390 | \n", "0.2414 | \n", "0.10520 | \n", "... | \n", "26.50 | \n", "98.87 | \n", "567.7 | \n", "0.2098 | \n", "0.8663 | \n", "0.6869 | \n", "0.2575 | \n", "0.6638 | \n", "0.17300 | \n", "NaN | \n", "
4 | \n", "84358402 | \n", "1 | \n", "20.29 | \n", "14.34 | \n", "135.10 | \n", "1297.0 | \n", "0.10030 | \n", "0.13280 | \n", "0.1980 | \n", "0.10430 | \n", "... | \n", "16.67 | \n", "152.20 | \n", "1575.0 | \n", "0.1374 | \n", "0.2050 | \n", "0.4000 | \n", "0.1625 | \n", "0.2364 | \n", "0.07678 | \n", "NaN | \n", "
5 rows × 33 columns
\n", "\n", " | id | \n", "diagnosis | \n", "radius_mean | \n", "texture_mean | \n", "perimeter_mean | \n", "area_mean | \n", "smoothness_mean | \n", "compactness_mean | \n", "concavity_mean | \n", "concave points_mean | \n", "... | \n", "radius_worst | \n", "texture_worst | \n", "perimeter_worst | \n", "area_worst | \n", "smoothness_worst | \n", "compactness_worst | \n", "concavity_worst | \n", "concave points_worst | \n", "symmetry_worst | \n", "fractal_dimension_worst | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "842302 | \n", "1 | \n", "17.99 | \n", "10.38 | \n", "122.80 | \n", "1001.0 | \n", "0.11840 | \n", "0.27760 | \n", "0.30010 | \n", "0.14710 | \n", "... | \n", "25.380 | \n", "17.33 | \n", "184.60 | \n", "2019.0 | \n", "0.16220 | \n", "0.66560 | \n", "0.7119 | \n", "0.2654 | \n", "0.4601 | \n", "0.11890 | \n", "
1 | \n", "842517 | \n", "1 | \n", "20.57 | \n", "17.77 | \n", "132.90 | \n", "1326.0 | \n", "0.08474 | \n", "0.07864 | \n", "0.08690 | \n", "0.07017 | \n", "... | \n", "24.990 | \n", "23.41 | \n", "158.80 | \n", "1956.0 | \n", "0.12380 | \n", "0.18660 | \n", "0.2416 | \n", "0.1860 | \n", "0.2750 | \n", "0.08902 | \n", "
2 | \n", "84300903 | \n", "1 | \n", "19.69 | \n", "21.25 | \n", "130.00 | \n", "1203.0 | \n", "0.10960 | \n", "0.15990 | \n", "0.19740 | \n", "0.12790 | \n", "... | \n", "23.570 | \n", "25.53 | \n", "152.50 | \n", "1709.0 | \n", "0.14440 | \n", "0.42450 | \n", "0.4504 | \n", "0.2430 | \n", "0.3613 | \n", "0.08758 | \n", "
3 | \n", "84348301 | \n", "1 | \n", "11.42 | \n", "20.38 | \n", "77.58 | \n", "386.1 | \n", "0.14250 | \n", "0.28390 | \n", "0.24140 | \n", "0.10520 | \n", "... | \n", "14.910 | \n", "26.50 | \n", "98.87 | \n", "567.7 | \n", "0.20980 | \n", "0.86630 | \n", "0.6869 | \n", "0.2575 | \n", "0.6638 | \n", "0.17300 | \n", "
4 | \n", "84358402 | \n", "1 | \n", "20.29 | \n", "14.34 | \n", "135.10 | \n", "1297.0 | \n", "0.10030 | \n", "0.13280 | \n", "0.19800 | \n", "0.10430 | \n", "... | \n", "22.540 | \n", "16.67 | \n", "152.20 | \n", "1575.0 | \n", "0.13740 | \n", "0.20500 | \n", "0.4000 | \n", "0.1625 | \n", "0.2364 | \n", "0.07678 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
564 | \n", "926424 | \n", "1 | \n", "21.56 | \n", "22.39 | \n", "142.00 | \n", "1479.0 | \n", "0.11100 | \n", "0.11590 | \n", "0.24390 | \n", "0.13890 | \n", "... | \n", "25.450 | \n", "26.40 | \n", "166.10 | \n", "2027.0 | \n", "0.14100 | \n", "0.21130 | \n", "0.4107 | \n", "0.2216 | \n", "0.2060 | \n", "0.07115 | \n", "
565 | \n", "926682 | \n", "1 | \n", "20.13 | \n", "28.25 | \n", "131.20 | \n", "1261.0 | \n", "0.09780 | \n", "0.10340 | \n", "0.14400 | \n", "0.09791 | \n", "... | \n", "23.690 | \n", "38.25 | \n", "155.00 | \n", "1731.0 | \n", "0.11660 | \n", "0.19220 | \n", "0.3215 | \n", "0.1628 | \n", "0.2572 | \n", "0.06637 | \n", "
566 | \n", "926954 | \n", "1 | \n", "16.60 | \n", "28.08 | \n", "108.30 | \n", "858.1 | \n", "0.08455 | \n", "0.10230 | \n", "0.09251 | \n", "0.05302 | \n", "... | \n", "18.980 | \n", "34.12 | \n", "126.70 | \n", "1124.0 | \n", "0.11390 | \n", "0.30940 | \n", "0.3403 | \n", "0.1418 | \n", "0.2218 | \n", "0.07820 | \n", "
567 | \n", "927241 | \n", "1 | \n", "20.60 | \n", "29.33 | \n", "140.10 | \n", "1265.0 | \n", "0.11780 | \n", "0.27700 | \n", "0.35140 | \n", "0.15200 | \n", "... | \n", "25.740 | \n", "39.42 | \n", "184.60 | \n", "1821.0 | \n", "0.16500 | \n", "0.86810 | \n", "0.9387 | \n", "0.2650 | \n", "0.4087 | \n", "0.12400 | \n", "
568 | \n", "92751 | \n", "0 | \n", "7.76 | \n", "24.54 | \n", "47.92 | \n", "181.0 | \n", "0.05263 | \n", "0.04362 | \n", "0.00000 | \n", "0.00000 | \n", "... | \n", "9.456 | \n", "30.37 | \n", "59.16 | \n", "268.6 | \n", "0.08996 | \n", "0.06444 | \n", "0.0000 | \n", "0.0000 | \n", "0.2871 | \n", "0.07039 | \n", "
569 rows × 32 columns
\n", "