{"cells":[{"metadata":{},"cell_type":"markdown","source":"
PREDICTING BREAST CANCER IN WISCONSIN
\n\n\nUsing data from a digitized images of a brest mass in the state of Wisconsin, this notebook will use feature selection and model building using several different algorithms to attempt to predict whether a breast mass is benign or malignant. "},{"metadata":{},"cell_type":"markdown","source":"| \n | id | \ndiagnosis | \nradius_mean | \ntexture_mean | \nperimeter_mean | \narea_mean | \nsmoothness_mean | \ncompactness_mean | \nconcavity_mean | \nconcave points_mean | \n... | \ntexture_worst | \nperimeter_worst | \narea_worst | \nsmoothness_worst | \ncompactness_worst | \nconcavity_worst | \nconcave points_worst | \nsymmetry_worst | \nfractal_dimension_worst | \nUnnamed: 32 | \n
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n842302 | \nM | \n17.99 | \n10.38 | \n122.80 | \n1001.0 | \n0.11840 | \n0.27760 | \n0.3001 | \n0.14710 | \n... | \n17.33 | \n184.60 | \n2019.0 | \n0.1622 | \n0.6656 | \n0.7119 | \n0.2654 | \n0.4601 | \n0.11890 | \nNaN | \n
| 1 | \n842517 | \nM | \n20.57 | \n17.77 | \n132.90 | \n1326.0 | \n0.08474 | \n0.07864 | \n0.0869 | \n0.07017 | \n... | \n23.41 | \n158.80 | \n1956.0 | \n0.1238 | \n0.1866 | \n0.2416 | \n0.1860 | \n0.2750 | \n0.08902 | \nNaN | \n
| 2 | \n84300903 | \nM | \n19.69 | \n21.25 | \n130.00 | \n1203.0 | \n0.10960 | \n0.15990 | \n0.1974 | \n0.12790 | \n... | \n25.53 | \n152.50 | \n1709.0 | \n0.1444 | \n0.4245 | \n0.4504 | \n0.2430 | \n0.3613 | \n0.08758 | \nNaN | \n
| 3 | \n84348301 | \nM | \n11.42 | \n20.38 | \n77.58 | \n386.1 | \n0.14250 | \n0.28390 | \n0.2414 | \n0.10520 | \n... | \n26.50 | \n98.87 | \n567.7 | \n0.2098 | \n0.8663 | \n0.6869 | \n0.2575 | \n0.6638 | \n0.17300 | \nNaN | \n
| 4 | \n84358402 | \nM | \n20.29 | \n14.34 | \n135.10 | \n1297.0 | \n0.10030 | \n0.13280 | \n0.1980 | \n0.10430 | \n... | \n16.67 | \n152.20 | \n1575.0 | \n0.1374 | \n0.2050 | \n0.4000 | \n0.1625 | \n0.2364 | \n0.07678 | \nNaN | \n
5 rows × 33 columns
\n| \n | id | \nradius_mean | \ntexture_mean | \nperimeter_mean | \narea_mean | \nsmoothness_mean | \ncompactness_mean | \nconcavity_mean | \nconcave points_mean | \nsymmetry_mean | \n... | \ntexture_worst | \nperimeter_worst | \narea_worst | \nsmoothness_worst | \ncompactness_worst | \nconcavity_worst | \nconcave points_worst | \nsymmetry_worst | \nfractal_dimension_worst | \nUnnamed: 32 | \n
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | \n5.690000e+02 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n... | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n569.000000 | \n0.0 | \n
| mean | \n3.037183e+07 | \n14.127292 | \n19.289649 | \n91.969033 | \n654.889104 | \n0.096360 | \n0.104341 | \n0.088799 | \n0.048919 | \n0.181162 | \n... | \n25.677223 | \n107.261213 | \n880.583128 | \n0.132369 | \n0.254265 | \n0.272188 | \n0.114606 | \n0.290076 | \n0.083946 | \nNaN | \n
| std | \n1.250206e+08 | \n3.524049 | \n4.301036 | \n24.298981 | \n351.914129 | \n0.014064 | \n0.052813 | \n0.079720 | \n0.038803 | \n0.027414 | \n... | \n6.146258 | \n33.602542 | \n569.356993 | \n0.022832 | \n0.157336 | \n0.208624 | \n0.065732 | \n0.061867 | \n0.018061 | \nNaN | \n
| min | \n8.670000e+03 | \n6.981000 | \n9.710000 | \n43.790000 | \n143.500000 | \n0.052630 | \n0.019380 | \n0.000000 | \n0.000000 | \n0.106000 | \n... | \n12.020000 | \n50.410000 | \n185.200000 | \n0.071170 | \n0.027290 | \n0.000000 | \n0.000000 | \n0.156500 | \n0.055040 | \nNaN | \n
| 25% | \n8.692180e+05 | \n11.700000 | \n16.170000 | \n75.170000 | \n420.300000 | \n0.086370 | \n0.064920 | \n0.029560 | \n0.020310 | \n0.161900 | \n... | \n21.080000 | \n84.110000 | \n515.300000 | \n0.116600 | \n0.147200 | \n0.114500 | \n0.064930 | \n0.250400 | \n0.071460 | \nNaN | \n
| 50% | \n9.060240e+05 | \n13.370000 | \n18.840000 | \n86.240000 | \n551.100000 | \n0.095870 | \n0.092630 | \n0.061540 | \n0.033500 | \n0.179200 | \n... | \n25.410000 | \n97.660000 | \n686.500000 | \n0.131300 | \n0.211900 | \n0.226700 | \n0.099930 | \n0.282200 | \n0.080040 | \nNaN | \n
| 75% | \n8.813129e+06 | \n15.780000 | \n21.800000 | \n104.100000 | \n782.700000 | \n0.105300 | \n0.130400 | \n0.130700 | \n0.074000 | \n0.195700 | \n... | \n29.720000 | \n125.400000 | \n1084.000000 | \n0.146000 | \n0.339100 | \n0.382900 | \n0.161400 | \n0.317900 | \n0.092080 | \nNaN | \n
| max | \n9.113205e+08 | \n28.110000 | \n39.280000 | \n188.500000 | \n2501.000000 | \n0.163400 | \n0.345400 | \n0.426800 | \n0.201200 | \n0.304000 | \n... | \n49.540000 | \n251.200000 | \n4254.000000 | \n0.222600 | \n1.058000 | \n1.252000 | \n0.291000 | \n0.663800 | \n0.207500 | \nNaN | \n
8 rows × 32 columns
\n| \n | diagnosis | \nfeatures | \nvalue | \n
|---|---|---|---|
| 0 | \nM | \nradius_mean | \n1.096100 | \n
| 1 | \nM | \nradius_mean | \n1.828212 | \n
| 2 | \nM | \nradius_mean | \n1.578499 | \n
| 3 | \nM | \nradius_mean | \n-0.768233 | \n
| 4 | \nM | \nradius_mean | \n1.748758 | \n
| \n | Algorithm | \nAccuracy_Score | \nF1_Score | \nLog_Loss | \n
|---|---|---|---|---|
| 0 | \nRandom Forest | \n0.947368 | \n0.947087 | \n0.334504 | \n
| 1 | \nRandom Forest w/ KBest | \n0.935673 | \n0.935071 | \n0.730524 | \n
| 2 | \nSupport Vector Machine | \n0.631579 | \n0.488964 | \n0.670893 | \n
| 3 | \nLogistic Regression | \n0.883041 | \n0.881165 | \n0.318784 | \n
| 4 | \nDecision Tree | \n0.941520 | \n0.941520 | \n1.659396 | \n
| 5 | \nK Nearest Neighbor | \n0.941520 | \n0.929312 | \n0.752288 | \n