{ "cells": [ { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "from sklearn.datasets import load_breast_cancer\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.model_selection import train_test_split\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Breast Cancer Wisconsin (Diagnostic) Database\n", "=============================================\n", "\n", "Notes\n", "-----\n", "Data Set Characteristics:\n", " :Number of Instances: 569\n", "\n", " :Number of Attributes: 30 numeric, predictive attributes and the class\n", "\n", " :Attribute Information:\n", " - radius (mean of distances from center to points on the perimeter)\n", " - texture (standard deviation of gray-scale values)\n", " - perimeter\n", " - area\n", " - smoothness (local variation in radius lengths)\n", " - compactness (perimeter^2 / area - 1.0)\n", " - concavity (severity of concave portions of the contour)\n", " - concave points (number of concave portions of the contour)\n", " - symmetry \n", " - fractal dimension (\"coastline approximation\" - 1)\n", "\n", " The mean, standard error, and \"worst\" or largest (mean of the three\n", " largest values) of these features were computed for each image,\n", " resulting in 30 features. For instance, field 3 is Mean Radius, field\n", " 13 is Radius SE, field 23 is Worst Radius.\n", "\n", " - class:\n", " - WDBC-Malignant\n", " - WDBC-Benign\n", "\n", " :Summary Statistics:\n", "\n", " ===================================== ====== ======\n", " Min Max\n", " ===================================== ====== ======\n", " radius (mean): 6.981 28.11\n", " texture (mean): 9.71 39.28\n", " perimeter (mean): 43.79 188.5\n", " area (mean): 143.5 2501.0\n", " smoothness (mean): 0.053 0.163\n", " compactness (mean): 0.019 0.345\n", " concavity (mean): 0.0 0.427\n", " concave points (mean): 0.0 0.201\n", " symmetry (mean): 0.106 0.304\n", " fractal dimension (mean): 0.05 0.097\n", " radius (standard error): 0.112 2.873\n", " texture (standard error): 0.36 4.885\n", " perimeter (standard error): 0.757 21.98\n", " area (standard error): 6.802 542.2\n", " smoothness (standard error): 0.002 0.031\n", " compactness (standard error): 0.002 0.135\n", " concavity (standard error): 0.0 0.396\n", " concave points (standard error): 0.0 0.053\n", " symmetry (standard error): 0.008 0.079\n", " fractal dimension (standard error): 0.001 0.03\n", " radius (worst): 7.93 36.04\n", " texture (worst): 12.02 49.54\n", " perimeter (worst): 50.41 251.2\n", " area (worst): 185.2 4254.0\n", " smoothness (worst): 0.071 0.223\n", " compactness (worst): 0.027 1.058\n", " concavity (worst): 0.0 1.252\n", " concave points (worst): 0.0 0.291\n", " symmetry (worst): 0.156 0.664\n", " fractal dimension (worst): 0.055 0.208\n", " ===================================== ====== ======\n", "\n", " :Missing Attribute Values: None\n", "\n", " :Class Distribution: 212 - Malignant, 357 - Benign\n", "\n", " :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian\n", "\n", " :Donor: Nick Street\n", "\n", " :Date: November, 1995\n", "\n", "This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.\n", "https://goo.gl/U2Uwz2\n", "\n", "Features are computed from a digitized image of a fine needle\n", "aspirate (FNA) of a breast mass. They describe\n", "characteristics of the cell nuclei present in the image.\n", "\n", "Separating plane described above was obtained using\n", "Multisurface Method-Tree (MSM-T) [K. P. Bennett, \"Decision Tree\n", "Construction Via Linear Programming.\" Proceedings of the 4th\n", "Midwest Artificial Intelligence and Cognitive Science Society,\n", "pp. 97-101, 1992], a classification method which uses linear\n", "programming to construct a decision tree. Relevant features\n", "were selected using an exhaustive search in the space of 1-4\n", "features and 1-3 separating planes.\n", "\n", "The actual linear program used to obtain the separating plane\n", "in the 3-dimensional space is that described in:\n", "[K. P. Bennett and O. L. Mangasarian: \"Robust Linear\n", "Programming Discrimination of Two Linearly Inseparable Sets\",\n", "Optimization Methods and Software 1, 1992, 23-34].\n", "\n", "This database is also available through the UW CS ftp server:\n", "\n", "ftp ftp.cs.wisc.edu\n", "cd math-prog/cpo-dataset/machine-learn/WDBC/\n", "\n", "References\n", "----------\n", " - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \n", " for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \n", " Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\n", " San Jose, CA, 1993.\n", " - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and \n", " prognosis via linear programming. Operations Research, 43(4), pages 570-577, \n", " July-August 1995.\n", " - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques\n", " to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \n", " 163-171.\n", "\n" ] } ], "source": [ "cancer = load_breast_cancer()\n", "print(cancer.DESCR)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['mean radius' 'mean texture' 'mean perimeter' 'mean area'\n", " 'mean smoothness' 'mean compactness' 'mean concavity'\n", " 'mean concave points' 'mean symmetry' 'mean fractal dimension'\n", " 'radius error' 'texture error' 'perimeter error' 'area error'\n", " 'smoothness error' 'compactness error' 'concavity error'\n", " 'concave points error' 'symmetry error' 'fractal dimension error'\n", " 'worst radius' 'worst texture' 'worst perimeter' 'worst area'\n", " 'worst smoothness' 'worst compactness' 'worst concavity'\n", " 'worst concave points' 'worst symmetry' 'worst fractal dimension']\n", "['malignant' 'benign']\n" ] } ], "source": [ "print(cancer.feature_names)\n", "print(cancer.target_names)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 1.79900000e+01, 1.03800000e+01, 1.22800000e+02, ...,\n", " 2.65400000e-01, 4.60100000e-01, 1.18900000e-01],\n", " [ 2.05700000e+01, 1.77700000e+01, 1.32900000e+02, ...,\n", " 1.86000000e-01, 2.75000000e-01, 8.90200000e-02],\n", " [ 1.96900000e+01, 2.12500000e+01, 1.30000000e+02, ...,\n", " 2.43000000e-01, 3.61300000e-01, 8.75800000e-02],\n", " ..., \n", " [ 1.66000000e+01, 2.80800000e+01, 1.08300000e+02, ...,\n", " 1.41800000e-01, 2.21800000e-01, 7.82000000e-02],\n", " [ 2.06000000e+01, 2.93300000e+01, 1.40100000e+02, ...,\n", " 2.65000000e-01, 4.08700000e-01, 1.24000000e-01],\n", " [ 7.76000000e+00, 2.45400000e+01, 4.79200000e+01, ...,\n", " 0.00000000e+00, 2.87100000e-01, 7.03900000e-02]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cancer.data" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(569, 30)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cancer.data.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.0" } }, "nbformat": 4, "nbformat_minor": 2 }