{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# K邻近算法" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "邻近算法,或者说K最近邻(kNN,k-NearestNeighbor)分类算法是数据挖掘分类技术中最简单的方法之一。所谓K最近邻,就是k个最近的邻居的意思,说的是每个样本都可以用它最接近的k个邻居来代表" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "欧式距离;曼哈顿距离;闵可夫斯基距离" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "交叉验证:在测试集上错误率最小,在训练集上错误率不一定最小" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "KNN困境:维度增加,距离失效;数据量大,算法超慢" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "一个机器学习程序需要多少数据训练:首先需要知道维度和特征的信息,维度和特征决定训练集的量" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "算法流程:\n", "对每一个未知点执行:\n", "=>计算未知点到所有已知类别点的距离\n", "=》按距离排序(升序)\n", "=》选取其中前k个与未知点离得最近的点\n", "=》统计k个点中各个类别的个数\n", "=》上述k个点里类别出现频率最高的作为未知点的类别" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "优点: \n", "简单有效、易理解" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "缺点: \n", "k近邻需要保存全部数据集,因此对内存消耗大,当数据集较大时对设备要求非常高; \n", "需要计算每个未知点到全部已知点的距离,可能会很耗时; \n", "分类结果不易理解" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 预测单个测试data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "录入数据集(此处为人模拟的数据集),X第一列为肿瘤大小,第二列为肿瘤时间,y为肿瘤性质良好还是恶性" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "raw_data_X = [[3.393533211, 2.331273381],\n", " [3.110073273, 1.786360121],\n", " [1.343892307, 3.362874429],\n", " [3.580243273, 4.671037091],\n", " [2.274392744, 2.873335573],\n", " [7.474390402, 4.673011339],\n", " [5.772024290, 3.560262131],\n", " [9.122354845, 2.568264233],\n", " [7.722344298, 3.479979792],\n", " [7.978408784, 0.773246244]\n", " ]\n", "raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[3.39353321 2.33127338]\n", " [3.11007327 1.78636012]\n", " [1.34389231 3.36287443]\n", " [3.58024327 4.67103709]\n", " [2.27439274 2.87333557]\n", " [7.4743904 4.67301134]\n", " [5.77202429 3.56026213]\n", " [9.12235485 2.56826423]\n", " [7.7223443 3.47997979]\n", " [7.97840878 0.77324624]]\n", "[0 0 0 0 0 1 1 1 1 1]\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAGaBJREFUeJzt3X9w3PWd3/Hny1iJI0Kki1EaY1leqFOugC9ABBfKDENR2hAHzLVHrmac5BxuqpZyjTDtZELUAcOM5pJJJrZSesnocC5wUSApJBdMOe5cBy5JZ4CTHbAghoYWZGTo2TGxwCdI/OPdP/a7X6+Ffqxsffe7q309Zna03+9+vPsaj63Xfr+f7w9FBGZmZgAL8g5gZma1w6VgZmYpl4KZmaVcCmZmlnIpmJlZyqVgZmYpl4KZmaVcCmZmlnIpmJlZamHeAWbr9NNPj0KhkHcMM7O6sn379l9GRNtM4+quFAqFAkNDQ3nHMDOrK5JGKhnn3UdmZpZyKZiZWcqlYGZmqbqbU5jMoUOHGB0d5a233so7yrQWLVpEe3s7TU1NeUcxM5vUvCiF0dFRTjvtNAqFApLyjjOpiGD//v2Mjo5y5pln5h3HzGxS82L30VtvvcXixYtrthAAJLF48eKa35oxs8Y2L0oBqOlCKKmHjGY1Z+LdIX23yEzNm1Iws3lowwZYv/5YEUQUlzdsyDPVvOZSmEOPPPIIZ599NitWrOCLX/xi3nGqZnB4kMKmAgtuX0BhU4HB4cG8I9l8EAEHDkB//7FiWL++uHzggLcYMjIvJpprwZEjR7jxxhvZunUr7e3tXHTRRaxevZpzzjkn72iZGhwepHtLN+OHxgEYGRuhe0s3AGtXrs0zmtU7CTZuLD7v7y8+AHp6iuu9OzYTDbmlkMU32yeffJIVK1Zw1lln8Y53vIM1a9bwwx/+cA7S1rbebb1pIZSMHxqnd1tvTolsXikvhhIXQqYarhRK32xHxkYIIv1me7LFsGfPHpYtW5Yut7e3s2fPnpONW/N2j+2e1XqzWSntMipXPsdgc67hSiGrb7YxyT/SRjjaqKOlY1brzSpWPofQ0wNHjxZ/ls8x2JxruFLI6ptte3s7L7/8cro8OjrKGWeccVLvWQ/6uvpobmo+bl1zUzN9XX05JbJ5Q4LW1uPnEDZuLC63tnoXUkYabqK5o6WDkbG3X0H2ZL/ZXnTRRfziF7/gxRdfZOnSpdx333185zvfOan3rAelyeTebb3sHttNR0sHfV19nmS2ubFhQ3GLoFQApWJwIWQm81KQdAowBOyJiKsmvLYO+DJQ2vl+Z0TclWWevq6+446Wgbn5Zrtw4ULuvPNOPvrRj3LkyBGuv/56zj333JONWxfWrlzrErDsTCwAF0KmqrGl0APsAt4zxevfjYg/rkIOINtvtqtWrWLVqlUn/T5mZnnJtBQktQMfB/qAm7P8rNnwN1szs8llPdG8CfgccHSaMb8vaaek+yUtm2acmZllLLNSkHQVsDcitk8zbAtQiIjfAf4ncPcU79UtaUjS0L59+zJIa2ZmkO2WwqXAakkvAfcBV0j6dvmAiNgfEb9OFv8M+NBkbxQRAxHRGRGdbW1tGUY2M2tsmZVCRNwSEe0RUQDWAD+KiE+Wj5G0pGxxNcUJaTMzy0nVz1OQdAcwFBEPAp+VtBo4DLwGrKt2HjMzO6YqZzRHxGOlcxQi4takEEpbE+dGxAcj4p9HxHPVyJOF66+/nve9732cd955eUcxMzthDXeZCyCTOzmtW7eORx555KTfx8wsT41XChndyemyyy7jve9970nHMzPLU2OVgu/kZGY2rca6IJ7v5GRmNq3G2lIA38nJzGwajVcKvpOTmdmUGqsUMryT03XXXccll1zC888/T3t7O5s3b57D4GZm1dF4cwqT3ckJTvpOTvfee+8chTQzy09jlQL4Tk5mZtNorN1HJb6Tk5nZpOZNKUQdTBTXQ0Yza2zzohQWLVrE/v37a/qXbkSwf/9+Fi1alHcUM7MpzYs5hfb2dkZHR6n1G/AsWrSI9vb2vGOYmU1pXpRCU1MTZ555Zt4xzMzq3rzYfWRmZnPDpWBmZqnMS0HSKZJ+JumhSV57p6TvSnpB0hOSClnnMTOzqVVjS6GHqe+9/EfAryJiBbAR+FIV8piZ2RQyLQVJ7cDHgbumGHINcHfy/H6gS/KZZGZmecl6S2ET8Dng6BSvLwVeBoiIw8AYsDjjTGZmNoXMSkHSVcDeiNg+3bBJ1r3tDDRJ3ZKGJA3V+rkIZmb1LMsthUuB1ZJeAu4DrpD07QljRoFlAJIWAi3AaxPfKCIGIqIzIjrb2toyjGxm1tgyK4WIuCUi2iOiAKwBfhQRn5ww7EHgD5Pn1yZjavdaFWZm81zVz2iWdAcwFBEPApuBv5D0AsUthDXVzmNmZsdUpRQi4jHgseT5rWXr3wI+UY0MZmY2M5/RbGZmKZeCmZmlXApmZpZyKZiZWcqlYGZmKZeCmZmlXApmZpZyKZiZWcqlYGZmKZeCmZmlXApmZpZyKZiZWcqlYGZmKZeCWS2aeFsR32bEqsSlYFZrNmyA9euPFUFEcXnDhjxTWYNwKZjVkgg4cAD6+48Vw/r1xeUDB7zFYJnL7CY7khYBPwbemXzO/RFx24Qx64AvA3uSVXdGxF1ZZTKreRJs3Fh83t9ffAD09BTXS/lls4agrG6JLEnAqRFxUFIT8FOgJyIeLxuzDuiMiD+u9H07OztjaGhozvOa1ZQIWFC2IX/0qAthOhHH//1MXDYkbY+IzpnGZbb7KIoOJotNycPbvmYzKe0yKlc+x2DH8xzMnMp0TkHSKZKeAvYCWyPiiUmG/b6knZLul7QsyzxmNa98DqGnp7iF0NNz/ByDHeM5mDmX2ZwCQEQcAc6X1Ar8QNJ5EfFM2ZAtwL0R8WtJ/x64G7hi4vtI6ga6ATo6OrKMbJYvCVpbj59DKM0xtLZ6l8hEnoOZc5nNKbztg6TbgH+IiK9M8fopwGsR0TLd+3hOwRqC95HPjudgZpT7nIKktmQLAUnvAj4CPDdhzJKyxdXArqzymNWVib/Q/Atuap6DmVNZziksAR6VtBP4O4pzCg9JukPS6mTMZyU9K+lp4LPAuqzCDA4PUthUYMHtCyhsKjA4PJjVR5lZtXgOZs5lNqcQETuBCyZZf2vZ81uAW7LKUDI4PEj3lm7GD40DMDI2QveWbgDWrlyb9cebWVY8BzPnqjanMFdOZE6hsKnAyNjI29Yvb1nOSze9NEfJzCw3noOZUe5zCrVk99juWa03szrjOZg50xCl0NEy+WGsU603M2tUDVEKfV19NDc1H7euuamZvq6+nBKZmdWmhiiFtSvXMnD1AMtbliPE8pblDFw94ElmM7MJGmKi2cys0Xmi2czMZs2lYGZmKZeCmZmlXApmZpZyKZiZWcqlYGZmKZeCmZmlXApmZpZyKZiZWcqlYGZmqSxvx7lI0pOSnk7urnb7JGPeKem7kl6Q9ISkQlZ5zMxsZlluKfwauCIiPgicD1wp6cMTxvwR8KuIWAFsBL6UYR4zM5tBZqUQRQeTxabkMfHqe9cAdyfP7we6JN8dw8wsL5nOKUg6RdJTwF5ga0Q8MWHIUuBlgIg4DIwBiyd5n25JQ5KG9u3bl2VkM7OGlmkpRMSRiDgfaAculnTehCGTbRW87VreETEQEZ0R0dnW1pZFVDMzo0pHH0XEAeAx4MoJL40CywAkLQRagNeqkcnMzN4uy6OP2iS1Js/fBXwEeG7CsAeBP0yeXwv8KOrtrj9mZvNIllsKS4BHJe0E/o7inMJDku6QtDoZsxlYLOkF4Gbg8xnmqSmDw4MUNhVYcPsCCpsKDA4P5h3JzIyFWb1xROwELphk/a1lz98CPpFVhlo1ODxI95Zuxg+NAzAyNkL3lm4A3zfazHI145aCpH8iaZukZ5Ll35H0X7KPNn/1butNC6Fk/NA4vdt6c0pkZlZUye6jPwNuAQ5BugWwJstQ893usd2zWm9mVi2VlEJzRDw5Yd3hLMI0io6WjlmtNzOrlkpK4ZeS/jHJ+QOSrgVezTTVPNfX1UdzU/Nx65qbmunr6sspkZlZUSUTzTcCA8BvS9oDvAh8MtNU81xpMrl3Wy+7x3bT0dJBX1efJ5nNLHeq9LQASacCCyLijWwjTa+zszOGhobyjGBmVnckbY+IzpnGzbilkJyA9mmgACwsXa8uIj57khnNzKzGVLL76GHgcWAYOJptHDMzy1MlpbAoIm7OPImZmeWukqOP/kLSv5W0RNJ7S4/Mk5mZWdVVsqXwG+DLQC/HLmsdwFlZhTIzs3xUUgo3Aysi4pdZhzEzs3xVsvvoWWB8xlFmZjb3Jp42kPHdBSrZUjgCPCXpUeDXpZU+JNXMLGMbNsCBA7BxI0jFQli/Hlpbi69loJJS+MvkYWZm1RJRLIT+/uLyxo3FQujvh56e4uua7I7GJ2fGUoiIu0/kjSUtA+4B3k/x/IaBiOifMOZy4IcUL50B8P2IuONEPs/MbF6RikUAxSIolUNPz7Ethyw+dqrLXEj6XkT8gaRhjh11VBIR8cFp31haAiyJiB2STgO2A78XET8vG3M58J8j4qpKA/syF2bWUCJgQdn079GjJ1QIlV7mYrqJ5p7k5y7g6rLHauD5md44Il6NiB3J8zeS91k6058zM7NEaQ6h3Pr1mU42T1kKEVG6PPaKiBgpe7wE/PZsPkRSgeKtOZ+Y5OVLJD0t6a8knTub9zUzm7dKhVCaQzh6tPizvz/TYphyTkHSDcB/AM6StLPspdOA/1XpB0h6N/AAcFNEvD7h5R3A8og4KGkVxQntD0zyHt1AN0BHh29EY2YNQCoeZVQ+h1CaY2htzWVOoQX4LeBPgM+XvfRGRLxW0ZtLTcBDwF9HxFcrGP8S0DndiXKeUzCzhjLxKKMTPOropC+dHRFjwBhw3aw/vRhAwGZg11SFIOn9wN9HREi6mOLurP0n8nlmZvPSxALIaAuhpJLzFE7UpcCngGFJTyXrvgB0AETEN4BrgRskHQbeBNZEpXf9MTOzOZdZKUTET4FpKy0i7gTuzCqDmZnNTiXXPjIzswbhUrCKDA4PUthUYMHtCyhsKjA4PJh3JDPLQJZzCjZPDA4P0r2lm/FDxYvljoyN0L2lG4C1K9fmGc3M5pi3FGxGvdt600IoGT80Tu+23pwSmVlWXAo2o91ju2e13szql0vBZtTRMvlZ5FOtN7P65VKwGfV19dHc1HzcuuamZvq6+nJKZGZZcSnYjNauXMvA1QMsb1mOEMtbljNw9YAnmc3moSmvfVSrfO0jM7PZm4v7KZiZWYNxKZiZWcqlYGZmKZeCmZmlXApmZpZyKZiZWcqlYGZmqcxKQdIySY9K2iXpWUk9k4yRpK9JekHSTkkXZpXHzMxmluWlsw8D/ykidkg6DdguaWtE/LxszMeADySP3wW+nvw0M7McZLalEBGvRsSO5PkbwC5g6YRh1wD3RNHjQKukJVllMjOz6VVlTkFSAbgAeGLCS0uBl8uWR3l7cZiZWZVkXgqS3g08ANwUEa9PfHmSP/K2izFJ6pY0JGlo3759WcQ0MzMyLgVJTRQLYTAivj/JkFFgWdlyO/DKxEERMRARnRHR2dbWlk1YMzPL9OgjAZuBXRHx1SmGPQh8OjkK6cPAWES8mlUmMzObXpZHH10KfAoYlvRUsu4LQAdARHwDeBhYBbwAjAOfyTCPmZnNILNSiIifMvmcQfmYAG7MKoOZmc2Oz2g2M7OUS8HMzFIuBTMzS7kUGtjg8CCFTQUW3L6AwqYCg8ODeUcys5xlefSR1bDB4UG6t3QzfmgcgJGxEbq3dAOwduXaPKOZWY68pdCgerf1poVQMn5onN5tvTklMrNa4FJoULvHds9qvZk1BpdCg+po6ZjVejNrDC6FBtXX1UdzU/Nx65qbmunr6sspkZnVApdCg1q7ci0DVw+wvGU5QixvWc7A1QOeZDZrcCpeaaJ+dHZ2xtDQUN4xzMzqiqTtEdE50zhvKZiZWcqlYGZmKZeCmZmlXApmZpZyKZiZWSrL23F+U9JeSc9M8frlksYkPZU8bs0qi5mZVSbLC+J9C7gTuGeaMT+JiKsyzGBmZrOQ2ZZCRPwYeC2r9zczs7mX95zCJZKelvRXks7NOYuZWcPL834KO4DlEXFQ0irgL4EPTDZQUjfQDdDR4Qu2mZllJbcthYh4PSIOJs8fBpoknT7F2IGI6IyIzra2tqrmNDNrJLmVgqT3S1Ly/OIky/688piZWYa7jyTdC1wOnC5pFLgNaAKIiG8A1wI3SDoMvAmsiXq7Op+Z2TyTWSlExHUzvH4nxUNWzcysRuR99JGZmdUQl4KZmaVcCmZmlnIpmJlZyqVgZmYpl4KZmaVcCmZmlnIpmJlZyqVgZmYpl4KZmaVcCmZmlnIpmJlZyqVgZmYpl4KZmaVcCmZmlnIpmJlZKrNSkPRNSXslPTPF65L0NUkvSNop6cKsspiZWWWy3FL4FnDlNK9/DPhA8ugGvp5hFjMzq0BmpRARPwZem2bINcA9UfQ40CppSVZ5zMxsZnnOKSwFXi5bHk3WvY2kbklDkob27dtXlXBmZo0oz1LQJOtisoERMRARnRHR2dbWlnEsM7PGlWcpjALLypbbgVdyymJmZuRbCg8Cn06OQvowMBYRr+aYx8ys4S3M6o0l3QtcDpwuaRS4DWgCiIhvAA8Dq4AXgHHgM1llMTOzymRWChFx3QyvB3BjVp9vZmaz5zOazcws5VIwM7OUS8HMGkfE9MvmUjCzBrFhA6xff6wIIorLGzbkmarmuBTMbP6LgAMHoL//WDGsX19cPnDAWwxlMjv6yMysZkiwcWPxeX9/8QHQ01Ncr8kusNCYFHXWkJ2dnTE0NJR3DDOrRxGwoGwHydGjDVMIkrZHROdM47z7yMwaQ2mXUbnyOQYDXApm1gjK5xB6eopbCD09x88xGOA5BTNrBBK0th4/h1CaY2htbZhdSJXwnIKZNY6I4wtg4vI85jkFM7OJJhZAgxTCbLgUzMws5VIwM7OUS8HMzFIuBTMzS7kUzMws5VIwM7NU3Z2nIGkfMAKcDvwy5zhTcbYT42wnxtlOTKNlWx4RbTMNqrtSKJE0VMmJGHlwthPjbCfG2U6Ms03Ou4/MzCzlUjAzs1Q9l8JA3gGm4WwnxtlOjLOdGGebRN3OKZiZ2dyr5y0FMzObY3VXCpK+KWmvpGfyzjKRpGWSHpW0S9KzknryzlQiaZGkJyU9nWS7Pe9ME0k6RdLPJD2Ud5Zykl6SNCzpKUk1dd12Sa2S7pf0XPLv7pK8MwFIOjv5+yo9Xpd0U965SiStT/4fPCPpXkmL8s5UIqknyfVsHn9ndbf7SNJlwEHgnog4L+885SQtAZZExA5JpwHbgd+LiJ/nHA1JAk6NiIOSmoCfAj0R8XjO0VKSbgY6gfdExFV55ymR9BLQGRE1d0y7pLuBn0TEXZLeATRHxIG8c5WTdAqwB/jdiBipgTxLKf77Pyci3pT0PeDhiPhWvslA0nnAfcDFwG+AR4AbIuIX1cpQd1sKEfFj4LW8c0wmIl6NiB3J8zeAXcDSfFMVRdHBZLEpedTMNwJJ7cDHgbvyzlIvJL0HuAzYDBARv6m1Qkh0Af+nFgqhzELgXZIWAs3AKznnKfmnwOMRMR4Rh4G/Bf5VNQPUXSnUC0kF4ALgiXyTHJPsnnkK2AtsjYiayQZsAj4HHM07yCQC+BtJ2yV15x2mzFnAPuDPk91ud0k6Ne9Qk1gD3Jt3iJKI2AN8BdgNvAqMRcTf5Jsq9QxwmaTFkpqBVcCyagZwKWRA0ruBB4CbIuL1vPOURMSRiDgfaAcuTjZVcyfpKmBvRGzPO8sULo2IC4GPATcmuzBrwULgQuDrEXEB8A/A5/ONdLxkl9Zq4L/nnaVE0m8B1wBnAmcAp0r6ZL6piiJiF/AlYCvFXUdPA4ermcGlMMeS/fUPAIMR8f2880wm2cXwGHBlzlFKLgVWJ/vu7wOukPTtfCMdExGvJD/3Aj+guL+3FowCo2VbfPdTLIla8jFgR0T8fd5BynwEeDEi9kXEIeD7wD/LOVMqIjZHxIURcRnFXeVVm08Al8KcSiZzNwO7IuKreecpJ6lNUmvy/F0U/2M8l2+qooi4JSLaI6JAcVfDjyKiJr65STo1OWiAZNfMv6S4iZ+7iPh/wMuSzk5WdQG5H9QwwXXU0K6jxG7gw5Kak/+zXRTn/2qCpPclPzuAf02V//4WVvPD5oKke4HLgdMljQK3RcTmfFOlLgU+BQwn++4BvhARD+eYqWQJcHdyJMgC4HsRUVOHftaofwT8oPi7g4XAdyLikXwjHec/AoPJbpr/C3wm5zypZJ/4vwD+Xd5ZykXEE5LuB3ZQ3DXzM2rr7OYHJC0GDgE3RsSvqvnhdXdIqpmZZce7j8zMLOVSMDOzlEvBzMxSLgUzM0u5FMzMLOVSMDsJyaUlzsk7h9lc8SGpZmaW8paCWYWSs5v/R3JPimck/RtJj0nqlLS67N4Bz0t6MfkzH5L0t8nF9P46uby6Wc1yKZhV7krglYj4YHIvj/TM5oh4MCLOTy44+DTwleQ6WP8VuDYiPgR8E+jLI7hZperuMhdmORqm+Mv+S8BDEfGT5PIXKUmfA96MiP+WXIX2PGBrMu4UipdqNqtZLgWzCkXE/5b0IYrXuP8TScddg19SF/AJije+ARDwbETUxC0yzSrh3UdmFZJ0BjAeEd+meJOWC8teWw78KfAHEfFmsvp5oK1032RJTZLOrXJss1nxloJZ5VYCX5Z0lOIVLG+gWA4A64DFHLui6isRsUrStcDXJLVQ/P+2CXi22sHNKuVDUs3MLOXdR2ZmlnIpmJlZyqVgZmYpl4KZmaVcCmZmlnIpmJlZyqVgZmYpl4KZmaX+PwmjlCLmDDhqAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#转为向量传入训练集\n", "X_train = np.array(raw_data_X)\n", "y_train = np.array(raw_data_y)\n", "print (X_train)\n", "print (y_train)\n", "\n", "#绘制散点图\n", "plt.scatter(X_train[y_train == 0,0], X_train[y_train == 0,1], c='g', marker='o', label='0')\n", "plt.scatter(X_train[y_train == 1,0], X_train[y_train == 1,1], c='r', marker='x', label='1')\n", "plt.xlabel('size')\n", "plt.ylabel('time')\n", "plt.legend(loc='upper left')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAGAhJREFUeJzt3X+QXXd93vH3I1sFLxBtitVibEtbCiUNOGDYIVDPMB5EChjbUGISMwICk+m2lJS1aGFi1MGCmR2GMYMkSgujYIoNWwg1P2q7TohrIEBn7HRlbMvGuHWLJfyjscBI4MgB2fr0j3v3aLVeaXflPXt2te/XzJ2959yv7n1GI+nR+X7PPSdVhSRJAKu6DiBJWjosBUlSw1KQJDUsBUlSw1KQJDUsBUlSw1KQJDUsBUlSw1KQJDVO7jrAfJ166qk1NDTUdQxJWlZ27tz5k6paO9u4ZVcKQ0NDTExMdB1DkpaVJLvnMs7pI0lSw1KQJDUsBUlSw1KQJDUsBUlSw1KQJDUsBUlL2/S7Q3q3yFZZCpKWri1bYNOmw0VQ1dvesqXLVCc0S0FP2viucYa2DbHqQ6sY2jbE+K7xriPpRFAF+/bB9u2Hi2HTpt72vn0eMbRk2X2jWUvL+K5xRq4d4cDBAwDs3r+bkWtHANh41sYuo2m5S2Dr1t7z7dt7D4DR0d7+pLtsJ7DUMmvb4eHh8jIXS8fQtiF273/it+fXr1nPvZfcu/iBdOKpglVTJjUOHbIQjkOSnVU1PNs4p4/0pOzZv2de+6V5mZwymmrqGoMWnKWgJ2XdmnXz2i/N2dQ1hNHR3hHC6OiRawxacJaCnpSxDWMMrB44Yt/A6gHGNox1lEgnjAQGB49cQ9i6tbc9OOgUUktcU9CTNr5rnM03bmbP/j2sW7OOsQ1jLjJr4VQdWQDTtzUnc11TaL0UkpwETAD3V9X50157B3A5cH9/1yer6jPHej9LQZLmb66lsBinpI4CdwG/dpTX/7Sq/mgRckiSZtHqmkKSM4DXA8f8378kaWloe6F5G/B+4NAxxvxuktuTXJ3kzJbzSJKOobVSSHI+8FBV7TzGsGuBoar6LeC/A1ce5b1Gkkwkmdi7d28LaSVJ0O6RwjnAhUnuBb4EvCrJF6YOqKqfVtUv+5t/Arx0pjeqqh1VNVxVw2vXrm0xsiStbK2VQlVdWlVnVNUQcDHwzap669QxSU6bsnkhvQVpSVJHFv2CeEk+DExU1TXAe5JcCDwGPAy8Y7HzSJIO88trkrQCeEE8SdK8WQqSpIalIElqWAqSpIalIElqWAqSpIalIElqWAqSpIalIElqWAqSpIalIElqWAqSpIalIElqWAqSpIalIElqWAqSpEbrpZDkpCTfT3LdDK89JcmfJrknyc1JhtrOI0k6usU4Uhjl6Pde/kPgZ1X1XGAr8NFFyCNJOopWSyHJGcDrgc8cZcgbgCv7z68GNiRJm5kkSUfX9pHCNuD9wKGjvH468GOAqnoM2A88s+VMkqSjaK0UkpwPPFRVO481bIZ9NcN7jSSZSDKxd+/eBcsoSTpSm0cK5wAXJrkX+BLwqiRfmDbmPuBMgCQnA2uAh6e/UVXtqKrhqhpeu3Zti5ElaWVrrRSq6tKqOqOqhoCLgW9W1VunDbsG+IP+84v6Y55wpCBJWhwnL/YHJvkwMFFV1wBXAJ9Pcg+9I4SLFzuPJOmwRSmFqvo28O3+8w9O2f+3wJsXI4MkaXZ+o1mS1LAUJEkNS0GS1LAUJEkNS0GS1LAUJEkNS0GS1LAUJEkNS0GS1LAUJEkNS0GS1LAUJEkNS0GS1LAUpKVo+m1FvM2IFomlIC01W7bApk2Hi6Cqt71lS5eptEJYCtJSUgX79sH27YeLYdOm3va+fR4xqHWt3WQnyVOB7wBP6X/O1VV12bQx7wAuB+7v7/pkVX2mrUzSkpfA1q2959u39x4Ao6O9/Ul32bQipK1bIicJ8LSqeiTJauB7wGhV3TRlzDuA4ar6o7m+7/DwcE1MTCx4XmlJqYJVUw7kDx2yEI6l6sjfn+nbIsnOqhqebVxr00fV80h/c3X/4bGvNJvJKaOppq4x6EiuwSyoVtcUkpyU5FbgIeCGqrp5hmG/m+T2JFcnObPNPNKSN3UNYXS0d4QwOnrkGoMOcw1mwbW2pgBQVY8DL04yCHwtyQur6o4pQ64FvlhVv0zyL4ErgVdNf58kI8AIwLp169qMLHUrgcHBI9cQJtcYBgedEpnONZgF19qawhM+KLkM+Juq+thRXj8JeLiq1hzrfVxT0IrgHPn8uAYzq87XFJKs7R8hkOQU4NXAD6eNOW3K5oXAXW3lkZaV6f+g+Q/c0bkGs6DaXFM4DfhWktuB/0lvTeG6JB9OcmF/zHuS3JnkNuA9wDvaCjO+a5yhbUOs+tAqhrYNMb5rvK2PkrRYXINZcK2tKVTV7cDZM+z/4JTnlwKXtpVh0viucUauHeHAwQMA7N6/m5FrRwDYeNbGtj9eUltcg1lwi7amsFCOZ01haNsQu/fvfsL+9WvWc+8l9y5QMkmdcQ1mVp2vKSwle/bvmdd+ScuMazALZkWUwro1M5/GerT9krRSrYhSGNswxsDqgSP2DaweYGzDWEeJJGlpWhGlsPGsjey4YAfr16wnhPVr1rPjgh0uMkvSNCtioVmSVjoXmiVJ82YpSJIaloIkqWEpSJIaloIkqWEpSJIaloIkqWEpSJIaloIkqWEpSJIard1kJ8lTge8AT+l/ztVVddm0MU8BrgJeCvwU+P2quretTNJ8ff3793P5N+7mgX2P8uzBU3jfa57PG88+vetYUmvaPFL4JfCqqnoR8GLgtUlePm3MHwI/q6rnAluBj7aYR5qXr3//fi796i7u3/coBdy/71Eu/eouvv79+7uOJrWmtVKonkf6m6v7j+lX33sDcGX/+dXAhsS7Y2hpuPwbd/PowceP2Pfowce5/Bt3d5RIal+rawpJTkpyK/AQcENV3TxtyOnAjwGq6jFgP/DMGd5nJMlEkom9e/e2GVlqPLDv0Xntl04ErZZCVT1eVS8GzgBeluSF04bMdFTwhGt5V9WOqhququG1a9e2EVV6gmcPnjKv/dKJYFHOPqqqfcC3gddOe+k+4EyAJCcDa4CHFyOTNJv3veb5nLL6pCP2nbL6JN73mud3lEhqX2ulkGRtksH+81OAVwM/nDbsGuAP+s8vAr5Zy+2uPzphvfHs0/nIm87i9MFTCHD64Cl85E1nefaRTmitnZIKnAZcmeQkeuXz5aq6LsmHgYmquga4Avh8knvoHSFc3GKeJWV81zibb9zMnv17WLdmHWMbxrw96BL0xrNPtwS0orRWClV1O3D2DPs/OOX53wJvbivDUjW+a5yRa0c4cPAAALv372bk2hEAi0FSp2adPkryj5LcmOSO/vZvJfl37Uc7cW2+cXNTCJMOHDzA5hs3d5RIknrmsqbwJ8ClwEFojgBWzDRPG/bs3zOv/ZK0WOZSCgNV9VfT9j3WRpiVYt2adfPaL0mLZS6l8JMk/5D+9weSXAQ82GqqE9zYhjEGVg8csW9g9QBjG8Y6SiRJPXNZaH43sAP4jST3Az8C3tpqqhPc5GKyZx9JWmoy168FJHkasKqqftFupGMbHh6uiYmJLiNI0rKTZGdVDc82btYjhf4X0N4ODAEnT16vrqre8yQzSpKWmLlMH10P3ATsAg61G0eS1KW5lMJTq+q9rSeRJHVuLmcffT7JP09yWpK/O/loPZkkadHN5UjhV8DlwGYOX9a6gOe0FUqS1I25lMJ7gedW1U/aDiNJ6tZcpo/uBA7MOkqStPCmf22g5bsLzOVI4XHg1iTfAn45udNTUiWpZVu2wL59sHUrJL1C2LQJBgd7r7VgLqXw9f5DkrRYqnqFsH17b3vr1l4hbN8Oo6O91zPTHY2fnFlLoaquPJ43TnImcBXwLHrfb9hRVdunjTkX+K/0Lp0B8NWq+vDxfJ4knVCSXhFArwgmy2F09PCRQxsfe7TLXCT5clX9XpJdHD7raFJV1YuO+cbJacBpVXVLkmcAO4E3VtUPpow5F/i3VXX+XAN7mQtJK0oVrJqy/Hvo0HEVwlwvc3GshebR/s+7gAumPC4E7p7tjavqwaq6pf/8F/338b6GkjRXk2sIU23a1Opi81FLoaomL4/93KraPeVxL/Ab8/mQJEP0bs158wwvvyLJbUn+LMkL5vO+knTCmiyEyTWEQ4d6P7dvb7UYjrqmkORdwL8CnpPk9ikvPQP4H3P9gCRPB74CXFJVP5/28i3A+qp6JMl59Ba0nzfDe4wAIwDr1nkjGkkrQNI7y2jqGsLkGsPgYCdrCmuAXwc+AvzxlJd+UVUPz+nNk9XAdcA3qurjcxh/LzB8rC/KuaYgaUWZfpbRcZ519KQvnV1V+4H9wFvm/em9AAGuAO46WiEkeRbw11VVSV5Gbzrrp8fzeZJ0QppeAC0dIUyay/cUjtc5wNuAXUlu7e/7ALAOoKo+DVwEvCvJY8CjwMU117v+SJIWXGulUFXfA45ZaVX1SeCTbWWQJM3PXK59JElaISwFzcn4rnGGtg2x6kOrGNo2xPiu8a4jSWpBm2sKOkGM7xpn5NoRDhzsXSx39/7djFw7AsDGszZ2GU3SAvNIQbPafOPmphAmHTh4gM03bu4okaS2WAqa1Z79e+a1X9LyZSloVuvWzPwt8qPtl7R8WQqa1diGMQZWDxyxb2D1AGMbxjpKJKktloJmtfGsjey4YAfr16wnhPVr1rPjgh0uMksnoKNe+2ip8tpHkjR/C3E/BUnSCmMpSJIaloIkqWEpSJIaloIkqWEpSJIaloIkqdFaKSQ5M8m3ktyV5M4kozOMSZJPJLknye1JXtJWHknS7Nq8dPZjwL+pqluSPAPYmeSGqvrBlDGvA57Xf/w28Kn+T0lSB1o7UqiqB6vqlv7zXwB3AadPG/YG4KrquQkYTHJaW5kkSce2KGsKSYaAs4Gbp710OvDjKdv38cTikCQtktZLIcnTga8Al1TVz6e/PMMvecLFmJKMJJlIMrF37942YkqSaLkUkqymVwjjVfXVGYbcB5w5ZfsM4IHpg6pqR1UNV9Xw2rVr2wkrSWr17KMAVwB3VdXHjzLsGuDt/bOQXg7sr6oH28okSTq2Ns8+Ogd4G7Arya39fR8A1gFU1aeB64HzgHuAA8A7W8wjSZpFa6VQVd9j5jWDqWMKeHdbGSRJ8+M3miVJDUtBktSwFCRJDUthBRvfNc7QtiFWfWgVQ9uGGN813nUkSR1r8+wjLWHju8YZuXaEAwcPALB7/25Grh0BYONZG7uMJqlDHimsUJtv3NwUwqQDBw+w+cbNHSWStBRYCivUnv175rVf0spgKaxQ69asm9d+SSuDpbBCjW0YY2D1wBH7BlYPMLZhrKNEkpYCS2GF2njWRnZcsIP1a9YTwvo169lxwQ4XmaUVLr0rTSwfw8PDNTEx0XUMSVpWkuysquHZxnmkIElqWAqSpIalIElqWAqSpIalIElqtHk7zs8meSjJHUd5/dwk+5Pc2n98sK0skqS5afOCeJ8DPglcdYwx362q81vMIEmah9aOFKrqO8DDbb2/JGnhdb2m8IoktyX5syQv6DiLJK14Xd5P4RZgfVU9kuQ84OvA82YamGQEGAFYt84LtklSWzo7Uqiqn1fVI/3n1wOrk5x6lLE7qmq4qobXrl27qDklaSXprBSSPCtJ+s9f1s/y067ySJJanD5K8kXgXODUJPcBlwGrAarq08BFwLuSPAY8Clxcy+3qfJJ0gmmtFKrqLbO8/kl6p6xKkpaIrs8+kiQtIZaCJKlhKUiSGpaCJKlhKUiSGpaCJKlhKUiSGpaCJKlhKUiSGpaCJKlhKUiSGpaCJKlhKUiSGpaCJKlhKUiSGpaCJKnRWikk+WySh5LccZTXk+QTSe5JcnuSl7SVRZI0N20eKXwOeO0xXn8d8Lz+YwT4VItZJElz0FopVNV3gIePMeQNwFXVcxMwmOS0tvJIkmbX5ZrC6cCPp2zf19/3BElGkkwkmdi7d++ihJOklajLUsgM+2qmgVW1o6qGq2p47dq1LceSpJWry1K4DzhzyvYZwAMdZZEk0W0pXAO8vX8W0suB/VX1YId5JGnFO7mtN07yReBc4NQk9wGXAasBqurTwPXAecA9wAHgnW1lkSTNTWulUFVvmeX1At7d1udLkubPbzRLkhqWgiSpYSlIWjmqjr0tS0HSCrFlC2zadLgIqnrbW7Z0mWrJsRQknfiqYN8+2L79cDFs2tTb3rfPI4YpWjv7SJKWjAS2bu0937699wAYHe3tz0wXWFiZUsusIYeHh2tiYqLrGJKWoypYNWWC5NChFVMISXZW1fBs45w+krQyTE4ZTTV1jUGApSBpJZi6hjA62jtCGB09co1BgGsKklaCBAYHj1xDmFxjGBxcMVNIc+GagqSVo+rIApi+fQJzTUGSppteACukEObDUpAkNSwFSVLDUpAkNSwFSVLDUpAkNSwFSVJj2X1PIcleYDdwKvCTjuMcjdmOj9mOj9mOz0rLtr6q1s42aNmVwqQkE3P5IkYXzHZ8zHZ8zHZ8zDYzp48kSQ1LQZLUWM6lsKPrAMdgtuNjtuNjtuNjthks2zUFSdLCW85HCpKkBbbsSiHJZ5M8lOSOrrNMl+TMJN9KcleSO5OMdp1pUpKnJvmrJLf1s32o60zTJTkpyfeTXNd1lqmS3JtkV5Jbkyyp67YnGUxydZIf9v/cvaLrTABJnt///Zp8/DzJJV3nmpRkU//vwR1JvpjkqV1nmpRktJ/rzi5+z5bd9FGSVwKPAFdV1Qu7zjNVktOA06rqliTPAHYCb6yqH3QcjSQBnlZVjyRZDXwPGK2qmzqO1kjyXmAY+LWqOr/rPJOS3AsMV9WSO6c9yZXAd6vqM0n+DjBQVfu6zjVVkpOA+4HfrqrdSyDP6fT+/P9mVT2a5MvA9VX1uW6TQZIXAl8CXgb8Cvhz4F1V9b8XK8OyO1Koqu8AD3edYyZV9WBV3dJ//gvgLuD0blP1VM8j/c3V/ceS+R9BkjOA1wOf6TrLcpHk14BXAlcAVNWvlloh9G0A/s9SKIQpTgZOSXIyMAA80HGeSf8YuKmqDlTVY8BfAv9sMQMsu1JYLpIMAWcDN3eb5LD+9MytwEPADVW1ZLIB24D3A4e6DjKDAv4iyc4kI12HmeI5wF7gP/Wn3T6T5Gldh5rBxcAXuw4xqaruBz4G7AEeBPZX1V90m6pxB/DKJM9MMgCcB5y5mAEshRYkeTrwFeCSqvp513kmVdXjVfVi4AzgZf1D1c4lOR94qKp2dp3lKM6pqpcArwPe3Z/CXApOBl4CfKqqzgb+BvjjbiMdqT+ldSHwX7rOMinJrwNvAP4B8GzgaUne2m2qnqq6C/gocAO9qaPbgMcWM4OlsMD68/VfAcar6qtd55lJf4rh28BrO44y6Rzgwv7c/ZeAVyX5QreRDquqB/o/HwK+Rm++dym4D7hvyhHf1fRKYil5HXBLVf1110GmeDXwo6raW1UHga8C/6TjTI2quqKqXlJVr6Q3Vb5o6wlgKSyo/mLuFcBdVfXxrvNMlWRtksH+81Po/cX4Ybepeqrq0qo6o6qG6E01fLOqlsT/3JI8rX/SAP2pmX9K7xC/c1X1/4AfJ3l+f9cGoPOTGqZ5C0to6qhvD/DyJAP9v7Mb6K3/LQlJ/l7/5zrgTSzy79/Ji/lhCyHJF4FzgVOT3AdcVlVXdJuqcQ7wNmBXf+4e4ANVdX2HmSadBlzZPxNkFfDlqlpSp34uUX8f+Frv3w5OBv5zVf15t5GO8K+B8f40zf8F3tlxnkZ/Tvx3gH/RdZapqurmJFcDt9Cbmvk+S+vbzV9J8kzgIPDuqvrZYn74sjslVZLUHqePJEkNS0GS1LAUJEkNS0GS1LAUJEkNS0F6EvqXlvjNrnNIC8VTUiVJDY8UpDnqf7v5v/XvSXFHkt9P8u0kw0kunHLvgLuT/Kj/a16a5C/7F9P7Rv/y6tKSZSlIc/da4IGqelH/Xh7NN5ur6pqqenH/goO3AR/rXwfr3wMXVdVLgc8CY10El+Zq2V3mQurQLnr/2H8UuK6qvtu//EUjyfuBR6vqP/SvQvtC4Ib+uJPoXapZWrIsBWmOqup/JXkpvWvcfyTJEdfgT7IBeDO9G98ABLizqpbELTKluXD6SJqjJM8GDlTVF+jdpOUlU15bD/xH4Peq6tH+7ruBtZP3TU6yOskLFjm2NC8eKUhzdxZweZJD9K5g+S565QDwDuCZHL6i6gNVdV6Si4BPJFlD7+/bNuDOxQ4uzZWnpEqSGk4fSZIaloIkqWEpSJIaloIkqWEpSJIaloIkqWEpSJIaloIkqfH/AXda72rzsjFvAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#要预测的点\n", "x = np.array([5, 3])\n", "\n", "#图中观测\n", "plt.scatter(X_train[y_train == 0,0], X_train[y_train == 0,1], c='g', marker='o', label='良性')\n", "plt.scatter(X_train[y_train == 1,0], X_train[y_train == 1,1], c='r', marker='x', label='恶性')\n", "plt.scatter(x[0], x[1])\n", "plt.xlabel('size')\n", "plt.ylabel('time')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## KNN过程" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.740095064966033, 2.246050932042296, 3.6740714900551215, 2.1927321139070988, 2.728548843438044, 2.986900534321874, 0.9538947320237519, 4.144901113488964, 2.764333387560458, 3.718783561121629]\n" ] } ], "source": [ "#求距离\n", "from math import sqrt\n", "\n", "distance=[]\n", "\n", "for x_train in X_train:\n", " d = sqrt(np.sum((x_train - x)**2))\n", " distance.append(d)\n", "print (distance)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([6, 0, 3, 1, 4, 8, 5, 2, 9, 7], dtype=int64)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#从距离数组中获得最近的k个点,如果将X排序,与y则不对应,此处想获得的是索引,所以可以用argsort方法进行排序获得其索引找到最近的k个点在哪\n", "\n", "nearest = np.argsort(distance)\n", "nearest" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 0, 0, 0, 0, 1]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#令k = 6\n", "k = 6\n", "topK_k = [y_train[i] for i in nearest[:6]]\n", "topK_k" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Counter({0: 4, 1: 2})\n", "[(0, 4)]\n", "[(0, 4), (1, 2)]\n" ] } ], "source": [ "#求投票结果(每种结果各有多少个,比例)\n", "\n", "from collections import Counter\n", "\n", "Counter(topK_k) #获取的结果为字典,键为原数组中的各个值,相应的值为出现的次数\n", "\n", "votes = Counter(topK_k)\n", "print (votes)\n", "\n", "votes.most_common(1) #找票数最多的点,即值最大的一个或几个(参数传入几,即寻找最大的前几个点)\n", "print (votes.most_common(1))\n", "votes.most_common(2)\n", "print (votes.most_common(2))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predict_y = votes.most_common(1)[0][0]\n", "print (predict_y)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }