"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"- 线性回归\n",
"- 逻辑回归\n",
"- 决策树\n",
"- SVM\n",
"- 朴素贝叶斯\n",
"---\n",
"- K最近邻算法\n",
"- K均值算法\n",
"- 随机森林算法\n",
"- 降维算法\n",
"- Gradient Boost 和 Adaboost 算法\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"> # 使用sklearn做线性回归\n",
"***\n",
"\n",
"王成军\n",
"\n",
"wangchengjun@nju.edu.cn\n",
"\n",
"计算传播网 http://computational-communication.com"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# 线性回归\n",
"- 通常用于估计连续性变量的实际数值(房价、呼叫次数、总销售额等)。\n",
"- 通过拟合最佳直线来建立自变量X和因变量Y的关系。\n",
"- 这条最佳直线叫做回归线,并且用 $Y= \\beta *X + C$ 这条线性等式来表示。\n",
"- 系数 $\\beta$ 和 C 可以通过最小二乘法获得"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2019-04-22T08:22:22.109042Z",
"start_time": "2019-04-22T08:22:20.811040Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import sklearn\n",
"from sklearn import datasets\n",
"from sklearn import linear_model\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.metrics import classification_report\n",
"from sklearn.preprocessing import scale"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2019-04-22T08:22:24.400103Z",
"start_time": "2019-04-22T08:22:24.390296Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# boston data\n",
"boston = datasets.load_boston()\n",
"y = boston.target\n",
"X = boston.data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2019-04-22T08:22:25.362696Z",
"start_time": "2019-04-22T08:22:25.356162Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',\n",
" 'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='| \n", " | title | \n", "link | \n", "author | \n", "author_page | \n", "click | \n", "reply | \n", "time | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "【民间语文第161期】宁波px启示:船进港湾人应上岸 | \n", "/post-free-2849477-1.shtml | \n", "贾也 | \n", "http://www.tianya.cn/50499450 | \n", "194675 | \n", "2703 | \n", "2012-10-29 07:59 | \n", "
| 1 | \n", "宁波镇海PX项目引发群体上访 当地政府发布说明(转载) | \n", "/post-free-2839539-1.shtml | \n", "无上卫士ABC | \n", "http://www.tianya.cn/74341835 | \n", "88244 | \n", "1041 | \n", "2012-10-24 12:41 | \n", "
| \n", " | Unnamed: 0 | \n", "PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.2500 | \n", "NaN | \n", "S | \n", "
| 1 | \n", "1 | \n", "2 | \n", "1 | \n", "1 | \n", "Cumings, Mrs. John Bradley (Florence Briggs Th... | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "PC 17599 | \n", "71.2833 | \n", "C85 | \n", "C | \n", "
| 2 | \n", "2 | \n", "3 | \n", "1 | \n", "3 | \n", "Heikkinen, Miss. Laina | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "STON/O2. 3101282 | \n", "7.9250 | \n", "NaN | \n", "S | \n", "
| 3 | \n", "3 | \n", "4 | \n", "1 | \n", "1 | \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "113803 | \n", "53.1000 | \n", "C123 | \n", "S | \n", "
| 4 | \n", "4 | \n", "5 | \n", "0 | \n", "3 | \n", "Allen, Mr. William Henry | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "373450 | \n", "8.0500 | \n", "NaN | \n", "S | \n", "
| \n", " | Survived | \n", "
|---|---|
| 892 | \n", "0 | \n", "
| 893 | \n", "0 | \n", "
| 894 | \n", "1 | \n", "