{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 特征工程和特征选择" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 介绍" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "本次实验通过特征提取、特征转换、特征选择三个过程介绍数据预处理方法,特征提取将原始数据转换为适合建模的特征,特征转换将数据进行变换以提高算法的准确性,特征选择用来删除无用的特征。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 知识点" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 特征提取\n", "- 特征转换\n", "- 特征选择" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "本次实验的一些示例将使用 Renthop 公司的数据集。首先载入数据集。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 下载数据并解压\n", "!wget -nc \"../../data/renthop_train.json.gz\"\n", "!gunzip \"renthop_train.json.gz\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-03-15T14:06:07.067528Z", "start_time": "2018-03-15T14:06:02.181930Z" } }, "outputs": [], "source": [ "import warnings\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2018-03-15T14:06:07.067528Z", "start_time": "2018-03-15T14:06:02.181930Z" }, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", " | bathrooms | \n", "bedrooms | \n", "building_id | \n", "created | \n", "description | \n", "display_address | \n", "features | \n", "latitude | \n", "listing_id | \n", "longitude | \n", "manager_id | \n", "photos | \n", "price | \n", "street_address | \n", "interest_level | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | \n", "1.5 | \n", "3 | \n", "53a5b119ba8f7b61d4e010512e0dfc85 | \n", "2016-06-24 07:54:24 | \n", "A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ... | \n", "Metropolitan Avenue | \n", "[] | \n", "40.7145 | \n", "7211212 | \n", "-73.9425 | \n", "5ba989232d0489da1b5f2c45f6688adc | \n", "[https://photos.renthop.com/2/7211212_1ed4542e... | \n", "3000 | \n", "792 Metropolitan Avenue | \n", "medium | \n", "
10000 | \n", "1.0 | \n", "2 | \n", "c5c8a357cba207596b04d1afd1e4f130 | \n", "2016-06-12 12:19:27 | \n", "\n", " | Columbus Avenue | \n", "[Doorman, Elevator, Fitness Center, Cats Allow... | \n", "40.7947 | \n", "7150865 | \n", "-73.9667 | \n", "7533621a882f71e25173b27e3139d83d | \n", "[https://photos.renthop.com/2/7150865_be3306c5... | \n", "5465 | \n", "808 Columbus Avenue | \n", "low | \n", "
100004 | \n", "1.0 | \n", "1 | \n", "c3ba40552e2120b0acfc3cb5730bb2aa | \n", "2016-04-17 03:26:41 | \n", "Top Top West Village location, beautiful Pre-w... | \n", "W 13 Street | \n", "[Laundry In Building, Dishwasher, Hardwood Flo... | \n", "40.7388 | \n", "6887163 | \n", "-74.0018 | \n", "d9039c43983f6e564b1482b273bd7b01 | \n", "[https://photos.renthop.com/2/6887163_de85c427... | \n", "2850 | \n", "241 W 13 Street | \n", "high | \n", "
100007 | \n", "1.0 | \n", "1 | \n", "28d9ad350afeaab8027513a3e52ac8d5 | \n", "2016-04-18 02:22:02 | \n", "Building Amenities - Garage - Garden - fitness... | \n", "East 49th Street | \n", "[Hardwood Floors, No Fee] | \n", "40.7539 | \n", "6888711 | \n", "-73.9677 | \n", "1067e078446a7897d2da493d2f741316 | \n", "[https://photos.renthop.com/2/6888711_6e660cee... | \n", "3275 | \n", "333 East 49th Street | \n", "low | \n", "
100013 | \n", "1.0 | \n", "4 | \n", "0 | \n", "2016-04-28 01:32:41 | \n", "Beautifully renovated 3 bedroom flex 4 bedroom... | \n", "West 143rd Street | \n", "[Pre-War] | \n", "40.8241 | \n", "6934781 | \n", "-73.9493 | \n", "98e13ad4b495b9613cef886d79a6291f | \n", "[https://photos.renthop.com/2/6934781_1fa4b41a... | \n", "3350 | \n", "500 West 143rd Street | \n", "low | \n", "