{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Week 2: Predicting house prices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this module, we focused on using regression to predict a continuous value (house prices) from features of the house (square feet of living space, number of bedrooms,...). We also built an iPython notebook for predicting house prices, using data from King County, USA, the region where the city of Seattle is located.\n", "\n", "In this assignment, we are going to build a more accurate regression model for predicting house prices by including more features of the house. In the process, we will also become more familiar with how the Python language can be used for data exploration, data transformations and machine learning. These techniques will be key to building intelligent applications." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Learning outcomes\n", "- Execute programs with the iPython notebook\n", "- Load and transform real, tabular data\n", "- Compute summaries and statistics of the data\n", "- Build a regression model using features of the data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[WARNING] Unable to write current GraphLab Create license to /Users/jcj/.graphlab/config. Ensure that this user account has write permission to /Users/jcj/.graphlab/config to save the license for offline use.\n", "[INFO] This non-commercial license of GraphLab Create is assigned to chengjun@chem.ku.dk and will expire on January 27, 2017. For commercial licensing options, visit https://dato.com/buy/.\n", "\n", "[INFO] Start server at: ipc:///tmp/graphlab_server-19663 - Server binary: /usr/local/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1454094383.log\n", "[INFO] GraphLab Server Version: 1.8.1\n", "[WARNING] Unable to create session in specified location: '/Users/jcj/.graphlab/artifacts'. Using: '/var/tmp/graphlab-jcj/19663/tmp_session_c9c27ebf-0083-4c08-bad3-cd2e93f68c64'\n" ] } ], "source": [ "import graphlab\n", "\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sales = graphlab.SFrame(\"home_data.gl/\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore data" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9118\n", "21613\n", "0.421875722945\n" ] } ], "source": [ "sales_filtered = sales[(sales[\"sqft_living\"]>2000) & (sales[\"sqft_living\"] <=4000)]\n", "print len(sales_filtered)\n", "print len(sales)\n", "print float(len(sales_filtered))/len(sales)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "$(\"head\").append($(\"\").attr({\n", " rel: \"stylesheet\",\n", " type: \"text/css\",\n", " href: \"//cdnjs.cloudflare.com/ajax/libs/font-awesome/4.1.0/css/font-awesome.min.css\"\n", "}));\n", "$(\"head\").append($(\"\").attr({\n", " rel: \"stylesheet\",\n", " type: \"text/css\",\n", " href: \"//dato.com/files/canvas/1.8.1/css/canvas.css\"\n", "}));\n", "\n", " (function(){\n", "\n", " var e = null;\n", " if (typeof element == 'undefined') {\n", " var scripts = document.getElementsByTagName('script');\n", " var thisScriptTag = scripts[scripts.length-1];\n", " var parentDiv = thisScriptTag.parentNode;\n", " e = document.createElement('div');\n", " parentDiv.appendChild(e);\n", " } else {\n", " e = element[0];\n", " }\n", "\n", " if (typeof requirejs !== 'undefined') {\n", " // disable load timeout; ipython_app.js is large and can take a while to load.\n", " requirejs.config({waitSeconds: 0});\n", " }\n", "\n", " require(['//dato.com/files/canvas/1.8.1/js/ipython_app.js'], function(IPythonApp){\n", " var app = new IPythonApp();\n", " app.attachView('sframe','Plots', {\"selected_variable\": {\"name\": [\"sales\"], \"descriptives\": {\"rows\": 21613, \"columns\": 21}, \"view_component\": \"Plots\", \"view_file\": \"sframe\", \"view_params\": {\"y\": \"price\", \"x\": \"sqft_living\", \"columns\": [\"id\", \"date\", \"price\", \"bedrooms\", \"bathrooms\", \"sqft_living\", \"sqft_lot\", \"floors\", \"waterfront\", \"view\", \"condition\", \"grade\", \"sqft_above\", \"sqft_basement\", \"yr_built\", \"yr_renovated\", \"zipcode\", \"lat\", \"long\", \"sqft_living15\", \"sqft_lot15\"], \"view\": \"Scatter Plot\"}, \"view_components\": [\"Summary\", \"Table\", \"Bar Chart\", \"BoxWhisker Plot\", \"Line Chart\", \"Scatter Plot\", \"Heat Map\", \"Plots\"], \"type\": \"SFrame\", \"columns\": [{\"dtype\": \"str\", \"name\": \"id\"}, {\"dtype\": \"datetime\", \"name\": \"date\"}, {\"dtype\": \"int\", \"name\": \"price\"}, {\"dtype\": \"str\", \"name\": \"bedrooms\"}, {\"dtype\": \"str\", \"name\": \"bathrooms\"}, {\"dtype\": \"int\", \"name\": \"sqft_living\"}, {\"dtype\": \"int\", \"name\": \"sqft_lot\"}, {\"dtype\": \"str\", \"name\": \"floors\"}, {\"dtype\": \"int\", \"name\": \"waterfront\"}, {\"dtype\": \"int\", \"name\": \"view\"}, {\"dtype\": \"int\", \"name\": \"condition\"}, {\"dtype\": \"int\", \"name\": \"grade\"}, {\"dtype\": \"int\", \"name\": \"sqft_above\"}, {\"dtype\": \"int\", \"name\": \"sqft_basement\"}, {\"dtype\": \"int\", \"name\": \"yr_built\"}, {\"dtype\": \"int\", \"name\": \"yr_renovated\"}, {\"dtype\": \"str\", \"name\": \"zipcode\"}, {\"dtype\": \"float\", \"name\": \"lat\"}, {\"dtype\": \"float\", \"name\": \"long\"}, {\"dtype\": \"float\", \"name\": \"sqft_living15\"}, {\"dtype\": \"float\", \"name\": \"sqft_lot15\"}], \"column_identifiers\": [\"bathrooms\", \"sqft_living15\", \"sqft_above\", \"grade\", \"yr_built\", \"price\", \"bedrooms\", \"zipcode\", \"long\", \"id\", \"sqft_lot15\", \"sqft_living\", \"floors\", \"sqft_lot\", \"date\", \"waterfront\", \"sqft_basement\", \"yr_renovated\", \"lat\", \"condition\", \"view\"]}, \"complete\": 1, \"ipython\": true, \"progress\": 1.0, \"data\": [[2720, 975000], [1530, 469775], [2380, 470000], [960, 660000], [2020, 463000], [890, 180000], [1400, 250000], [3620, 919000], [1700, 429950], [2560, 1085000], [4610, 960000], [3110, 725000], [1570, 356500], [2740, 350000], [950, 490000], [1720, 550000], [2630, 631000], [4350, 2538000], [2160, 550000], [1500, 470000], [2100, 356000], [2520, 450000], [2410, 758000], [1210, 660000], [1010, 429000], [920, 650000], [1320, 201500], [1830, 1250000], [2490, 344200], [1170, 527500], [2830, 627250], [4730, 2453500], [2400, 635000], [700, 400000], [2860, 1200000], [2240, 480000], [2570, 439000], [1870, 565000], [1280, 440000], [1790, 416000], [1880, 405000], [1040, 190000], [4190, 1150000], [940, 234000], [1890, 462000], [730, 378500], [1240, 330000], [3500, 799000], [2300, 395000], [1630, 246000], [1960, 435000], [1400, 214950], [2320, 310000], [870, 475000], [1060, 320000], [4133, 979500], [1380, 235500], [2160, 730000], [960, 490000], [3620, 1015000], [2630, 525000], [2658, 411605], [2519, 505000], [960, 193000], [4620, 575000], [3680, 770000], [2720, 975000], [1594, 269000], [2680, 1160000], [1730, 268000], [930, 255000], [2780, 290000], [1700, 603000], [3200, 699000], [1730, 510000], [700, 152000], [1590, 295000], [3420, 692500], [2240, 905000], [384, 265000], [3370, 1245000], [2630, 585000], [1320, 150000], [1620, 260000], [960, 315000], [2360, 957500], [1380, 540000], [1060, 395350], [3200, 580000], [1950, 415000], [1330, 239000], [3520, 1099880], [1630, 330950], [3520, 870000], [1640, 295000], [2610, 605000], [1330, 347000], [2420, 345000], [2290, 737000], [1020, 485000], [1630, 475000], [840, 220000], [3530, 874150], [1890, 535000], [2790, 1160000], [2506, 350000], [750, 425000], [2650, 615000], [1890, 325000], [1010, 269000], [2310, 597750], [2280, 565000], [3100, 563000], [3440, 855000], [2330, 471000], [2550, 660000], [1520, 292600], [1600, 308550], [1960, 355000], [920, 207000], [3274, 478830], [1780, 450000], [2790, 288000], [3180, 850000], [3640, 925000], [1470, 865000], [1560, 299950], [910, 350000], [2580, 640000], [2090, 300000], [1610, 630000], [2010, 660500], [1260, 282000], [1760, 440000], [2910, 817500], [1460, 360000], [1300, 700000], [1020, 255000], [2360, 390000], [2160, 650000], [1480, 650000], [2640, 885000], [1930, 733000], [1260, 353500], [1970, 396000], [2290, 848000], [2760, 1340000], [1070, 550000], [1060, 450000], [870, 177000], [590, 156000], [1590, 850000], [1490, 263000], [2130, 509000], [2280, 445000], [1710, 332544], [1230, 385000], [2230, 440000], [1800, 958000], [2550, 369000], [3330, 950000], [2740, 438900], [3430, 813000], [2230, 652450], [1610, 338950], [1850, 299000], [2310, 470000], [1650, 371500], [1620, 395000], [2510, 562000], [2040, 315000], [1480, 510000], [1690, 519990], [1130, 204950], [3180, 850000], [1120, 425000], [840, 290000], [960, 265000], [2660, 1444000], [2300, 350000], [1770, 279900], [2350, 747500], [2470, 811000], [1160, 299000], [1560, 400000], [620, 175000], [840, 308500], [5960, 1355000], [950, 330000], [2130, 292000], [2040, 355000], [2170, 296500], [1470, 487000], [1630, 425000], [3020, 805000], [1980, 707000], [1990, 378000], [2100, 270000], [1340, 474905], [1630, 522000], [1800, 365000], [1940, 297000], [1630, 540000], [1980, 659000], [1380, 265000], [1190, 311000], [1740, 339000], [2900, 675000], [900, 180250], [3160, 589900], [1550, 320000], [1800, 542000], [2820, 408000], [1650, 424950], [1640, 405000], [940, 342500], [720, 300000], [2075, 430000], [1850, 287200], [2150, 430000], [1440, 456000], [3940, 933399], [2650, 715000], [1470, 280000], [2656, 495000], [1590, 310000], [1810, 760000], [2060, 415000], [3040, 478000], [1078, 240000], [1110, 399000], [1780, 473600], [2390, 845000], [4180, 1886700], [1060, 415000], [1000, 378000], [2460, 630000], [1540, 324000], [840, 389100], [1530, 640000], [2680, 359999], [2250, 670000], [1430, 455000], [2780, 563500], [3590, 1205000], [1200, 504200], [1700, 570000], [1160, 190000], [1010, 235000], [2990, 559950], [1660, 449000], [1230, 435000], [770, 197000], [3300, 925000], [1640, 250000], [3330, 1095000], [880, 492000], [2360, 690000], [1290, 250800], [2080, 969500], [8020, 3300000], [2940, 589000], [1640, 221000], [2450, 605000], [2220, 435000], [2760, 734200], [2060, 558000], [2140, 525000], [1150, 445000], [2990, 814950], [1440, 509990], [780, 319000], [1890, 282000], [1600, 257700], [2440, 280500], [2860, 390000], [2100, 375000], [2620, 746000], [1110, 625000], [2390, 405000], [1350, 270000], [4420, 909500], [1410, 541000], [1130, 384500], [1660, 279950], [2090, 632500], [1090, 306000], [1270, 254000], [1960, 535000], [2450, 712000], [710, 320000], [1410, 265000], [1270, 196900], [2110, 435000], [1770, 630000], [720, 190000], [1790, 359950], [1830, 249000], [1890, 475000], [1300, 455000], [2190, 280000], [1140, 467000], [1230, 349950], [1830, 347500], [2490, 1081000], [4100, 775000], [1800, 550000], [1690, 462500], [1290, 658000], [1420, 306500], [1770, 215000], [2610, 890000], [1570, 782000], [1220, 765000], [1180, 425000], [1630, 550000], [2060, 418500], [3190, 625000], [2610, 515000], [1870, 419950], [1800, 450000], [2640, 1084500], [1450, 230000], [2340, 850000], [1490, 245000], [2530, 382000], [1480, 336500], [2730, 905000], [2390, 379750], [1140, 230000], [1580, 320000], [2090, 388000], [2500, 680000], [2160, 555000], [2780, 705000], [2420, 597400], [2950, 619000], [1630, 541000], [1800, 321000], [2910, 610000], [1240, 319950], [1810, 268643], [790, 100000], [2080, 898888], [1440, 435000], [3440, 2485000], [3160, 693000], [780, 180000], [1790, 546000], [1650, 575000], [780, 285000], [3650, 835000], [1470, 475000], [2140, 420000], [2680, 622000], [2420, 339900], [2860, 805000], [1840, 809000], [2030, 253779], [2200, 390000], [1170, 360000], [1660, 210000], [1820, 605000], [1560, 181000], [1140, 206000], [3490, 728000], [1070, 401750], [1950, 299000], [1864, 235000], [940, 355000], [1850, 270000], [1650, 355000], [2000, 375000], [1400, 250000], [2250, 615000], [2770, 804000], [1470, 449950], [2310, 327500], [1800, 606400], [3630, 1085000], [1660, 608000], [1890, 437000], [2350, 613500], [1670, 274950], [2460, 276000], [1830, 325000], [2880, 751000], [1510, 449000], [3140, 585000], [1160, 250000], [2780, 478000], [5230, 1987500], [1220, 526000], [1710, 561000], [2710, 600000], [2470, 1205000], [2370, 351000], [1720, 432500], [880, 425000], [1730, 258000], [2120, 615000], [3730, 360000], [2870, 540000], [1380, 329900], [720, 150000], [2370, 870000], [3180, 1236300], [3362, 410500], [3270, 435000], [1160, 340000], [2070, 410000], [1630, 203000], [1770, 660000], [1010, 325000], [1260, 333000], [2020, 519000], [2100, 418800], [2800, 569500], [1860, 280000], [3300, 510000], [1060, 303500], [1240, 355000], [1000, 264950], [3060, 400000], [810, 238950], [2640, 389517], [2190, 310000], [1790, 475000], [1270, 386100], [3060, 710000], [1440, 410000], [1090, 345000], [2610, 1000000], [3230, 569950], [1600, 625000], [1920, 235000], [2540, 317500], [1710, 250500], [1760, 625000], [1830, 532000], [1920, 440000], [860, 345000], [3010, 1240000], [5150, 2140000], [1620, 451000], [870, 406500], [2170, 550000], [2100, 477000], [2520, 508500], [1490, 309000], [1870, 205000], [2370, 750000], [3120, 444900], [1250, 455000], [2260, 489000], [2860, 1200000], [1220, 350000], [1060, 400000], [1780, 509000], [1980, 660000], [2620, 470000], [1960, 564000], [1390, 445500], [1040, 290000], [1780, 252500], [3030, 685000], [1310, 525000], [1350, 259950], [1850, 459950], [2450, 619420], [4590, 960000], [2550, 590000], [1460, 443750], [1120, 290000], [2370, 429900], [980, 133000], [1730, 479000], [1500, 300000], [2010, 500000], [1600, 562000], [2920, 480000], [1400, 343000], [3050, 1030000], [2880, 885000], [1610, 197400], [1480, 275000], [4860, 1067000], [3480, 1505000], [5640, 1999950], [2970, 680000], [1440, 339888], [2090, 543000], [1300, 232000], [3550, 859000], [1450, 680000], [900, 329950], [1020, 187000], [3080, 299000], [1820, 385000], [2830, 540000], [2500, 455000], [2270, 435000], [2470, 898000], [850, 490000], [1900, 305000], [1970, 435000], [2540, 365000], [1120, 380000], [900, 249500], [3480, 532000], [2430, 585000], [2590, 537250], [680, 365000], [1090, 425000], [1300, 360000], [1100, 300000], [2880, 725000], [1670, 246950], [2750, 375000], [1020, 322500], [2130, 686000], [2020, 300000], [1480, 511200], [1240, 436000], [1890, 423500], [4560, 1100000], [4850, 569950], [2030, 395000], [4360, 1258000], [2495, 436000], [1780, 270500], [3360, 834995], [1650, 445000], [910, 169900], [2830, 527000], [2040, 506000], [760, 190000], [1600, 329000], [2620, 749950], [1970, 346900], [1820, 650000], [2640, 383000], [2807, 630000], [2750, 430000], [2780, 479950], [1810, 290000], [3180, 1325000], [1780, 615000], [1150, 115000], [2900, 520000], [2460, 930000], [1061, 223990], [3490, 915000], [4960, 1355000], [1490, 390000], [2180, 825000], [3380, 865000], [1280, 575000], [2020, 840000], [2670, 605000], [1080, 115000], [1540, 221000], [1010, 354000], [2980, 374900], [1980, 299950], [2450, 359950], [2980, 515000], [3960, 805000], [3310, 785000], [780, 146000], [830, 222000], [1830, 554000], [1370, 170000], [5660, 955000], [3720, 780000], [2960, 1340000], [4300, 1130000], [870, 271310], [1820, 365000], [3600, 461000], [4420, 1325000], [2540, 265000], [2080, 550000], [1030, 410000], [1350, 299999], [1690, 405000], [1420, 410000], [1620, 580000], [3980, 970000], [1290, 492000], [1552, 282000], [2390, 420000], [2420, 920000], [5440, 850000], [2010, 471000], [1770, 315000], [2110, 400000], [2330, 585000], [1330, 245000], [1660, 324500], [2780, 645000], [1230, 440000], [2040, 355000], [830, 200000], [910, 223000], [2820, 635000], [1800, 583000], [2010, 364000], [1850, 359782], [3100, 975000], [1100, 340000], [1260, 437500], [2890, 1570000], [1860, 445000], [2130, 802000], [2030, 225000], [2220, 480000], [2950, 992000], [1750, 434000], [1690, 455800], [2560, 255000], [2310, 592100], [1490, 354901], [1370, 390000], [3120, 750000], [3480, 808000], [1410, 481450], [840, 218450], [1000, 220000], [3316, 490600], [1670, 295000], [3400, 825000], [1450, 375000], [1780, 336000], [1550, 545000], [1810, 649000], [3580, 725000], [3860, 430000], [1710, 405000], [5040, 2532000], [1620, 780000], [1300, 310000], [2750, 1050000], [1660, 315000], [1920, 330000], [2030, 295000], [2730, 510000], [1820, 227490], [1200, 263700], [1530, 452000], [2100, 277950], [1790, 420000], [1260, 245000], [890, 587750], [1350, 450000], [1600, 995000], [2470, 609000], [3260, 755000], [1410, 429000], [2620, 635000], [1400, 348125], [3361, 471275], [1460, 201000], [1790, 560000], [1950, 225000], [1600, 400000], [1820, 275000], [1570, 335000], [2200, 410000], [870, 313000], [1530, 439000], [1960, 250000], [860, 370000], [1850, 469500], [1410, 439900], [3110, 435000], [1770, 514000], [2000, 810000], [2340, 485000], [3410, 815000], [1800, 474900], [1970, 406500], [1420, 523950], [1810, 407450], [770, 176000], [1270, 212500], [1010, 174500], [820, 370000], [750, 300000], [1900, 459000], [990, 275000], [2430, 459950], [1090, 152500], [1300, 240000], [2260, 540000], [4800, 1079000], [4270, 1300000], [1240, 380000], [1430, 375000], [3600, 660000], [1380, 795000], [2770, 685000], [2050, 370228], [2130, 320000], [880, 130000], [3150, 450000], [1340, 409500], [2420, 365000], [2200, 355000], [1170, 336750], [1860, 730000], [4390, 1278000], [2010, 360000], [1740, 447000], [2220, 641250], [1080, 350000], [2540, 652500], [2660, 452000], [2030, 305000], [2170, 442250], [1060, 315000], [2430, 940000], [1250, 340000], [1240, 345000], [3830, 989000], [2420, 441500], [2020, 290000], [2720, 1020000], [2150, 639000], [4230, 985000], [1700, 545000], [1560, 380000], [1670, 660000], [1780, 200000], [1340, 621000], [1260, 120000], [1220, 260000], [1980, 736000], [2223, 320000], [1220, 875000], [4260, 635000], [2680, 612000], [3700, 950000], [1300, 415000], [1320, 352000], [1820, 330000], [3910, 1999000], [2170, 449950], [1850, 230000], [2080, 549950], [2500, 411000], [1380, 455000], [1410, 530000], [1260, 380000], [2310, 450000], [3900, 1998000], [1520, 405000], [1830, 572000], [1230, 195500], [3190, 1700000], [2380, 549950], [2020, 488000], [3230, 715000], [1770, 455000], [1290, 255000], [2810, 368500], [1220, 201000], [1060, 205000], [2370, 510000], [3730, 775000], [1310, 225000], [1320, 269950], [1560, 566000], [1100, 450000], [2800, 440000], [2880, 499950], [1520, 450000], [1550, 328000], [1340, 389000], [1252, 352000], [1660, 277500], [1750, 199000], [1140, 340000], [900, 470000], [1510, 254000], [1590, 373000], [2580, 650000], [1310, 420000], [2300, 500000], [1990, 302500], [900, 185000], [990, 210000], [1450, 521500], [2100, 280000], [1700, 148900], [890, 154500], [1425, 467000], [1810, 579950], [2710, 1200000], [1950, 497950], [1250, 275000], [1590, 484950], [2780, 674250], [3870, 1299888], [2200, 339950], [1520, 425000], [1360, 296000], [1670, 490000], [2510, 410000], [3310, 838000], [1520, 530000], [2510, 860000], [1050, 445000], [2020, 397000], [2730, 535000], [2950, 599000], [1310, 575000], [1960, 204250], [2160, 220000], [1200, 150000], [1330, 375000], [2090, 355900], [3010, 1100000], [1810, 287500], [2190, 350000], [2090, 865000], [1720, 450000], [1530, 231000], [2490, 616000], [3000, 490500], [1610, 415000], [1070, 510000], [2520, 415000], [1490, 350000], [2150, 275000], [3520, 1500000], [1040, 217500], [1460, 499922], [1420, 579950], [1260, 188200], [2500, 705000], [1540, 565000], [2641, 393000], [1790, 397000], [1650, 267500], [1450, 205000], [1200, 420000], [1290, 305000], [3040, 598780], [2400, 535000], [2540, 567000], [1050, 287653], [1060, 506000], [1030, 570000], [3000, 850000], [1580, 749000], [3250, 570000], [760, 261590], [2600, 415000], [2410, 899100], [1420, 364000], [3320, 1108000], [2300, 342000], [2530, 435000], [2050, 280000], [1010, 441500], [2130, 780000], [3390, 460000], [940, 347000], [3860, 880000], [2350, 459995], [2710, 359000], [1530, 649000], [2520, 750000], [2530, 375000], [2010, 530000], [2790, 340000], [1910, 390000], [1420, 192500], [3940, 725000], [1920, 719521], [1840, 265000], [2120, 530000], [1300, 281000], [2330, 453000], [3670, 883000], [2120, 289500], [2080, 830000], [1890, 825000], [1940, 480000], [3290, 965000], [1820, 1010000], [1840, 285000], [1750, 270000], [3540, 707000], [2340, 339900], [1180, 435000], [3760, 851000], [2400, 425000], [2280, 500000], [1400, 190000], [1720, 348000], [1200, 229000], [1450, 375000], [2620, 1030000], [4040, 765000], [2860, 540000], [1340, 215000], [1600, 340000], [1060, 202950], [1350, 304500], [2210, 590000], [1650, 485000], [1480, 475000], [3810, 966000], [1060, 420000], [840, 135000], [2660, 886000], [1410, 350000], [1730, 435000], [2590, 796500], [2230, 398750], [3990, 570000], [2300, 525000], [2050, 355000], [2120, 550000], [3250, 463828], [2390, 432500], [1360, 330000], [1840, 499000], [2500, 575000], [2690, 790000], [2190, 294000], [740, 114975], [5300, 1150000], [1340, 339950], [3490, 759000], [1630, 185850], [3440, 455000], [2440, 500000], [1230, 195000], [1050, 420000], [1860, 380000], [1550, 430000], [2240, 363990], [900, 496000], [1990, 537000], [2220, 467000], [1900, 340000], [940, 340500], [1990, 273000], [2740, 407500], [2050, 540000], [2180, 490000], [3470, 994000], [2620, 499431], [1190, 235000], [1360, 370000], [1990, 290000], [1830, 270000], [4110, 925000], [1010, 385000], [1740, 299950], [2460, 818000], [1170, 145600], [1950, 485000], [2420, 675000], [1160, 168000], [2920, 730000], [1350, 335000], [2538, 335606], [1210, 582000], [3470, 455000], [2930, 754842], [2790, 765000], [1660, 265000], [1460, 705000], [1710, 230000], [2240, 432500], [3310, 770000], [1990, 435000], [1070, 550000], [2740, 378800], [2530, 830000], [1140, 450000], [1230, 220000], [2770, 550000], [2140, 354450], [2240, 349950], [1260, 275000], [2130, 316000], [2890, 1100000], [1610, 275000], [3830, 710000], [2300, 560000], [1970, 445000], [1260, 295000], [1300, 422500], [960, 320000], [3580, 879900]], \"columns\": [{\"dtype\": \"str\", \"name\": \"id\"}, {\"dtype\": \"datetime\", \"name\": \"date\"}, {\"dtype\": \"int\", \"name\": \"price\"}, {\"dtype\": \"str\", \"name\": \"bedrooms\"}, {\"dtype\": \"str\", \"name\": \"bathrooms\"}, {\"dtype\": \"int\", \"name\": \"sqft_living\"}, {\"dtype\": \"int\", \"name\": \"sqft_lot\"}, {\"dtype\": \"str\", \"name\": \"floors\"}, {\"dtype\": \"int\", \"name\": \"waterfront\"}, {\"dtype\": \"int\", \"name\": \"view\"}, {\"dtype\": \"int\", \"name\": \"condition\"}, {\"dtype\": \"int\", \"name\": \"grade\"}, {\"dtype\": \"int\", \"name\": \"sqft_above\"}, {\"dtype\": \"int\", \"name\": \"sqft_basement\"}, {\"dtype\": \"int\", \"name\": \"yr_built\"}, {\"dtype\": \"int\", \"name\": \"yr_renovated\"}, {\"dtype\": \"str\", \"name\": \"zipcode\"}, {\"dtype\": \"float\", \"name\": \"lat\"}, {\"dtype\": \"float\", \"name\": \"long\"}, {\"dtype\": \"float\", \"name\": \"sqft_living15\"}, {\"dtype\": \"float\", \"name\": \"sqft_lot15\"}]}, e);\n", " });\n", " })();\n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "graphlab.canvas.set_target(\"ipynb\")\n", "sales.show(view=\"Scatter Plot\", x=\"sqft_living\", y=\"price\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a simple regression model" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "training_data, testing_data = sales.random_split(0.8, seed=0)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROGRESS: Linear regression:\n", "PROGRESS: --------------------------------------------------------\n", "PROGRESS: Number of examples : 17384\n", "PROGRESS: Number of features : 1\n", "PROGRESS: Number of unpacked features : 1\n", "PROGRESS: Number of coefficients : 2\n", "PROGRESS: Starting Newton Method\n", "PROGRESS: --------------------------------------------------------\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: | Iteration | Passes | Elapsed Time | Training-max_error | Training-rmse |\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: | 1 | 2 | 1.015003 | 4349521.926170 | 262943.613754 |\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: SUCCESS: Optimal solution found.\n", "PROGRESS:\n" ] } ], "source": [ "sqft_model = graphlab.linear_regression.create(training_data, target='price', features=['sqft_living'],validation_set=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate the model" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "543054.042563\n" ] } ], "source": [ "print testing_data[\"price\"].mean()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'max_error': 4143550.8825285938, 'rmse': 255191.02870527358}\n" ] } ], "source": [ "print sqft_model.evaluate(testing_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualize the predictions" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[,\n", " ]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZ0AAAEACAYAAABoJ6s/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuYVdWZ5/HvC0UJohRVppARQyCtRo0mggLtpdMFiihO\nq+kZ0cwkQGI3PYk9yXR3RmSSCOTWYk8S7Uk0wxMTxUSRNuPIjEQIkZpOR4iYYOMFgVw0VBlRuZRt\ni8rlnT/2Opx1dp1z6tTlXKrq93me87Br7fuuYr9nrfXutc3dERERqYQh1T4AEREZPBR0RESkYhR0\nRESkYhR0RESkYhR0RESkYhR0RESkYroMOmZ2mpltMbNfhn87zOzTZtZoZuvMbLuZrTWzhmidRWa2\n08y2mdmlUflkM9tqZjvM7LaovN7MVoZ1NprZ+GjevLD8djObG5VPMLNNYd79ZlbXN5dERETKpcug\n4+473H2Su08GzgX+FXgIuAlY7+7vAx4DFgGY2ZnAHOAM4HLgDjOzsLk7gevd/TTgNDObFcqvB/a6\n+6nAbcCtYVuNwM3AFGAasDgKbsuAr4Vt7Q/bEBGRGtbd5rVLgF+7+y7gKuCeUH4PcHWYvhJY6e6H\n3P0FYCcw1czGAse7++aw3IponXhbDwIzwvQsYJ27d7j7fmAdcFmYNwP4YbT/D3fzXEREpMK6G3Su\nBe4L0ye6+24Ad38ZGBPKxwG7onXaQ9k4oC0qbwtlOeu4+2Ggw8yaCm3LzE4A9rn7kWhbJ3XzXERE\npMJKDjpmNoykFvMPoSg9fk5fjqdjXS9S0jIiIlJDutP5fjnwC3d/Lfy828xOdPfdoenslVDeDrw7\nWu/kUFaoPF7nJTMbCoxy971m1g60pNbZ4O57zKzBzIaE2k68rRxmpsHlRER6wN37/Mt9d5rXPgLc\nH/28GpgfpucBD0fl14WMtInAKcAToQmuw8ymhsSCual15oXpa0gSEwDWAjNDgGkEZoYygA1h2fT+\nO3F3fdxZvHhx1Y+hVj66FroWuhbJp63NmT07+bS1ZcvLpaSajpkdS5JEsCAqXgasMrNPAC+SZKzh\n7s+Z2SrgOeAg8CnPnsENwN3AcGCNuz8ayu8C7jWzncAe4LqwrX1m9iXgSZLmu6WeJBRAkj23Mszf\nErYhIiLdsGABrFmTnX7kkfLur6Sg4+5vAs2psr0kgSjf8n8L/G2e8l8AZ+cpf5sQtPLMu5skUKXL\nf0uSRi0iIv2EHqgcRFpaWqp9CDVD1yJL1yJrMF6L5cuTGk5mutysnG13tcDMfKCfo4hIXzMzvMqJ\nBCIiIr2ioCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMi\nIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWj\noCMiIhWjoCMiIhVTUtAxswYz+wcz22Zmz5rZNDNrNLN1ZrbdzNaaWUO0/CIz2xmWvzQqn2xmW81s\nh5ndFpXXm9nKsM5GMxsfzZsXlt9uZnOj8glmtinMu9/M6np/OUREpJxKrencDqxx9zOADwLPAzcB\n6939fcBjwCIAMzsTmAOcAVwO3GFmFrZzJ3C9u58GnGZms0L59cBedz8VuA24NWyrEbgZmAJMAxZH\nwW0Z8LWwrf1hGyIiUsO6DDpmNgr4I3f/HoC7H3L3DuAq4J6w2D3A1WH6SmBlWO4FYCcw1czGAse7\n++aw3IponXhbDwIzwvQsYJ27d7j7fmAdcFmYNwP4YbT/D5d81iIiUhWl1HQmAq+Z2ffM7JdmttzM\njgVOdPfdAO7+MjAmLD8O2BWt3x7KxgFtUXlbKMtZx90PAx1m1lRoW2Z2ArDP3Y9E2zqplBMWEZHq\nKaUfpA6YDNzg7k+a2TdImtY8tVz6596wrhcpaRkAlixZcnS6paWFlpaW7h+RiMgA1traSmtra9n3\nU0rQaQN2ufuT4ecfkgSd3WZ2orvvDk1nr4T57cC7o/VPDmWFyuN1XjKzocAod99rZu1AS2qdDe6+\nJyQ3DAm1nXhbncRBR0REOkt/IV+6dGlZ9tNl81poQttlZqeFoouBZ4HVwPxQNg94OEyvBq4LGWkT\ngVOAJ0ITXIeZTQ2JBXNT68wL09eQJCYArAVmhgDTCMwMZQAbwrLp/YuISI0y965bxczsg8B3gGHA\nb4CPA0OBVSQ1lBeBOaGzHzNbRJJNdhD4jLuvC+XnAncDw0my4T4Tyo8B7gUmAXuA60ISAmY2H/gc\nSfPdl919RSifCKwEGoEtwEfd/WCeY/dSzlFERLLMDHcvuRuj5O0O9Buygo6ISPeVK+hoRAIREakY\nBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0R\nEakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakYBR0REakY\nBR0REakYBR0REakYBR0REamYkoKOmb1gZv9sZlvM7IlQ1mhm68xsu5mtNbOGaPlFZrbTzLaZ2aVR\n+WQz22pmO8zstqi83sxWhnU2mtn4aN68sPx2M5sblU8ws01h3v1mVtfbiyEiIuVVak3nCNDi7pPc\nfWoouwlY7+7vAx4DFgGY2ZnAHOAM4HLgDjOzsM6dwPXufhpwmpnNCuXXA3vd/VTgNuDWsK1G4GZg\nCjANWBwFt2XA18K29odtiIhIDSs16FieZa8C7gnT9wBXh+krgZXufsjdXwB2AlPNbCxwvLtvDsut\niNaJt/UgMCNMzwLWuXuHu+8H1gGXhXkzgB9G+/9wieciIiJVUmrQceDHZrbZzP4slJ3o7rsB3P1l\nYEwoHwfsitZtD2XjgLaovC2U5azj7oeBDjNrKrQtMzsB2OfuR6JtnVTiuYiISJWU2g9yobv/3sya\ngXVmtp0kEMXSP/eGdb1IScsAsGTJkqPTLS0ttLS0dP+IREQGsNbWVlpbW8u+n5KCjrv/Pvz7qpn9\nb2AqsNvMTnT33aHp7JWweDvw7mj1k0NZofJ4nZfMbCgwyt33mlk70JJaZ4O77zGzBjMbEmo78bY6\niYOOiIh0lv5CvnTp0rLsp8vmNTM71syOC9MjgUuBp4HVwPyw2Dzg4TC9GrguZKRNBE4BnghNcB1m\nNjUkFsxNrTMvTF9DkpgAsBaYGQJMIzAzlAFsCMum9y8iIjXK3Iu3ioXA8RBJ81kd8AN3vyX0uawi\nqaG8CMwJnf2Y2SKSbLKDwGfcfV0oPxe4GxgOrHH3z4TyY4B7gUnAHuC6kISAmc0HPhf2/2V3XxEd\n10qgEdgCfNTdD+Y5fu/qHEVEJJeZ4e4ld2OUvN2BfkNW0BER6b5yBR2NSCAiIhWjoCMiIhWjoCMi\nIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWj\noCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMiIhWjoCMi\nIhWjoCMiIhVTctAxsyFm9kszWx1+bjSzdWa23czWmllDtOwiM9tpZtvM7NKofLKZbTWzHWZ2W1Re\nb2YrwzobzWx8NG9eWH67mc2NyieY2aYw734zq+vNhRARkfLrTk3nM8Bz0c83Aevd/X3AY8AiADM7\nE5gDnAFcDtxhZhbWuRO43t1PA04zs1mh/Hpgr7ufCtwG3Bq21QjcDEwBpgGLo+C2DPha2Nb+sA0R\nEalhJQUdMzsZmA18Jyq+CrgnTN8DXB2mrwRWuvshd38B2AlMNbOxwPHuvjkstyJaJ97Wg8CMMD0L\nWOfuHe6+H1gHXBbmzQB+GO3/w6Wci4iIVE+pNZ1vAP8V8KjsRHffDeDuLwNjQvk4YFe0XHsoGwe0\nReVtoSxnHXc/DHSYWVOhbZnZCcA+dz8SbeukEs9FRESqpMt+EDO7Atjt7k+ZWUuRRb3IvO6yrhcp\naRkAlixZcnS6paWFlpaW7h+RiMgA1traSmtra9n3U0rn+4XAlWY2GxgBHG9m9wIvm9mJ7r47NJ29\nEpZvB94drX9yKCtUHq/zkpkNBUa5+14zawdaUutscPc9ZtZgZkNCbSfeVidx0BERkc7SX8iXLl1a\nlv102bzm7v/N3ce7+3uB64DH3P1jwP8B5ofF5gEPh+nVwHUhI20icArwRGiC6zCzqSGxYG5qnXlh\n+hqSxASAtcDMEGAagZmhDGBDWDa9fxERqVG9STO+BVhlZp8AXiTJWMPdnzOzVSSZbgeBT7l7punt\nBuBuYDiwxt0fDeV3Afea2U5gD0lww933mdmXgCdJmu+WhoQCSLLnVob5W8I2RESkhlk2HgxMZuYD\n/RxFRPqameHuJfedl0ojEoiISMUo6IiISMUo6IiISMUo6IiISMUo6Ij0M+3tcMUVyae94NNpIrVJ\n2Wsi/cwVV8CaNcn07NnwyCPVPR4ZmJS9JiIi/Z5qOiL9THs7LFiQTC9fDuPGFV9epCfKVdNR0BER\nkU7UvCYiIv2ego6IiFSMgo6IiFSMgo5IpBaegSl2DLVwfCK9oUQCkUhvn4Hpi8yyYsegZ3SkUpRI\nINIPLFiQBIU1a7LBR0SyevMSN5EBZ/ny3JpKrR1DLRyfSG+oeU2kD+nBTRko9HBoDynoiIh0n/p0\nRESk31PQERGRilHQERGRilHQERGRilHQEZGq0QgLg0+XQcfMjjGzn5vZFjN71sy+GsobzWydmW03\ns7Vm1hCts8jMdprZNjO7NCqfbGZbzWyHmd0Wldeb2cqwzkYzGx/NmxeW325mc6PyCWa2Kcy738z0\nzJFIP6OHaQefLoOOu78NTHf3ScAHgBlmdiFwE7De3d8HPAYsAjCzM4E5wBnA5cAdZpZJu7sTuN7d\nTwNOM7NZofx6YK+7nwrcBtwattUI3AxMAaYBi6Pgtgz4WtjW/rANERGpYSU1r7n7m2HymLDOPuAq\n4J5Qfg9wdZi+Eljp7ofc/QVgJzDVzMYCx7v75rDcimideFsPAjPC9Cxgnbt3uPt+YB1wWZg3A/hh\ntP8Pl3IuIlI7li9PxpCbPVsjLAwWJTVJmdkQ4BfAHwDfdvfnzOxEd98N4O4vm9mYsPg4YGO0enso\nOwS0ReVtoTyzzq6wrcNm1mFmTXF5vC0zOwHY5+5Hom2dVMq5iEjtGDdOg5YONiUFnXBzn2Rmo4C1\nZtYCpB/z78vH/kt5CrbkJ2WXLFlydLqlpYWWlpbuH5EMehriRgay1tZWWltby76fbnW+u/vrZrYG\nOA/YnanthKazV8Ji7cC7o9VODmWFyuN1XjKzocAod99rZu1AS2qdDe6+x8wazGxICIjxtjqJg45I\nT2U6vTPT6cE3FYSkP0t/IV+6dGlZ9lNK9tq7Mp33ZjYCmAlsAVYD88Ni84CHw/Rq4LqQkTYROAV4\nwt1fBjrMbGpILJibWmdemL6GJDEBYC0wMwSYxrDvtWHehrBsev8iFaHMK5HuK6Wm82+Ae0KgGALc\n6+4/MbMtwCoz+wTwIknGGqG/ZxXwHHAQ+FQ04uYNwN3AcGCNuz8ayu8C7jWzncAe4LqwrX1m9iXg\nSZLmu6UhoQCS7LmVYf6WsA2RsknXbBRoRLpPo0yL9JD6eGQg06sNekhBR0Sk+/RqAxER6fcUdGTQ\n0DhfIvm9/vbrHDn62GN5KejIoNGbbDMFLBlI3jn8Dt984puM/OpIbKnRcEsD6369riL71iCZIiVI\nP6Ojp+ilv/nxr3/Mjetv5KmXn8op/9R5n+ILf/wFxh43tiLHoaAjg0Y65bkvVSOTTdlzUszOPTv5\n/IbPs+rZVTnll7z3Em65+BbOPencqhyXstekptXKjbWr47jiimxNaPbsytSEqrFPqV2vv/06f/ez\nv+PLP/1yTvn4hvHcesmtXPP+axhipfeolCt7TTUdqWm10qylgSml1hw+cpj7nr6PG9ffyMtvvJwz\n74stX+Svzv8rjqs/rkpHV5iCjgi9r1GVs+mulvYp1bWpbRML1y/kH1/8x5zyj37go3yx5YtMbJxY\npSMrnZrXpKZVqnlNTVVSi9pfb2dJ6xK+s+U7OeVTx01l2SXLaJnQUrZ9q3lNBiU1a8lgcuDgAb75\nxDe5cf2NOeWjh4/m1ktu5eOTPk7dkP5921ZNR4TaSViQwcXdWb19NTeuv5Ede3bkzPvs+Z/lpotu\n4oRjT6jKsammI5JHb4OFgo1U2jOvPMOinyzi/+74vznlV77vSr4y4yucNeasKh1ZZaimI/1ab/ti\n1Jcj5bbnzT189adf5eubvp5Tfvq7TmfZJcv4k9P+hOTNMbVFNR2RKlBNSLrr4OGDfHfLd1m4fiEd\nb3ccLR9iQ7j1klu5YeoNDK8bXsUjrC7VdKRfK3fzmmpCUooNv93AwvUL2fzS5pzyBZMXcPMf38y4\nUf3v24pqOiJ59Da7Tdlx0hO/3fdbPr/h89z39H055S0TWrjl4luYdvK0Kh1Z7VNNR2patZu3erL/\nah+z9L033nmDr2/8OotbF+eUn3T8SSy7ZBkfOesjDB0ytEpHVx56c2gPKej0b/2xeas/HrPkOuJH\neOCZB1i4fiG7Xt+VM+8LH/oCn73gs4w6ZlSVjq4y1LwmUqLe1DSefDIJFJAEjvPO6/7+DxzIP10t\nqnmV5smXnuSm9Tfxk9/+JKf82vdfy5emf4lTTzi1Skc2sKimIxXXnZtgZtm33gJ3GDGi63V6U9MY\nMwZefTWZbm6GV14pbb34nPbvh8cfT6ZnzICf/KTwepWgmld+L7/xMl/8f1/kzifvzCmfNHYSyy5Z\nxsw/mFmlI6sNqunIgNGdkaMzHf3xjbM7o01v3pysW+5v+PE5NTdny4f3YWasaiy98/aht7nzyTtZ\nuH4h7xx+52j5yGEjWXbJMv783D+nfmh9FY9wcFDQkQEnM/ry5s1JrSXzeupSAtWaNbnNaz1x9tnZ\nYNOXoz/39DUPg3U0anfnR7/6EQvXL+SZV57JmffpqZ/mcx/6HGNGjqnS0Q1i7l70A5wMPAY8CzwN\nfDqUNwLrgO3AWqAhWmcRsBPYBlwalU8GtgI7gNui8npgZVhnIzA+mjcvLL8dmBuVTwA2hXn3A3UF\njt+ltrS1uc+enXza2sqzTlube3Oze9Iol6xXzuPsyTl11+zZvTufwWDbq9v8Tx/4U2cJOZ/Lv3+5\nb/n9lmofXr8S7p1dxojufkoJOmOBc8L0ceHmfzqwDLgxlC8EbgnTZwJbSGpRE4Bfke07+jkwJUyv\nAWaF6U8Cd4Tpa4GVng1svwYagNGZ6TDvAeCaMH0n8BcFjr8svxCpbfENurm5Z4Gg1m7ylQhs/c3e\nN/f6wh8v7BRk3nv7e/3BZx/0I0eOVPsQ+61yBZ0um9fc/WXg5TD9hpltC7Wfq4A/DovdA7QCNwFX\nhqBxCHjBzHYCU83sReB4d888srsCuJqklnQVkEmAfxD4H2F6FrDO3TsAzGwdcFkIODOAj0T7XwL8\nz67ORyqjlvofpkwZGP0ftfYgazV+x4ePHGbFP6/gxvU38tqbr+XM++qMr/LpaZ9mZP3I8h+I9Fi3\n+nTMbAJwDkmz1onuvhuSwGRmmcbRcSRNZBntoewQ0BaVt4XyzDq7wrYOm1mHmTXF5fG2zOwEYJ+7\nH4m2dVJ3zkXKq9qvme6Lfox4GzfckGS2Qc9TqQeaSv2OH/3Vo3zpH7/E47sezymff858lrYsZXzD\n+PLsWMqi5KBjZseR1EI+E2o86TzkvsxLLiVNr+RUviVLlhydbmlpoaWlpftHJDWlq2/ZfVEriLcR\np1LPnl16KrV0389+9zMu+t5FncrPG3MB37hiGReN7zxPeq+1tZXW1tay76ekoGNmdSQB5153fzgU\n7zazE919t5mNBTL/DduBd0ernxzKCpXH67xkZkOBUe6+18zagZbUOhvcfY+ZNZjZkFDbibfVSRx0\npDLy1TT6sjmm2jWpvlJLzZDd1VdZcXve3MMf/P0f5IzInNH4+h+x7/b1cLieMbPhok/2fD9SXPoL\n+dKlS8uyn1JrOt8FnnP326Oy1cB8koSCecDDUfkPzOwbJM1jpwBPuLuHZrOpwGZgLvD30TrzSBIN\nriHJloOkv+crZtYADAFmkvQbAWwIyz6Q2r/UgHw1jVICRS3ehNvb4dRToaMDRo7seSp1Pv05ePa0\nNnnEjzD3obn84Okf5J3/m0//homNE4HwfNbh3hyl1JyuMg2AC4HDwFMkWWm/JOnMbwLWk2SzrQNG\nR+ssIslaS6dMn0uSdr0TuD0qPwZYFco3AROiefND+Q5yU6YnkgSpHSSBZ1iB4+99GoeUrFiGVSnZ\nYKVmjHWVyVVqplcpy5Uzi63WMuTKZcVTKzplmGU+D217qOB6ytirHsqUvaZhcKRPxSMHTJ+eDFsD\n2eaXrmox8frNzbBlS7Jcd2tA8XZmzMh9WDNet5QhYi6+GB57LLutvhzWphZrdn1h26vbOPOOM/PO\n+89T/zO3X3Z7Tb4tU7I0DI70Oz/7GbwTRhvJNB0Vao6Jx1hraoK9e5OO+8x6vWmGevrpbBJAqevG\nwSAetLOvv7/UWhp0T3W81cHoZaPzzps4eiJP/aenBvyozFIaBR3pU+khaNIKfbMvNHZZb48DkqCx\nYUMynR6LrVBneKHjydTcBGxp4S/BW/5iC+eMPaeCRyP9hZrXpCwKNZOVUp5ulhs3rnevHMgEujgQ\ndjXacqnNc4PJZ9d9lq9t/FreedPGTWPTn22q8BFJOeklbj2koFMd6RoN5K8BZW7+8fJLl8Lixdl1\n00GplOH58+1/0qTSg05/6msp17Fubt/M1O9MLTj/7c+/XZOjMven310tU9DpIQWd2hAHjWHD4ODB\nZDpfx3w6wCxfnhswmpuToW2K3VDSNSrIXT9Tw+qtWrjB9dX7ct469BYjvlK4/bC/NJnp/UF9o1xB\nZ0hfb1CkK8cdl53esiW5SbQXfLQ3ualnAkZ9fe7rCkrx6qu5tavujsXW3p4cY77jzPT9rFmTBMau\nzqUvFDue7jr9m6djSw1bap0Czuf+6HP4Yj/66Q8BR/qBcuRh19IHPadTcfGzFZs3J/9On+4+Y0Z2\nOvNsSr6RoNPPZqRHjC5l9OjMevX12eXr63v2vEehZ2nSr0+o1PM26eNpa0uuaXNzco2Lnd+3N3+7\n4PMyLKHbx1+Lz9HU4jH1R1Tr1Qb9/aOg0z3p/7D5/gMX+0+dvhHne6fN5s25waCrm3U6iMXbbGpK\nfp4+Pf8NJg5w06d379wzCgWduDw+n74KOvmC9+zZSWBJ76vQMb6w74WiQWbmn+ztVdAcLA+3Dkbl\nCjpKmZYc6edhoPPzMcWemYmbwgpZvDj7/E4p0s+yTJmS3f/evcm/GzbAxz6WfYgz4957Sx8frNB5\nlTLG2IUXdn4Qtrfi44kTMKZPz2bydd6Xs2bqEKzAsFkfePoR3rUvWdkdnopeqBn3lYmUi4LOAFUL\nHdzNzclNM85ES2tqgg9+MLlhl3qzywSB9etzg9czz3ReNn4WZ8GCnl2LQttIB6PeXuN8GXf5jBiR\nG4SvffBa1kxdBfkSzZ6/Clb+b5qbk6C1Nc8ipSRmFDJYX4UtvVCO6lMtfRikzWs9bfbobfNapvmr\nuTmZLmUfPZXuG5oxI/8+8jVHdfe8KtGMlO67mjEjOcd089r3Nz1atMns8JHDebeXbkZTs5gUg/p0\nFHRKle5X6enAmT0JDpVs4y/WgV7ohttVH0i87UzAOv/8vuuzaWtLtpmvHyo+pnhf+w/sLxpkfr33\n1wX3VSihIw5o6myXfBR0FHRKlr7hxt+Su/MNvicBJF4nk62W3m8lsovi42hq6lxjaGrKnZ+u8cTz\n40+hjLliwaTQcaWva+a6NDe784W6gkHmmz//Zt9fMJEUBR0FnZKlg0VPA0s6eJUSPOKyuPkrvlmn\nA0L6Rl0oKGXK4/TrQrWzGTNyA0fm3PKla6eXiZvjSm2KKiWYpJv64uX+evWSgkHmPd94T9FrLlIO\nCjoKOiVL35yLNbWV0rxWaP2uakKFbrD5mpEy89NNg/X12aazfOulax5dNauV0rdRaJmmpp7VYOJ5\nmd9J4/u2Fm0ym3XFm3mDeanNpiK9Va6go+y1ASiTYpweCiaToZTOkkqPfXbDDTB/Phw6BGec0Xn7\n8QCaGZs3J+WZd9/MnQs//Wn+4ys0EvW+fbnD3UCSnfbYYzBnDuzc2Xlbr76azPvZzzrPO+usbBrz\n0qXJ9TicegvlqFHJ4KHDh2ezr846KzsqdUMDnHtuMn/p0sLZasuXJ+f89NPJW0YPHEiG+DFLyhhy\nEG6uZ0P+S8L5zz3OxlXnA8nvam3qVQxx+nQtqoVsSeknyhHJaunDIKzpZJTyUGO+2ke+Bzfj5rVC\nyxaryeTrCynWd5L+xPsZNqzzvHib+WpucbPa0KHZ6Qsu6HzdCjUbFqplFBxB4c+mFazJ/OUjf1lw\nn/Gx5rum+Zo6q62SCSRSGaimI92VfoYiXw2lVFOmdD1wYqa2k2YG+/fDBz4Aa9dmX0swbhycc07n\nBzoheX7n4EH4l39Jfj722OwzOQcP5g4aetxxhUe1hqSGEteEjhzJTj//fFIDyqw3blz+F6ulH3qN\n38tztBZy9g84+TsfTZ6XyfPMjC92IHusV3wr/z7znUtfPxMkUjXliGS19GEQ13TSin1bjr9pP/JI\nMr+xMakJFOrQLzTuWFfZX7FCHfvpRABwN8utocTPA6XPrdBQPOB+/PHZ6cbG0r6h563dHfdS0X6Z\nCy/dnTfhoRK1gkonHCjBYeBBNR0ppNT29Pi1y2+/nXxbnzsXVqzo/E17ypTO22tvT/pPnnwSRo5M\n+nvSQ96sW5cMR1OoBpP29tudy+rq8q/rnowyfcklnV/u1tGRXS59TK+9lvvzqafC2LHJdPxW0Yx8\n13P5cvjzBc6Ppg6h4Ag+//AAPDsn+9K3Olj+3dJrJX3ZL9Kb13v3xEB57bZUQDkiWS19GAQ1nVK/\nORdKBS756TiQAAASFElEQVQ1jTrdV5N5viVfraaxsXPf0LBhSS0qll63rs599Oj8xxn3DWW+Vcfr\n19fn748qVNuKn62ZPDnZb329+8iR2WX/zYI/L1iTafrrli5TxfP9Pro7sGhPqI9FegvVdKQ78n1r\nzrxyuafimhLA0KGds+Qy9u3rvP7Bg0lW3JYtyc8LFsDrr2fn19XBCy8kNaV07cMMGhuz79HJ7K8u\n+gseOTLJfuuqhnX22dn9Z5bt6Ah9Rh/4Pu/86ceOLvv79Dl84SB1Q5Kd5ssChGwfUSGVqBVoTDSp\nWeWIZLX0YRDUdPJ9c873TbetLbcPI1OraGzMfTgz3/hp+fppMrWWtrakjyXucynUF5OvzyXzGTUq\n2dZ55+Wv3WTE5xbXii64oHOWWab20tDQuX9q9mx3hu8r2i8z7YrnCl73QrWJnvZvqF9EagllqumU\nctO+C9gNbI3KGoF1wHZgLdAQzVsE7AS2AZdG5ZNJBrndAdwWldcDK8M6G4Hx0bx5YfntwNyofAKw\nKcy7H6grcvx9/9voBwrdEOMmtnQQyYwOkK9zvdCL1/INK9ObT11d57JiD7R2NZjnBRdk559/frgu\nRYIMH/m3bpYNUJs3lza+20BJCBDJqGbQuQg4JxV0lgE3humFwC1h+kxgC8krEyYAvwIszPs5MCVM\nrwFmhelPAneE6WuBlZ4NbL8GGoDRmekw7wHgmjB9J/AXRY6/DL+O2ldKv0Gxp/Pjm2lbW+G+kjhA\ndffT2Oh+7rnFl6mvLz52XFejYjc3uzPnT4sGmnznlsk6S1+jvqjRdIf6ZqRaqhZ0kn3znlTQeR44\nMUyPBZ4P0zcBC6PlfgRMC8s8F5VfB9wZph8FpoXpocAr6WU8G1yuDdOvAkPC9B8CjxY59r79TVRQ\nqYNI5luv2E06s80LLkg+jY2dH7jMpFSn39SZro0UalIr5dPQUDxpAJLjS4/h1uWQPSdvLF6bGfFa\np/0MG1baeVX6xq+gI9VSrqDT00SCMe6+O9zRXzazMaF8HEkTWUZ7KDsEtEXlbaE8s86usK3DZtZh\nZk1xebwtMzsB2OfuR6JtndTD86hpcUf3hg2lpb62t+cOJZN56VjcqTx8eDL/1VeTdOO4833YsCRd\n+phjkp8/8YncFGSzZN6hQ8nPSVzvmTjNOTZyZJIg0NEBjz+emyzw6qtJEsFZZyXHetZZ8L17DjHh\nu8MKPpTJw3fBlk8AyRAzHYfplPZ88GCSqJBJgIjPq6kpSZo4++zKd8p3JyFAQ9FIf9BX2Wu9uPV0\nYn20zFFLliw5Ot3S0kJLS0v3jqhMenuTyLf+xz7W+TmVfK+gzkhnnY0enXwKjfPlDv/6r907zu5K\nbz8T4GL7/0vyJ7ABmPDd1My3GuCW/dTVdV63oyP/9iB3pIKM5uYk265aN/DuZLpV+tkcGVhaW1tp\nbW0t/45KqQ7RuXltG7nNa9s8f/Pao2Sb17ZF5aU2r307WufbZJvXXiG3ee1HRY697+qbfayrppPN\nm5MO+vr6pBM83aQUNztNn56UpUdoTo+Vlun8L9TxP3Roac+6QHJMM2b0rl+n5M9FXy3aZNYw+kiv\n9xE3rw0blm3ayzQz1nqHvpripC9R5eY1I7d2sRqYT5JQMA94OCr/gZl9g6R57BTgCXf30Gw2FdgM\nzAX+PlpnHkmiwTVA5imLtcBXzKwBGALMDEENki+415AkFMT777cOHOg8BtjixbB3b1LW2Nj52/Yz\nz2Snt2yBMWOSMc4yLroo+Xf//qRZzD2pBV1wQefRljMOH04+meWLGT48+ZSl5jNqF/z1+MLzv/Us\nvHrm0R8LtNR1S1wDmjkz+TdTc4hHxJ40KTtid/p3Uq4mrlK2q2dzpF/oKioB9wEvAW8DvwM+TpJZ\ntp4klXkdMDpafhFJ1lo6Zfpc4GmS1Ojbo/JjgFWhfBMwIZo3P5TvIDdleiJJkNpBEniGFTn+vv8K\n0Ee6Sv2Nv7k2NeW+uCz93Ey+ju+unu7v7WfUqD7eZpGaDJd9pmznkR7JIM6Yy5QVGmcurVy1DdVi\npNKoVk3H3f9DgVmXFFj+b4G/zVP+C+DsPOVvA3MKbOtu4O485b8labbrN/J9U43b69NPsbe3J7Wf\nTL/E3r3Zzv45c5J+l0wtCPLXSg4dyq359LV4NIEe+U/nwNh/Ljx/SRdVrTzy9ePEGhqyo14DDBmS\nHb06s+477yS1zLjmsHRpknSR7i8Tke7JPEMzYJmZV/Mc49cJZG5Ys2d3Hsb+wIHkZph5mVixl3ZZ\naOjsd7+6D3wfoiFmOvnym3BoRI83X1cHd90Fd9/deRideJl8gQaSDLVMs2NDQzK46NlnZwdETb8U\nL1+CQV81rxV7VYMy06QSzAx371bSVknbVdApr3zjksVBJ54/e3b2DZT/9E/ZG2O/dUwHLBpdeP73\n18CvLu/TXWaCQeYtngcOwBtv5F+2vj55s2gmdbtQLSnz+yoWUAp9eehpcEj/XfQkE00p1NIb5Qo6\nGvCzjNrbc1+Y1twMp5ySPHtyzDHJa5BHRF/sM68aKOWVADVrSZG/0TfGwH/fXfZDiAcMbWwsvNw7\n7+SmSY8aldtkGevqBp6vZlrttGWlUEstUtApo/htk5lv4JMmZfsTNm6E6dOThw/37k2Wfeqp6h1v\njyw4D076ReH5PeiX6amhQ5Omtfnzs2VmSY2mUK3x0KHkdzNlStJvs3gxbNqUDT5NTZ2bOytxA1cm\nmgxUCjplkq7lnHpqEnDSLxTbsiW3sz/fS81qyvifwic+VHj+7b+Gfe+t3PFEDh+GD384t6/r9NOT\nV1JngohZEpxGjMi+Cjt+FXdXzWiFZIJEunmtp/ri9QcKXFKL1KfTh+KbVfxGyvr6pBkn7i8olAzQ\nVfZVxdkRWDy08PzNn4RH7qjc8ZTIDN71rqQ5c2MYmCmu8Rx9uyelBRb1j8hgoz6dfqDQy8XyNe0U\nioM18R2gWL8MVLTJLJ+hQ+HYY5PAXihAe3gQNl2zzBg+XH0cItWgoNOH4jdr9rS2UmikgLK69G/g\ngq8Xnv/Fg3Ckdv5Uhg7NNo11JQ7i77yTJBacf35uc1N7e5J88MwzuSnSMXXKi/SN2rmT9GOZppd/\nLvKcY03paoiZFT+G3+R99rfmNDYmfWKl1hDr6joHjAULsk2hjz2moCJSTgo6PZBu349TdGtWsSaz\n106Db26v3LH00kMPwbe+lUwvXw6//z1ceGHhDLV4HLmzO42JUZrM7/mZZ+Ctt5K/gUL9Our/ESlM\niQQ9kH5wL06xrRk318GQIm11Ve6XKUV9ffIsUyYRoK4umT7vvM7Lxr+TpqYkyJjBOefA3/xNNo16\nzZrO67e3J0MLPflk0ld05pnJMEPpgFHqA5t98WCnSLUpkaBK4m+tmec44lToxx8v7/hmJTvrfvj3\nhYbJA5btgQNNlTuePnDhhbkPz156aRIw8g0Rc+BA8rxNvj6ZK67IPi+1eHH+ILBzZ1JTeued5HcK\namYTKQcFnS7EHcjx+GmZJpuqBZy6A/D5YwvPX/vfYePfVO54+oBZ8jn+eHj/+5Ppt95KHqAdMSJ3\nDLLM72TSpOTfzO9l+PDuN2fFD/EWU+pzL3o+RqQwBZ0eqkqrZI2nMvdWZvD+978/9+2lxZqo8gWL\ndE2oO0GgqSlpksv3cGepD2z2xYOdIgOV+nRSio3uu3Rp8Q7rPvcf/i2cVuTuteQI3Xxzd79QX5+8\ngC4zBl1TE2zdmq3BtLd3HhQ1M5RNesiaUvpU1PEv0pn6dCok3/MYmXepzJpVfByvXhvzNHzqA4Xn\n37EVXulh+lU/MnJk7jNPe/fm9q+MG5fUROKAk+81A6VSzUSkchR0gsy33U2bsmWZG1/5Xt7lsGRI\n4dlb/yP8r++XY8dVV+h12EOGwLp1yTUv1ZQpuQFHfSoitUvNa0G+997MmAE/+UnSvLNvXx8d0ADv\nlynkgguS1zlknvqH3Fc41NXBhz6UzTy7+OLCzWugJjGRclPzWhVkBoQ844xsGm23dZXK/KW34PAx\nPdx4bZg8Oen4f/rpztl9kDR/rVrVOWgUG3pmxYriQUVNYiL9k2o6QXt7kn6bfv8N5JZ3qau3ZfaD\nIWamT09esbBpU+5LzvJpboZXXkmm8z3TBKqJiPRHel11D3Une63Qg6D5vr3nKNZktmM23Ne/vpLH\nGV+Za/LWW9lz/5d/SZq7Ro5M+l/yjRAgIv2bgk4P9XQYnHx9PEd1NSpzDfTLjB6dBMlS+qKGDk1q\ncy++CGedBffeq5qJyGCnPp08zOwy4DZgCHCXuy8rx35Gn7SH/QveVXiBW/bCW43l2HUnBWtbJCMu\nT5qU+2BjoX4RdcSLSDX025qOmQ0BdgAXAy8Bm4Hr3P351HLdrukc8SPMuW8+P/zVvfkXeOBB2Pbv\nenLYeQ0dCtOmJdM7dyZvu9y2LTvEzrnnwu9+l0zffXd2hOUbbkg64994I2niSnfWp7W2ttLS0tJn\nx92f6Vpk6Vpk6VpkqabT2VRgp7u/CGBmK4GrgOeLrlXEa2++RvPfNXcqn7zjIX5539VAEiCOPx6G\nNee+CjmfYcPg4MFk+rzz4De/SfpDRoxImrGGD8/WStLBolhNJH6GZc+e0s9P/6GydC2ydC2ydC3K\nrz8HnXHArujnNpJA1GNDbSjXnXUdV7/vaua8fw5mSZBvb4cFodYRB4BMYDhwIMn2ytRSRoxIgklv\nMriUEiwiA1F/Djp9rnFEI/f/u/s7lRcKAKUEBgUOEZGs/tyn84fAEne/LPx8E+DpZAIz658nKCJS\nZUqZjpjZUGA7SSLB74EngI+4+7aqHpiIiBTUb5vX3P2wmf0lsI5syrQCjohIDeu3NR0REel/ioyr\n37+Z2WVm9ryZ7TCzhdU+nnIws5PN7DEze9bMnjazT4fyRjNbZ2bbzWytmTVE6ywys51mts3MLo3K\nJ5vZ1nC9bqvG+fSWmQ0xs1+a2erw86C8DgBm1mBm/xDO71kzmzZYr0c4t2fDefzAzOoHy7Uws7vM\nbLeZbY3K+uzcw7VcGdbZaGbjuzwodx9wH5Jg+ivgPcAw4Cng9GofVxnOcyxwTpg+jqSP63RgGXBj\nKF8I3BKmzwS2kDSrTgjXKFPb/TkwJUyvAWZV+/x6cD3+Cvg+sDr8PCivQzj2u4GPh+k6oGEwXo9w\nD/gNUB9+fgCYN1iuBXARcA6wNSrrs3MHPgncEaavBVZ2dUwDtaZz9MFRdz8IZB4cHVDc/WV3fypM\nvwFsA04mOdd7wmL3AFeH6StJ/igOufsLwE5gqpmNBY53981huRXROv2CmZ0MzAa+ExUPuusAYGaj\ngD9y9+8BhPPsYHBej9eBd4CRZlYHjADaGSTXwt3/CUiPwNiX5x5v60GSxK6iBmrQyffg6IAeXczM\nJpB8o9kEnOjuuyEJTMCYsFj6urSHsnEk1yijP16vbwD/FYg7KQfjdQCYCLxmZt8LzY3LzexYBuH1\ncPd9wNeA35GcV4e7r2cQXovImD4896PruPthYL+ZNRXb+UANOoOKmR1H8i3jM6HGk84OGdDZImZ2\nBbA71PqKPVcwoK9DpA6YDHzL3ScD/wrcxCD7uwAws/eSNLu+BziJpMbzHxmE16KIvjz3Lp/rGahB\npx2IO7RODmUDTmgyeBC4190fDsW7zezEMH8sEF6zRjvw7mj1zHUpVN5fXAhcaWa/Ae4HZpjZvcDL\ng+w6ZLQBu9z9yfDzD0mC0GD7uwA4D/iZu+8N38QfAi5gcF6LjL4896PzwrOTo9x9b7GdD9Sgsxk4\nxczeY2b1wHXA6iofU7l8F3jO3W+PylYD88P0PODhqPy6kHEyETgFeCJUsTvMbKqZGTA3Wqfmuft/\nc/fx7v5ekt/1Y+7+MeD/MIiuQ0ZoOtllZqeFoouBZxlkfxfBduAPzWx4OIeLgecYXNfCyK2B9OW5\nrw7bALgGeKzLo6l2dkUZszYuI/mD2wncVO3jKdM5XggcJsnO2wL8Mpx3E7A+nP86YHS0ziKSrJRt\nwKVR+bnA0+F63V7tc+vFNfljstlrg/k6fJDky9dTwP8iyV4blNeDpK/vWWArSaf3sMFyLYD7SF79\n8jZJv9bHgca+OnfgGGBVKN8ETOjqmPRwqIiIVMxAbV4TEZEapKAjIiIVo6AjIiIVo6AjIiIVo6Aj\nIiIVo6AjIiIVo6AjIiIVo6AjIiIV8/8BbfTmPBoKRnAAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(testing_data['sqft_living'],testing_data['price'],'.',\n", " testing_data['sqft_living'],sqft_model.predict(testing_data),'-')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameindexvaluestderr
(intercept)None-47114.02067024923.34437753
sqft_livingNone281.9578501662.16405465323
\n", "[2 rows x 4 columns]
\n", "
" ], "text/plain": [ "Columns:\n", "\tname\tstr\n", "\tindex\tstr\n", "\tvalue\tfloat\n", "\tstderr\tfloat\n", "\n", "Rows: 2\n", "\n", "Data:\n", "+-------------+-------+----------------+---------------+\n", "| name | index | value | stderr |\n", "+-------------+-------+----------------+---------------+\n", "| (intercept) | None | -47114.0206702 | 4923.34437753 |\n", "| sqft_living | None | 281.957850166 | 2.16405465323 |\n", "+-------------+-------+----------------+---------------+\n", "[2 rows x 4 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sqft_model.get(\"coefficients\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build a more elaborate model: more features" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode', 'condition', 'grade', 'waterfront', 'view', 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'lat', 'long', 'sqft_living15', 'sqft_lot15']\n", "PROGRESS: Linear regression:\n", "PROGRESS: --------------------------------------------------------\n", "PROGRESS: Number of examples : 17384\n", "PROGRESS: Number of features : 18\n", "PROGRESS: Number of unpacked features : 18\n", "PROGRESS: Number of coefficients : 127\n", "PROGRESS: Starting Newton Method\n", "PROGRESS: --------------------------------------------------------\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: | Iteration | Passes | Elapsed Time | Training-max_error | Training-rmse |\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: | 1 | 2 | 0.042353 | 3469012.450686 | 154580.940736 |\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: SUCCESS: Optimal solution found.\n", "PROGRESS:\n", "{'max_error': 3556849.413858208, 'rmse': 156831.1168021901}\n" ] } ], "source": [ "my_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode']\n", "#sales[my_features].show()\n", "\n", "advanced_features = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'zipcode',\n", "'condition', # condition of house\t\t\t\t\n", "'grade', # measure of quality of construction\t\t\t\t\n", "'waterfront', # waterfront property\t\t\t\t\n", "'view', # type of view\t\t\t\t\n", "'sqft_above', # square feet above ground\t\t\t\t\n", "'sqft_basement', # square feet in basement\t\t\t\t\n", "'yr_built', # the year built\t\t\t\t\n", "'yr_renovated', # the year renovated\t\t\t\t\n", "'lat', 'long', # the lat-long of the parcel\t\t\t\t\n", "'sqft_living15', # average sq.ft. of 15 nearest neighbors \t\t\t\t\n", "'sqft_lot15', # average lot size of 15 nearest neighbors \n", "]\n", "\n", "print advanced_features\n", "\n", "advanced_features_model = graphlab.linear_regression.create(training_data,target='price',features=advanced_features,validation_set=None)\n", "print advanced_features_model.evaluate(testing_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sales.show(view='BoxWhisker Plot', x='zipcode', y='price')\n", "houses = sales[sales[\"zipcode\"]==\"98039\"]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PROGRESS: Linear regression:\n", "PROGRESS: --------------------------------------------------------\n", "PROGRESS: Number of examples : 17384\n", "PROGRESS: Number of features : 6\n", "PROGRESS: Number of unpacked features : 6\n", "PROGRESS: Number of coefficients : 115\n", "PROGRESS: Starting Newton Method\n", "PROGRESS: --------------------------------------------------------\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: | Iteration | Passes | Elapsed Time | Training-max_error | Training-rmse |\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: | 1 | 2 | 0.030431 | 3763208.270523 | 181908.848367 |\n", "PROGRESS: +-----------+----------+--------------+--------------------+---------------+\n", "PROGRESS: SUCCESS: Optimal solution found.\n", "PROGRESS:\n" ] } ], "source": [ "my_features_model = graphlab.linear_regression.create(training_data,target='price',features=my_features,validation_set=None)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'max_error': 4143550.8825285938, 'rmse': 255191.02870527358}\n", "{'max_error': 3486584.509381705, 'rmse': 179542.4333126903}\n" ] } ], "source": [ "print sqft_model.evaluate(testing_data)\n", "print my_features_model.evaluate(testing_data)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[620000, ... ]\n", "[629584.8197281545]\n", "[721918.9333272863]\n" ] } ], "source": [ "house1 = sales[sales['id']=='5309101200']\n", "print house1['price']\n", "print sqft_model.predict(house1)\n", "print my_features_model.predict(house1)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }