{ "metadata": { "name": "", "signature": "sha256:dc4e1e4e75c25643df954f1b026cbc2ed3787f213fdbe08b76dc2fdd334011f3" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Prediction Using Different Machine Learning Methods\n", "## Random Forests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we'll train a Random Forest Regression model for predicting building energy consumption based on historical enregy data and several weather variables. We'll use daily energy data and weather data to predict energy consumption." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline \n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "pd.options.display.mpl_style = 'default'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "import seaborn as sns\n", "import scipy as sp\n", "import sklearn\n", "import sklearn.cross_validation\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.cross_validation import cross_val_score\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we'll train a Random Forest Regression model for predicting building energy consumption based on historical enregy data and several weather variables. We'll use daily energy data and weather data to predict energy consumption." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# read in original data:\n", "electricity = pd.read_excel('Data/dailyElectricityWithFeatures.xlsx')\n", "electricity = electricity.drop('startDay', 1).drop('endDay', 1)\n", "#electricity = electricity.drop('humidityRatio-kg/kg',1).drop('coolingDegrees',1).drop('heatingDegrees',1).drop('dehumidification',1).drop('occupancy',1)\n", "electricity = electricity.dropna()\n", "\n", "chilledWater = pd.read_excel('Data/dailyChilledWaterWithFeatures.xlsx')\n", "chilledWater = chilledWater.drop('startDay', 1).drop('endDay', 1)\n", "chilledWater = chilledWater.dropna()\n", "\n", "steam = pd.read_excel('Data/dailySteamWithFeatures.xlsx')\n", "steam = steam.drop('startDay', 1).drop('endDay', 1)\n", "steam = steam.dropna()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "# normalize data:\n", "normalized_electricity = electricity - electricity.mean()\n", "normalized_chilledWater = chilledWater - chilledWater.mean()\n", "normalized_steam = steam - steam.mean()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adding a new column to specify if working days or weekends and holidays. We'll set working days to 0, and weekends and holidays to 1. US public holidays are listed here, http://www.officeholidays.com/countries/usa/ We may also remove vacations times when there is no school, but we'll do it since we don't have this information." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Initialization all days to 0\n", "normalized_electricity['day_type'] = np.zeros(len(normalized_electricity))\n", "normalized_chilledWater['day_type'] = np.zeros(len(normalized_chilledWater))\n", "normalized_steam['day_type'] = np.zeros(len(normalized_steam))\n", "\n", "# Set weekends to 1\n", "normalized_electricity['day_type'][(normalized_electricity.index.dayofweek==5)|(normalized_electricity.index.dayofweek==6)] = 1\n", "normalized_chilledWater['day_type'][(normalized_chilledWater.index.dayofweek==5)|(normalized_chilledWater.index.dayofweek==6)] = 1\n", "normalized_steam['day_type'][(normalized_steam.index.dayofweek==5)|(normalized_steam.index.dayofweek==6)] = 1\n", "\n", "# Set holidays to 1\n", "holidays = ['2014-01-01','2014-01-20','2014-05-26','2014-07-04','2014-09-01','2014-11-11','2014-11-27','2014-12-25','2013-01-01',\n", " '2013-01-21','2013-05-27','2013-07-04','2013-09-02','2013-11-11','2013-11-27','2013-12-25','2012-01-01','2012-01-16',\n", " '2012-05-28','2012-07-04','2012-09-03','2012-11-12','2012-11-22','2012-12-25']\n", "\n", "for i in range(len(holidays)):\n", " normalized_electricity['day_type'][normalized_electricity.index.date==np.datetime64(holidays[i])] = 1\n", " normalized_chilledWater['day_type'][normalized_chilledWater.index.date==np.datetime64(holidays[i])] = 1\n", " normalized_steam['day_type'][normalized_steam.index.date==np.datetime64(holidays[i])] = 1" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analysis of electricity data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Split train and test data:\n", "elect_train = pd.DataFrame(data=normalized_electricity, index=np.arange('2012-01', '2014-01', dtype='datetime64[D]')).dropna()\n", "elect_test = pd.DataFrame(data=normalized_electricity, index=np.arange('2014-01', '2014-11', dtype='datetime64[D]')).dropna()\n", "\n", "XX_elect_train = elect_train.drop('electricity-kWh', axis = 1).reset_index().drop('index', axis = 1)\n", "XX_elect_test = elect_test.drop('electricity-kWh', axis = 1).reset_index().drop('index', axis = 1)\n", "\n", "YY_elect_train = elect_train['electricity-kWh']\n", "YY_elect_test = elect_test['electricity-kWh']\n", "\n", "print XX_elect_train.shape, XX_elect_test.shape" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(634, 13) (294, 13)\n" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "XX_elect_train.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | RH-% | \n", "T-C | \n", "Tdew-C | \n", "pressure-mbar | \n", "solarRadiation-W/m2 | \n", "windDirection | \n", "windSpeed-m/s | \n", "humidityRatio-kg/kg | \n", "coolingDegrees | \n", "heatingDegrees | \n", "dehumidification | \n", "occupancy | \n", "day_type | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "8.212490 | \n", "-4.533170 | \n", "-2.381563 | \n", "-6.369746 | \n", "-67.924759 | \n", "28.249822 | \n", "0.562182 | \n", "-0.001941 | \n", "-3.841443 | \n", "2.072677 | \n", "-0.000885 | \n", "-0.671879 | \n", "1 | \n", "
1 | \n", "-12.481350 | \n", "-5.873750 | \n", "-8.392976 | \n", "-16.701268 | \n", "-75.852295 | \n", "45.912866 | \n", "2.358178 | \n", "-0.003322 | \n", "-3.841443 | \n", "3.413256 | \n", "-0.000885 | \n", "-0.371879 | \n", "0 | \n", "
2 | \n", "-25.939684 | \n", "-14.915417 | \n", "-18.430476 | \n", "-9.201268 | \n", "-67.477295 | \n", "95.079532 | \n", "2.693826 | \n", "-0.005410 | \n", "-3.841443 | \n", "12.454923 | \n", "-0.000885 | \n", "-0.371879 | \n", "0 | \n", "
3 | \n", "-26.898017 | \n", "-18.790417 | \n", "-22.413809 | \n", "-3.076268 | \n", "-64.435629 | \n", "78.829532 | \n", "1.571140 | \n", "-0.005847 | \n", "-3.841443 | \n", "16.329923 | \n", "-0.000885 | \n", "-0.371879 | \n", "0 | \n", "
4 | \n", "-21.523017 | \n", "-12.290417 | \n", "-15.322143 | \n", "-9.284601 | \n", "-72.435629 | \n", "50.496199 | \n", "1.605862 | \n", "-0.004991 | \n", "-3.841443 | \n", "9.829923 | \n", "-0.000885 | \n", "-0.371879 | \n", "0 | \n", "