{ "metadata": { "name": "", "signature": "sha256:ee9b143bd3a2938a92d10cb6f1f35e426398ed56f6e9b5917714a36506b0cf51" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Prediction Accuracy of Different Machine Learning Methods\n", "\n", "### Linear Regression\n", "\n", "Advantage of the method: Simple and fast.\n", "\n", "Disadvantage of the method: Poor results for large data sets. For example, hourly prediction.\n", "\n", "### Support Vector Regression\n", "\n", "\n", "### Gaussian Process Regression\n", "\n", "\n", "\n", "### Random Forests\n", "\n", "\n", "### K-Nearest Neighbours\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Please note that in Random Forests and K-Nearest Neighbours methods, the training set is larger and the test set is smaller compared with those in other methods. \n", "\n", "\n", "* For KNN and RF predictions, the accuracy might increase if the features are carefully selected for different type of energy. For example, steam has nothing to do with dehumidification and cooling degrees, which are features designed for chilled water. Moreover, pressure, solar radiation, wind direction and wind speed do not have an impact on energy use according to our exploratory analysis. It is meaningless to include them in the prediction.\n", "\n", "\n", "* Due to the time limitation, we didn't perform hourly prediction for all the methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion\n", "\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Daily Consumption\n", "\n", "#### Chilled Water and Steam Prediction\n", "* According to exploratory analysis, Daily chilled water and steam consumption have a good linear relationship with cooling and heating degrees. Even simple linear regression could predict daily chilled water and steam consumption quite well. \n", "\n", "\n", "* Gaussian Process Regression and Random Forests perform slightly better than other methods in daily chilled water prediction. However, Random Forests prediction used a larger training set and smaller test set. The training test ratio is 2.4 : 1 for Random Forests and 1.1 : 1 for other methods. Therefore, it is not a fair comparison. The accuracy of RF could be lower if using the same training and test set.\n", "\n", "#### Daily Electricity\n", "* Daily electricity is not correlated with weather. Occupancy/schedule/study pattern have a large impact on daily electricity.\n", "\n", "\n", "* Gaussian Process Regression outperforms other methods.\n", "\n", "### Hourly Consumption\n", "\n", "* Hourly precition is much more difficult than daily prediction. First, the data sample is large. Therefore, it is very time-consuming to train a model, expecially for those computationally expensive methods. Second, the noise and variance in hourly consumption are much larger than daily.\n", "\n", "\n", "* Gaussian Process Regression did a good job predicting the hourly energy demand.\n", "\n", "\n", "* Linear Regression prediction used a larger training set and smaller test set. The training test ratio is 2.8 : 1 ~ 8.7 : 1 for Linear Regression and 1.1 : 1 for Gaussian Process Regression. Therefore, it is not a fair comparison. The accuracy of Linear Regression Prediction for hourly prediction could be lower if using the same training and test set.\n", "\n", "\n", "* Due to the time limitation, we didn't try all the methods for hourly prediction.\n", "\n", "\n", "### The winner is Gaussian Process Regression.\n", "\n", "* It is no doubt that Gaussian Process Regression outperforms other methods even in an unfair comparison. However, this does not mean Gaussian Process Regression is superior to other methods in genral. The features in Gaussian Process Regression are different from other methods. Maybe it is because we choose the right features for Gaussian Process regression.\n", "\n", "\n", "A sample image of Gaussian Process Regression prediction." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Discussion\n", "\n", "* We spent a lot of time cleaning the raw data. Therefore, we can only manage prediction for one building.\n", "\n", "\n", "* If we had more time, we would like to optimize the set of features that should be included in the prediction. For example, we could include more features such as weather data of previous hour, or even previous two hours. This is because the cooling and heating process is dynamic and there might be some delay in systems' reponse to weather. We probably could also exclude some irralevant features to reduce the time cost of training a model.\n", "\n", "\n", "* We shoudl have used the same training and test time scale. Please note that in daily prediction, for Random Forests and K-Nearest Neighbours methods, the training set is larger and the test set is smaller compared with those for other methods. In hourly prediction, the trainng to test ratio (number of training points divided by number of test points) of Linear Regression is much lower than Gaussian Process Regression. These are not fair comparison.\n", "\n", "\n", "* We tried our best to explain everything in the notebooks. However, there is very limited time for this project and the some part of work is trivial and difficult to explain. If there is anything unclear to you, do not heasitate to send us an email. We are happy to explain further. Please forgive any typos." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }