{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting Stock Market\n", "\n", "In this project, you'll work with data from the S&P500 Index. [The S&P500](https://en.wikipedia.org/wiki/S%26P_500) is a stock market index. Before we get into what an index is, we'll need to start with the basics of the stock market.\n", "\n", "Some companies are publicly traded, which means that anyone can **buy and sell their shares** on the open market. A share entitles the owner to some control over the direction of the company and to a percentage (or share) of the earnings of the company. When you buy or sell shares, it's common known as **trading a stock**. The price of a share is based on supply and demand for a given stock.\n", "\n", "**Indexes** aggregate the prices of multiple stocks together, and allow you to see how the market as a whole performs.\n", "\n", "You'll be using historical data on the price of the S&P500 Index to make predictions about future prices. Predicting whether an index goes up or down helps forecast how the stock market as a whole performs. Since stocks tend to correlate with how well the economy as a whole is performs, it can also help with economic forecasts.\n", "\n", "In this project, our dataset contain index prices. Each row in the file contains a daily record of the price of the S&P500 Index from *1950* to *2015*. The dataset is stored in sphist.csv.\n", "\n", "| Columns | Description |\n", "| ----------- | ----------- |\n", "| **Date** | The date of the record. |\n", "| Open | The opening price of the day (when trading starts) |\n", "| High | The highest trade price during the day |\n", "| Low | The lowest trade price during the day |\n", "| Close | The closing price for the day (when trading is finished) |\n", "| Volume | The number of shares traded |\n", "| Adj Close | The daily closing price, adjusted retroactively to include any corporate actions. |\n", "\n", "You'll be using this dataset to develop a predictive model. You'll train the model with data from *1950-2012* and try to make predictions from *2013-2015*.\n", "\n", "## Overview of the dataset\n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateOpenHighLowCloseVolumeAdj Close
165891950-01-0316.66000016.66000016.66000016.6600001260000.016.660000
165881950-01-0416.85000016.85000016.85000016.8500001890000.016.850000
165871950-01-0516.93000016.93000016.93000016.9300002550000.016.930000
165861950-01-0616.98000016.98000016.98000016.9800002010000.016.980000
165851950-01-0917.08000017.08000017.08000017.0800002520000.017.080000
165841950-01-1017.03000117.03000117.03000117.0300012160000.017.030001
165831950-01-1117.09000017.09000017.09000017.0900002630000.017.090000
165821950-01-1216.76000016.76000016.76000016.7600002970000.016.760000
165811950-01-1316.67000016.67000016.67000016.6700003330000.016.670000
165801950-01-1616.71999916.71999916.71999916.7199991460000.016.719999
\n", "
" ], "text/plain": [ " Date Open High Low Close Volume \\\n", "16589 1950-01-03 16.660000 16.660000 16.660000 16.660000 1260000.0 \n", "16588 1950-01-04 16.850000 16.850000 16.850000 16.850000 1890000.0 \n", "16587 1950-01-05 16.930000 16.930000 16.930000 16.930000 2550000.0 \n", "16586 1950-01-06 16.980000 16.980000 16.980000 16.980000 2010000.0 \n", "16585 1950-01-09 17.080000 17.080000 17.080000 17.080000 2520000.0 \n", "16584 1950-01-10 17.030001 17.030001 17.030001 17.030001 2160000.0 \n", "16583 1950-01-11 17.090000 17.090000 17.090000 17.090000 2630000.0 \n", "16582 1950-01-12 16.760000 16.760000 16.760000 16.760000 2970000.0 \n", "16581 1950-01-13 16.670000 16.670000 16.670000 16.670000 3330000.0 \n", "16580 1950-01-16 16.719999 16.719999 16.719999 16.719999 1460000.0 \n", "\n", " Adj Close \n", "16589 16.660000 \n", "16588 16.850000 \n", "16587 16.930000 \n", "16586 16.980000 \n", "16585 17.080000 \n", "16584 17.030001 \n", "16583 17.090000 \n", "16582 16.760000 \n", "16581 16.670000 \n", "16580 16.719999 " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "from datetime import datetime\n", "import numpy as np\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import mean_absolute_error\n", "# Read our data\n", "df = pd.read_csv(\"sphist.csv\")\n", "# Convert the Date column into a Pandas date type\n", "df[\"Date\"] = pd.to_datetime(df[\"Date\"])\n", "df.sort_values(by=\"Date\", ascending=True, inplace=True)\n", "\n", "df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating indicators\n", "\n", "Stock market data is sequential and each observation comes a day after the previous observation. Thus, the observations are not all independent and you can't treat them as such. The time series nature of the data means that we can generate indicators to make our model more accurate. Our goal is to teach the model how to predict the current price from historical prices.\n", "Let's select 3 indicators : \n", "- The average price from the past **5** days.\n", "- The average price for the past **30** days.\n", "- The *ratio* between the average price for the past 5 days, and the average price for the past 30 days." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateOpenHighLowCloseVolumeAdj CloseClose_Day_5Close_Day_30Ratio_Close_5_30Volume_Day_5Volume_Day_30Ratio_Volume_5_30
165891950-01-0316.66000016.66000016.66000016.6600001260000.016.6600000.0000000.0NaN0.000000e+000.0NaN
165881950-01-0416.85000016.85000016.85000016.8500001890000.016.8500000.0000000.0NaN0.000000e+000.0NaN
165871950-01-0516.93000016.93000016.93000016.9300002550000.016.9300000.0000000.0NaN0.000000e+000.0NaN
165861950-01-0616.98000016.98000016.98000016.9800002010000.016.9800000.0000000.0NaN0.000000e+000.0NaN
165851950-01-0917.08000017.08000017.08000017.0800002520000.017.0800000.0000000.0NaN0.000000e+000.0NaN
165841950-01-1017.03000117.03000117.03000117.0300012160000.017.03000116.9457140.0inf2.145714e+060.0inf
165831950-01-1117.09000017.09000017.09000017.0900002630000.017.09000016.9600000.0inf2.390000e+060.0inf
165821950-01-1216.76000016.76000016.76000016.7600002970000.016.76000016.9342860.0inf2.595714e+060.0inf
165811950-01-1316.67000016.67000016.67000016.6700003330000.016.67000016.9042860.0inf2.440000e+060.0inf
165801950-01-1616.71999916.71999916.71999916.7199991460000.016.71999916.8871430.0inf2.408571e+060.0inf
165791950-01-1716.86000116.86000116.86000116.8600011790000.016.86000116.8542860.0inf2.272857e+060.0inf
165781950-01-1816.85000016.85000016.85000016.8500001570000.016.85000016.8314290.0inf2.131429e+060.0inf
165771950-01-1916.87000116.87000116.87000116.8700011170000.016.87000116.8042860.0inf1.961429e+060.0inf
165761950-01-2016.90000016.90000016.90000016.9000001440000.016.90000016.8271430.0inf1.728571e+060.0inf
165751950-01-2316.92000016.92000016.92000016.9200001340000.016.92000016.8542860.0inf1.431429e+060.0inf
165741950-01-2416.86000116.86000116.86000116.8600011250000.016.86000116.8571430.0inf1.465714e+060.0inf
165731950-01-2516.74000016.74000016.74000016.7400001700000.016.74000016.8385720.0inf1.374286e+060.0inf
165721950-01-2616.73000016.73000016.73000016.7300001150000.016.73000016.8342860.0inf1.328571e+060.0inf
165711950-01-2716.82000016.82000016.82000016.8200001250000.016.82000016.8557140.0inf1.395714e+060.0inf
165701950-01-3017.02000017.02000017.02000017.0200001640000.017.02000016.8771430.0inf1.431429e+060.0inf
\n", "
" ], "text/plain": [ " Date Open High Low Close Volume \\\n", "16589 1950-01-03 16.660000 16.660000 16.660000 16.660000 1260000.0 \n", "16588 1950-01-04 16.850000 16.850000 16.850000 16.850000 1890000.0 \n", "16587 1950-01-05 16.930000 16.930000 16.930000 16.930000 2550000.0 \n", "16586 1950-01-06 16.980000 16.980000 16.980000 16.980000 2010000.0 \n", "16585 1950-01-09 17.080000 17.080000 17.080000 17.080000 2520000.0 \n", "16584 1950-01-10 17.030001 17.030001 17.030001 17.030001 2160000.0 \n", "16583 1950-01-11 17.090000 17.090000 17.090000 17.090000 2630000.0 \n", "16582 1950-01-12 16.760000 16.760000 16.760000 16.760000 2970000.0 \n", "16581 1950-01-13 16.670000 16.670000 16.670000 16.670000 3330000.0 \n", "16580 1950-01-16 16.719999 16.719999 16.719999 16.719999 1460000.0 \n", "16579 1950-01-17 16.860001 16.860001 16.860001 16.860001 1790000.0 \n", "16578 1950-01-18 16.850000 16.850000 16.850000 16.850000 1570000.0 \n", "16577 1950-01-19 16.870001 16.870001 16.870001 16.870001 1170000.0 \n", "16576 1950-01-20 16.900000 16.900000 16.900000 16.900000 1440000.0 \n", "16575 1950-01-23 16.920000 16.920000 16.920000 16.920000 1340000.0 \n", "16574 1950-01-24 16.860001 16.860001 16.860001 16.860001 1250000.0 \n", "16573 1950-01-25 16.740000 16.740000 16.740000 16.740000 1700000.0 \n", "16572 1950-01-26 16.730000 16.730000 16.730000 16.730000 1150000.0 \n", "16571 1950-01-27 16.820000 16.820000 16.820000 16.820000 1250000.0 \n", "16570 1950-01-30 17.020000 17.020000 17.020000 17.020000 1640000.0 \n", "\n", " Adj Close Close_Day_5 Close_Day_30 Ratio_Close_5_30 Volume_Day_5 \\\n", "16589 16.660000 0.000000 0.0 NaN 0.000000e+00 \n", "16588 16.850000 0.000000 0.0 NaN 0.000000e+00 \n", "16587 16.930000 0.000000 0.0 NaN 0.000000e+00 \n", "16586 16.980000 0.000000 0.0 NaN 0.000000e+00 \n", "16585 17.080000 0.000000 0.0 NaN 0.000000e+00 \n", "16584 17.030001 16.945714 0.0 inf 2.145714e+06 \n", "16583 17.090000 16.960000 0.0 inf 2.390000e+06 \n", "16582 16.760000 16.934286 0.0 inf 2.595714e+06 \n", "16581 16.670000 16.904286 0.0 inf 2.440000e+06 \n", "16580 16.719999 16.887143 0.0 inf 2.408571e+06 \n", "16579 16.860001 16.854286 0.0 inf 2.272857e+06 \n", "16578 16.850000 16.831429 0.0 inf 2.131429e+06 \n", "16577 16.870001 16.804286 0.0 inf 1.961429e+06 \n", "16576 16.900000 16.827143 0.0 inf 1.728571e+06 \n", "16575 16.920000 16.854286 0.0 inf 1.431429e+06 \n", "16574 16.860001 16.857143 0.0 inf 1.465714e+06 \n", "16573 16.740000 16.838572 0.0 inf 1.374286e+06 \n", "16572 16.730000 16.834286 0.0 inf 1.328571e+06 \n", "16571 16.820000 16.855714 0.0 inf 1.395714e+06 \n", "16570 17.020000 16.877143 0.0 inf 1.431429e+06 \n", "\n", " Volume_Day_30 Ratio_Volume_5_30 \n", "16589 0.0 NaN \n", "16588 0.0 NaN \n", "16587 0.0 NaN \n", "16586 0.0 NaN \n", "16585 0.0 NaN \n", "16584 0.0 inf \n", "16583 0.0 inf \n", "16582 0.0 inf \n", "16581 0.0 inf \n", "16580 0.0 inf \n", "16579 0.0 inf \n", "16578 0.0 inf \n", "16577 0.0 inf \n", "16576 0.0 inf \n", "16575 0.0 inf \n", "16574 0.0 inf \n", "16573 0.0 inf \n", "16572 0.0 inf \n", "16571 0.0 inf \n", "16570 0.0 inf " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "indicators = [5, 30]\n", "def add_indicator(df,indicators,targets):\n", " for target in targets:\n", " for index, row in df.iterrows():\n", " size = len(df[df['Date'] < row['Date']])\n", " for indicator in indicators:\n", " column = \"{}_Day_{}\".format(target,indicator)\n", " #new_column = \"Volume_Day_{}\".format(indicator)\n", " # print(column)\n", " if size < indicator:\n", " df.loc[index, column] = 0\n", " else:\n", " df.loc[index, column] = np.mean(\n", " df.loc[index+indicator:index-1, target])\n", " column1 =\"Ratio_{}_{}_{}\".format(target,indicators[0],indicators[1])\n", " df[column1] = df.iloc[:,-2] / df.iloc[:,-1]\n", "add_indicator(df,indicators,[\"Close\",\"Volume\"])\n", "df.head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cleaning and Splitting up our data\n", "Since we're computing indicators that use historical data, there are some rows where there isn't enough historical data to generate them. let's clean our data depending on the select columns and number of days." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Date 0\n", "Open 0\n", "High 0\n", "Low 0\n", "Close 0\n", "Volume 0\n", "Adj Close 0\n", "Close_Day_5 0\n", "Close_Day_30 0\n", "Ratio_Close_5_30 0\n", "Volume_Day_5 0\n", "Volume_Day_30 0\n", "Ratio_Volume_5_30 0\n", "dtype: int64\n", "(16560, 13)\n" ] } ], "source": [ "def clean(df,day,columns):\n", " # Remove data before 1951-01-03\n", " #df = df[df[\"Date\"] > datetime(year=1951, month=1, day=2)]\n", " for column in columns:\n", " col = \"{}_Day_{}\".format(column,day)\n", " df.drop(df[(df[col] == 0)].index,\n", " axis=0, inplace=True)\n", " df.dropna(axis=0, inplace=True)\n", "\n", "clean_df = df.copy()\n", "clean(clean_df,30,[\"Close\",\"Volume\"])\n", "\n", "print(clean_df.isnull().sum())\n", "print(clean_df.shape)\n", "# Generate the train and test dataset\n", "train = clean_df[clean_df[\"Date\"] < datetime(year=2013, month=1, day=1)]\n", "test = clean_df[clean_df[\"Date\"] > datetime(year=2013, month=1, day=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On the splitting part, we're going :\n", "- train should contain any rows in the data with a date less than 2013-01-01\n", "- test should contain any rows with a date greater than or equal to 2013-01-01" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(15821, 13)\n", "(739, 13)\n" ] } ], "source": [ "# Generate the train and test dataset\n", "train = clean_df[clean_df[\"Date\"] < datetime(year=2013, month=1, day=1)]\n", "test = clean_df[clean_df[\"Date\"] > datetime(year=2013, month=1, day=1)]\n", "print(train.shape)\n", "print(test.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making Prediction\n", "\n", "The **linear regression model** is going to be used to train the train dataset and predict the test dataset and the error metric is **Mean Absolute Error** (MAE)." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MAE = 11.350741886520453\n", "MAE = 11.351021603941877\n" ] } ], "source": [ "def model(target,features,train,test):\n", " lr = LinearRegression()\n", " lr.fit(train[features], train[target])\n", "\n", " predictions = lr.predict(test[features])\n", " mae = mean_absolute_error(test['Close'], predictions)\n", " print(\"MAE = {}\".format(mae))\n", "\n", "features = [\"Close_Day_5\", \"Close_Day_30\"]\n", "target = \"Close\"\n", "#\"ratio_5_365\",\"Volume_Day_5\", \"Volume_Day_365\"\n", "model(target,features,train,test)\n", "features = [\"Close_Day_5\", \"Close_Day_30\",\"Ratio_Close_5_30\"]\n", "target = \"Close\"\n", "#\"ratio_5_365\",\"Volume_Day_5\", \"Volume_Day_365\"\n", "model(target,features,train,test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By the result above we can say the ratio doesn't have significant effect in reducing error. Let's be sure by checking the correlation coefficients." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Open 0.999901\n", "High 0.999954\n", "Low 0.999956\n", "Close 1.000000\n", "Volume 0.774267\n", "Adj Close 1.000000\n", "Close_Day_5 0.999892\n", "Close_Day_30 0.999297\n", "Ratio_Close_5_30 0.005923\n", "Volume_Day_5 0.783925\n", "Volume_Day_30 0.788661\n", "Ratio_Volume_5_30 -0.004822\n", "Name: Close, dtype: float64" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clean_df.corr()[\"Close\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The coefficients (*Ratio_Close_5_30 = 0.005923 and Ratio_Volume_5_30 = -0.004822*) confirm the assertion from above.\n", "Let's add more 2 more indicators :\n", "- The average volume from the past **5** days.\n", "- The average volume for the past **30** days." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MAE = 11.3073026169537\n" ] } ], "source": [ "features = [\"Close_Day_5\", \"Close_Day_30\",\"Volume_Day_5\", \"Volume_Day_30\"]\n", "#\"ratio_5_365\",\"Volume_Day_5\", \"Volume_Day_365\"\n", "model(target,features,train,test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have a small improvement of our model from *MAE = 11.350741886520453* to *MAE = 11.3073026169537* . We can also make significant structural improvements to the algorithm.\n", "\n", "## Predictions only one day ahead\n", "\n", "About this improvement, we train a model using data from *1951-01-03 to 2013-01-02*, make predictions for *2013-01-03*, and then train another model using data from *1951-01-03 to 2013-01-03*, make predictions for 2013-01-04, and so on. This more closely simulates what you'd do if you were trading using the algorithm." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MAE = 3.7842744203948424\n" ] } ], "source": [ "# Modification of model function\n", "def model(df,row,target,features):\n", " #print(row)\n", " lr = LinearRegression()\n", " train = df[df[\"Date\"] < row[\"Date\"]]\n", " test = df[df[\"Date\"] == row[\"Date\"]]\n", " if train.empty :\n", " return 0\n", " else:\n", " lr.fit(train[features], train[target])\n", " prediction = lr.predict(test[features])\n", " mae = mean_absolute_error(test['Close'], prediction)\n", " return mae\n", " \n", " \n", "\n", "# get the MAEs of our new model\n", "maes = clean_df.apply(lambda row : model(clean_df,row,target,features),axis=1)\n", "mae = np.mean(maes)\n", "print(\"MAE = {}\".format(mae))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see a big improvement in the reduction's error, by that we can conclure the accuracy of the model will improve by making predictions only one day ahead.\n", "\n", "## Other idea\n", "\n", "In the goal to improve the accuracy of the prediction, we can :\n", "- Try other techniques, like a random forest, and see if they perform better.\n", "- Incorporate outside data, such as the weather in New York City (where most trading happens) the day before and the amount of Twitter activity around certain stocks.\n", "- Make the system real-time by writing an automated script to download the latest data when the market closes and make predictions for the next day.\n", "- Make the system \"higher-resolution\". You're currently making daily predictions, but you could make hourly, minute-by-minute, or second-by-second predictions. This requires obtaining more data, though. You could also make predictions for individual stocks instead of the S&P500." ] } ], "metadata": { "kernelspec": { "display_name": "data_science", "language": "python", "name": "data_science" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }