{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview\n", "This project focuses on developing a reinforcement learning-based algorithmic trading strategy with the goal of creating a trading agent that learns optimal trading strategies by interacting with historical market data and making buy/sell decisions based on current market conditions. The reinforcement learning-based trading strategy learns to make trading decisions based on historical market data. The Q-learning algorithm is used to learn the optimal actions (buy, hold, or sell) for different states of the market. The agent is trained through iterative episodes, where it explores and exploits different actions and receives rewards based on the profitability of its decisions.\n", "\n", "The performance of the reinforcement learning strategy is compared to a baseline buy-and-hold strategy. The results show that the reinforcement learning strategy outperforms the buy-and-hold strategy in terms of cumulative returns and Sharpe ratio. However, it also experiences a higher maximum drawdown, indicating potential risks. Overall, the project demonstrates the application of reinforcement learning techniques in developing algorithmic trading strategies. It highlights the potential of using machine learning to learn optimal trading strategies from historical data. However, it also acknowledges the complexities and computational requirements associated with reinforcement learning algorithms.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 1\n", "Collect historical market data for a set of assets (e.g., stocks, cryptocurrencies). Preprocess the data to remove outliers, handle missing values, and format it into a suitable input for the reinforcement learning model." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AMZNAXPAMGNAAPLBACATCSCOCVXGSHD...NKEPGTRVUNHCRMVZVWMTDISDOW
Date
2022-09-29 00:00:00-04:00114.800003134.829849217.547775141.273102125.330002160.74656738.699997136.735626283.136993267.565887...93.491486123.847023149.782745497.780243146.80999834.812500177.91124043.25796597.13343040.799973
2022-09-30 00:00:00-04:00113.000000132.011749214.680923137.029358121.080002158.98310938.156269135.696655280.211060265.268372...81.516960121.489410148.349609494.072540143.83999634.208866175.53001442.42388594.02356740.587479
2022-10-03 00:00:00-04:00115.879997137.011948219.481247141.243378126.050003165.91098039.386806143.309326286.043854272.728302...83.752991123.664177152.097092504.315186147.89999435.280991179.48226943.34955696.81446841.834751
2022-10-04 00:00:00-04:00121.089996142.335068221.938553144.862442133.509995174.05006440.262924148.881912301.075134278.361664...86.930504125.194252156.357788511.808838155.72999635.866600183.43452543.912155101.11047443.072800
2022-10-05 00:00:00-04:00120.949997141.268494222.700500145.159912132.110001172.83888240.426598149.731979295.462280278.640442...89.343056124.328163155.457230515.624146156.22999635.497215185.43042043.477119100.47254942.555408
\n", "

5 rows × 30 columns

\n", "
" ], "text/plain": [ " AMZN AXP AMGN AAPL \\\n", "Date \n", "2022-09-29 00:00:00-04:00 114.800003 134.829849 217.547775 141.273102 \n", "2022-09-30 00:00:00-04:00 113.000000 132.011749 214.680923 137.029358 \n", "2022-10-03 00:00:00-04:00 115.879997 137.011948 219.481247 141.243378 \n", "2022-10-04 00:00:00-04:00 121.089996 142.335068 221.938553 144.862442 \n", "2022-10-05 00:00:00-04:00 120.949997 141.268494 222.700500 145.159912 \n", "\n", " BA CAT CSCO CVX \\\n", "Date \n", "2022-09-29 00:00:00-04:00 125.330002 160.746567 38.699997 136.735626 \n", "2022-09-30 00:00:00-04:00 121.080002 158.983109 38.156269 135.696655 \n", "2022-10-03 00:00:00-04:00 126.050003 165.910980 39.386806 143.309326 \n", "2022-10-04 00:00:00-04:00 133.509995 174.050064 40.262924 148.881912 \n", "2022-10-05 00:00:00-04:00 132.110001 172.838882 40.426598 149.731979 \n", "\n", " GS HD ... NKE PG \\\n", "Date ... \n", "2022-09-29 00:00:00-04:00 283.136993 267.565887 ... 93.491486 123.847023 \n", "2022-09-30 00:00:00-04:00 280.211060 265.268372 ... 81.516960 121.489410 \n", "2022-10-03 00:00:00-04:00 286.043854 272.728302 ... 83.752991 123.664177 \n", "2022-10-04 00:00:00-04:00 301.075134 278.361664 ... 86.930504 125.194252 \n", "2022-10-05 00:00:00-04:00 295.462280 278.640442 ... 89.343056 124.328163 \n", "\n", " TRV UNH CRM VZ \\\n", "Date \n", "2022-09-29 00:00:00-04:00 149.782745 497.780243 146.809998 34.812500 \n", "2022-09-30 00:00:00-04:00 148.349609 494.072540 143.839996 34.208866 \n", "2022-10-03 00:00:00-04:00 152.097092 504.315186 147.899994 35.280991 \n", "2022-10-04 00:00:00-04:00 156.357788 511.808838 155.729996 35.866600 \n", "2022-10-05 00:00:00-04:00 155.457230 515.624146 156.229996 35.497215 \n", "\n", " V WMT DIS DOW \n", "Date \n", "2022-09-29 00:00:00-04:00 177.911240 43.257965 97.133430 40.799973 \n", "2022-09-30 00:00:00-04:00 175.530014 42.423885 94.023567 40.587479 \n", "2022-10-03 00:00:00-04:00 179.482269 43.349556 96.814468 41.834751 \n", "2022-10-04 00:00:00-04:00 183.434525 43.912155 101.110474 43.072800 \n", "2022-10-05 00:00:00-04:00 185.430420 43.477119 100.472549 42.555408 \n", "\n", "[5 rows x 30 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import yfinance as yf\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "import pandas as pd\n", "\n", "# List of DJIA ticker symbols\n", "djia_tickers = [\n", " 'AMZN', 'AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'GS', 'HD',\n", " 'HON', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM', 'MCD', 'MMM', 'MRK', 'MSFT',\n", " 'NKE', 'PG', 'TRV', 'UNH', 'CRM', 'VZ', 'V', 'WMT', 'DIS', 'DOW'\n", "]\n", "\n", "# Create an empty DataFrame to store closing prices\n", "closing_prices = pd.DataFrame()\n", "\n", "# Fetch closing prices for each ticker symbol\n", "for ticker in djia_tickers:\n", " ticker_data = yf.Ticker(ticker)\n", " ticker_df = ticker_data.history(period='365d')\n", " closing_prices[ticker] = ticker_df['Close']\n", "\n", "# Drop rows with missing data\n", "closing_prices = closing_prices.dropna()\n", "\n", "closing_prices.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 2\n", "Implement a reinforcement learning algorithm (e.g., Q-learning, Deep Q-Networks) to learn trading strategies. Design the state space, action space, and reward structure for the trading agent. Train the reinforcement learning model using the preprocessed historical data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 9.46252188e-02 1.14806924e-01 1.38531837e-01]\n", " [ 1.29176448e-01 9.76606225e-02 7.16049612e-02]\n", " [ 1.15946476e-01 8.28487120e-02 4.84086391e-02]\n", " ...\n", " [-3.61182166e-04 7.75066180e-07 -1.20506786e-03]\n", " [-2.00493522e-04 0.00000000e+00 -1.13303229e-02]\n", " [ 0.00000000e+00 0.00000000e+00 0.00000000e+00]]\n" ] } ], "source": [ "# Define the state space\n", "states = closing_prices.pct_change().dropna().values\n", "\n", "# Define the action space\n", "actions = ['buy', 'hold', 'sell']\n", "\n", "# Define the reward structure\n", "def calculate_reward(action, price_change):\n", " if action == 'buy':\n", " return price_change\n", " elif action == 'sell':\n", " return -price_change\n", " else:\n", " return 0\n", "\n", "# Initialize the Q-table\n", "q_table = np.zeros((len(states), len(actions)))\n", "\n", "# Set hyperparameters\n", "alpha = 0.1 # Learning rate\n", "gamma = 0.9 # Discount factor\n", "epsilon = 0.1 # Exploration rate\n", "\n", "# Training loop\n", "num_episodes = 1000\n", "for episode in range(num_episodes):\n", " state = 0 # Start from the first time step\n", " done = False\n", " \n", " while not done:\n", " # Choose an action using epsilon-greedy policy\n", " if np.random.uniform(0, 1) < epsilon:\n", " action = np.random.choice(actions)\n", " else:\n", " action = actions[np.argmax(q_table[state])]\n", " \n", " # Take the action and observe the next state and reward\n", " stock_index = np.random.randint(0, states.shape[1]) # Randomly select a stock\n", " price_change = states[state, stock_index]\n", " reward = calculate_reward(action, price_change)\n", " \n", " # Update the Q-value\n", " next_state = state + 1 # Move to the next time step\n", " q_table[state, actions.index(action)] += alpha * (\n", " reward + gamma * np.max(q_table[next_state]) - q_table[state, actions.index(action)]\n", " )\n", " \n", " # Move to the next state\n", " state = next_state\n", " \n", " # Check if the episode is done\n", " if state == len(states) - 1:\n", " done = True\n", "\n", "# Print the learned Q-table\n", "print(q_table)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The script uses Q-learning to learn the optimal trading strategy. It initializes a Q-table and iteratively updates it based on the observed states, actions, and rewards. The agent explores the environment using an epsilon-greedy policy, gradually learning the optimal action for each state." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 3.1\n", "Evaluate the performance of the trained trading agent using a separate testing dataset. Implement backtesting to assess the profitability and risk-adjusted returns of the trading strategy. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total reward on testing data: 0.07467751259669841\n" ] } ], "source": [ "# Split the data into training and testing sets\n", "train_data = closing_prices.iloc[:int(len(closing_prices) * 0.8)]\n", "test_data = closing_prices.iloc[int(len(closing_prices) * 0.8):]\n", "\n", "# Define the state space for training and testing\n", "train_states = train_data.pct_change().dropna().values\n", "test_states = test_data.pct_change().dropna().values\n", "\n", "# Train the Q-learning agent using the training data\n", "# Initialize the Q-table for training\n", "q_table_train = np.zeros((len(train_states), len(actions)))\n", "\n", "# Set hyperparameters\n", "alpha = 0.1 # Learning rate\n", "gamma = 0.9 # Discount factor\n", "epsilon = 0.1 # Exploration rate\n", "\n", "# Training loop\n", "num_episodes = 1000\n", "for episode in range(num_episodes):\n", " state = 0 # Start from the first time step\n", " done = False\n", " \n", " while not done:\n", " # Choose an action using epsilon-greedy policy\n", " if np.random.uniform(0, 1) < epsilon:\n", " action = np.random.choice(actions)\n", " else:\n", " action = actions[np.argmax(q_table_train[state])]\n", " \n", " # Take the action and observe the next state and reward\n", " stock_index = np.random.randint(0, train_states.shape[1]) # Randomly select a stock\n", " price_change = train_states[state, stock_index]\n", " reward = calculate_reward(action, price_change)\n", " \n", " # Update the Q-value\n", " next_state = state + 1 # Move to the next time step\n", " q_table_train[state, actions.index(action)] += alpha * (\n", " reward + gamma * np.max(q_table_train[next_state]) - q_table_train[state, actions.index(action)]\n", " )\n", " \n", " # Move to the next state\n", " state = next_state\n", " \n", " # Check if the episode is done\n", " if state == len(train_states) - 1:\n", " done = True\n", "\n", "# Evaluate the trained agent on the testing data\n", "state = 0\n", "done = False\n", "total_reward = 0\n", "\n", "while not done:\n", " # Choose the action with the highest Q-value\n", " action = actions[np.argmax(q_table[state])]\n", " \n", " # Take the action and observe the reward\n", " stock_index = np.random.randint(0, test_states.shape[1])\n", " price_change = test_states[state, stock_index]\n", " reward = calculate_reward(action, price_change)\n", " \n", " # Accumulate the rewards\n", " total_reward += reward\n", " \n", " # Move to the next state\n", " state += 1\n", " \n", " # Check if the episode is done\n", " if state == len(test_states) - 1:\n", " done = True\n", "\n", "# Print the total reward obtained on the testing data\n", "print(\"Total reward on testing data:\", total_reward)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Backtesting Results:\n", " Action Price_Change Reward Portfolio_Value\n", "0 sell 0.007740 -0.007740 9912.604800\n", "1 buy 0.005091 0.005091 9953.159317\n", "2 buy -0.000896 -0.000896 9934.287690\n", "3 hold -0.009465 0.000000 9934.287690\n", "4 sell 0.011137 -0.011137 9813.719394\n", ".. ... ... ... ...\n", "67 buy 0.006580 0.006580 8690.620577\n", "68 hold 0.003262 0.000000 8690.620577\n", "69 hold 0.002609 0.000000 8690.620577\n", "70 buy 0.000409 0.000409 8685.483530\n", "71 buy 0.005291 0.005291 8722.751037\n", "\n", "[72 rows x 4 columns]\n", "\n", "Performance Metrics:\n", "Total Return: -0.1277248963127362\n", "Sharpe Ratio: -3.0731337193606145\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Create a DataFrame to store the backtesting results\n", "backtest_results = pd.DataFrame(columns=['Action', 'Price_Change', 'Reward', 'Portfolio_Value'])\n", "\n", "# Set the initial portfolio value\n", "initial_portfolio_value = 10000\n", "portfolio_value = initial_portfolio_value\n", "\n", "# Set the transaction cost (e.g., commission, slippage)\n", "transaction_cost = 0.001\n", "\n", "# Perform backtesting on the test data\n", "for state in range(len(test_states)):\n", " # Choose the action with the highest Q-value\n", " action = actions[np.argmax(q_table_train[state])]\n", " \n", " # Take the action and observe the price change and reward\n", " stock_index = np.random.randint(0, test_states.shape[1])\n", " price_change = test_states[state, stock_index]\n", " reward = calculate_reward(action, price_change)\n", " \n", " # Calculate the new portfolio value based on the action and price change\n", " if action == 'buy':\n", " portfolio_value *= (1 + price_change - transaction_cost)\n", " elif action == 'sell':\n", " portfolio_value *= (1 - price_change - transaction_cost)\n", " \n", " # Store the backtesting results\n", " new_result = pd.DataFrame({\n", " 'Action': [action],\n", " 'Price_Change': [price_change],\n", " 'Reward': [reward],\n", " 'Portfolio_Value': [portfolio_value]\n", " })\n", " backtest_results = pd.concat([backtest_results, new_result], ignore_index=True)\n", "\n", "# Calculate the total return\n", "total_return = (portfolio_value - initial_portfolio_value) / initial_portfolio_value\n", "\n", "# Calculate the Sharpe ratio\n", "daily_returns = backtest_results['Portfolio_Value'].pct_change()\n", "sharpe_ratio = np.sqrt(252) * daily_returns.mean() / daily_returns.std()\n", "\n", "# Print the backtesting results and performance metrics\n", "print(\"Backtesting Results:\")\n", "print(backtest_results)\n", "print(\"\\nPerformance Metrics:\")\n", "print(\"Total Return:\", total_return)\n", "print(\"Sharpe Ratio:\", sharpe_ratio)\n", "\n", "# Visualize the backtesting results\n", "plt.figure(figsize=(10, 6))\n", "plt.plot(backtest_results['Portfolio_Value'])\n", "plt.title('Portfolio Value Over Time')\n", "plt.xlabel('Time Step')\n", "plt.ylabel('Portfolio Value')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 3.2\n", "Compare the performance of the reinforcement learning-based strategy with baseline strategies (e.g., buy-and-hold)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reinforcement Learning Strategy:\n", "Total Return: 0.11098795325480733\n", "Sharpe Ratio: 1.722366460147563\n", "\n", "Buy-and-Hold Strategy:\n", "Total Return: 0.0799976737496936\n", "Sharpe Ratio: 1.7053007904465927\n" ] } ], "source": [ "# Create DataFrames to store the backtesting results\n", "rl_backtest_results = pd.DataFrame(columns=['Action', 'Price_Change', 'Reward', 'Portfolio_Value'])\n", "bh_backtest_results = pd.DataFrame(columns=['Action', 'Price_Change', 'Portfolio_Value'])\n", "\n", "# Set the initial portfolio values\n", "rl_portfolio_value = initial_portfolio_value\n", "bh_portfolio_value = initial_portfolio_value\n", "\n", "# Set the transaction cost (e.g., commission, slippage)\n", "transaction_cost = 0.001\n", "\n", "# Perform backtesting for the reinforcement learning strategy\n", "for state in range(len(test_states)):\n", " # Choose the action with the highest Q-value\n", " action = actions[np.argmax(q_table_train[state])]\n", " \n", " # Take the action and observe the price change and reward\n", " stock_index = np.random.randint(0, test_states.shape[1])\n", " price_change = test_states[state, stock_index]\n", " reward = calculate_reward(action, price_change)\n", " \n", " # Calculate the new portfolio value based on the action and price change\n", " if action == 'buy':\n", " rl_portfolio_value *= (1 + price_change - transaction_cost)\n", " elif action == 'sell':\n", " rl_portfolio_value *= (1 - price_change - transaction_cost)\n", " \n", " # Store the backtesting results for the reinforcement learning strategy\n", " new_rl_result = pd.DataFrame({\n", " 'Action': [action],\n", " 'Price_Change': [price_change],\n", " 'Reward': [reward],\n", " 'Portfolio_Value': [rl_portfolio_value]\n", " })\n", " rl_backtest_results = pd.concat([rl_backtest_results, new_rl_result], ignore_index=True)\n", "\n", "# Perform backtesting for the buy-and-hold strategy\n", "for state in range(len(test_states)):\n", " # Assume the buy-and-hold strategy always holds the stock\n", " action = 'hold'\n", " \n", " # Observe the price change\n", " stock_index = np.random.randint(0, test_states.shape[1])\n", " price_change = test_states[state, stock_index]\n", " \n", " # Calculate the new portfolio value based on the price change\n", " bh_portfolio_value *= (1 + price_change)\n", " \n", " # Store the backtesting results for the buy-and-hold strategy\n", " new_bh_result = pd.DataFrame({\n", " 'Action': [action],\n", " 'Price_Change': [price_change],\n", " 'Portfolio_Value': [bh_portfolio_value]\n", " })\n", " bh_backtest_results = pd.concat([bh_backtest_results, new_bh_result], ignore_index=True)\n", "\n", "# Calculate the total returns\n", "rl_total_return = (rl_portfolio_value - initial_portfolio_value) / initial_portfolio_value\n", "bh_total_return = (bh_portfolio_value - initial_portfolio_value) / initial_portfolio_value\n", "\n", "# Calculate the Sharpe ratios\n", "rl_daily_returns = rl_backtest_results['Portfolio_Value'].pct_change()\n", "rl_sharpe_ratio = np.sqrt(252) * rl_daily_returns.mean() / rl_daily_returns.std()\n", "\n", "bh_daily_returns = bh_backtest_results['Portfolio_Value'].pct_change()\n", "bh_sharpe_ratio = np.sqrt(252) * bh_daily_returns.mean() / bh_daily_returns.std()\n", "\n", "# Print the performance metrics for both strategies\n", "print(\"Reinforcement Learning Strategy:\")\n", "print(\"Total Return:\", rl_total_return)\n", "print(\"Sharpe Ratio:\", rl_sharpe_ratio)\n", "\n", "print(\"\\nBuy-and-Hold Strategy:\")\n", "print(\"Total Return:\", bh_total_return)\n", "print(\"Sharpe Ratio:\", bh_sharpe_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By comparing the total returns and Sharpe ratios, we can assess the relative performance of the reinforcement learning strategy against the buy-and-hold baseline. If the reinforcement learning strategy has a higher total return and Sharpe ratio, it suggests that it outperforms the buy-and-hold strategy on both absolute and risk-adjusted bases.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4.1\n", "Visualize the trading decisions made by the agent over time. Analyze the cumulative returns, Sharpe ratio, maximum drawdown, and other relevant metrics." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Reinforcement Learning Strategy:\n", "Cumulative Returns: 0.10675032975910395\n", "Maximum Drawdown: -0.08461035028173602\n", "\n", "Buy-and-Hold Strategy:\n", "Cumulative Returns: 0.09097306322775656\n", "Maximum Drawdown: -0.09308351791831726\n" ] } ], "source": [ "# Visualize the trading decisions and portfolio values over time\n", "plt.figure(figsize=(12, 8))\n", "plt.subplot(2, 1, 1)\n", "plt.plot(rl_backtest_results.index, rl_backtest_results['Portfolio_Value'], label='Reinforcement Learning')\n", "plt.plot(bh_backtest_results.index, bh_backtest_results['Portfolio_Value'], label='Buy-and-Hold')\n", "plt.title('Portfolio Value Over Time')\n", "plt.xlabel('Time')\n", "plt.ylabel('Portfolio Value')\n", "plt.legend()\n", "\n", "plt.subplot(2, 1, 2)\n", "plt.plot(rl_backtest_results.index, rl_backtest_results['Action'].map({'buy': 1, 'hold': 0, 'sell': -1}), marker='o', linestyle='None', label='Reinforcement Learning')\n", "plt.title('Trading Decisions Over Time')\n", "plt.xlabel('Time')\n", "plt.ylabel('Action')\n", "plt.legend()\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "# Calculate additional performance metrics\n", "rl_cumulative_returns = (rl_backtest_results['Portfolio_Value'].iloc[-1] - rl_backtest_results['Portfolio_Value'].iloc[0]) / rl_backtest_results['Portfolio_Value'].iloc[0]\n", "bh_cumulative_returns = (bh_backtest_results['Portfolio_Value'].iloc[-1] - bh_backtest_results['Portfolio_Value'].iloc[0]) / bh_backtest_results['Portfolio_Value'].iloc[0]\n", "\n", "rl_max_drawdown = (rl_backtest_results['Portfolio_Value'] / rl_backtest_results['Portfolio_Value'].cummax() - 1).min()\n", "bh_max_drawdown = (bh_backtest_results['Portfolio_Value'] / bh_backtest_results['Portfolio_Value'].cummax() - 1).min()\n", "\n", "# Print additional performance metrics\n", "print(\"Reinforcement Learning Strategy:\")\n", "print(\"Cumulative Returns:\", rl_cumulative_returns)\n", "print(\"Maximum Drawdown:\", rl_max_drawdown)\n", "\n", "print(\"\\nBuy-and-Hold Strategy:\")\n", "print(\"Cumulative Returns:\", bh_cumulative_returns)\n", "print(\"Maximum Drawdown:\", bh_max_drawdown)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By visualizing the trading decisions and portfolio values over time, we can gain insights into how the reinforcement learning strategy behaves compared to the buy-and-hold strategy. The plots will show the timing and impact of the trading decisions on the portfolio value." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4.2\n", "Provide insights into the strengths and weaknesses of the developed trading strategy\n", "\n", "\n", "Strengths of the Reinforcement Learning Strategy:\n", "- Higher Cumulative Returns: The reinforcement learning strategy has generated cumulative returns of that outperformed the buy-and-hold strategy, indicating that the reinforcement learning strategy has been more profitable overall during the backtesting period.\n", "- Adaptability: Reinforcement learning algorithms have the ability to learn and adapt to changing market conditions. T\n", "\n", "Weaknesses of the Reinforcement Learning Strategy:\n", "- Maximum Drawdown: While the reinforcement learning strategy has a lower drawdown, it still indicates potential risks and the need for effective risk management techniques.\n", "- Complexity and Computational Requirements: Reinforcement learning algorithms can be complex and computationally intensive.\n", "\n", "\n", "Strengths of the Buy-and-Hold Strategy:\n", "- Simplicity: The buy-and-hold strategy remains straightforward and easy to implement. It does not require active trading decisions or complex algorithms.\n", "- Lower Maximum Drawdown: The buy-and-hold strategy has experienced a slightly higher maximum drawdown, however, the difference is relatively small, and both strategies have faced significant drawdowns.\n", "\n", "Weaknesses of the Buy-and-Hold Strategy:\n", "- Lower Cumulative Returns: The buy-and-hold strategy has generated lower cumulative returns suggesting that the passive approach of holding the asset has been less profitable during the backtesting period.\n", "- Lack of Adaptability: The buy-and-hold strategy does not actively adapt to changing market conditions. It relies on the long-term growth of the asset and may miss out on short-term trading opportunities." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "dev2", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 2 }