{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sampling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this first exercise, we will investigate how to evaluate the Q-value of each action available in a 5-armed bandit. It is mostly to give you intuition about the limits of sampling and the central limit theorem.\n", "\n", "Let's start with importing numpy and matplotlib:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sampling a n-armed bandit\n", "\n", "Let's now create the n-armed bandit. The only thing we need to do is to randomly choose 5 true Q-values $Q^*(a)$.\n", "\n", "![](../img/bandit-example.png)\n", "\n", "To be generic, let's define nb_actions=5 and create an array corresponding to the index of each action (0, 1, 2, 3, 4) for plotting purpose." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "nb_actions = 5\n", "actions = np.arange(nb_actions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Create a numpy array Q_star with nb_actions values, normally distributed with a mean of 0 and standard deviation of 1 (as in the lecture). " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "rng = np.random.default_rng()\n", "Q_star = rng.normal(0, 1, nb_actions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Plot the Q-values. Identify the optimal action $a^*$.\n", "\n", "*Tip:* you could plot the array Q_star with plt.plot, but that would be ugly. Check the documentation of the plt.bar method." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimal action: 0\n" ] }, { "data": { "image/png": 