{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial for analyzing instrumental learning data with the HDDMrl module\n",
"This is a tutorial for using the HDDMrl module to simultaneously estimate reinforcement learning parameters and decision parameters within a fully hierarchical bayesian estimation framework, including steps for sampling, assessing convergence, model fit, parameter recovery, and posterior predictive checks (model validation). The module uses the reinforcement learning drift diffusion model (RLDDM), a reinforcement learning model that replaces the standard “softmax” choice function with a drift diffusion process. The softmax and drift diffusion process is equivalent for capturing choice proportions, but the DDM also takes RT distributions into account; options are provided to also only fit RL parameters without RT. The RLDDM estimates trial-by-trial drift rate as a scaled difference in expected rewards (expected reward for upper bound alternative minus expected reward for lower bound alternative). Expected rewards are updated with a delta learning rule using either a single learning rate or with separate learning rates for positive and negative prediction errors. The model also includes the standard DDM-parameters. The RLDDM is described in detail in [Pedersen, Frank & Biele (2017).](http://ski.clps.brown.edu/papers/PedersenEtAl_RLDDM.pdf) (Note this approach differs from Frank et al (2015) who used HDDM to fit instrumental learning but did not allow for simultaneous estimation of learning parameters). \n",
"\n",
"## OUTLINE \n",
"\n",
"[1. Background](#1.-Background)
\n",
"[2. Installing the module](#2.-Installing-the-module)
\n",
"[3. How the RLDDM works](#3.-How-the-RLDDM-works)
\n",
"[4. Structuring data](#4.-Structuring-data)
\n",
"[5. Running basic model](#5.-Running-basic-model)
\n",
"[6. Checking results](#6.-Checking-results)
\n",
"[7. Posterior predictive checks](#7.-Posterior-predictive-checks)
\n",
"[8. Parameter recovery](#8.-Parameter-recovery)
\n",
"[9. Separate learning rates for positive and negative prediction errors](#9.-Separate-learning-rates-for-positive-and-negative-prediction-errors)
\n",
"[10. depends_on vs. split_by](#10.-depends_on-vs.-split_by)
\n",
"[11. Probabilistic binary outcomes vs. normally distributed outcomes](#11.-Probabilistic-binary-outcomes-vs.-normally-distributed-outcomes)
\n",
"[12. HDDMrlRegressor](#12.-HDDMrlRegressor)
\n",
"[13. Regular RL without RT](#13.-Regular-RL-without-RT)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Background\n",
"Traditional RL models typically assume static decision processes (e.g. softmax), and the DDM typically assumes static decision variables (stimuli are modeled with the same drift rate across trials). The RLDDM combines dynamic decision variables from RL and dynamic choice process from DDM by assuming trial-by-trial drift rate that depends on the difference in expected rewards, which are updated on each trial by a rate of the prediction error dependent on the learning rate. The potential benefit of the RLDDM is thus to gain a better insight into decision processes in instrumental learning by also accounting for speed of decision making."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Installing the module\n",
"The new version of HDDM (version 0.7.1) that includes the RLDDM is currently not uploaded to conda. So to install you would either have to use pip:
\n",
"'pip install hddm'
\n",
"OR
\n",
"docker: 'pull madslupe/hddm', which runs hddm in jupyter notebook
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. How the RLDDM works\n",
"The main idea of the RLDDM is that reward-based choices can be captured by an accumulation-to-bound process where drift rate is proportional to the difference in expected reward between options, and where expected rewards subsequently are updated in a trial-by-trial basis via reinforcement learning.
\n",
"__drift rate on each trial depends on difference in expected rewards for the two alternatives (q_up and q_low):__
\n",
"drift rate = (q_up - q_low) * scaling
\n",
"_where the scaling parameter describes the weight to put on the difference in q-values._
\n",
"__expeceted reward (q) for chosen option is updated according to delta learning rule :__
\n",
"q_chosen = q_chosen + alpha * (feedback-q_chosen)
\n",
"_where alpha weights the rate of learning on each trial._
\n",
"So in principle all you need is the Wiener first passage time likelihood-function. The reason why HDDM is useful (and hence also HDDMrl) is that it automates the process of setting up and running your model, which tends to be very time consuming. So after structuring the data it is simple to run a model with HDDMrl. In particular it separates subjects and conditions (using the split_by-column, see next section) so that the updating process works correctly, which can be especially difficult to do for RL models. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Structuring data\n",
"The HDDMrl module was created to make it easier to model instrumental learning data with the RLDDM. If you are familiar with using HDDM it shouldn't be a big step to start using HDDMrl. Please refresh your memory by starting with the [tutorial for HDDM](http://ski.clps.brown.edu/hddm_docs/index.html) first (especially critical if you have not used HDDM at all). Running HDDMrl does require a few extra steps compared to HDDM, and because the model includes increased potential for parameter colinearity (typically learning rate and the scaling parameter on drift rate) it is even more important to assess model fit, which will be covered below. Here are the most important steps for structuring your dataframe:\n",
"1. Sort trials in ascending order within subject and condition, to ensure proper updating of expected rewards.\n",
"2. Include a column called __'split_by'__ which identifies the different task conditions (__as integers__), to ensure reward updating will work properly for each condition without mixing values learned from one trial type to another. \n",
"3. Code the response column with [stimulus-coding] (http://ski.clps.brown.edu/hddm_docs/howto.html#code-subject-responses). Although stimulus-coding and accuracy-coding often are the same in instrumental learning it is important that the upper and lower boundaries are represented by the same alternative within a condition, because expected rewards are linked to the thresholds/boundaries.\n",
"4. __feedback__-column. This should be the reward received for the chosen option on each trial.\n",
"5. __q_init__. Adjusting initial q-values is something that can improve model fit quite a bit. To allow the user to set their own initial values we therefore require that the dataframe includes a column called q_init. The function will set all initial q-values according to the first value in q_init. So this is not the most elegant method of allowing users to set inital value for expected rewards, but it works for now.\n",
"\n",
"#### Required columns in data:\n",
"* __rt__: in seconds, same as in HDDM\n",
"* __response__: 0 or 1. defines chosen stimulus, not accuracy.\n",
"* __split_by__: needs to be an integer. Split_by defines conditions (trial-types) that should have the same variables (e.g. Q values) within subject: the trials need to be split by condition to ensure proper updating of expected rewards that do not bleed over into other trial-types. (e.g. if you have stimulus A and get reward you want that updated value to impact choice only for the next stimulus A trial but not necessarily the immediate trial afterwards, which may be of a different condition) \n",
"* __subj_idx__: same as in HDDM, but even more important here because it is used to split trials\n",
"* __feedback__: feedback on the current trial. can be any value.\n",
"* __q_init__: used to initialize expected rewards. can be any value, but an unbiased initial value should be somewhere between the minimum and and maximum reward values (e.g. 0.5 for tasks with rewards of 0 and 1)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Running basic model\n",
"To illustrate how to run the model we will use example data from the learning phase of the probabilistic selection task (PST). During the learning phase of the PST subjects choose between two stimuli presented as Hiragana-letters (here represented as letters from the latin alphabet). There are three conditions with different probabilities of receiving reward (feedback=1) and non-reward (feedback=0). In the AB condition A is rewarded with 80% probability, B with 20%. In the CD condition C is rewarded with 70% probability and D with 30%, while in the EF condition E is rewarded with a 60% probability and F with 40%. The dataset is included in the data-folder in your installation of HDDM."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/krishn/anaconda3/envs/hddm/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
"/home/krishn/anaconda3/envs/hddm/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
"/home/krishn/anaconda3/envs/hddm/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
"/home/krishn/anaconda3/envs/hddm/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
"/home/krishn/anaconda3/envs/hddm/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
"/home/krishn/anaconda3/envs/hddm/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
" np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"tanh\n",
"tanh\n",
"tanh\n",
"linear\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/krishn/anaconda3/envs/hddm/lib/python3.6/site-packages/IPython/parallel.py:13: ShimWarning: The `IPython.parallel` package has been deprecated since IPython 4.0. You should import from ipyparallel instead.\n",
" \"You should import from ipyparallel instead.\", ShimWarning)\n"
]
}
],
"source": [
"# import\n",
"import pandas as pd\n",
"import numpy as np\n",
"import hddm\n",
"from scipy import stats\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import pymc\n",
"import kabuki\n",
"\n",
"sns.set(style=\"white\")\n",
"%matplotlib inline\n",
"from tqdm import tqdm\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\", category=np.VisibleDeprecationWarning)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | subj_idx | \n", "response | \n", "cond | \n", "rt | \n", "trial | \n", "split_by | \n", "feedback | \n", "q_init | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "42 | \n", "0.0 | \n", "CD | \n", "1.255 | \n", "1.0 | \n", "1 | \n", "0.0 | \n", "0.5 | \n", "
1 | \n", "42 | \n", "1.0 | \n", "EF | \n", "0.778 | \n", "1.0 | \n", "2 | \n", "0.0 | \n", "0.5 | \n", "
2 | \n", "42 | \n", "1.0 | \n", "AB | \n", "0.647 | \n", "1.0 | \n", "0 | \n", "1.0 | \n", "0.5 | \n", "
3 | \n", "42 | \n", "1.0 | \n", "AB | \n", "0.750 | \n", "2.0 | \n", "0 | \n", "1.0 | \n", "0.5 | \n", "
4 | \n", "42 | \n", "0.0 | \n", "EF | \n", "0.772 | \n", "2.0 | \n", "2 | \n", "1.0 | \n", "0.5 | \n", "