{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "cf396dee", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.\n" ] }, { "data": { "application/javascript": [ "\n", " setTimeout(function() {\n", " var nbb_cell_id = 1;\n", " var nbb_unformatted_code = \"%load_ext nb_black\\n\\nimport warnings\\n\\nwarnings.simplefilter(action=\\\"ignore\\\", category=FutureWarning)\\n\\nimport pymc3 as pm\\nimport pandas as pd\\nimport numpy as np\\nimport seaborn as sns\\nimport arviz as az\\nimport theano\\nimport theano.tensor as tt\\nimport matplotlib.pyplot as plt\\nimport statsmodels.stats.api as sms\\nimport altair as alt\";\n", " var nbb_formatted_code = \"%load_ext nb_black\\n\\nimport warnings\\n\\nwarnings.simplefilter(action=\\\"ignore\\\", category=FutureWarning)\\n\\nimport pymc3 as pm\\nimport pandas as pd\\nimport numpy as np\\nimport seaborn as sns\\nimport arviz as az\\nimport theano\\nimport theano.tensor as tt\\nimport matplotlib.pyplot as plt\\nimport statsmodels.stats.api as sms\\nimport altair as alt\";\n", " var nbb_cells = Jupyter.notebook.get_cells();\n", " for (var i = 0; i < nbb_cells.length; ++i) {\n", " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", " if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n", " nbb_cells[i].set_text(nbb_formatted_code);\n", " }\n", " break;\n", " }\n", " }\n", " }, 500);\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%load_ext nb_black\n", "\n", "import warnings\n", "\n", "warnings.simplefilter(action=\"ignore\", category=FutureWarning)\n", "\n", "import pymc3 as pm\n", "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import arviz as az\n", "import theano\n", "import theano.tensor as tt\n", "import matplotlib.pyplot as plt\n", "import statsmodels.stats.api as sms\n", "import altair as alt" ] }, { "cell_type": "code", "execution_count": 2, "id": "c0a187df", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Week_nrTotalTotal_scaled
1115801615.8016
2221873321.8733
3326802626.8026
4431189531.1895
5535809835.8098
............
1301302308213230.8213
1311312332833233.2833
1321322346868234.6868
1331332359112235.9112
1341342368328236.8328
\n", "

134 rows × 3 columns

\n", "
" ], "text/plain": [ " Week_nr Total Total_scaled\n", "1 1 158016 15.8016\n", "2 2 218733 21.8733\n", "3 3 268026 26.8026\n", "4 4 311895 31.1895\n", "5 5 358098 35.8098\n", ".. ... ... ...\n", "130 130 2308213 230.8213\n", "131 131 2332833 233.2833\n", "132 132 2346868 234.6868\n", "133 133 2359112 235.9112\n", "134 134 2368328 236.8328\n", "\n", "[134 rows x 3 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "application/javascript": [ "\n", " setTimeout(function() {\n", " var nbb_cell_id = 2;\n", " var nbb_unformatted_code = \"data = pd.read_csv(\\\"./data/archon_arcana_weekly_20210619.csv\\\", thousands=\\\",\\\")\\ndata[\\\"Week_nr\\\"] = data.index\\ndata = data[data.Week_nr > 0] # ignore first line which is day 3, start with week 1\\nmodel_data = data[[\\\"Week_nr\\\", \\\"Total\\\"]].copy()\\nmodel_data[\\\"Total_scaled\\\"] = (\\n model_data[\\\"Total\\\"] / 10000\\n) # PyMC3 doesn't seem to like very large numbers, so lets bring them down a few orders of magnitude\\nmodel_data\";\n", " var nbb_formatted_code = \"data = pd.read_csv(\\\"./data/archon_arcana_weekly_20210619.csv\\\", thousands=\\\",\\\")\\ndata[\\\"Week_nr\\\"] = data.index\\ndata = data[data.Week_nr > 0] # ignore first line which is day 3, start with week 1\\nmodel_data = data[[\\\"Week_nr\\\", \\\"Total\\\"]].copy()\\nmodel_data[\\\"Total_scaled\\\"] = (\\n model_data[\\\"Total\\\"] / 10000\\n) # PyMC3 doesn't seem to like very large numbers, so lets bring them down a few orders of magnitude\\nmodel_data\";\n", " var nbb_cells = Jupyter.notebook.get_cells();\n", " for (var i = 0; i < nbb_cells.length; ++i) {\n", " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", " if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n", " nbb_cells[i].set_text(nbb_formatted_code);\n", " }\n", " break;\n", " }\n", " }\n", " }, 500);\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data = pd.read_csv(\"./data/archon_arcana_weekly_20210619.csv\", thousands=\",\")\n", "data[\"Week_nr\"] = data.index\n", "data = data[data.Week_nr > 0] # ignore first line which is day 3, start with week 1\n", "model_data = data[[\"Week_nr\", \"Total\"]].copy()\n", "model_data[\"Total_scaled\"] = (\n", " model_data[\"Total\"] / 10000\n", ") # PyMC3 doesn't seem to like very large numbers, so lets bring them down a few orders of magnitude\n", "model_data" ] }, { "cell_type": "markdown", "id": "105ba65d", "metadata": {}, "source": [ "## What if COVID-19 didn't happen, how many decks would have been sold\n", "\n", "We'll take model 5 from the previous notebook and add a twist, rather than including the start of COVID-19 in the model, we'll make it a fixed value using ```pm.Data``` this will allow us to change the value later on (to something in the future) so we can create a model that would behave as if the pandemic didn't happen." ] }, { "cell_type": "code", "execution_count": 4, "id": "cd1b2f19", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Auto-assigning NUTS sampler...\n", "Initializing NUTS using adapt_diag...\n", "Multiprocess sampling (4 chains in 4 jobs)\n", "NUTS: [dt_interest, mm_interest, wc_interest, aoa_interest, cota_interest, decay_factor, weekly_registrations_precovid, weekly_registrations_covid, sigma]\n" ] }, { "data": { "text/html": [ "\n", "
\n", " \n", " \n", " 100.00% [8000/8000 18:32<00:00 Sampling 4 chains, 420 divergences]\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1135 seconds.\n", "There were 11 divergences after tuning. Increase `target_accept` or reparameterize.\n", "There were 21 divergences after tuning. Increase `target_accept` or reparameterize.\n", "There were 385 divergences after tuning. Increase `target_accept` or reparameterize.\n", "The acceptance probability does not match the target. It is 0.6861368264329725, but should be close to 0.8. Try to increase the number of tuning steps.\n", "There were 3 divergences after tuning. Increase `target_accept` or reparameterize.\n", "The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.\n", "The estimated number of effective samples is smaller than 200 for some parameters.\n" ] }, { "data": { "application/javascript": [ "\n", " setTimeout(function() {\n", " var nbb_cell_id = 4;\n", " var nbb_unformatted_code = \"len_observed = len(model_data)\\nwith pm.Model() as model_5:\\n # priors\\n sigma = pm.Exponential(\\\"sigma\\\", lam=1.0) # Sigma for likelihood function\\n\\n # here we set the start of covid manually (based on previous notebook)\\n # this way it can be changed after fitting the model\\n covid_start = pm.Data(\\\"covid_start\\\", 68)\\n\\n # We know from the previous that mu before and during covid should be around 0.8 and 3.0 respectively\\n # Sigma is reduced here not to diverge to far from these values\\n BoundNormal_0 = pm.Bound(pm.Normal, lower=0)\\n weekly_registrations_covid = BoundNormal_0(\\n \\\"weekly_registrations_covid\\\", mu=0.8, sigma=0.5\\n )\\n weekly_registrations_precovid = BoundNormal_0(\\n \\\"weekly_registrations_precovid\\\", mu=3, sigma=0.5\\n )\\n\\n weekly_registrations_base = pm.math.switch(\\n covid_start >= model_data.Week_nr,\\n weekly_registrations_precovid,\\n weekly_registrations_covid,\\n )\\n\\n # Model extra registrations due to shifting interest (like new sets being released)\\n # The interest factor is calculated on a weekly basis\\n decay_factor = pm.Exponential(\\\"decay_factor\\\", lam=1.0)\\n\\n cota_interest = pm.HalfNormal(\\\"cota_interest\\\", sigma=2)\\n aoa_interest = pm.HalfNormal(\\\"aoa_interest\\\", sigma=2)\\n wc_interest = pm.HalfNormal(\\\"wc_interest\\\", sigma=2)\\n mm_interest = pm.HalfNormal(\\\"mm_interest\\\", sigma=2)\\n dt_interest = pm.HalfNormal(\\\"dt_interest\\\", sigma=2)\\n\\n cota_surplus = pm.Deterministic(\\n \\\"cota_surplus\\\", cota_interest * weekly_registrations_base[0]\\n )\\n aoa_surplus = pm.Deterministic(\\n \\\"aoa_surplus\\\", aoa_interest * weekly_registrations_base[27]\\n )\\n wc_surplus = pm.Deterministic(\\n \\\"wc_surplus\\\", wc_interest * weekly_registrations_base[50]\\n )\\n mm_surplus = pm.Deterministic(\\n \\\"mm_surplus\\\", mm_interest * weekly_registrations_base[85]\\n )\\n dt_surplus = pm.Deterministic(\\n \\\"dt_surplus\\\", dt_interest * weekly_registrations_base[126]\\n )\\n\\n interest_decayed = [cota_interest]\\n\\n for i in range(len_observed - 1):\\n new_element = interest_decayed[i] * decay_factor\\n if i == 27:\\n new_element += aoa_interest\\n if i == 50:\\n new_element += wc_interest\\n if i == 85:\\n new_element += mm_interest\\n if i == 126:\\n new_element += dt_interest\\n interest_decayed.append(new_element)\\n\\n # there were 150k decks registered the first week, that is the initial value\\n y0 = tt.zeros(len_observed)\\n y0 = tt.set_subtensor(y0[0], 15)\\n\\n outputs, _ = theano.scan(\\n fn=lambda t, y, ws, intfac: tt.set_subtensor(\\n y[t], (ws[t] * (1 + intfac[t])) + y[t - 1]\\n ),\\n sequences=[tt.arange(1, len_observed)],\\n outputs_info=y0,\\n non_sequences=[weekly_registrations_base, interest_decayed],\\n n_steps=len_observed - 1,\\n )\\n\\n total_registrations = pm.Deterministic(\\\"total_registrations\\\", outputs[-1])\\n\\n # Likelihood\\n likelihood = pm.Normal(\\n \\\"y\\\", mu=total_registrations, sigma=sigma, observed=model_data.Total_scaled\\n )\\n\\n # posterior\\n trace_5 = pm.sample(1000, cores=4, chains=4, init=\\\"adapt_diag\\\")\";\n", " var nbb_formatted_code = \"len_observed = len(model_data)\\nwith pm.Model() as model_5:\\n # priors\\n sigma = pm.Exponential(\\\"sigma\\\", lam=1.0) # Sigma for likelihood function\\n\\n # here we set the start of covid manually (based on previous notebook)\\n # this way it can be changed after fitting the model\\n covid_start = pm.Data(\\\"covid_start\\\", 68)\\n\\n # We know from the previous that mu before and during covid should be around 0.8 and 3.0 respectively\\n # Sigma is reduced here not to diverge to far from these values\\n BoundNormal_0 = pm.Bound(pm.Normal, lower=0)\\n weekly_registrations_covid = BoundNormal_0(\\n \\\"weekly_registrations_covid\\\", mu=0.8, sigma=0.5\\n )\\n weekly_registrations_precovid = BoundNormal_0(\\n \\\"weekly_registrations_precovid\\\", mu=3, sigma=0.5\\n )\\n\\n weekly_registrations_base = pm.math.switch(\\n covid_start >= model_data.Week_nr,\\n weekly_registrations_precovid,\\n weekly_registrations_covid,\\n )\\n\\n # Model extra registrations due to shifting interest (like new sets being released)\\n # The interest factor is calculated on a weekly basis\\n decay_factor = pm.Exponential(\\\"decay_factor\\\", lam=1.0)\\n\\n cota_interest = pm.HalfNormal(\\\"cota_interest\\\", sigma=2)\\n aoa_interest = pm.HalfNormal(\\\"aoa_interest\\\", sigma=2)\\n wc_interest = pm.HalfNormal(\\\"wc_interest\\\", sigma=2)\\n mm_interest = pm.HalfNormal(\\\"mm_interest\\\", sigma=2)\\n dt_interest = pm.HalfNormal(\\\"dt_interest\\\", sigma=2)\\n\\n cota_surplus = pm.Deterministic(\\n \\\"cota_surplus\\\", cota_interest * weekly_registrations_base[0]\\n )\\n aoa_surplus = pm.Deterministic(\\n \\\"aoa_surplus\\\", aoa_interest * weekly_registrations_base[27]\\n )\\n wc_surplus = pm.Deterministic(\\n \\\"wc_surplus\\\", wc_interest * weekly_registrations_base[50]\\n )\\n mm_surplus = pm.Deterministic(\\n \\\"mm_surplus\\\", mm_interest * weekly_registrations_base[85]\\n )\\n dt_surplus = pm.Deterministic(\\n \\\"dt_surplus\\\", dt_interest * weekly_registrations_base[126]\\n )\\n\\n interest_decayed = [cota_interest]\\n\\n for i in range(len_observed - 1):\\n new_element = interest_decayed[i] * decay_factor\\n if i == 27:\\n new_element += aoa_interest\\n if i == 50:\\n new_element += wc_interest\\n if i == 85:\\n new_element += mm_interest\\n if i == 126:\\n new_element += dt_interest\\n interest_decayed.append(new_element)\\n\\n # there were 150k decks registered the first week, that is the initial value\\n y0 = tt.zeros(len_observed)\\n y0 = tt.set_subtensor(y0[0], 15)\\n\\n outputs, _ = theano.scan(\\n fn=lambda t, y, ws, intfac: tt.set_subtensor(\\n y[t], (ws[t] * (1 + intfac[t])) + y[t - 1]\\n ),\\n sequences=[tt.arange(1, len_observed)],\\n outputs_info=y0,\\n non_sequences=[weekly_registrations_base, interest_decayed],\\n n_steps=len_observed - 1,\\n )\\n\\n total_registrations = pm.Deterministic(\\\"total_registrations\\\", outputs[-1])\\n\\n # Likelihood\\n likelihood = pm.Normal(\\n \\\"y\\\", mu=total_registrations, sigma=sigma, observed=model_data.Total_scaled\\n )\\n\\n # posterior\\n trace_5 = pm.sample(1000, cores=4, chains=4, init=\\\"adapt_diag\\\")\";\n", " var nbb_cells = Jupyter.notebook.get_cells();\n", " for (var i = 0; i < nbb_cells.length; ++i) {\n", " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", " if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n", " nbb_cells[i].set_text(nbb_formatted_code);\n", " }\n", " break;\n", " }\n", " }\n", " }, 500);\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "len_observed = len(model_data)\n", "with pm.Model() as model_5:\n", " # priors\n", " sigma = pm.Exponential(\"sigma\", lam=1.0) # Sigma for likelihood function\n", "\n", " # here we set the start of covid manually (based on previous notebook)\n", " # this way it can be changed after fitting the model\n", " covid_start = pm.Data(\"covid_start\", 68)\n", "\n", " # We know from the previous that mu before and during covid should be around 0.8 and 3.0 respectively\n", " # Sigma is reduced here not to diverge to far from these values\n", " BoundNormal_0 = pm.Bound(pm.Normal, lower=0)\n", " weekly_registrations_covid = BoundNormal_0(\n", " \"weekly_registrations_covid\", mu=0.8, sigma=0.5\n", " )\n", " weekly_registrations_precovid = BoundNormal_0(\n", " \"weekly_registrations_precovid\", mu=3, sigma=0.5\n", " )\n", "\n", " weekly_registrations_base = pm.math.switch(\n", " covid_start >= model_data.Week_nr,\n", " weekly_registrations_precovid,\n", " weekly_registrations_covid,\n", " )\n", "\n", " # Model extra registrations due to shifting interest (like new sets being released)\n", " # The interest factor is calculated on a weekly basis\n", " decay_factor = pm.Exponential(\"decay_factor\", lam=1.0)\n", "\n", " cota_interest = pm.HalfNormal(\"cota_interest\", sigma=2)\n", " aoa_interest = pm.HalfNormal(\"aoa_interest\", sigma=2)\n", " wc_interest = pm.HalfNormal(\"wc_interest\", sigma=2)\n", " mm_interest = pm.HalfNormal(\"mm_interest\", sigma=2)\n", " dt_interest = pm.HalfNormal(\"dt_interest\", sigma=2)\n", "\n", " cota_surplus = pm.Deterministic(\n", " \"cota_surplus\", cota_interest * weekly_registrations_base[0]\n", " )\n", " aoa_surplus = pm.Deterministic(\n", " \"aoa_surplus\", aoa_interest * weekly_registrations_base[27]\n", " )\n", " wc_surplus = pm.Deterministic(\n", " \"wc_surplus\", wc_interest * weekly_registrations_base[50]\n", " )\n", " mm_surplus = pm.Deterministic(\n", " \"mm_surplus\", mm_interest * weekly_registrations_base[85]\n", " )\n", " dt_surplus = pm.Deterministic(\n", " \"dt_surplus\", dt_interest * weekly_registrations_base[126]\n", " )\n", "\n", " interest_decayed = [cota_interest]\n", "\n", " for i in range(len_observed - 1):\n", " new_element = interest_decayed[i] * decay_factor\n", " if i == 27:\n", " new_element += aoa_interest\n", " if i == 50:\n", " new_element += wc_interest\n", " if i == 85:\n", " new_element += mm_interest\n", " if i == 126:\n", " new_element += dt_interest\n", " interest_decayed.append(new_element)\n", "\n", " # there were 150k decks registered the first week, that is the initial value\n", " y0 = tt.zeros(len_observed)\n", " y0 = tt.set_subtensor(y0[0], 15)\n", "\n", " outputs, _ = theano.scan(\n", " fn=lambda t, y, ws, intfac: tt.set_subtensor(\n", " y[t], (ws[t] * (1 + intfac[t])) + y[t - 1]\n", " ),\n", " sequences=[tt.arange(1, len_observed)],\n", " outputs_info=y0,\n", " non_sequences=[weekly_registrations_base, interest_decayed],\n", " n_steps=len_observed - 1,\n", " )\n", "\n", " total_registrations = pm.Deterministic(\"total_registrations\", outputs[-1])\n", "\n", " # Likelihood\n", " likelihood = pm.Normal(\n", " \"y\", mu=total_registrations, sigma=sigma, observed=model_data.Total_scaled\n", " )\n", "\n", " # posterior\n", " trace_5 = pm.sample(1000, cores=4, chains=4, init=\"adapt_diag\")" ] }, { "cell_type": "code", "execution_count": 52, "id": "e25a70b8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "output __str__ = [ 15. 17.10610287 18.87279496 20.40422542 21.77258495\n", " 23.02791231 24.20489176 25.32756456 26.41259487 27.47153339\n", " 28.51238646 29.54070365 30.56033163 31.57393671 32.58336703\n", " 33.58990363 34.59443446 35.59757499 36.59975184 37.60126071\n", " 38.60230659 39.60303153 40.60353403 41.60388233 42.60412375\n", " 43.60429109 44.60440709 45.60448749 48.20031234 50.30645384\n", " 52.0731727 53.60462172 54.97299411 56.2283304 57.40531602\n", " 58.52799311 59.61302639 60.67196697 61.71282146 62.74113964\n", " 63.7607683 64.77437386 65.78380451 66.79034134 67.79487232\n", " 68.79801296 69.80018989 70.80169882 71.80274473 72.8034697\n", " 73.80397221 76.40008965 78.50643395 80.27329339 81.80483984\n", " 83.17327977 84.42866287 85.60568095 86.72838053 87.8134294\n", " 88.87238078 89.91324277 90.94156614 91.9611984 92.97480645\n", " 93.98423883 94.99077686 95.99530867 96.99844989 98.00062721\n", " 99.00213642 100.00318252 101.00390762 102.00441023 103.0047586\n", " 104.00500008 105.00516746 106.00528348 107.0053639 108.00541964\n", " 109.00545828 110.00548506 111.00550362 112.00551649 113.00552541\n", " 114.00553159 116.60130499 118.70741083 120.47410498 122.00553687\n", " 123.37389739 124.62922544 125.80620536 126.92887849 128.01390903\n", " 129.07284771 130.11370089 131.14201815 132.16164618 133.1752513\n", " 134.18468165 135.19121826 136.1957491 137.19888964 138.2010665\n", " 139.20257538 140.20362125 141.2043462 142.2048487 143.205197\n", " 144.20543842 145.20560577 146.20572176 147.20580216 148.20585789\n", " 149.20589652 150.20592329 151.20594185 152.20595472 153.20596363\n", " 154.20596981 155.2059741 156.20597707 157.20597913 158.20598055\n", " 159.20598154 160.20598223 162.80175182 164.90785502 166.67454734\n", " 168.20597796 169.5743376 170.82966504 172.00664454]\n" ] }, { "data": { "text/plain": [ "Print{message='output', attrs=('__str__',), global_fn=}.0" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "application/javascript": [ "\n", " setTimeout(function() {\n", " var nbb_cell_id = 52;\n", " var nbb_unformatted_code = \"theano.printing.Print(\\\"output\\\")(outputs[-1])\";\n", " var nbb_formatted_code = \"theano.printing.Print(\\\"output\\\")(outputs[-1])\";\n", " var nbb_cells = Jupyter.notebook.get_cells();\n", " for (var i = 0; i < nbb_cells.length; ++i) {\n", " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", " if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n", " nbb_cells[i].set_text(nbb_formatted_code);\n", " }\n", " break;\n", " }\n", " }\n", " }, 500);\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "theano.printing.Print(\"output\")(outputs[-1])" ] }, { "cell_type": "code", "execution_count": 5, "id": "94cf34c7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " \n", " 100.00% [4000/4000 01:54<00:00]\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "application/javascript": [ "\n", " setTimeout(function() {\n", " var nbb_cell_id = 5;\n", " var nbb_unformatted_code = \"from pymc3_altair import plot_fit_altair\\n\\nwith model_5:\\n pm.set_data({\\\"covid_start\\\": 68})\\n\\nchart_with_covid = plot_fit_altair(model_5, trace_5, model_data)\\nchart_with_covid\";\n", " var nbb_formatted_code = \"from pymc3_altair import plot_fit_altair\\n\\nwith model_5:\\n pm.set_data({\\\"covid_start\\\": 68})\\n\\nchart_with_covid = plot_fit_altair(model_5, trace_5, model_data)\\nchart_with_covid\";\n", " var nbb_cells = Jupyter.notebook.get_cells();\n", " for (var i = 0; i < nbb_cells.length; ++i) {\n", " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", " if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n", " nbb_cells[i].set_text(nbb_formatted_code);\n", " }\n", " break;\n", " }\n", " }\n", " }, 500);\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pymc3_altair import plot_fit_altair\n", "\n", "with model_5:\n", " pm.set_data({\"covid_start\": 68})\n", "\n", "chart_with_covid = plot_fit_altair(model_5, trace_5, model_data)\n", "chart_with_covid" ] }, { "cell_type": "code", "execution_count": 6, "id": "e42e2524", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " \n", " 100.00% [4000/4000 01:51<00:00]\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "application/javascript": [ "\n", " setTimeout(function() {\n", " var nbb_cell_id = 6;\n", " var nbb_unformatted_code = \"with model_5:\\n pm.set_data({\\\"covid_start\\\": 500})\\n\\nchart_without_covid = plot_fit_altair(model_5, trace_5, model_data)\\nchart_without_covid\";\n", " var nbb_formatted_code = \"with model_5:\\n pm.set_data({\\\"covid_start\\\": 500})\\n\\nchart_without_covid = plot_fit_altair(model_5, trace_5, model_data)\\nchart_without_covid\";\n", " var nbb_cells = Jupyter.notebook.get_cells();\n", " for (var i = 0; i < nbb_cells.length; ++i) {\n", " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", " if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n", " nbb_cells[i].set_text(nbb_formatted_code);\n", " }\n", " break;\n", " }\n", " }\n", " }, 500);\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with model_5:\n", " pm.set_data({\"covid_start\": 500})\n", "\n", "chart_without_covid = plot_fit_altair(model_5, trace_5, model_data)\n", "chart_without_covid" ] }, { "cell_type": "code", "execution_count": 7, "id": "987db584", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.HConcatChart(...)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "application/javascript": [ "\n", " setTimeout(function() {\n", " var nbb_cell_id = 7;\n", " var nbb_unformatted_code = \"combined = chart_with_covid.properties(\\n title=\\\"Final model with COVID-19\\\"\\n) | chart_without_covid.properties(title=\\\"Final model without COVID-19\\\")\\ncombined.save(\\\"./altair_output/covid_start_data.json\\\")\\ncombined\";\n", " var nbb_formatted_code = \"combined = chart_with_covid.properties(\\n title=\\\"Final model with COVID-19\\\"\\n) | chart_without_covid.properties(title=\\\"Final model without COVID-19\\\")\\ncombined.save(\\\"./altair_output/covid_start_data.json\\\")\\ncombined\";\n", " var nbb_cells = Jupyter.notebook.get_cells();\n", " for (var i = 0; i < nbb_cells.length; ++i) {\n", " if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n", " if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n", " nbb_cells[i].set_text(nbb_formatted_code);\n", " }\n", " break;\n", " }\n", " }\n", " }, 500);\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "combined = chart_with_covid.properties(\n", " title=\"Final model with COVID-19\"\n", ") | chart_without_covid.properties(title=\"Final model without COVID-19\")\n", "combined.save(\"./altair_output/covid_start_data.json\")\n", "combined" ] }, { "cell_type": "markdown", "id": "68bac03b", "metadata": {}, "source": [ "This is by far the best way to change a model, but it doesn't offer much flexibility. In this case it is very likely that releasing a new set during COVID will have a very different effect on sales than under normal circumstances. Hence the impact of the last two sets might be exagerated. This would be an optimistic model, though it would be nice to be able to model what would happen in case Mass Mutation and Dark Tidings performed as Worlds Collide (which had the lowest impact on sales). This requires another approach (I think let me know if there is an elegant solution to this).\n", "\n", "Another method is to create a new model, with priors based on the results from the previous one, excluding the COVID-19 factor. Rather than tracing it to get posteriors, we'll use the ```prior_predictive()``` to run the model with these parameters to see what comes out at the end. This allows for some additional flexibility the latter option didn't provide\n", "\n", "We'll use two versions here, an optimistic one, where the interest in Mass Mutation (mm_interest) and Dark Tidings (dt_interest) from model_5 is used. However, as these were released during COVID, it might not be an accurate representation how they would have impacted sales during normal times. Therefore a pessimistic model, where MM and DT performed the same as the worst previous set (Worlds Collide) is build as well." ] }, { "cell_type": "code", "execution_count": null, "id": "3c35a833", "metadata": {}, "outputs": [], "source": [ "with pm.Model() as model_6:\n", " # priors\n", " sigma = pm.Normal(\"sigma\", mu=1.17, sigma=0.07) # Sigma for likelihood function\n", "\n", " weekly_registrations = pm.Normal(\"weekly_registrations\", mu=1.55, sigma=0.06)\n", "\n", " # Model extra registrations due to shifting interest (like new sets being released)\n", " # The interest factor is calculated on a weekly basis\n", " decay_factor = pm.Normal(\"decay_factor\", mu=0.84, sigma=0.01)\n", "\n", " cota_interest = pm.Normal(\"cota_interest\", mu=4.11, sigma=0.19)\n", " aoa_interest = pm.Normal(\"aoa_interest\", mu=1.7, sigma=0.16)\n", " wc_interest = pm.Normal(\"wc_interest\", mu=0.95, sigma=0.13)\n", " mm_interest = pm.Normal(\"mm_interest\", mu=3.18, sigma=0.31)\n", " dt_interest = pm.Normal(\"dt_interest\", mu=3.02, sigma=0.35)\n", "\n", " interest_decayed = [cota_interest]\n", "\n", " for i in range(len_observed - 1):\n", " new_element = interest_decayed[i] * decay_factor\n", " if i == 27:\n", " new_element += aoa_interest\n", " if i == 50:\n", " new_element += wc_interest\n", " if i == 85:\n", " new_element += mm_interest\n", " if i == 126:\n", " new_element += dt_interest\n", " interest_decayed.append(new_element)\n", "\n", " # there were 150k decks registered the first week, that is the initial value\n", " y0 = tt.zeros(len_observed)\n", " y0 = tt.set_subtensor(y0[0], 15)\n", "\n", " outputs, _ = theano.scan(\n", " fn=lambda t, y, intfac: tt.set_subtensor(\n", " y[t], (weekly_registrations * (1 + intfac[t])) + y[t - 1]\n", " ),\n", " sequences=[tt.arange(1, len_observed)],\n", " outputs_info=y0,\n", " non_sequences=[interest_decayed],\n", " n_steps=len_observed - 1,\n", " )\n", "\n", " total_registrations = pm.Deterministic(\"total_registrations\", outputs[-1])\n", "\n", " # Likelihood\n", " likelihood = pm.Normal(\n", " \"y\", mu=total_registrations, sigma=sigma, observed=model_data.Total_scaled\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "f3b9ee25", "metadata": {}, "outputs": [], "source": [ "def plot_prior_predictor(m, df):\n", " with m:\n", " prior_predictor = pm.sample_prior_predictive()\n", "\n", " fig, (ax1) = plt.subplots(1, 1, figsize=(7.5, 8))\n", " ax1.plot(prior_predictor[\"y\"].T, color=\"0.5\", alpha=0.1)\n", " ax1.plot(df.Total_scaled, color=\"red\", alpha=1)\n", " ax1.set(\n", " xlabel=\"Weeks (since release)\",\n", " ylabel=\"Registered Decks (x10 000)\",\n", " title=\"Model Fit\",\n", " )\n", "\n", " plt.show()\n", "\n", "\n", "plot_prior_predictor(model_6, model_data)" ] }, { "cell_type": "code", "execution_count": null, "id": "ad31862c", "metadata": {}, "outputs": [], "source": [ "from pymc3_altair import plot_prior_predictor_altair\n", "\n", "\n", "chart_prediction_1 = plot_prior_predictor_altair(model_6, model_data)\n", "chart_prediction_1.properties(width=\"container\").save(\n", " \"./altair_output/prediction_1.json\"\n", ")\n", "chart_prediction_1" ] }, { "cell_type": "code", "execution_count": null, "id": "962abf93", "metadata": {}, "outputs": [], "source": [ "with pm.Model() as model_7:\n", " # priors\n", " sigma = pm.Normal(\"sigma\", mu=1.17, sigma=0.07) # Sigma for likelihood function\n", "\n", " weekly_registrations = pm.Normal(\"weekly_registrations\", mu=1.55, sigma=0.06)\n", "\n", " # Model extra registrations due to shifting interest (like new sets being released)\n", " # The interest factor is calculated on a weekly basis\n", " decay_factor = pm.Normal(\"decay_factor\", mu=0.84, sigma=0.01)\n", "\n", " cota_interest = pm.Normal(\"cota_interest\", mu=4.11, sigma=0.19)\n", " aoa_interest = pm.Normal(\"aoa_interest\", mu=1.7, sigma=0.16)\n", " wc_interest = pm.Normal(\"wc_interest\", mu=0.95, sigma=0.13)\n", " mm_interest = pm.Normal(\n", " \"mm_interest\", mu=0.95, sigma=0.13\n", " ) # Maybe people's buying behaviour responds differently to\n", " dt_interest = pm.Normal(\n", " \"dt_interest\", mu=0.95, sigma=0.13\n", " ) # the release of a new set, worst case scenario is setting these very low\n", "\n", " interest_decayed = [cota_interest]\n", "\n", " for i in range(len_observed - 1):\n", " new_element = interest_decayed[i] * decay_factor\n", " if i == 27:\n", " new_element += aoa_interest\n", " if i == 50:\n", " new_element += wc_interest\n", " if i == 85:\n", " new_element += mm_interest\n", " if i == 126:\n", " new_element += dt_interest\n", " interest_decayed.append(new_element)\n", "\n", " # there were 150k decks registered the first week, that is the initial value\n", " y0 = tt.zeros(len_observed)\n", " y0 = tt.set_subtensor(y0[0], 15)\n", "\n", " outputs, _ = theano.scan(\n", " fn=lambda t, y, intfac: tt.set_subtensor(\n", " y[t], (weekly_registrations * (1 + intfac[t])) + y[t - 1]\n", " ),\n", " sequences=[tt.arange(1, len_observed)],\n", " outputs_info=y0,\n", " non_sequences=[interest_decayed],\n", " n_steps=len_observed - 1,\n", " )\n", "\n", " total_registrations = pm.Deterministic(\"total_registrations\", outputs[-1])\n", "\n", " # Likelihood\n", " likelihood = pm.Normal(\n", " \"y\", mu=total_registrations, sigma=sigma, observed=model_data.Total_scaled\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "eedcad89", "metadata": {}, "outputs": [], "source": [ "def plot_prior_predictor_side_by_side(m, m2, df):\n", " with m:\n", " prior_predictor = pm.sample_prior_predictive()\n", " with m2:\n", " prior_predictor2 = pm.sample_prior_predictive()\n", "\n", " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 8), sharey=True)\n", " ax1.plot(prior_predictor[\"y\"].T, color=\"0.5\", alpha=0.1)\n", " ax1.plot(df.Total_scaled, color=\"red\", alpha=1)\n", " ax1.set(\n", " xlabel=\"Weeks (since release)\",\n", " ylabel=\"Registered Decks (x10 000)\",\n", " title=\"No COVID-19 Model A (optimistic)\",\n", " )\n", "\n", " ax2.plot(prior_predictor2[\"y\"].T, color=\"0.5\", alpha=0.1)\n", " ax2.plot(df.Total_scaled, color=\"red\", alpha=1)\n", " ax2.set(\n", " xlabel=\"Weeks (since release)\",\n", " ylabel=\"Registered Decks (x10 000)\",\n", " title=\"No COVID-19 Model B (pessimistic)\",\n", " )\n", "\n", " plt.show()\n", "\n", "\n", "plot_prior_predictor_side_by_side(model_6, model_7, model_data)" ] }, { "cell_type": "code", "execution_count": null, "id": "5736df13", "metadata": {}, "outputs": [], "source": [ "chart_prediction_2 = plot_prior_predictor_altair(model_7, model_data)\n", "chart_prediction_2.properties(width=\"container\").save(\n", " \"./altair_output/prediction_2.json\"\n", ")\n", "chart_prediction_2" ] }, { "cell_type": "code", "execution_count": null, "id": "b423f98f", "metadata": {}, "outputs": [], "source": [ "combined = chart_prediction_1.properties(\n", " title=\"No COVID-19 (Optimistic model)\"\n", ") | chart_prediction_2.properties(title=\"No COVID-19 (Pessimistic model)\")\n", "combined.save(\"./altair_output/combined_predictions.json\")\n", "combined" ] }, { "cell_type": "markdown", "id": "fd3a7d4b", "metadata": {}, "source": [ "So the optimistic model predicts that in a universe without COVID-19, there likely would have been over 3 million registered decks by now. Going as high up as 3.7 million, this would mean there would have been roughly 0.5 to 1.5 million more decks registered over the last 18 or so months.\n", "\n", "The pessimistic model is more modest about those numbers though the total number of registrations would still have been getting very close to 3 million, with roughly 0.5 to 1 million additional decks sold since the first lockdowns due to COVID.\n", "\n", "However, here we start the model from the very start. This isn't necessary as we know exactly how many decks were sold prior to the lockdown. So we don't have to include all the uncertainty before that point when forecasting sales in a COVID-19 free world." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }