{ "cells": [ { "cell_type": "code", "execution_count": 2, "metadata": { "Collapsed": "false" }, "outputs": [], "source": [ "# import some basic libraries\n", "import os\n", "import pandas as pd\n", "import seaborn as sns\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "# 2. Categorical Predictors\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "The syntax for handling categorical predictors is **different** between standard regression models/two-stage-models (i.e. `Lm` and `Lm2`) and multi-level models (`Lmer`) in `pymer4`. This is because formula parsing is passed to R for `Lmer` models, but handled by Python for other models. \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "Lm and Lm2 Models\n", "-----------------\n", "`Lm` and `Lm2` models use [patsy](https://patsy.readthedocs.io/en/latest/>) to parse model formulae. Patsy is very powerful and has built-in support for handling categorical coding schemes by wrapping a predictor in then `C()` *within* the module formula. Patsy can also perform some pre-processing such as scaling and standardization using special functions like `center()`. Here are some examples.\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# import sample data and model\n", "from pymer4.utils import get_resource_path\n", "from pymer4.models import Lm\n", "\n", "# IV3 is a categorical predictors with 3 levels in the sample data\n", "df = pd.read_csv(os.path.join(get_resource_path(), 'sample_data.csv'))" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Dummy-coded/Treatment contrasts" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Formula: DV~C(IV3,levels=[1.0,0.5,1.5])\n", "\n", "Family: gaussian\t Estimator: OLS\n", "\n", "Std-errors: non-robust\tCIs: standard 95%\tInference: parametric \n", "\n", "Number of observations: 564\t R^2: 0.004\t R^2_adj: 0.001\n", "\n", "Log-likelihood: -2728.620 \t AIC: 5463.241\t BIC: 5476.246\n", "\n", "Fixed effects:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Estimate2.5_ci97.5_ciSEDFT-statP-valSig
Intercept42.72138.33447.1082.23356119.1290.000***
C(IV3, levels=[1.0, 0.5, 1.5])[T.0.5]1.463-4.7417.6673.1585610.4630.643
C(IV3, levels=[1.0, 0.5, 1.5])[T.1.5]-3.419-9.6222.7853.158561-1.0820.280
\n", "
" ], "text/plain": [ " Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig\n", "Intercept 42.721 38.334 47.108 2.233 561 19.129 0.000 ***\n", "C(IV3, levels=[1.0, 0.5, 1.5])[T.0.5] 1.463 -4.741 7.667 3.158 561 0.463 0.643 \n", "C(IV3, levels=[1.0, 0.5, 1.5])[T.1.5] -3.419 -9.622 2.785 3.158 561 -1.082 0.280 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Estimate a model using Treatment contrasts (dummy-coding)\n", "# with '1.0' as the reference level\n", "# This is the default of the C() function \n", "model = Lm(\"DV ~ C(IV3, levels=[1.0, 0.5, 1.5])\", data=df)\n", "model.fit()" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Orthogonal Polynomial Contrasts" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Formula: DV~C(IV3,Poly)\n", "\n", "Family: gaussian\t Estimator: OLS\n", "\n", "Std-errors: non-robust\tCIs: standard 95%\tInference: parametric \n", "\n", "Number of observations: 564\t R^2: 0.004\t R^2_adj: 0.001\n", "\n", "Log-likelihood: -2728.620 \t AIC: 5463.241\t BIC: 5476.246\n", "\n", "Fixed effects:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Estimate2.5_ci97.5_ciSEDFT-statP-valSig
Intercept42.06939.53744.6021.28956132.6270.000***
C(IV3, Poly).Linear-3.452-7.8380.9352.233561-1.5460.123
C(IV3, Poly).Quadratic-0.798-5.1853.5882.233561-0.3570.721
\n", "
" ], "text/plain": [ " Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig\n", "Intercept 42.069 39.537 44.602 1.289 561 32.627 0.000 ***\n", "C(IV3, Poly).Linear -3.452 -7.838 0.935 2.233 561 -1.546 0.123 \n", "C(IV3, Poly).Quadratic -0.798 -5.185 3.588 2.233 561 -0.357 0.721 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Patsy can do this using the Poly argument to the \n", "# C() function\n", "model = Lm('DV ~ C(IV3, Poly)', data=df)\n", "model.fit()" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Sum-to-zero contrasts" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Formula: DV~C(IV3,Sum)\n", "\n", "Family: gaussian\t Estimator: OLS\n", "\n", "Std-errors: non-robust\tCIs: standard 95%\tInference: parametric \n", "\n", "Number of observations: 564\t R^2: 0.004\t R^2_adj: 0.001\n", "\n", "Log-likelihood: -2728.620 \t AIC: 5463.241\t BIC: 5476.246\n", "\n", "Fixed effects:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Estimate2.5_ci97.5_ciSEDFT-statP-valSig
Intercept42.06939.53744.6021.28956132.6270.000***
C(IV3, Sum)[S.0.5]2.115-1.4675.6971.8235611.1600.247
C(IV3, Sum)[S.1.0]0.652-2.9304.2341.8235610.3570.721
\n", "
" ], "text/plain": [ " Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig\n", "Intercept 42.069 39.537 44.602 1.289 561 32.627 0.000 ***\n", "C(IV3, Sum)[S.0.5] 2.115 -1.467 5.697 1.823 561 1.160 0.247 \n", "C(IV3, Sum)[S.1.0] 0.652 -2.930 4.234 1.823 561 0.357 0.721 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Similar to before but with the Sum argument\n", "model = Lm('DV ~ C(IV3, Sum)', data=df)\n", "model.fit()" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Scaling/Centering" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Formula: DV~center(IV2)*C(IV3,Sum)\n", "\n", "Family: gaussian\t Estimator: OLS\n", "\n", "Std-errors: non-robust\tCIs: standard 95%\tInference: parametric \n", "\n", "Number of observations: 564\t R^2: 0.511\t R^2_adj: 0.507\n", "\n", "Log-likelihood: -2528.051 \t AIC: 5068.102\t BIC: 5094.113\n", "\n", "Fixed effects:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Estimate2.5_ci97.5_ciSEDFT-statP-valSig
Intercept42.05140.26843.8330.90855846.3290.000***
C(IV3, Sum)[S.0.5]0.580-1.9423.1021.2845580.4520.652
C(IV3, Sum)[S.1.0]0.383-2.1362.9031.2825580.2990.765
center(IV2)0.7460.6850.8070.03155824.0120.000***
center(IV2):C(IV3, Sum)[S.0.5]0.050-0.0370.1370.0445581.1320.258
center(IV2):C(IV3, Sum)[S.1.0]-0.057-0.1440.0290.044558-1.3060.192
\n", "
" ], "text/plain": [ " Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig\n", "Intercept 42.051 40.268 43.833 0.908 558 46.329 0.000 ***\n", "C(IV3, Sum)[S.0.5] 0.580 -1.942 3.102 1.284 558 0.452 0.652 \n", "C(IV3, Sum)[S.1.0] 0.383 -2.136 2.903 1.282 558 0.299 0.765 \n", "center(IV2) 0.746 0.685 0.807 0.031 558 24.012 0.000 ***\n", "center(IV2):C(IV3, Sum)[S.0.5] 0.050 -0.037 0.137 0.044 558 1.132 0.258 \n", "center(IV2):C(IV3, Sum)[S.1.0] -0.057 -0.144 0.029 0.044 558 -1.306 0.192 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Moderation with IV2, but centering IV2 first\n", "model = Lm('DV ~ center(IV2) * C(IV3, Sum)', data=df)\n", "model.fit()" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "Please refer to the [patsy documentation](https://patsy.readthedocs.io/en/latest/categorical-coding.html) for more details when working categorical predictors in `Lm` or `Lm2` models.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "## Lmer Models\n", "\n", "`Lmer()` models currently have support for handling categorical predictors in one of three ways based on how R's `factor()` works (see the note at the end of this tutorial):\n", "\n", "- Dummy-coded factor levels (treatment contrasts) in which each model term is the difference between a factor level and a selected reference level\n", "- Orthogonal polynomial contrasts in which each model term is a polynomial contrast across factor levels (e.g. linear, quadratic, cubic, etc)\n", "- Custom contrasts for each level of a factor, which should be provided in the manner expected by R.\n", "\n", "To make re-parameterizing models easier, factor codings are passed as a dictionary to the `factors` argument of a model's `.fit()`. This obviates the need for adjusting data-frame properties as in R. Note that this is **different** from `Lm` and `Lm2` models above which expect factor codings in their formulae (because patsy does). \n", "\n", "Each of these ways also enables you to easily compute post-hoc comparisons between factor levels, as well as interactions between continuous predictors and each factor level. See tutorial 3 for more on post-hoc tests.\n", "\n", "## \\*Compatibility Note\\*:\n", "Due to recent updates in `rpy2` and `pandas`, both of which are dependencies of `pymer4`, the contrast syntax shown below will raise errors and fail to run. But for the time being you can use `rpy2==3.1.0` and `pandas>0.23<1.0` on your own system to ensure compatiblity. Thanks for your patience while I implement the necessary fixes!\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "from pymer4.models import Lmer\n", "# We're going to fit a multi-level logistic regression using the \n", "# dichotomous DV_l variable and the same categorical predictor (IV3)\n", "# as before\n", "model = Lmer('DV_l ~ IV3 + (IV3|Group)', data=df, family='binomial')" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Dummy-coding factors\n", "\n", "First we'll use dummy-coding/treatment contrasts with 1.0 as the reference level. This will compute two coefficients: 0.5 > 1.0 and 1.5 > 1.0. \n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "boundary (singular) fit: see ?isSingular \n", "\n", "Formula: DV_l~IV3+(IV3|Group)\n", "\n", "Family: binomial\t Inference: parametric\n", "\n", "Number of observations: 564\t Groups: {'Group': 47.0}\n", "\n", "Log-likelihood: -389.003 \t AIC: 796.006\n", "\n", "Random effects:\n", "\n", " Name Var Std\n", "Group (Intercept) 0.022 0.148\n", "Group IV30.5 0.060 0.246\n", "Group IV31.5 0.038 0.196\n", "\n", " IV1 IV2 Corr\n", "Group (Intercept) IV30.5 -1.0\n", "Group (Intercept) IV31.5 -1.0\n", "Group IV30.5 IV31.5 1.0\n", "\n", "Fixed effects:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Estimate2.5_ci97.5_ciSEOROR_2.5_ciOR_97.5_ciProbProb_2.5_ciProb_97.5_ciZ-statP-valSig
(Intercept)-0.129-0.4190.1620.1480.8790.6581.1760.4680.3970.540-0.8670.386
IV30.50.129-0.2830.5400.2101.1370.7531.7160.5320.4300.6320.6120.541
IV31.5-0.128-0.5390.2830.2100.8800.5831.3270.4680.3680.570-0.6120.541
\n", "
" ], "text/plain": [ " Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig\n", "(Intercept) -0.129 -0.419 0.162 0.148 0.879 0.658 1.176 0.468 0.397 0.540 -0.867 0.386 \n", "IV30.5 0.129 -0.283 0.540 0.210 1.137 0.753 1.716 0.532 0.430 0.632 0.612 0.541 \n", "IV31.5 -0.128 -0.539 0.283 0.210 0.880 0.583 1.327 0.468 0.368 0.570 -0.612 0.541 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# See compatiblity note!\n", "model.fit(factors={\n", " 'IV3': ['1.0', '0.5', '1.5']\n", "})" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Polynomial contrast coding\n", "\n", "Second we'll use orthogonal polynomial contrasts. This is accomplished using the `ordered=True` argument and specifying the order of the *linear* contrast in increasing order. R will automatically compute higher order polynomial contrats that are orthogonal to this linear contrast. In this example, since there are 3 factor levels this will result in two polynomial terms: a linear contrast we specify below corresponding to 0.5 < 1.0 < 1.5 and an orthogonal quadratic contrast automatically determined by R, corresponding to 0.5 > 1 < 1.5\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "boundary (singular) fit: see ?isSingular \n", "\n", "boundary (singular) fit: see ?isSingular \n", "\n", "Formula: DV_l~IV3+(IV3|Group)\n", "\n", "Family: binomial\t Inference: parametric\n", "\n", "Number of observations: 564\t Groups: {'Group': 47.0}\n", "\n", "Log-likelihood: -389.003 \t AIC: 796.006\n", "\n", "Random effects:\n", "\n", " Name Var Std\n", "Group (Intercept) 0.000 0.000\n", "Group IV3.L 0.001 0.035\n", "Group IV3.Q 0.032 0.180\n", "\n", " IV1 IV2 Corr\n", "Group (Intercept) IV3.L NaN\n", "Group (Intercept) IV3.Q NaN\n", "Group IV3.L IV3.Q -1.0\n", "\n", "Fixed effects:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Estimate2.5_ci97.5_ciSEOROR_2.5_ciOR_97.5_ciProbProb_2.5_ciProb_97.5_ciZ-statP-valSig
(Intercept)-0.128-0.2940.0370.0850.8790.7451.0380.4680.4270.509-1.5180.129
IV3.L-0.182-0.4690.1060.1470.8340.6261.1120.4550.3850.526-1.2380.216
IV3.Q0.000-0.2920.2920.1491.0000.7471.3390.5000.4280.5720.0011.000
\n", "
" ], "text/plain": [ " Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig\n", "(Intercept) -0.128 -0.294 0.037 0.085 0.879 0.745 1.038 0.468 0.427 0.509 -1.518 0.129 \n", "IV3.L -0.182 -0.469 0.106 0.147 0.834 0.626 1.112 0.455 0.385 0.526 -1.238 0.216 \n", "IV3.Q 0.000 -0.292 0.292 0.149 1.000 0.747 1.339 0.500 0.428 0.572 0.001 1.000 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# See compatiblity note!\n", "model.fit(factors={\n", " 'IV3': ['0.5', '1.0', '1.5']},\n", " ordered=True\n", ")" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### Custom contrasts\n", "\n", "`Lmer` models can also take custom factor contrasts based on how they are expected by R (see the note at the end of this tutorial for how contrasts work in R). Remember that there can be at most k-1 model terms representing any k level factor without over-parameterizing a model. If you specify a custom contrast, R will generate set of orthogonal contrasts for the rest of your model terms. \n", "\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "boundary (singular) fit: see ?isSingular \n", "\n", "boundary (singular) fit: see ?isSingular \n", "\n", "boundary (singular) fit: see ?isSingular \n", "\n", "boundary (singular) fit: see ?isSingular \n", "\n", "boundary (singular) fit: see ?isSingular \n", "\n", "Formula: DV_l~IV3+(IV3|Group)\n", "\n", "Family: binomial\t Inference: parametric\n", "\n", "Number of observations: 564\t Groups: {'Group': 47.0}\n", "\n", "Log-likelihood: -389.003 \t AIC: 796.006\n", "\n", "Random effects:\n", "\n", " Name Var Std\n", "Group (Intercept) 0.022 0.148\n", "Group IV30.5 0.060 0.246\n", "Group IV31.5 0.038 0.196\n", "\n", " IV1 IV2 Corr\n", "Group (Intercept) IV30.5 -1.0\n", "Group (Intercept) IV31.5 -1.0\n", "Group IV30.5 IV31.5 1.0\n", "\n", "Fixed effects:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Estimate2.5_ci97.5_ciSEOROR_2.5_ciOR_97.5_ciProbProb_2.5_ciProb_97.5_ciZ-statP-valSig
(Intercept)-0.129-0.4190.1620.1480.8790.6581.1760.4680.3970.540-0.8670.386
IV30.50.129-0.2830.5400.2101.1370.7531.7160.5320.4300.6320.6120.541
IV31.5-0.128-0.5390.2830.2100.8800.5831.3270.4680.3680.570-0.6120.541
\n", "
" ], "text/plain": [ " Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig\n", "(Intercept) -0.129 -0.419 0.162 0.148 0.879 0.658 1.176 0.468 0.397 0.540 -0.867 0.386 \n", "IV30.5 0.129 -0.283 0.540 0.210 1.137 0.753 1.716 0.532 0.430 0.632 0.612 0.541 \n", "IV31.5 -0.128 -0.539 0.283 0.210 0.880 0.583 1.327 0.468 0.368 0.570 -0.612 0.541 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# See compatiblity note!\n", "\n", "# Compare level '1.0' to the mean of levels '0.5' and '1.5'\n", "# and let R determine the second contrast orthogonal to it\n", "\n", "model.fit(factors={\n", " 'IV3': {'1.0': 1, '0.5': -.5, '1.5': -.5}\n", "})" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### User-created contrasts (without R)\n", "\n", "Another option available to you is fitting a model with *only* your desired contrast(s) rather than a full set of k-1 contrasts. Contrary to how statistics is usually taught, you don't ever *have to* include a full set of k-1 contrasts for a k level factor! The upside to doing this is that you won't need to rely on R to compute anything for you (aside from the model fit), and you will have a model with exactly the number of terms as contrasts you desire, giving you complete control. The downside is that post-hoc tests will no longer be available (see tutorial 3 for more information on post-hoc tests), but it's unlikely you're doing post-hoc tests if you are computing a subset of specific contrasts anyway. This is also a useful approach if you don't want to use patsy's formula syntax with `Lm` and `Lm2` as noted above.\n", "\n", "This can be accomplished by creating new columns in your dataframe to test specific hypotheses and is trivial to do with pandas [map](https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.Series.map.html) and [assign](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html/) methods. For example, here we manually compute a linear contrast by creating a new column in our dataframe and treating it as a continuous variable.\n", "\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GroupIV1DV_lDVIV2IV3IV3_custom_lin
0120.007.9365084.5634920.5-1
1120.0015.2777780.0000001.00
2120.010.0000000.0000001.51
3120.019.5238100.0000000.5-1
4112.500.0000000.0000001.00
\n", "
" ], "text/plain": [ " Group IV1 DV_l DV IV2 IV3 IV3_custom_lin\n", "0 1 20.0 0 7.936508 4.563492 0.5 -1\n", "1 1 20.0 0 15.277778 0.000000 1.0 0\n", "2 1 20.0 1 0.000000 0.000000 1.5 1\n", "3 1 20.0 1 9.523810 0.000000 0.5 -1\n", "4 1 12.5 0 0.000000 0.000000 1.0 0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a new column in the dataframe with a custom (linear) contrast\n", "df = df.assign(\n", " IV3_custom_lin=df['IV3'].map({\n", " 0.5: -1,\n", " 1.0: 0,\n", " 1.5: 1\n", " })\n", ")\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "Now we can use this variable as a continuous predictor without the need for the `factors` argument. Notice how the z-stat and p-value of the estimate are the same as the linear polynomial contrast estimated above. The coefficients differ in scale only because R uses [~-0.707, ~0, ~0.707] for its polynomial contrasts rather than [-1, 0, 1] like we did.\n", "\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "Collapsed": "false", "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "boundary (singular) fit: see ?isSingular \n", "\n", "Formula: DV_l~IV3_custom_lin+(IV3_custom_lin|Group)\n", "\n", "Family: binomial\t Inference: parametric\n", "\n", "Number of observations: 564\t Groups: {'Group': 47.0}\n", "\n", "Log-likelihood: -389.016 \t AIC: 788.031\n", "\n", "Random effects:\n", "\n", " Name Var Std\n", "Group (Intercept) 0.0 0.0\n", "Group IV3_custom_lin 0.0 0.0\n", "\n", " IV1 IV2 Corr\n", "Group (Intercept) IV3_custom_lin NaN\n", "\n", "Fixed effects:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Estimate2.5_ci97.5_ciSEOROR_2.5_ciOR_97.5_ciProbProb_2.5_ciProb_97.5_ciZ-statP-valSig
(Intercept)-0.128-0.2940.0370.0850.880.7451.0380.4680.4270.509-1.5170.129
IV3_custom_lin-0.128-0.3310.0750.1040.880.7181.0770.4680.4180.519-1.2390.215
\n", "
" ], "text/plain": [ " Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig\n", "(Intercept) -0.128 -0.294 0.037 0.085 0.88 0.745 1.038 0.468 0.427 0.509 -1.517 0.129 \n", "IV3_custom_lin -0.128 -0.331 0.075 0.104 0.88 0.718 1.077 0.468 0.418 0.519 -1.239 0.215 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Estimate model\n", "model = Lmer('DV_l ~ IV3_custom_lin + (IV3_custom_lin|Group)', data=df, family='binomial')\n", "model.fit()" ] }, { "cell_type": "markdown", "metadata": { "Collapsed": "false" }, "source": [ "### A note on how contrasts in R work\n", "\n", "

Note

This is just for folks curious about how contrasts in R work

\n", "\n", "Specifying multiple custom contrasts in R has always been a point of confusion amongst users. This because the `contrasts()` command in R doesn't actually expect contrast weights (i.e. a design matrix) as one would intuit. Rather, it is made for generating contrast coding schemes which are the inverse of the contrast weight matrix. For a longer explanation with examples see [this reference](https://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html>) and [this reference](https://github.com/ejolly/R/blob/master/Guides/Contrasts_in_R.md>). For these situations pymer4 offers a few utility functions to convert between these matrix types if desired in `pymer4.utils`: `R2con()` and `con2R()`.\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }