{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"Collapsed": "false"
},
"outputs": [],
"source": [
"# import some basic libraries\n",
"import os\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"# 2. Categorical Predictors\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"The syntax for handling categorical predictors is **different** between standard regression models/two-stage-models (i.e. `Lm` and `Lm2`) and multi-level models (`Lmer`) in `pymer4`. This is because formula parsing is passed to R for `Lmer` models, but handled by Python for other models. \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"Lm and Lm2 Models\n",
"-----------------\n",
"`Lm` and `Lm2` models use [patsy](https://patsy.readthedocs.io/en/latest/>) to parse model formulae. Patsy is very powerful and has built-in support for handling categorical coding schemes by wrapping a predictor in then `C()` *within* the module formula. Patsy can also perform some pre-processing such as scaling and standardization using special functions like `center()`. Here are some examples.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"# import sample data and model\n",
"from pymer4.utils import get_resource_path\n",
"from pymer4.models import Lm\n",
"\n",
"# IV3 is a categorical predictors with 3 levels in the sample data\n",
"df = pd.read_csv(os.path.join(get_resource_path(), 'sample_data.csv'))"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Dummy-coded/Treatment contrasts"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Formula: DV~C(IV3,levels=[1.0,0.5,1.5])\n",
"\n",
"Family: gaussian\t Estimator: OLS\n",
"\n",
"Std-errors: non-robust\tCIs: standard 95%\tInference: parametric \n",
"\n",
"Number of observations: 564\t R^2: 0.004\t R^2_adj: 0.001\n",
"\n",
"Log-likelihood: -2728.620 \t AIC: 5463.241\t BIC: 5476.246\n",
"\n",
"Fixed effects:\n",
"\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Estimate | \n",
" 2.5_ci | \n",
" 97.5_ci | \n",
" SE | \n",
" DF | \n",
" T-stat | \n",
" P-val | \n",
" Sig | \n",
"
\n",
" \n",
" \n",
" \n",
" Intercept | \n",
" 42.721 | \n",
" 38.334 | \n",
" 47.108 | \n",
" 2.233 | \n",
" 561 | \n",
" 19.129 | \n",
" 0.000 | \n",
" *** | \n",
"
\n",
" \n",
" C(IV3, levels=[1.0, 0.5, 1.5])[T.0.5] | \n",
" 1.463 | \n",
" -4.741 | \n",
" 7.667 | \n",
" 3.158 | \n",
" 561 | \n",
" 0.463 | \n",
" 0.643 | \n",
" | \n",
"
\n",
" \n",
" C(IV3, levels=[1.0, 0.5, 1.5])[T.1.5] | \n",
" -3.419 | \n",
" -9.622 | \n",
" 2.785 | \n",
" 3.158 | \n",
" 561 | \n",
" -1.082 | \n",
" 0.280 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig\n",
"Intercept 42.721 38.334 47.108 2.233 561 19.129 0.000 ***\n",
"C(IV3, levels=[1.0, 0.5, 1.5])[T.0.5] 1.463 -4.741 7.667 3.158 561 0.463 0.643 \n",
"C(IV3, levels=[1.0, 0.5, 1.5])[T.1.5] -3.419 -9.622 2.785 3.158 561 -1.082 0.280 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Estimate a model using Treatment contrasts (dummy-coding)\n",
"# with '1.0' as the reference level\n",
"# This is the default of the C() function \n",
"model = Lm(\"DV ~ C(IV3, levels=[1.0, 0.5, 1.5])\", data=df)\n",
"model.fit()"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Orthogonal Polynomial Contrasts"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Formula: DV~C(IV3,Poly)\n",
"\n",
"Family: gaussian\t Estimator: OLS\n",
"\n",
"Std-errors: non-robust\tCIs: standard 95%\tInference: parametric \n",
"\n",
"Number of observations: 564\t R^2: 0.004\t R^2_adj: 0.001\n",
"\n",
"Log-likelihood: -2728.620 \t AIC: 5463.241\t BIC: 5476.246\n",
"\n",
"Fixed effects:\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Estimate | \n",
" 2.5_ci | \n",
" 97.5_ci | \n",
" SE | \n",
" DF | \n",
" T-stat | \n",
" P-val | \n",
" Sig | \n",
"
\n",
" \n",
" \n",
" \n",
" Intercept | \n",
" 42.069 | \n",
" 39.537 | \n",
" 44.602 | \n",
" 1.289 | \n",
" 561 | \n",
" 32.627 | \n",
" 0.000 | \n",
" *** | \n",
"
\n",
" \n",
" C(IV3, Poly).Linear | \n",
" -3.452 | \n",
" -7.838 | \n",
" 0.935 | \n",
" 2.233 | \n",
" 561 | \n",
" -1.546 | \n",
" 0.123 | \n",
" | \n",
"
\n",
" \n",
" C(IV3, Poly).Quadratic | \n",
" -0.798 | \n",
" -5.185 | \n",
" 3.588 | \n",
" 2.233 | \n",
" 561 | \n",
" -0.357 | \n",
" 0.721 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig\n",
"Intercept 42.069 39.537 44.602 1.289 561 32.627 0.000 ***\n",
"C(IV3, Poly).Linear -3.452 -7.838 0.935 2.233 561 -1.546 0.123 \n",
"C(IV3, Poly).Quadratic -0.798 -5.185 3.588 2.233 561 -0.357 0.721 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Patsy can do this using the Poly argument to the \n",
"# C() function\n",
"model = Lm('DV ~ C(IV3, Poly)', data=df)\n",
"model.fit()"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Sum-to-zero contrasts"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Formula: DV~C(IV3,Sum)\n",
"\n",
"Family: gaussian\t Estimator: OLS\n",
"\n",
"Std-errors: non-robust\tCIs: standard 95%\tInference: parametric \n",
"\n",
"Number of observations: 564\t R^2: 0.004\t R^2_adj: 0.001\n",
"\n",
"Log-likelihood: -2728.620 \t AIC: 5463.241\t BIC: 5476.246\n",
"\n",
"Fixed effects:\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Estimate | \n",
" 2.5_ci | \n",
" 97.5_ci | \n",
" SE | \n",
" DF | \n",
" T-stat | \n",
" P-val | \n",
" Sig | \n",
"
\n",
" \n",
" \n",
" \n",
" Intercept | \n",
" 42.069 | \n",
" 39.537 | \n",
" 44.602 | \n",
" 1.289 | \n",
" 561 | \n",
" 32.627 | \n",
" 0.000 | \n",
" *** | \n",
"
\n",
" \n",
" C(IV3, Sum)[S.0.5] | \n",
" 2.115 | \n",
" -1.467 | \n",
" 5.697 | \n",
" 1.823 | \n",
" 561 | \n",
" 1.160 | \n",
" 0.247 | \n",
" | \n",
"
\n",
" \n",
" C(IV3, Sum)[S.1.0] | \n",
" 0.652 | \n",
" -2.930 | \n",
" 4.234 | \n",
" 1.823 | \n",
" 561 | \n",
" 0.357 | \n",
" 0.721 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig\n",
"Intercept 42.069 39.537 44.602 1.289 561 32.627 0.000 ***\n",
"C(IV3, Sum)[S.0.5] 2.115 -1.467 5.697 1.823 561 1.160 0.247 \n",
"C(IV3, Sum)[S.1.0] 0.652 -2.930 4.234 1.823 561 0.357 0.721 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Similar to before but with the Sum argument\n",
"model = Lm('DV ~ C(IV3, Sum)', data=df)\n",
"model.fit()"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Scaling/Centering"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Formula: DV~center(IV2)*C(IV3,Sum)\n",
"\n",
"Family: gaussian\t Estimator: OLS\n",
"\n",
"Std-errors: non-robust\tCIs: standard 95%\tInference: parametric \n",
"\n",
"Number of observations: 564\t R^2: 0.511\t R^2_adj: 0.507\n",
"\n",
"Log-likelihood: -2528.051 \t AIC: 5068.102\t BIC: 5094.113\n",
"\n",
"Fixed effects:\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Estimate | \n",
" 2.5_ci | \n",
" 97.5_ci | \n",
" SE | \n",
" DF | \n",
" T-stat | \n",
" P-val | \n",
" Sig | \n",
"
\n",
" \n",
" \n",
" \n",
" Intercept | \n",
" 42.051 | \n",
" 40.268 | \n",
" 43.833 | \n",
" 0.908 | \n",
" 558 | \n",
" 46.329 | \n",
" 0.000 | \n",
" *** | \n",
"
\n",
" \n",
" C(IV3, Sum)[S.0.5] | \n",
" 0.580 | \n",
" -1.942 | \n",
" 3.102 | \n",
" 1.284 | \n",
" 558 | \n",
" 0.452 | \n",
" 0.652 | \n",
" | \n",
"
\n",
" \n",
" C(IV3, Sum)[S.1.0] | \n",
" 0.383 | \n",
" -2.136 | \n",
" 2.903 | \n",
" 1.282 | \n",
" 558 | \n",
" 0.299 | \n",
" 0.765 | \n",
" | \n",
"
\n",
" \n",
" center(IV2) | \n",
" 0.746 | \n",
" 0.685 | \n",
" 0.807 | \n",
" 0.031 | \n",
" 558 | \n",
" 24.012 | \n",
" 0.000 | \n",
" *** | \n",
"
\n",
" \n",
" center(IV2):C(IV3, Sum)[S.0.5] | \n",
" 0.050 | \n",
" -0.037 | \n",
" 0.137 | \n",
" 0.044 | \n",
" 558 | \n",
" 1.132 | \n",
" 0.258 | \n",
" | \n",
"
\n",
" \n",
" center(IV2):C(IV3, Sum)[S.1.0] | \n",
" -0.057 | \n",
" -0.144 | \n",
" 0.029 | \n",
" 0.044 | \n",
" 558 | \n",
" -1.306 | \n",
" 0.192 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Estimate 2.5_ci 97.5_ci SE DF T-stat P-val Sig\n",
"Intercept 42.051 40.268 43.833 0.908 558 46.329 0.000 ***\n",
"C(IV3, Sum)[S.0.5] 0.580 -1.942 3.102 1.284 558 0.452 0.652 \n",
"C(IV3, Sum)[S.1.0] 0.383 -2.136 2.903 1.282 558 0.299 0.765 \n",
"center(IV2) 0.746 0.685 0.807 0.031 558 24.012 0.000 ***\n",
"center(IV2):C(IV3, Sum)[S.0.5] 0.050 -0.037 0.137 0.044 558 1.132 0.258 \n",
"center(IV2):C(IV3, Sum)[S.1.0] -0.057 -0.144 0.029 0.044 558 -1.306 0.192 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Moderation with IV2, but centering IV2 first\n",
"model = Lm('DV ~ center(IV2) * C(IV3, Sum)', data=df)\n",
"model.fit()"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"Please refer to the [patsy documentation](https://patsy.readthedocs.io/en/latest/categorical-coding.html) for more details when working categorical predictors in `Lm` or `Lm2` models.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"## Lmer Models\n",
"\n",
"`Lmer()` models currently have support for handling categorical predictors in one of three ways based on how R's `factor()` works (see the note at the end of this tutorial):\n",
"\n",
"- Dummy-coded factor levels (treatment contrasts) in which each model term is the difference between a factor level and a selected reference level\n",
"- Orthogonal polynomial contrasts in which each model term is a polynomial contrast across factor levels (e.g. linear, quadratic, cubic, etc)\n",
"- Custom contrasts for each level of a factor, which should be provided in the manner expected by R.\n",
"\n",
"To make re-parameterizing models easier, factor codings are passed as a dictionary to the `factors` argument of a model's `.fit()`. This obviates the need for adjusting data-frame properties as in R. Note that this is **different** from `Lm` and `Lm2` models above which expect factor codings in their formulae (because patsy does). \n",
"\n",
"Each of these ways also enables you to easily compute post-hoc comparisons between factor levels, as well as interactions between continuous predictors and each factor level. See tutorial 3 for more on post-hoc tests.\n",
"\n",
"## \\*Compatibility Note\\*:\n",
"Due to recent updates in `rpy2` and `pandas`, both of which are dependencies of `pymer4`, the contrast syntax shown below will raise errors and fail to run. But for the time being you can use `rpy2==3.1.0` and `pandas>0.23<1.0` on your own system to ensure compatiblity. Thanks for your patience while I implement the necessary fixes!\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"from pymer4.models import Lmer\n",
"# We're going to fit a multi-level logistic regression using the \n",
"# dichotomous DV_l variable and the same categorical predictor (IV3)\n",
"# as before\n",
"model = Lmer('DV_l ~ IV3 + (IV3|Group)', data=df, family='binomial')"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Dummy-coding factors\n",
"\n",
"First we'll use dummy-coding/treatment contrasts with 1.0 as the reference level. This will compute two coefficients: 0.5 > 1.0 and 1.5 > 1.0. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"boundary (singular) fit: see ?isSingular \n",
"\n",
"Formula: DV_l~IV3+(IV3|Group)\n",
"\n",
"Family: binomial\t Inference: parametric\n",
"\n",
"Number of observations: 564\t Groups: {'Group': 47.0}\n",
"\n",
"Log-likelihood: -389.003 \t AIC: 796.006\n",
"\n",
"Random effects:\n",
"\n",
" Name Var Std\n",
"Group (Intercept) 0.022 0.148\n",
"Group IV30.5 0.060 0.246\n",
"Group IV31.5 0.038 0.196\n",
"\n",
" IV1 IV2 Corr\n",
"Group (Intercept) IV30.5 -1.0\n",
"Group (Intercept) IV31.5 -1.0\n",
"Group IV30.5 IV31.5 1.0\n",
"\n",
"Fixed effects:\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Estimate | \n",
" 2.5_ci | \n",
" 97.5_ci | \n",
" SE | \n",
" OR | \n",
" OR_2.5_ci | \n",
" OR_97.5_ci | \n",
" Prob | \n",
" Prob_2.5_ci | \n",
" Prob_97.5_ci | \n",
" Z-stat | \n",
" P-val | \n",
" Sig | \n",
"
\n",
" \n",
" \n",
" \n",
" (Intercept) | \n",
" -0.129 | \n",
" -0.419 | \n",
" 0.162 | \n",
" 0.148 | \n",
" 0.879 | \n",
" 0.658 | \n",
" 1.176 | \n",
" 0.468 | \n",
" 0.397 | \n",
" 0.540 | \n",
" -0.867 | \n",
" 0.386 | \n",
" | \n",
"
\n",
" \n",
" IV30.5 | \n",
" 0.129 | \n",
" -0.283 | \n",
" 0.540 | \n",
" 0.210 | \n",
" 1.137 | \n",
" 0.753 | \n",
" 1.716 | \n",
" 0.532 | \n",
" 0.430 | \n",
" 0.632 | \n",
" 0.612 | \n",
" 0.541 | \n",
" | \n",
"
\n",
" \n",
" IV31.5 | \n",
" -0.128 | \n",
" -0.539 | \n",
" 0.283 | \n",
" 0.210 | \n",
" 0.880 | \n",
" 0.583 | \n",
" 1.327 | \n",
" 0.468 | \n",
" 0.368 | \n",
" 0.570 | \n",
" -0.612 | \n",
" 0.541 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig\n",
"(Intercept) -0.129 -0.419 0.162 0.148 0.879 0.658 1.176 0.468 0.397 0.540 -0.867 0.386 \n",
"IV30.5 0.129 -0.283 0.540 0.210 1.137 0.753 1.716 0.532 0.430 0.632 0.612 0.541 \n",
"IV31.5 -0.128 -0.539 0.283 0.210 0.880 0.583 1.327 0.468 0.368 0.570 -0.612 0.541 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# See compatiblity note!\n",
"model.fit(factors={\n",
" 'IV3': ['1.0', '0.5', '1.5']\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Polynomial contrast coding\n",
"\n",
"Second we'll use orthogonal polynomial contrasts. This is accomplished using the `ordered=True` argument and specifying the order of the *linear* contrast in increasing order. R will automatically compute higher order polynomial contrats that are orthogonal to this linear contrast. In this example, since there are 3 factor levels this will result in two polynomial terms: a linear contrast we specify below corresponding to 0.5 < 1.0 < 1.5 and an orthogonal quadratic contrast automatically determined by R, corresponding to 0.5 > 1 < 1.5\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"boundary (singular) fit: see ?isSingular \n",
"\n",
"boundary (singular) fit: see ?isSingular \n",
"\n",
"Formula: DV_l~IV3+(IV3|Group)\n",
"\n",
"Family: binomial\t Inference: parametric\n",
"\n",
"Number of observations: 564\t Groups: {'Group': 47.0}\n",
"\n",
"Log-likelihood: -389.003 \t AIC: 796.006\n",
"\n",
"Random effects:\n",
"\n",
" Name Var Std\n",
"Group (Intercept) 0.000 0.000\n",
"Group IV3.L 0.001 0.035\n",
"Group IV3.Q 0.032 0.180\n",
"\n",
" IV1 IV2 Corr\n",
"Group (Intercept) IV3.L NaN\n",
"Group (Intercept) IV3.Q NaN\n",
"Group IV3.L IV3.Q -1.0\n",
"\n",
"Fixed effects:\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Estimate | \n",
" 2.5_ci | \n",
" 97.5_ci | \n",
" SE | \n",
" OR | \n",
" OR_2.5_ci | \n",
" OR_97.5_ci | \n",
" Prob | \n",
" Prob_2.5_ci | \n",
" Prob_97.5_ci | \n",
" Z-stat | \n",
" P-val | \n",
" Sig | \n",
"
\n",
" \n",
" \n",
" \n",
" (Intercept) | \n",
" -0.128 | \n",
" -0.294 | \n",
" 0.037 | \n",
" 0.085 | \n",
" 0.879 | \n",
" 0.745 | \n",
" 1.038 | \n",
" 0.468 | \n",
" 0.427 | \n",
" 0.509 | \n",
" -1.518 | \n",
" 0.129 | \n",
" | \n",
"
\n",
" \n",
" IV3.L | \n",
" -0.182 | \n",
" -0.469 | \n",
" 0.106 | \n",
" 0.147 | \n",
" 0.834 | \n",
" 0.626 | \n",
" 1.112 | \n",
" 0.455 | \n",
" 0.385 | \n",
" 0.526 | \n",
" -1.238 | \n",
" 0.216 | \n",
" | \n",
"
\n",
" \n",
" IV3.Q | \n",
" 0.000 | \n",
" -0.292 | \n",
" 0.292 | \n",
" 0.149 | \n",
" 1.000 | \n",
" 0.747 | \n",
" 1.339 | \n",
" 0.500 | \n",
" 0.428 | \n",
" 0.572 | \n",
" 0.001 | \n",
" 1.000 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig\n",
"(Intercept) -0.128 -0.294 0.037 0.085 0.879 0.745 1.038 0.468 0.427 0.509 -1.518 0.129 \n",
"IV3.L -0.182 -0.469 0.106 0.147 0.834 0.626 1.112 0.455 0.385 0.526 -1.238 0.216 \n",
"IV3.Q 0.000 -0.292 0.292 0.149 1.000 0.747 1.339 0.500 0.428 0.572 0.001 1.000 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# See compatiblity note!\n",
"model.fit(factors={\n",
" 'IV3': ['0.5', '1.0', '1.5']},\n",
" ordered=True\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### Custom contrasts\n",
"\n",
"`Lmer` models can also take custom factor contrasts based on how they are expected by R (see the note at the end of this tutorial for how contrasts work in R). Remember that there can be at most k-1 model terms representing any k level factor without over-parameterizing a model. If you specify a custom contrast, R will generate set of orthogonal contrasts for the rest of your model terms. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"boundary (singular) fit: see ?isSingular \n",
"\n",
"boundary (singular) fit: see ?isSingular \n",
"\n",
"boundary (singular) fit: see ?isSingular \n",
"\n",
"boundary (singular) fit: see ?isSingular \n",
"\n",
"boundary (singular) fit: see ?isSingular \n",
"\n",
"Formula: DV_l~IV3+(IV3|Group)\n",
"\n",
"Family: binomial\t Inference: parametric\n",
"\n",
"Number of observations: 564\t Groups: {'Group': 47.0}\n",
"\n",
"Log-likelihood: -389.003 \t AIC: 796.006\n",
"\n",
"Random effects:\n",
"\n",
" Name Var Std\n",
"Group (Intercept) 0.022 0.148\n",
"Group IV30.5 0.060 0.246\n",
"Group IV31.5 0.038 0.196\n",
"\n",
" IV1 IV2 Corr\n",
"Group (Intercept) IV30.5 -1.0\n",
"Group (Intercept) IV31.5 -1.0\n",
"Group IV30.5 IV31.5 1.0\n",
"\n",
"Fixed effects:\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Estimate | \n",
" 2.5_ci | \n",
" 97.5_ci | \n",
" SE | \n",
" OR | \n",
" OR_2.5_ci | \n",
" OR_97.5_ci | \n",
" Prob | \n",
" Prob_2.5_ci | \n",
" Prob_97.5_ci | \n",
" Z-stat | \n",
" P-val | \n",
" Sig | \n",
"
\n",
" \n",
" \n",
" \n",
" (Intercept) | \n",
" -0.129 | \n",
" -0.419 | \n",
" 0.162 | \n",
" 0.148 | \n",
" 0.879 | \n",
" 0.658 | \n",
" 1.176 | \n",
" 0.468 | \n",
" 0.397 | \n",
" 0.540 | \n",
" -0.867 | \n",
" 0.386 | \n",
" | \n",
"
\n",
" \n",
" IV30.5 | \n",
" 0.129 | \n",
" -0.283 | \n",
" 0.540 | \n",
" 0.210 | \n",
" 1.137 | \n",
" 0.753 | \n",
" 1.716 | \n",
" 0.532 | \n",
" 0.430 | \n",
" 0.632 | \n",
" 0.612 | \n",
" 0.541 | \n",
" | \n",
"
\n",
" \n",
" IV31.5 | \n",
" -0.128 | \n",
" -0.539 | \n",
" 0.283 | \n",
" 0.210 | \n",
" 0.880 | \n",
" 0.583 | \n",
" 1.327 | \n",
" 0.468 | \n",
" 0.368 | \n",
" 0.570 | \n",
" -0.612 | \n",
" 0.541 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig\n",
"(Intercept) -0.129 -0.419 0.162 0.148 0.879 0.658 1.176 0.468 0.397 0.540 -0.867 0.386 \n",
"IV30.5 0.129 -0.283 0.540 0.210 1.137 0.753 1.716 0.532 0.430 0.632 0.612 0.541 \n",
"IV31.5 -0.128 -0.539 0.283 0.210 0.880 0.583 1.327 0.468 0.368 0.570 -0.612 0.541 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# See compatiblity note!\n",
"\n",
"# Compare level '1.0' to the mean of levels '0.5' and '1.5'\n",
"# and let R determine the second contrast orthogonal to it\n",
"\n",
"model.fit(factors={\n",
" 'IV3': {'1.0': 1, '0.5': -.5, '1.5': -.5}\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### User-created contrasts (without R)\n",
"\n",
"Another option available to you is fitting a model with *only* your desired contrast(s) rather than a full set of k-1 contrasts. Contrary to how statistics is usually taught, you don't ever *have to* include a full set of k-1 contrasts for a k level factor! The upside to doing this is that you won't need to rely on R to compute anything for you (aside from the model fit), and you will have a model with exactly the number of terms as contrasts you desire, giving you complete control. The downside is that post-hoc tests will no longer be available (see tutorial 3 for more information on post-hoc tests), but it's unlikely you're doing post-hoc tests if you are computing a subset of specific contrasts anyway. This is also a useful approach if you don't want to use patsy's formula syntax with `Lm` and `Lm2` as noted above.\n",
"\n",
"This can be accomplished by creating new columns in your dataframe to test specific hypotheses and is trivial to do with pandas [map](https://pandas.pydata.org/pandas-docs/version/0.25/reference/api/pandas.Series.map.html) and [assign](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html/) methods. For example, here we manually compute a linear contrast by creating a new column in our dataframe and treating it as a continuous variable.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Group | \n",
" IV1 | \n",
" DV_l | \n",
" DV | \n",
" IV2 | \n",
" IV3 | \n",
" IV3_custom_lin | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 20.0 | \n",
" 0 | \n",
" 7.936508 | \n",
" 4.563492 | \n",
" 0.5 | \n",
" -1 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 20.0 | \n",
" 0 | \n",
" 15.277778 | \n",
" 0.000000 | \n",
" 1.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 20.0 | \n",
" 1 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 1.5 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 20.0 | \n",
" 1 | \n",
" 9.523810 | \n",
" 0.000000 | \n",
" 0.5 | \n",
" -1 | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" 12.5 | \n",
" 0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 1.0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Group IV1 DV_l DV IV2 IV3 IV3_custom_lin\n",
"0 1 20.0 0 7.936508 4.563492 0.5 -1\n",
"1 1 20.0 0 15.277778 0.000000 1.0 0\n",
"2 1 20.0 1 0.000000 0.000000 1.5 1\n",
"3 1 20.0 1 9.523810 0.000000 0.5 -1\n",
"4 1 12.5 0 0.000000 0.000000 1.0 0"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create a new column in the dataframe with a custom (linear) contrast\n",
"df = df.assign(\n",
" IV3_custom_lin=df['IV3'].map({\n",
" 0.5: -1,\n",
" 1.0: 0,\n",
" 1.5: 1\n",
" })\n",
")\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"Now we can use this variable as a continuous predictor without the need for the `factors` argument. Notice how the z-stat and p-value of the estimate are the same as the linear polynomial contrast estimated above. The coefficients differ in scale only because R uses [~-0.707, ~0, ~0.707] for its polynomial contrasts rather than [-1, 0, 1] like we did.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"Collapsed": "false",
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"boundary (singular) fit: see ?isSingular \n",
"\n",
"Formula: DV_l~IV3_custom_lin+(IV3_custom_lin|Group)\n",
"\n",
"Family: binomial\t Inference: parametric\n",
"\n",
"Number of observations: 564\t Groups: {'Group': 47.0}\n",
"\n",
"Log-likelihood: -389.016 \t AIC: 788.031\n",
"\n",
"Random effects:\n",
"\n",
" Name Var Std\n",
"Group (Intercept) 0.0 0.0\n",
"Group IV3_custom_lin 0.0 0.0\n",
"\n",
" IV1 IV2 Corr\n",
"Group (Intercept) IV3_custom_lin NaN\n",
"\n",
"Fixed effects:\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Estimate | \n",
" 2.5_ci | \n",
" 97.5_ci | \n",
" SE | \n",
" OR | \n",
" OR_2.5_ci | \n",
" OR_97.5_ci | \n",
" Prob | \n",
" Prob_2.5_ci | \n",
" Prob_97.5_ci | \n",
" Z-stat | \n",
" P-val | \n",
" Sig | \n",
"
\n",
" \n",
" \n",
" \n",
" (Intercept) | \n",
" -0.128 | \n",
" -0.294 | \n",
" 0.037 | \n",
" 0.085 | \n",
" 0.88 | \n",
" 0.745 | \n",
" 1.038 | \n",
" 0.468 | \n",
" 0.427 | \n",
" 0.509 | \n",
" -1.517 | \n",
" 0.129 | \n",
" | \n",
"
\n",
" \n",
" IV3_custom_lin | \n",
" -0.128 | \n",
" -0.331 | \n",
" 0.075 | \n",
" 0.104 | \n",
" 0.88 | \n",
" 0.718 | \n",
" 1.077 | \n",
" 0.468 | \n",
" 0.418 | \n",
" 0.519 | \n",
" -1.239 | \n",
" 0.215 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig\n",
"(Intercept) -0.128 -0.294 0.037 0.085 0.88 0.745 1.038 0.468 0.427 0.509 -1.517 0.129 \n",
"IV3_custom_lin -0.128 -0.331 0.075 0.104 0.88 0.718 1.077 0.468 0.418 0.519 -1.239 0.215 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Estimate model\n",
"model = Lmer('DV_l ~ IV3_custom_lin + (IV3_custom_lin|Group)', data=df, family='binomial')\n",
"model.fit()"
]
},
{
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
},
"source": [
"### A note on how contrasts in R work\n",
"\n",
"Note
This is just for folks curious about how contrasts in R work
\n",
"\n",
"Specifying multiple custom contrasts in R has always been a point of confusion amongst users. This because the `contrasts()` command in R doesn't actually expect contrast weights (i.e. a design matrix) as one would intuit. Rather, it is made for generating contrast coding schemes which are the inverse of the contrast weight matrix. For a longer explanation with examples see [this reference](https://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html>) and [this reference](https://github.com/ejolly/R/blob/master/Guides/Contrasts_in_R.md>). For these situations pymer4 offers a few utility functions to convert between these matrix types if desired in `pymer4.utils`: `R2con()` and `con2R()`.\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}