{
"metadata": {
"name": "",
"signature": "sha256:82dd99dce72b88d1c4a58e01087fa61159cab3f0a670f553b971e33e4623abf6"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Convert A Numeric Categorical Variable With Patsy\n",
"\n",
"- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n",
"- **Date:** -\n",
"- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n",
"- **Note:** Originally from: Data Origami."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### import modules"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"import patsy"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create dataframe"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"raw_data = {'countrycode': [1, 2, 3, 2, 1]} \n",
"df = pd.DataFrame(raw_data, columns = ['countrycode'])\n",
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" countrycode | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" 3 | \n",
" 2 | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
" countrycode\n",
"0 1\n",
"1 2\n",
"2 3\n",
"3 2\n",
"4 1"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Convert the countrycode variable into three binary variables"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"patsy.dmatrix('C(countrycode)-1', df, return_type='dataframe')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" C(countrycode)[1] | \n",
" C(countrycode)[2] | \n",
" C(countrycode)[3] | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
" C(countrycode)[1] C(countrycode)[2] C(countrycode)[3]\n",
"0 1 0 0\n",
"1 0 1 0\n",
"2 0 0 1\n",
"3 0 1 0\n",
"4 1 0 0"
]
}
],
"prompt_number": 10
}
],
"metadata": {}
}
]
}