{ "metadata": { "name": "", "signature": "sha256:82dd99dce72b88d1c4a58e01087fa61159cab3f0a670f553b971e33e4623abf6" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convert A Numeric Categorical Variable With Patsy\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:** Originally from: Data Origami." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import patsy" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "raw_data = {'countrycode': [1, 2, 3, 2, 1]} \n", "df = pd.DataFrame(raw_data, columns = ['countrycode'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrycode
0 1
1 2
2 3
3 2
4 1
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ " countrycode\n", "0 1\n", "1 2\n", "2 3\n", "3 2\n", "4 1" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert the countrycode variable into three binary variables" ] }, { "cell_type": "code", "collapsed": false, "input": [ "patsy.dmatrix('C(countrycode)-1', df, return_type='dataframe')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
C(countrycode)[1]C(countrycode)[2]C(countrycode)[3]
0 1 0 0
1 0 1 0
2 0 0 1
3 0 1 0
4 1 0 0
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ " C(countrycode)[1] C(countrycode)[2] C(countrycode)[3]\n", "0 1 0 0\n", "1 0 1 0\n", "2 0 0 1\n", "3 0 1 0\n", "4 1 0 0" ] } ], "prompt_number": 10 } ], "metadata": {} } ] }