{ "metadata": { "name": "", "signature": "sha256:2a61c11fc4cfaa8c34ade4ea86130ce85b4a74c7f26da063290965df33b40f69" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convert A Categorical Variable Into Dummy Variables\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], \n", " 'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], \n", " 'sex': ['male', 'female', 'male', 'female', 'female']}\n", "df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'sex'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namesex
0 Jason Miller male
1 Molly Jacobson female
2 Tina Ali male
3 Jake Milner female
4 Amy Cooze female
\n", "

5 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ " first_name last_name sex\n", "0 Jason Miller male\n", "1 Molly Jacobson female\n", "2 Tina Ali male\n", "3 Jake Milner female\n", "4 Amy Cooze female\n", "\n", "[5 rows x 3 columns]" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a set of dummy variables from the sex variable" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df_sex = pd.get_dummies(df['sex'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Join the dummy variables to the main dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df_new = pd.concat([df, df_sex], axis=1)\n", "df_new" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namesexfemalemale
0 Jason Miller male 0 1
1 Molly Jacobson female 1 0
2 Tina Ali male 0 1
3 Jake Milner female 1 0
4 Amy Cooze female 1 0
\n", "

5 rows \u00d7 5 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ " first_name last_name sex female male\n", "0 Jason Miller male 0 1\n", "1 Molly Jacobson female 1 0\n", "2 Tina Ali male 0 1\n", "3 Jake Milner female 1 0\n", "4 Amy Cooze female 1 0\n", "\n", "[5 rows x 5 columns]" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alterative for joining the new columns" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df_new = df.join(df_sex)\n", "df_new" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namesexfemalemale
0 Jason Miller male 0 1
1 Molly Jacobson female 1 0
2 Tina Ali male 0 1
3 Jake Milner female 1 0
4 Amy Cooze female 1 0
\n", "

5 rows \u00d7 5 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ " first_name last_name sex female male\n", "0 Jason Miller male 0 1\n", "1 Molly Jacobson female 1 0\n", "2 Tina Ali male 0 1\n", "3 Jake Milner female 1 0\n", "4 Amy Cooze female 1 0\n", "\n", "[5 rows x 5 columns]" ] } ], "prompt_number": 19 } ], "metadata": {} } ] }