{
"metadata": {
"name": "",
"signature": "sha256:974bfc5c75092df1374cb6bc96de6c35c986bc95f5d9b4cb2e042c38be14da42"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Convert A String Categorical Variable To A Numeric Variable Naively\n",
"\n",
"- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n",
"- **Date:** -\n",
"- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n",
"- **Note:** Originally from: Data Origami."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### import modules"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create dataframe"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"raw_data = {'patient': [1, 1, 1, 2, 2], \n",
" 'obs': [1, 2, 3, 1, 2], \n",
" 'treatment': [0, 1, 0, 1, 0],\n",
" 'score': ['strong', 'weak', 'normal', 'weak', 'strong']} \n",
"df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])\n",
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" patient | \n",
" obs | \n",
" treatment | \n",
" score | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" strong | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" weak | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" 0 | \n",
" normal | \n",
"
\n",
" \n",
" 3 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" weak | \n",
"
\n",
" \n",
" 4 | \n",
" 2 | \n",
" 2 | \n",
" 0 | \n",
" strong | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
" patient obs treatment score\n",
"0 1 1 0 strong\n",
"1 1 2 1 weak\n",
"2 1 3 0 normal\n",
"3 2 1 1 weak\n",
"4 2 2 0 strong"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a function that converts all values of df['score'] into numbers"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def score_to_numeric(x):\n",
" if x=='strong':\n",
" return 3\n",
" if x=='normal':\n",
" return 2\n",
" if x=='weak':\n",
" return 1"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Apply the function to the score variable"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df['score_num'] = df['score'].apply(score_to_numeric)\n",
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" patient | \n",
" obs | \n",
" treatment | \n",
" score | \n",
" score_num | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" strong | \n",
" 3 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" weak | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 3 | \n",
" 0 | \n",
" normal | \n",
" 2 | \n",
"
\n",
" \n",
" 3 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" weak | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" 2 | \n",
" 2 | \n",
" 0 | \n",
" strong | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
" patient obs treatment score score_num\n",
"0 1 1 0 strong 3\n",
"1 1 2 1 weak 1\n",
"2 1 3 0 normal 2\n",
"3 2 1 1 weak 1\n",
"4 2 2 0 strong 3"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}