{ "metadata": { "name": "", "signature": "sha256:974bfc5c75092df1374cb6bc96de6c35c986bc95f5d9b4cb2e042c38be14da42" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convert A String Categorical Variable To A Numeric Variable Naively\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:** Originally from: Data Origami." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "raw_data = {'patient': [1, 1, 1, 2, 2], \n", " 'obs': [1, 2, 3, 1, 2], \n", " 'treatment': [0, 1, 0, 1, 0],\n", " 'score': ['strong', 'weak', 'normal', 'weak', 'strong']} \n", "df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientobstreatmentscore
0 1 1 0 strong
1 1 2 1 weak
2 1 3 0 normal
3 2 1 1 weak
4 2 2 0 strong
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ " patient obs treatment score\n", "0 1 1 0 strong\n", "1 1 2 1 weak\n", "2 1 3 0 normal\n", "3 2 1 1 weak\n", "4 2 2 0 strong" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a function that converts all values of df['score'] into numbers" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def score_to_numeric(x):\n", " if x=='strong':\n", " return 3\n", " if x=='normal':\n", " return 2\n", " if x=='weak':\n", " return 1" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply the function to the score variable" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df['score_num'] = df['score'].apply(score_to_numeric)\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientobstreatmentscorescore_num
0 1 1 0 strong 3
1 1 2 1 weak 1
2 1 3 0 normal 2
3 2 1 1 weak 1
4 2 2 0 strong 3
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ " patient obs treatment score score_num\n", "0 1 1 0 strong 3\n", "1 1 2 1 weak 1\n", "2 1 3 0 normal 2\n", "3 2 1 1 weak 1\n", "4 2 2 0 strong 3" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }