{ "metadata": { "name": "", "signature": "sha256:5c1510b1e2064dcb9aae50c0ffd6da4529234fccf2690257c3730da3893f97df" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas: Long To Wide Format\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "raw_data = {'patient': [1, 1, 1, 2, 2], \n", " 'obs': [1, 2, 3, 1, 2], \n", " 'treatment': [0, 1, 0, 1, 0],\n", " 'score': [6252, 24243, 2345, 2342, 23525]} \n", "df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
patientobstreatmentscore
0 1 1 0 6252
1 1 2 1 24243
2 1 3 0 2345
3 2 1 1 2342
4 2 2 0 23525
\n", "

5 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 68, "text": [ " patient obs treatment score\n", "0 1 1 0 6252\n", "1 1 2 1 24243\n", "2 1 3 0 2345\n", "3 2 1 1 2342\n", "4 2 2 0 23525\n", "\n", "[5 rows x 4 columns]" ] } ], "prompt_number": 68 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a dataframe with the rows by patient number, the columns being by observation number, and the cell values being the score values" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.pivot(index='patient', columns='obs', values='score')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
obs123
patient
1 6252 24243 2345
2 2342 23525 NaN
\n", "

2 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 72, "text": [ "obs 1 2 3\n", "patient \n", "1 6252 24243 2345\n", "2 2342 23525 NaN\n", "\n", "[2 rows x 3 columns]" ] } ], "prompt_number": 72 } ], "metadata": {} } ] }