{ "metadata": { "name": "", "signature": "sha256:11993ebaa7aa1c5ebb5c3eaff271ec7cd7eb035bd2793e76d9837603a767a572" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Breaking Up A String Into Columns Using Regex In Pandas\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:** Originally based on [this tutorial in nbviewer](http://nbviewer.ipython.org/github/swcarpentry/notebooks/blob/master/regex-intro.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import re\n", "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataframe of raw strings" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create a dataframe with a single column of strings\n", "data = {'raw': ['Arizona 1 2014-12-23 3242.0',\n", " 'Iowa 1 2010-02-23 3453.7',\n", " 'Oregon 0 2014-06-20 2123.0',\n", " 'Maryland 0 2014-03-14 1123.6',\n", " 'Florida 1 2013-01-15 2134.0',\n", " 'Georgia 0 2012-07-14 2345.6']}\n", "df = pd.DataFrame(data, columns = ['raw'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | raw | \n", "
---|---|
0 | \n", "Arizona 1 2014-12-23 3242.0 | \n", "
1 | \n", "Iowa 1 2010-02-23 3453.7 | \n", "
2 | \n", "Oregon 0 2014-06-20 2123.0 | \n", "
3 | \n", "Maryland 0 2014-03-14 1123.6 | \n", "
4 | \n", "Florida 1 2013-01-15 2134.0 | \n", "
5 | \n", "Georgia 0 2012-07-14 2345.6 | \n", "
6 rows \u00d7 1 columns
\n", "\n", " | raw | \n", "female | \n", "date | \n", "score | \n", "state | \n", "
---|---|---|---|---|---|
0 | \n", "Arizona 1 2014-12-23 3242.0 | \n", "1 | \n", "2014-12-23 | \n", "3242.0 | \n", "Arizona | \n", "
1 | \n", "Iowa 1 2010-02-23 3453.7 | \n", "1 | \n", "2010-02-23 | \n", "3453.7 | \n", "Iowa | \n", "
2 | \n", "Oregon 0 2014-06-20 2123.0 | \n", "0 | \n", "2014-06-20 | \n", "2123.0 | \n", "Oregon | \n", "
3 | \n", "Maryland 0 2014-03-14 1123.6 | \n", "0 | \n", "2014-03-14 | \n", "1123.6 | \n", "Maryland | \n", "
4 | \n", "Florida 1 2013-01-15 2134.0 | \n", "1 | \n", "2013-01-15 | \n", "2134.0 | \n", "Florida | \n", "
5 | \n", "Georgia 0 2012-07-14 2345.6 | \n", "0 | \n", "2012-07-14 | \n", "2345.6 | \n", "Georgia | \n", "
6 rows \u00d7 5 columns
\n", "