{ "metadata": { "name": "", "signature": "sha256:bccbe4c0f89ae449bb56549fd91930d8f993fb8168d10e54cd0a5b92f5da099e" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Split Combined Lat/Long Coordinate Variables Into Seperate Variables In Pandas\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preliminaries" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create an example dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "raw_data = {'geo': ['40.0024, -105.4102', '40.0068, -105.266', '39.9318, -105.2813', np.nan]}\n", "df = pd.DataFrame(raw_data, columns = ['geo'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geo
0 40.0024, -105.4102
1 40.0068, -105.266
2 39.9318, -105.2813
3 NaN
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 42, "text": [ " geo\n", "0 40.0024, -105.4102\n", "1 40.0068, -105.266\n", "2 39.9318, -105.2813\n", "3 NaN" ] } ], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Split the geo variable into seperate lat and lon variables" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create two lists for the loop results to be placed\n", "lat = []\n", "lon = []\n", "\n", "# For each row in a varible,\n", "for row in df['geo']:\n", " # Try to,\n", " try:\n", " # Split the row by comma and append\n", " # everything before the comma to lat\n", " lat.append(row.split(',')[0])\n", " # Split the row by comma and append\n", " # everything after the comma to lon\n", " lon.append(row.split(',')[1])\n", " # But if you get an error\n", " except:\n", " # append a missing value to lat\n", " lat.append(np.NaN)\n", " # append a missing value to lon\n", " lon.append(np.NaN)\n", "\n", "# Create two new columns from lat and lon\n", "df['latitude'] = lat\n", "df['longitude'] = lon" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## View the dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
geolatitudelongitude
0 40.0024, -105.4102 40.0024 -105.4102
1 40.0068, -105.266 40.0068 -105.266
2 39.9318, -105.2813 39.9318 -105.2813
3 NaN NaN NaN
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 44, "text": [ " geo latitude longitude\n", "0 40.0024, -105.4102 40.0024 -105.4102\n", "1 40.0068, -105.266 40.0068 -105.266\n", "2 39.9318, -105.2813 39.9318 -105.2813\n", "3 NaN NaN NaN" ] } ], "prompt_number": 44 } ], "metadata": {} } ] }