{
"metadata": {
"name": "",
"signature": "sha256:bccbe4c0f89ae449bb56549fd91930d8f993fb8168d10e54cd0a5b92f5da099e"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Split Combined Lat/Long Coordinate Variables Into Seperate Variables In Pandas\n",
"\n",
"- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n",
"- **Date:** -\n",
"- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n",
"- **Note:**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preliminaries"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"import numpy as np"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 41
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an example dataframe"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"raw_data = {'geo': ['40.0024, -105.4102', '40.0068, -105.266', '39.9318, -105.2813', np.nan]}\n",
"df = pd.DataFrame(raw_data, columns = ['geo'])\n",
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" geo | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 40.0024, -105.4102 | \n",
"
\n",
" \n",
" 1 | \n",
" 40.0068, -105.266 | \n",
"
\n",
" \n",
" 2 | \n",
" 39.9318, -105.2813 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 42,
"text": [
" geo\n",
"0 40.0024, -105.4102\n",
"1 40.0068, -105.266\n",
"2 39.9318, -105.2813\n",
"3 NaN"
]
}
],
"prompt_number": 42
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Split the geo variable into seperate lat and lon variables"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Create two lists for the loop results to be placed\n",
"lat = []\n",
"lon = []\n",
"\n",
"# For each row in a varible,\n",
"for row in df['geo']:\n",
" # Try to,\n",
" try:\n",
" # Split the row by comma and append\n",
" # everything before the comma to lat\n",
" lat.append(row.split(',')[0])\n",
" # Split the row by comma and append\n",
" # everything after the comma to lon\n",
" lon.append(row.split(',')[1])\n",
" # But if you get an error\n",
" except:\n",
" # append a missing value to lat\n",
" lat.append(np.NaN)\n",
" # append a missing value to lon\n",
" lon.append(np.NaN)\n",
"\n",
"# Create two new columns from lat and lon\n",
"df['latitude'] = lat\n",
"df['longitude'] = lon"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 43
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## View the dataframe"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" geo | \n",
" latitude | \n",
" longitude | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 40.0024, -105.4102 | \n",
" 40.0024 | \n",
" -105.4102 | \n",
"
\n",
" \n",
" 1 | \n",
" 40.0068, -105.266 | \n",
" 40.0068 | \n",
" -105.266 | \n",
"
\n",
" \n",
" 2 | \n",
" 39.9318, -105.2813 | \n",
" 39.9318 | \n",
" -105.2813 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 44,
"text": [
" geo latitude longitude\n",
"0 40.0024, -105.4102 40.0024 -105.4102\n",
"1 40.0068, -105.266 40.0068 -105.266\n",
"2 39.9318, -105.2813 39.9318 -105.2813\n",
"3 NaN NaN NaN"
]
}
],
"prompt_number": 44
}
],
"metadata": {}
}
]
}