{ "metadata": { "name": "", "signature": "sha256:af9ee9dd222419dc741cf392ce4d71f84211dd93c10fbcbc42fd4ca492850400" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Coding Country Dyads From Headlines\n", "\n", "This snippit was written by [Chris R. Albon](http://www.chrisralbon.com/) and is part of his collection of [well-documented Python snippits](https://github.com/chrisalbon/code_py). All code is written in Python 3 in iPython notebook and offered under the [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Import the packages (i.e. plug-ins) we want.\n", "\n", "Unless STATA and R, Python does not come with a row/column data structure out of the box, to organize our data that way, we import the powerful [Pandas](http://pandas.pydata.org/) package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preliminaries" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# import the pandas package and call it \"pd\".\n", "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create some fake headlines" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create a list of strings (i.e. text), each one a headline.\n", "headlines = ['Germany Attacks Poland Through The Mountains.',\n", " 'Britan Declares War On Germany',\n", " 'Britan Asks US For Help',\n", " 'Japan Attacks US Base In Hawaii',\n", " 'US Declares War On Germany']" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Now for the data wrangling\n", "\n", "This might look complicated, but it isn't. The basic idea is that we are telling Python to \"loop\" through each headline, and if it finds the name of a certain country, then go mark that that is happened. This particular loop actually does two things: first it marks a \"yes\" or \"no\" if it finds a country, but it also adds that country to a list of dyads.\n", "\n", "I've added comments." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create a list variable (hence the square brackets) called poland\n", "poland = []\n", "\n", "# Create a list variable called germany\n", "germany = []\n", "\n", "# Create a list variable called britan\n", "britan = []\n", "\n", "# Create a list variable called japan\n", "japan = []\n", "\n", "# Create a list variable called us\n", "US = []\n", "\n", "# Create a list variable called dyad\n", "dyad = []\n", "\n", "# Now for the loop\n", "\n", "# For each row in the variable called headlines,\n", "for row in headlines:\n", " \n", " # create a variable called dyad_member, and then,\n", " dyad_member = []\n", " \n", " # if poland is in the headline,\n", " if 'Poland' in row:\n", " # append 'yes' to the poland list variable, and,\n", " poland.append('yes')\n", " # append 'poland' to the dyad_member variable\n", " dyad_member.append('Poland')\n", " # otherwise,\n", " else:\n", " # just append 'no' to the poland list variable\n", " poland.append('no')\n", " \n", " # the code below just does the exact same thing that\n", " # we just did with poland, but with each other country\n", " \n", " if 'Germany' in row:\n", " germany.append('yes')\n", " dyad_member.append('Germany')\n", " else:\n", " germany.append('no')\n", " \n", " if 'Britan' in row:\n", " britan.append('yes')\n", " dyad_member.append('britan')\n", " else:\n", " britan.append('no')\n", " \n", " if 'Japan' in row:\n", " japan.append('yes')\n", " dyad_member.append('Japan')\n", " else:\n", " japan.append('no')\n", " \n", " if 'US' in row:\n", " US.append('yes')\n", " dyad_member.append('US')\n", " else:\n", " US.append('no')\n", " \n", " # append the variable dyad_member to the dyad variable\n", " dyad.append(dyad_member)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "# view the dyad variable just to make sure we did everything right\n", "dyad" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "[['Poland', 'Germany'],\n", " ['Germany', 'britan'],\n", " ['britan', 'US'],\n", " ['Japan', 'US'],\n", " ['Germany', 'US']]" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "# Create list variables, country1 and country2\n", "country1 = []\n", "country2 = []\n", "\n", "# For each row in the variable, dyad\n", "for row in dyad:\n", " # append the first country listed to country1\n", " country1.append(row[0])\n", " # append the second country listed to country2\n", " country2.append(row[1])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "# Now to turn our work into a dataframe with rows and columns\n", "\n", "# create a dataframe called df\n", "df = pd.DataFrame()\n", "\n", "# create a column called headlines, from the variable deadlines\n", "df['headlines'] = headlines\n", "\n", "# create a column called country1, from the variable country1\n", "df['country1'] = country1\n", "\n", "# create a column called country2, from the variable country2\n", "df['country2'] = country2\n", "\n", "# create a column called poland, from the variable poland\n", "df['poland'] = poland\n", "\n", "# create a column called germany, from the variable germany\n", "df['germany'] = germany\n", "\n", "# create a column called britan, from the variable britan\n", "df['britan'] = britan\n", "\n", "# create a column called japan, from the variable japan\n", "df['japan'] = japan\n", "\n", "# create a column called US, from the variable US\n", "df['US'] = US" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's view our new dataset" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# View the dataframe\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
headlinescountry1country2polandgermanybritanjapanUS
0 Germany Attacks Poland Through The Mountains. Poland Germany yes yes no no no
1 Britan Declares War On Germany Germany britan no yes yes no no
2 Britan Asks US For Help britan US no no yes no yes
3 Japan Attacks US Base In Hawaii Japan US no no no yes yes
4 US Declares War On Germany Germany US no yes no no yes
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ " headlines country1 country2 poland \\\n", "0 Germany Attacks Poland Through The Mountains. Poland Germany yes \n", "1 Britan Declares War On Germany Germany britan no \n", "2 Britan Asks US For Help britan US no \n", "3 Japan Attacks US Base In Hawaii Japan US no \n", "4 US Declares War On Germany Germany US no \n", "\n", " germany britan japan US \n", "0 yes no no no \n", "1 yes yes no no \n", "2 no yes no yes \n", "3 no no yes yes \n", "4 yes no no yes " ] } ], "prompt_number": 7 } ], "metadata": {} } ] }