---
aliases:
- /2013/09/run-hadoop-python-jobs-on-amazon-with-mrjob
categories:
- git
- python
date: 2013-09-02 02:36
layout: post
slug: run-hadoop-python-jobs-on-amazon-with-mrjob
title: Run Hadoop Python jobs on Amazon with MrJob
---
First we need to install mrjob with:
pip install mrjobI am starting with a simple example of word counting. Previously I implemented this directly using the hadoop streaming interface, therefore mapper and reducer were scripts that read from standard input and print to standard output, see mapper.py and reducer.py in:
https://github.com/zonca/python-wordcount-hadoop/blob/master/mrjob/word_count_mrjob.py
python word_count_mrjob.py gutemberg/20417.txt.utf-8
python word_count_mrjob.py --runner=local gutemberg/20417.txt.utf-8
python word_count_mrjob.py --runner=emr --aws-region=us-west-2 gutemberg/20417.txt.utf-8
. runemr.shCreates the instances:
using configs in /home/zonca/.mrjob.conf
using existing scratch bucket mrjob-ecd1d07aeee083dd
using s3://mrjob-ecd1d07aeee083dd/tmp/ as our scratch dir on S3
creating tmp directory /tmp/mrjobjob.zonca.20130901.192250.785550
Copying non-input files into s3://mrjob-ecd1d07aeee083dd/tmp/mrjobjob.zonca.20130901.192250.785550/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-2E83MO9QZQILB
Created new job flow j-2E83MO9QZQILB
Job launched 30.9s ago, status STARTING: Starting instancesCreates an SSH tunnel to the tracker:
Job launched 123.9s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 250.5s ago, status RUNNING: Running step (mrjobjob.zonca.20130901.192250.785550: Step 1 of 1)
Opening ssh tunnel to Hadoop job tracker
Connect to job tracker at: http://localhost:40630/jobtracker.jsp
"maladies" 1I've been positively impressed that it is so easy to implement and run a MapReduce job with MrJob without need of managing directly EC2 instances or the Hadoop installation.
"malaria" 5
"male" 18
"maleproducing" 1
"males" 5
"mammal" 10
"mammalInstinctive" 1
"mammalian" 4
"mammallike" 1
"mammals" 87
"mammoth" 5
"mammoths" 1
"man" 152