{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Walkthrough for S3-helper function\n", "

\n", " Forget where you put some sensitive information on your cloud database?\n", "
Tired of downloading files from s3 only to preview them in excel, or textpad?\n", "
Want to touch and reshape data but feel it's caged off from you?\n", "
Look no further, your s3 problems are solved!\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents \n", "
\n", " This Jupyter notebook gives a walkthrough of several handy functions from the s3 module.\n", "
With best intentions, these functions mirror the use of standard libraries, while empolying the backend of popular open source projects.\n", "

The notebook highlights 7 functions:\n", "
\n", "\n", "1. List files (with wildcard) in a s3 bucket/key using ls()\n", "2. Read files into a string or bytes using read() and open() \n", "3. Read csv and json files on s3 into Pandas dataframes using read_csv() and read_json()\n", "4. Write csv and json files from Pandas dataframes to s3 using to_csv() and to_json()\n", "5. Write local files to s3 using write()\n", "6. Saving and Loading Scikit-Learn classifiers \n", "7. Moving files to new buckets and keys using mv()\n", "\n", "The only requirements are setting AWS environment variables or setting up the AWS CLI, and installing the `requirements.txt` modules.\n", "\n", "For this tutorial, we'll use the red wine quality dataset from UCI Center for Machine Learning and Intelligent Systems." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os\n", "import s3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Listing files in a S3 bucket and key using ls( )\n", "

\n", " s3.ls will list all the files and directories in a bucket/key akin to os.listdir()\n", "
see the code\n", "

" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "s3_path = 's3://prod-datalytics/playground/'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It takes in a bucket or bucket, key pair." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['s3://prod-datalytics/playground/json_bourne.json',\n", " 's3://prod-datalytics/playground/wine_is_fine.csv',\n", " 's3://prod-datalytics/playground/wine_is_not_fine.tsv']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.ls(s3_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "s3.ls also supports regex-like wildcard patterns exactly like glob.glob()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['s3://prod-datalytics/playground/wine_is_fine.csv']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.ls(s3_path + '*.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With a programmatic method of getting s3 file paths, we can start doing some cools stuff.\n", "
top" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read files in s3 with open() \n", "\n", "see the code" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'s3://prod-datalytics/playground/wine_is_fine.csv'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f = s3.ls(s3_path + '*.csv')[0]\n", "f" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we can open the file as a streaming body of bytes." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.open(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "this is helpful sometimes, but typically we want to read a file like Python's native \n", "
 Open(filename, 'r') as f: \n", "
   f.read()\n", "
" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality\\n7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.read(f, encoding='utf-8')[:200] # displays the first 200 characters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more structured data, we can leverage Pandas' parsing engines...\n", "
top" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read S3 files to memory with read_csv( ) and read_json( ) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", " s3.read_csv and read_json are identical to their Pandas' \n", " ancestor and backbone.\n", "
\n", " Using this handy function, you have data displayed in a nice tabular format:
\n", " see the code\n", "

" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
07.40.700.001.90.07611.034.00.99783.510.569.45
17.80.880.002.60.09825.067.00.99683.200.689.85
27.80.760.042.30.09215.054.00.99703.260.659.85
\n", "
" ], "text/plain": [ " fixed acidity volatile acidity citric acid residual sugar chlorides \\\n", "0 7.4 0.70 0.00 1.9 0.076 \n", "1 7.8 0.88 0.00 2.6 0.098 \n", "2 7.8 0.76 0.04 2.3 0.092 \n", "\n", " free sulfur dioxide total sulfur dioxide density pH sulphates \\\n", "0 11.0 34.0 0.9978 3.51 0.56 \n", "1 25.0 67.0 0.9968 3.20 0.68 \n", "2 15.0 54.0 0.9970 3.26 0.65 \n", "\n", " alcohol quality \n", "0 9.4 5 \n", "1 9.8 5 \n", "2 9.8 5 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = s3.read_csv(f, sep=',')\n", "df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "a csv is the most simple use case, we can handle alternative delimiters and json files too." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['s3://prod-datalytics/playground/json_bourne.json',\n", " 's3://prod-datalytics/playground/wine_is_fine.csv',\n", " 's3://prod-datalytics/playground/wine_is_not_fine.tsv']" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "files = s3.ls(s3_path)\n", "files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "here are tab-separated values (tsv)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "We can read the s3://prod-datalytics/playground/wine_is_not_fine.tsv tsv easily.\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
15966.30.5100.132.30.07629.040.00.995743.420.7511.06
15975.90.6450.122.00.07532.044.00.995473.570.7110.25
15986.00.3100.473.60.06718.042.00.995493.390.6611.06
\n", "
" ], "text/plain": [ " fixed acidity volatile acidity citric acid residual sugar chlorides \\\n", "1596 6.3 0.510 0.13 2.3 0.076 \n", "1597 5.9 0.645 0.12 2.0 0.075 \n", "1598 6.0 0.310 0.47 3.6 0.067 \n", "\n", " free sulfur dioxide total sulfur dioxide density pH sulphates \\\n", "1596 29.0 40.0 0.99574 3.42 0.75 \n", "1597 32.0 44.0 0.99547 3.57 0.71 \n", "1598 18.0 42.0 0.99549 3.39 0.66 \n", "\n", " alcohol quality \n", "1596 11.0 6 \n", "1597 10.2 5 \n", "1598 11.0 6 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"We can read the {} tsv easily.\".format(files[-1]))\n", "\n", "df = s3.read_csv(files[-1], sep='\\t')\n", "df.tail(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "here's a json file" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "We can also read the s3://prod-datalytics/playground/json_bourne.json file easily.\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
alcoholchloridescitric aciddensityfixed acidityfree sulfur dioxidepHqualityresidual sugarsulphatestotal sulfur dioxidevolatile acidity
128910.20.0680.300.999147.020.03.3054.51.17110.00.60
60710.50.0920.410.998208.826.03.3163.30.5352.00.48
67510.20.0640.390.998409.312.03.2652.20.6531.00.41
\n", "
" ], "text/plain": [ " alcohol chlorides citric acid density fixed acidity \\\n", "1289 10.2 0.068 0.30 0.99914 7.0 \n", "607 10.5 0.092 0.41 0.99820 8.8 \n", "675 10.2 0.064 0.39 0.99840 9.3 \n", "\n", " free sulfur dioxide pH quality residual sugar sulphates \\\n", "1289 20.0 3.30 5 4.5 1.17 \n", "607 26.0 3.31 6 3.3 0.53 \n", "675 12.0 3.26 5 2.2 0.65 \n", "\n", " total sulfur dioxide volatile acidity \n", "1289 110.0 0.60 \n", "607 52.0 0.48 \n", "675 31.0 0.41 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"We can also read the {} file easily.\".format(files[0]))\n", "\n", "df = s3.read_json(files[0])\n", "df.sample(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "they're actually all the same file-- in different formats!
\n", "If you're new to Pandas, you'll be happy to learn that it is the de-facto tool for data manipulation.
" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "alcohol float64\n", "chlorides float64\n", "citric acid float64\n", "density float64\n", "fixed acidity float64\n", "free sulfur dioxide float64\n", "pH float64\n", "quality int64\n", "residual sugar float64\n", "sulphates float64\n", "total sulfur dioxide float64\n", "volatile acidity float64\n", "dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Getting basic stats and distributions are a function away.." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
alcohol1599.010.4229831.0656688.400009.500010.2000011.10000014.90000
chlorides1599.00.0874670.0470650.012000.07000.079000.0900000.61100
citric acid1599.00.2709760.1948010.000000.09000.260000.4200001.00000
density1599.00.9967470.0018870.990070.99560.996750.9978351.00369
fixed acidity1599.08.3196371.7410964.600007.10007.900009.20000015.90000
free sulfur dioxide1599.015.87492210.4601571.000007.000014.0000021.00000072.00000
pH1599.03.3111130.1543862.740003.21003.310003.4000004.01000
quality1599.05.6360230.8075693.000005.00006.000006.0000008.00000
residual sugar1599.02.5388061.4099280.900001.90002.200002.60000015.50000
sulphates1599.00.6581490.1695070.330000.55000.620000.7300002.00000
total sulfur dioxide1599.046.46779232.8953246.0000022.000038.0000062.000000289.00000
volatile acidity1599.00.5278210.1790600.120000.39000.520000.6400001.58000
\n", "
" ], "text/plain": [ " count mean std min 25% \\\n", "alcohol 1599.0 10.422983 1.065668 8.40000 9.5000 \n", "chlorides 1599.0 0.087467 0.047065 0.01200 0.0700 \n", "citric acid 1599.0 0.270976 0.194801 0.00000 0.0900 \n", "density 1599.0 0.996747 0.001887 0.99007 0.9956 \n", "fixed acidity 1599.0 8.319637 1.741096 4.60000 7.1000 \n", "free sulfur dioxide 1599.0 15.874922 10.460157 1.00000 7.0000 \n", "pH 1599.0 3.311113 0.154386 2.74000 3.2100 \n", "quality 1599.0 5.636023 0.807569 3.00000 5.0000 \n", "residual sugar 1599.0 2.538806 1.409928 0.90000 1.9000 \n", "sulphates 1599.0 0.658149 0.169507 0.33000 0.5500 \n", "total sulfur dioxide 1599.0 46.467792 32.895324 6.00000 22.0000 \n", "volatile acidity 1599.0 0.527821 0.179060 0.12000 0.3900 \n", "\n", " 50% 75% max \n", "alcohol 10.20000 11.100000 14.90000 \n", "chlorides 0.07900 0.090000 0.61100 \n", "citric acid 0.26000 0.420000 1.00000 \n", "density 0.99675 0.997835 1.00369 \n", "fixed acidity 7.90000 9.200000 15.90000 \n", "free sulfur dioxide 14.00000 21.000000 72.00000 \n", "pH 3.31000 3.400000 4.01000 \n", "quality 6.00000 6.000000 8.00000 \n", "residual sugar 2.20000 2.600000 15.50000 \n", "sulphates 0.62000 0.730000 2.00000 \n", "total sulfur dioxide 38.00000 62.000000 289.00000 \n", "volatile acidity 0.52000 0.640000 1.58000 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Everything is indexed!
\n", "Here we get a quick calculation for the 75th percentile of alcohol content." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "11.1" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()['alcohol']['75%']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's easy to filter a dataframe:
\n", "Here we're going to get all the heavily alcoholic wines..." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
alcoholchloridescitric aciddensityfixed acidityfree sulfur dioxidepHqualityresidual sugarsulphatestotal sulfur dioxidevolatile acidity
4513.10.0540.150.99344.68.03.9042.10.5665.00.52
9512.90.0580.170.99324.717.03.8562.30.60106.00.60
13113.00.0490.090.99375.617.03.6352.30.6399.00.50
13213.00.0490.090.99375.617.03.6352.30.6399.00.50
14214.00.0500.000.99165.227.03.6861.80.7963.00.34
\n", "
" ], "text/plain": [ " alcohol chlorides citric acid density fixed acidity \\\n", "45 13.1 0.054 0.15 0.9934 4.6 \n", "95 12.9 0.058 0.17 0.9932 4.7 \n", "131 13.0 0.049 0.09 0.9937 5.6 \n", "132 13.0 0.049 0.09 0.9937 5.6 \n", "142 14.0 0.050 0.00 0.9916 5.2 \n", "\n", " free sulfur dioxide pH quality residual sugar sulphates \\\n", "45 8.0 3.90 4 2.1 0.56 \n", "95 17.0 3.85 6 2.3 0.60 \n", "131 17.0 3.63 5 2.3 0.63 \n", "132 17.0 3.63 5 2.3 0.63 \n", "142 27.0 3.68 6 1.8 0.79 \n", "\n", " total sulfur dioxide volatile acidity \n", "45 65.0 0.52 \n", "95 106.0 0.60 \n", "131 99.0 0.50 \n", "132 99.0 0.50 \n", "142 63.0 0.34 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_alcoholic = df[df['alcohol'] > df.describe()['alcohol']['75%']]\n", "df_alcoholic.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's also stupid easy to plot-- as Pandas extends the Matplotlib package." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#this line is run once, typically at the beginning of the notebook to enable plotting.\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAEPCAYAAACHuClZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuYnGV9//H3N9nTZHO2Ichpw5mEUxJKSIuHjYhE+lMs\nHiBWQU1jFDZQRUukmqQilmDBJlAMiZGEakjqgSoeCEay5YoWlxJCwN0gVBMBNZtWRUCESL6/P557\ndp+dnZmd2WdmZ2bzeV3Xc+3Mc7xns7m/c5/N3REREUliRKUTICIitU/BREREElMwERGRxBRMREQk\nMQUTERFJTMFEREQSK2swMbO1ZrbXzHbmOWelmT1hZjvMbMZA15rZUjN72sy2h21uOT+DiIgMrNwl\nk9uB83IdNLM3A8e6+/HAQuALBV57k7vPDNs9JUutiIgMSlmDibtvA36b55QLgDvCuT8GxpnZ5AKu\ntVKmU0REkql0m8nhwFOx98+EfQNpC9ViXzSzceVJmoiIFKrSwWQwbgWOcffpwK+BmyqcHhGRg15d\nhZ//DHBk7P0RYV9O7r4v9nYNcHeuc81ME4+JiAyCuxfVnDAUJRMjdxvHt4BLAMxsNvA7d9+b71oz\nOzT29kLgsXwPd/ea3ZYuXVrxNBys6a/ltCv9ld9qPf2DUdaSiZltAFqBV5nZL4ClQAPg7r7a3b9r\nZueb2ZPAC8D7813r7rcDN5jZdOAAsJuoF5iIiFRQWYOJu7+7gHPairnW3S9Jmi4RESmtWmyAP2i0\ntrZWOgmJ1HL6azntoPRXWq2nfzBssPVjtcDMfDh/PhGRcjAzvAob4EVEZJhTMBERkcQUTEREJDEF\nExERSUzBREREElMwERGRxBRMREQkMQUTERFJTMFEREQSUzAREZHEFExERCQxBRMREUlMwURERBJT\nMBERkcQUTEREJDEFExERSUzBREREEitrMDGztWa218x25jlnpZk9YWY7zGzGQNea2QQzu9fMHjez\nzWY2rpyfQUREBlbuksntwHm5DprZm4Fj3f14YCHwhQKuXQxscfcTgfuAT5QuuSIiMhhlDSbuvg34\nbZ5TLgDuCOf+GBhnZpMHuPYCYH14vR54W8kSLCIig1LpNpPDgadi758J+/I5xN33Arj7r4FDypQ2\nEREpUKWDSSl4pRMgInKwq6vw858Bjoy9PyLsy2evmU12971mdijQne/kZcuW9bxubW2ltbV1cCkV\nERmm2tvbaW9vT3QPcy/vF3szmwLc7e6nZjl2PnC5u/+Vmc0G/sXdZ+e71syWA79x9+VmdjUwwd0X\n53i2l/vziYgMN2aGu1tR15QzszWzDUAr8CpgL7AUaADc3VeHc24B5gIvAO939+25rnX3281sIvDv\nRCWaPcC73P13OZ6vYCIiUqSqCyaVpmAiIlK8wQST4dAALyIiFaZgIiIiiSmYiIhIYgomIiKSmIKJ\niIgkpmAiIiKJKZiIiEhiCiYiIpKYgomIiCSmYCIiIokpmIiISGIKJiIikpiCiYiIJKZgIiIiiSmY\niIhIYgomIiKSmIKJiIgkpmAiIiKJKZiIiEhiZQ0mZrbWzPaa2c4856w0syfMbIeZTY/tn2tmu8zs\np2Z2dWz/UjN72sy2h21uOT+DiIgMrNwlk9uB83IdNLM3A8e6+/HAQmBV2D8CuCVcezIwz8xOil16\nk7vPDNs9ZUu9iIgUpKzBxN23Ab/Nc8oFwB3h3B8D48xsMjALeMLd97j7fmBjODfNypRkEREZhEq3\nmRwOPBV7/3TYl2t/WluoFvuimY0rfzJFRCSfukonIEMhJY5bgU+7u5vZZ4CbgPm5Tl62bFnP69bW\nVlpbWxMmUURkeGlvb6e9vT3RPczdS5OaXA8wawHudvfTshxbBWx1903h/S7g9cDRwDJ3nxv2Lwbc\n3ZcXeu9w3Mv9+UREhhszw92Lak4YimouI3eJ41vAJQBmNhv4nbvvBR4EjjOzFjNrAC4O52Jmh8au\nvxB4rFwJFxGRwpS1msvMNgCtwKvM7BfAUqCBqJSx2t2/a2bnm9mTwAvA+4kOvmJmbcC9RAFvrbt3\nhdveELoQHwB2E/UCExGRCip7NVclqZpLRKR41VrNJSIiw5yCiYiIJKZgIoOyb98+HnzwQfbt21fp\npIhIFVAwkaLdeecmWlpO4txzP0RLy0nceeemSidJRCpMDfBSlH379tHSchIvvrgVOA3YSSo1hz17\ndjFp0qRKJ09ESkAN8FJ2u3fvpqFhClEgATiN+voWdu/eXblEiUjFKZhIUaZMmcLLL+8G0qsK7GT/\n/j1MmTKlcokSkYpTMJGiTJo0ibVrbyWVmsPYsTNJpeawdu2tquISOcipzUQGZd++fezevZspU6Yo\nkIgMM4NpM1EwkSHX1dVFR0cHs2bNYurUqZVOjohkUAO8VL1Fi/6OadPO4H3v+yzTpp3BokVXDniN\nxrSIVD+VTGTIdHV1MW3aGcADpLsVw2w6Ox/KWUK5885NzJ9/GQ0NUcP/2rW3Mm/eRUOYapGDj0om\nUtU6OjqAI4l3K4Yjwv7+9u3bx/z5l/Hii1t59tmHePHFrcyff1nJSygq+Ygkp2AiQ2bWrFlEqzH3\ndiuGp8P+/oZiTItG84uUhoKJDJmpU6fS1rYAmA2cAMymrW1Bziquco9pGaqSj8jBQMFkmChHVU05\n7nnzzSvYtu37LFkyj23bvs/NN6/IeW65x7RoNL9ICbn7sN2ijzf8bdiw0VOpiT5u3ExPpSb6hg0b\nB7ymu7vbOzo6vLu7u2T3rFRaB6u7u9tTqYkOjzi4wyOeSk0s+XNEak3IO4vLb4u9oJa2gyGYDCZD\nHChDL1cmW42Zd/p3MXbsjJIGTZFaNphgomquGldsVU0h7QTlqP7Zt28f3/3ud6mraynpfZOaN+8i\n9uzZxZYtt7Fnzy51OxYZpLIGEzNba2Z7zWxnnnNWmtkTZrbDzKbH9s81s11m9lMzuzq2f4KZ3Wtm\nj5vZZjMbV87PUO2KbaTuDRT1wHqgvl+GXuqG73SPqUWLbuS553blvW9XVxfr16+nq6trUM8ajEmT\nJnHmmWdqWhiRJIotyhSzAa8BpgM7cxx/M/Cd8Pos4IHwegTwJNBClOvtAE4Kx5YDfx9eXw1cn+f5\nJS78Vadiqmq6u7t9xIiUQ8rhBIeUjxjR1K+qqVTVP/2rtpY7pHzMmOn97tvWdmWfdLW1XTGoZ4pI\nMlRjm0kICLmCySrgotj7LmAyUd/R78X2LwauDq93AZPD60OBXXmeXcrfb1XL10jd2dnp69at887O\nTu/s7AwZdm+7BaS8s7OzqHsWqqOjw8eNmxmeFW2jR5/i69at63PfYtIlIuU1mGBSV/KiTnEOJxrF\nlvZ02Jdtf3pk22R33wvg7r82s0OGIqHVbtKkSVmraRYt+jtuuWU10cjzpzjnnLPJNQo9c7xHrnsW\no2+VWTSFyiuv/JLzzz+/z73zjY7XZJAi1a/SwSRTUXPBBHkn31q2bFnP69bWVlpbWwfxiNrU1dUV\nAknvXFg/+MEsolrE3sw93yj0pNJjRebPn0N9fQv79+/JOlak7+j44tKl6fBFkmlvb6e9vT3ZTYot\nyhS7UVw11y56q7nuie2PV3N10beaqyvPs0tX7qtB69atC+0P2xyWhJ/H+znnnBuqlI4fsG2iVGM8\nCrlPW9sVBacrrVzjYUQOZlRpm8kU4NEcx86ntwF+Nr0N8CPpbYBvIGqAnxqOLY8FFjXA5xG1Q9T1\nyaBhZE/bSbodJZdVq1Z7Y+N4HzNm6MZgFJKutGoctyIyHFRdMAE2AL8EXgJ+AbwfWAh8MHbOLSFw\nPALMjO2fCzwOPAEsju2fCGwJx+4Fxud5ful/yzVk27ZtWRu1t23b1u/czEx81arV/a6ttow6W+P+\n2LEzvKOjoyLpKSYQilSzqgsmld6GSzAZbFXTkiVLQonEY9txvmTJkj7nZXbJnT//b72xcazD6X2u\nHTNmesUy6myGsmQy0L+BujXLcKJgMgyDSZI2gXwlk3Tm2HvOVoeO8LPRm5tPceibUTc2jq+qkol7\n9PtpaBjncKzDKK+vH13y6riB/g1ydWvetGlT1f2+RAqhYDLMgkkpvnmfeur0kNEd55DyU0+d3idz\njDLiCSFwzAw/Dwn7l4f3pzmkfNWq1SX/fEkb97u7u72pabzDVxy6S146KeTfoLejg/cpATY1HaVO\nAVKTFEyGUTDp7u72devW+Zgxpw66TaA3I1wXenOt81RqYsh805nj1qzfqpcu/UdPpSb66NGneGPj\n2JIHklL1wip3u0kh989VMoHOqmxrEhmIgskwCSbpjHbMmBkhU1o+qJJJtoywsXGap1Lxb9Ed/dpV\nGhqm9pQYhnLq982bNxf9rHK3mxR6/8xuzXBFWYKbyFBQMBkGwSRb5gUpHz36lKK/wWe/1yiHprwl\nk3RmWa5gki3IwbHe3HzioEop5Z5GvtD7d3Z2+sqVK0PnhertBScyEAWTYRBMejPazlA91eljxkzv\nN5dVoTZs2OhNTRM8aqCe4LDR05Mtjh59uqdSE72t7Yp+mWW+aqikQSZ7kJuQqM2jXIFvMPfXGilS\n6xRMhkEwKWRW32Izzuuuu85hcghQHrajfcmSJT33iN8zX9VOqdo60vdpbj4tlJY21ly1UL5/h1IG\nt3IHSpFMZQsmwFuAEcXevNJbrQSTYmb1jWfmjY1jfeHCD/cbJBfPfPqWTMY7XNdTtZVrcF2uRufN\nmzcX3T4xUIa7efPmfh0CGhvH+rZt26o6Ax2qaVw0XYxUQjmDyZeB/wFuIKwrUgtbLQSTaLBbk8NR\nDk1h3qzMbqbH91Rz9Wbm6euOdGjqGSQXz3yamib4yJHNWdtM3vSmN+dMU66SyeLF13jUxbhvkMlV\niig0I0yf19R0tEPK6+uPckh5KnVqVWagQzVYUtPFSKWUtZoLGBumQnkA+C/gg8CYYh84lFu1B5Oo\nFNIY2gtmhp8NYV//ksnmzZu9ufn0UF3V/7q77747jA/JDB7dsQAww+ErBa8Tn673X7VqdShBTAj3\n7Xb4ijc1ZR/IWGxG2NnZ6Y2N40Opqfoy0HgJa6imcam26WLk4DGYYFLwsr3u/nvga8BG4NXAXwPb\nzWxRofeQvrZs2UI0p2U78FD4WcdZZ80kmvfyBGA273znW7n//m1ccMFFvPDCE0TL7fa/7i1veRsv\nv/xn9F0T5NXA98P7dqIC5gxefHECt922JmfaMtdGnzlzOo2NxwBfIFpAcwqwhFdeOcCWLff1uz5a\nBvjwjLQclnO99+eff56mpmOA5nDv0q4//+CDD/ZZ574Y6WWHzz33Q7S0nMT27TtKuqxxLqVePlmk\nrAqJOMAFwF3Ao8DHgUPC/lHA7mIj2FBtVGHJJP4Nd9OmTaEtw2PbsV5f3+yf+9yNvnDhh72hYbSP\nGnVCqNJaHhqqG7NeF5VqJnhmqSZqKzk0vD41nJPKWarIle6opFFYyaHYlROLvX+himlzyNa+k6uE\ntWrV6iHpsaWeYVIJlLHNZD3wuhzHzin2oUO1VVswyczYVq1a7XV1YzIy3DEOWz2Vmuh1dc0O4xxO\nDD9HOax0+HTWjBqOCcFmYqjOGhUCz7pw377nNzefVFSVyYYNG8MYir5tOtmqXjo6OjyVOjqWlone\n0PBq37x5c885mZl3+v51da9yaPJUqvixNXHFVLXlCjr5qpqGqpeVenPJUCtnMFleyL5q26opmOT7\nhhu1cxznveNA3FOpqQ4jQxCZGYJFb3dhMI/PudX7vrc9IyrNpANK5uzBx3t9fXPRGVRv20b+DLpv\nSeO68DmOyzuOJXPm3Xe+86IB05cvoy20zSFf0Kn2RnAFGimHcgaT7Vn2ZV09sZq2agom+TK2KIMe\nGzJe9/Sgwt6BhjdmlETucqh3uCaUOu4Kx+kTYMwawvu7PKrq6lsy+dznbhzUZym06iXqljw+BLTe\nZzc1je+XQUfdl5v6pTHf2iADVWEVGggGCjrVWtWkbsNSLiUPJsCHQzvJH4haAdPbz4EvF/uwod6q\nKZgMlLH1zu10tPevwhrnURVWt8PF3ncOqEM8qko61GFsCCL1ITh92uGwcDx93yjQfOADCxJ9noEW\ngkofX7NmTeiB1ptRNzef0G9fKnWKR92jPbZFXaIH8/tMKyQQFHKvaisBVHuJSWpbOYLJOKKuNXcS\nLaGb3iYW+6BKbNUUTNxzZ2x9q4SyTWc+NZRExmQJNOmSR71H1Vl1DiM9lTrRozXf4+dvdWj0L3/5\nyz1pGkwmma3tJ36PzOqqESP6ljhKUTIpptts5uj+bJ+3Wksfuajb8MFlqL/MlCOYjA0/J2bbin3Y\nUG/VFkzcs/9RrFmzxhsaJjvc7XBTlgCQrq76dMiguz2a7bc7lDQOyyit1IeMuS0Eot4MJz0jsHv/\noHDttdcN+Mearc0EUj5mTDTA8HOfy6ySi443No7NOvdXfF/mzLv5VisczDfzQqrFCvkPWw2lFJVM\nDh6VqM4sRzD5dvj5c+Bn4Wd6+1mxDxvqrRaCybnnvjlk/K8KPyc7vNp7q67SbSfHea7BitG+rd67\nUmLKo3aKkWFb1y/D6VsaSl83ypuaxudtA8nWmytaPKvD4RGvr2/Ocvx4X7lyZdZut5n7illHvZjS\nRKky32pqp6i10pQUr1JfGsrWAJ9kA+YCu4CfAldnOT4e+AbwCNHo+mmxY1eGNptHgStj+5cCTwPb\nwzY3x7NL/TtOJDMj+tjH/t57q6+O86iRPP3Nfon3Nlx3h8Cx1bN18e1tN0mvlHhoCErx0sqofl1e\nU6ljMq6b4rlGx+cbBxK9j0bZNzeflLVkUkhwGIxCSwlJq4XS84hVW2mg2FJSNZSqpHCVqs4sZ2+u\ns4Hm8Po9wE3AUQVcNwJ4MrSz1AM7yJjbi2i+r0+F1ycCW8Lrk0NjfyPRcO/vA8d4bzD5aAHPL8sv\nejD6f8NY7r3VV5njTPD+3Xk3hvOP9L7Toxzv2aZfybbvpptu6klP7tUBt+UcN9L7R50ey5IOVJ/y\ndOkmlZroH/jAAi+0umqoJPmG1zvD8YlezNxk1aaaSlVSmGFXMgmZugGnAw8DlwP/WcB1s4Hvxd4v\nziydAN8Gzo69fxKYBLwDWBPb/0ngY94bTK4q4Pml/y0PUt/MuNNhtEfVWpkj2Y/z3mqreDXUp7xv\nd+GN3tvwfljs+m6Hwz0qmfS971VXXdUnPanUqRnnHO8wxuvrR+cpmfSd3ffd736Pxxvb04EjX3VV\nKb4dJ+k4UEy1UN/PnS4hVk/JpFCFZkoquVSfSlRnljOYbA8/lwDz4/sGuO7twOrY+/cAKzPOuQ64\nMbyeBbwMzABOCtVjE4imbfkRsMJ7g8nPQ0nni8C4HM8vz2+6QPEMtfc/83KPqrNeHYJGZulgvMPJ\nHvXEmu25uwunQjBKlw7eHQJMerr5vsv9QsrvvvvunrRlbzOJ3tfXj8m6hG62yR9L3QheiCT3KDaz\n7F/NsNFhlDc3n1ZT3+4LqS5RyaV61XxvLu/NlP8T+ERo9zg0VF89WsB1hQSTMcCXQtvHeuDHwGnh\n2PuB/yaaofBfgZvC/kmAhdefAdbmeL4vXbq0Z9u6dWt5fvNZZHaPbWu7wletWh32pTPuL4RSxfgQ\nFCY6rA5BJj16/ZGQ2Z+cUYo41qNR7vHgki3gpAPRmH4ZRG8a4+uWpzPL0wfs8VRsfW4piuxDXezP\n9rympvGDWq++kgb6val32MFt69atffLKcgaTQ4GPAq8N748CLingutnAPbH3/aq5slzzc2B0lv3X\nAR/Ksr+FHKPxK1UyydUesWnTJm9uPsWj3lVHheBxnKcHG0ZBJj025CseNYy7R9VimffLnFr+OIcj\nMgLOKR415N/VU+rIN01IdE7fKezzZSjFZkClaEysRIPkcOk1le9zaNyKxJUtmAx2I2o4TzfAN4Rq\nqakZ54wD6sPrBcC62LFJ3hu8Oukd93Jo7JyPABtyPL/Ev+LCrFu3zvvPhXWcv+Y1rw9BYZr3DjZc\nF7bGsKVLKd3e22uqw6HFeydNHOdRY3w8EDSFfVszgkM64Mxw6OjJILJlHtkCUmaGkm1yxqHsnlup\nb9DDpS0h1+dQyUTiylkyuRB4AngW+D3wHPD7Aq+dCzwerl8c9i0EPhhezw7Hu4jWSxkXu/Z+4DGi\nRv/W2P47iDoF7AD+A5ic49nl+l3ntW1b5sjzdGaf3tfhUXfeBofm8PM4jxrlR3nvSPB0G8gU7y25\ndIQgVOe9VVTptpeTw890t+B4m8nAJZPGxvEZS+j2zVBy1akXk9GW4lv+cCkpVBv9XiWtnMHkycwS\nRS1slQomHR0dYenZ3unXR4w4xHu7lW7LCAYph+net/G8PpQS0u0n9eH4jPCzLhz7syyBa5SbNXpD\nw7gw51XKGxpe7U1N4/uMcs+WeQw85Uvyb66V6s0lA9PvVdzLG0x+WOyNq2GrVDDJ1lMqGjmezvSv\nyRIAUh4fqR69nxJKK6kQWJo86vY7Jmx3edQmktnF91hfs2aNd3d3+7XXXueNjeO8vn6qR+uqHz1g\nqSLbPtWpixw8yhlMVgCbgHmhyutC4MJiHzbUW6WCiXv/b/3XXntdKK2MD1Vb/dtUosCQft87RUlU\n9ZWeHmVcKJWkx6fE21a85/ympvGxtVIyG9m3lqytorOzU99kRYaZwQSTOgozlmga+jfF9jnRNCgS\ndHV10dHRwaxZs5g37yLe+MY3sHv37p41u5csuZZo7OdEotlgdhKtdb4TeAY4JtxpZzg+hagX9KvD\n8caw72fAU7HrryZqejoM+A3wJf74x6lceeXrefnlifRdh70FaO5ZV33SpEkFfbZJkyaxdu2tzJ8/\nh/r6Fvbv38P8+e/ljDNeQ0NDtFb52rW3Mm/eRYP51ckQif+NTp06tdLJkeGk2OhTSxtDWDLJNq4k\nrrdRfmsocZzlfVdKPMqjLsLZBxzmHrR4SqwNpe9UK6NGner9p1UZXMkkLV0F1tnZqd4/NWagv1GR\nNMpYzXUC8APgsfD+NOCTxT5sqLehCia5xpXEpxJZsmSJ984GnJ5YcUKo8hoX2zcxVGelA8sEj5a9\n7buYVFQNdkMIFmNCNVj/zD1aR35CCDpRT69sa5DEFdIIqzaU2lLI36hI2mCCyYgCCzBriEbA7w85\n9E7g4hIWkGpaR0cHcCR9q5OOCPsjZ555JvA7YCvwUPj5InCAqAd0fN9IYDlQR9RregG91VqEn08C\nnyWabeZlokkE3kNU3XU8MJv589/DHXespanJaW5+mcbGeq699go+//nr+chHFnPuuR+ipeUk7rxz\nU08677xzEy0tJ2U9FjdlSlS1FU/T/v17eqr0pLoU8jcqkkghEQd4MPx8OLZvR7GRa6g3hrxkkh6E\neFe/b30dHR0+YkTmOh+He+biVXCsn332a73vdCru/deF3xir7mr2/hND9h1TEl9pMFf1VLHdfzUu\noXaoZCLFoIzVXN8DjqV3wsd3EJsNuFq3oQom7u6nnDLD4+NGTj11ep9MPPt/5uas/8FXrlzpnZ2d\nfs4553q8XWXOnDf4iBFHef9pVNJjTvr2EBt4Kvm+5w2m6krjEmpHMStZysGtnMHkGGALUY+uZ4Bt\nQEuxDxvqbehLJn2DQn19szc3n9gzWDCVOjqUHk4LpY4bPRqMONajsSLjHRp6Zhmurx/jUXvKiaGk\nkllaSc8yPMp7200GnmK8VCUTqT3FrGQpB6/BBJO8XYPN7KOxt98lqtQfAbxANCPwTYnq2IaJLVu2\nAEfQtz76cPbv383+/X8CnE9/+p8YObIOuIuojePtwBuBDwOrgd8Cf6St7UNMnTqVe++9l/37XyGK\n2/XAGUQLUXYBrUTdi38Trl8Rzus91tj4f6xdu6pf199sXXzXrr2157x8x6T2TZ06VV2CpSwGGmcy\nJvw8ETgT+CbRQIn3Amq5CyZPnkz2cSMponksf8P+/X/kqquuYMWKtzNy5GE8//zLRBn/FKCeESP2\ncf/9P+Dss8+O3fmwcL/1RI2n9cAfgbuJ+j98MNzjq+G804A3ADO4447beNe73pU1vZljYOLBIt8x\nEZFc8gYTd/9HADO7H5jp7s+F98uA75Q9dTVizpw5jBjhHDgwGzicKJD8iagkkQ4uf8GMGaezZ88u\nNm7cyOLFt/CHP/wH8Dwwmvr6/0dXV1dPMDnyyCOpr+9m//6dRGuG/ZyodHIkUc+u/TQ2foH6+u/w\n/PPP0BvIfkVDw/PMmTMnb5onTZqUM1DkOyYikk2hXYMnE9XNpL0c9glR5vvlL6+joWEksAf4ONFI\n83i116sZP348W7bcx9VXL+MPfzgA/AVRTeFf8tJLxoIFV3Deeedz552bOOOM11BX9ypgNiNH/hVR\n3H+AqKvwA0A9P/jBd7nvvi+xatUKUqk5NDefTio1h3XrblMwEJEhlV6tMP9JZv8AvIuowh/gbcAm\nd/+nMqYtMTPzQj5fqezbt4/bblvDddd9jj/+8WXgv0iXTBoaXseOHf/FGWe8hhdf3ErUvrGQaOhO\n73kwm8bGBl566f7YOSOAPyNa6DLtBNat+wcuvfTSnmerakpESsHMcHcr6ppCM1szmwm8Nry9390f\nLjJ9Q26og0navn37uPzyRXz1q98iXe3V1raASy55D+ee+yGeffYeoiXubwZuJBqwmHY8jY0HeOml\nB8I5W+nbAN8bdDo7H1JjqoiU3GCCSaETPeLu24nWaZcCfPvb3yfqANcMvMDatW/nsss+xEsv/QxY\nRRRkzgUW0b/hvgH4PlHjfLqqbAHR6PYjgKdpa1ugQCIiVaPQNhMpwu7du2lomELU0+pMoJX6+ha+\n/vW7OHDAiRaKfBK4HbgVeB1RkDiLN72pldtvv42mpsuBXfROVzKfxsZ6Vq5cRGfnQ9x884oh/lQi\nIrkVXM1ViypZzdXSclJoG4lKHE1Nr8dsRJ99UfvIJF56aR91dS2MHPlrbr99FfPmXdTT/vLZz97Y\nZ8xHfIp3TScuIuVQ1jaTWlSpYALRhInz51/WEwiuueYq/vmfv86zz/a2j4wePZ2XXvof9u//IekA\nk0rNYc+RLLO7AAAQZ0lEQVSeXT2N6Lka1hct+jtuuWU16a7CbW0LVFoRkZKoymBiZnOBfyGqUlvr\n7sszjo8nmvL2WKIpcz/g7p3h2JXA34ZT17j7yrB/AtHKjy3AbuBd7v5slmdXLJhA30AA9CutNDa+\nnoaGI3nuuZ0914wdO5MtW24Lswxn19XVxbRptdsgr55nItVtMMGkrG0mZjYCuAU4DzgZmGdmJ2Wc\ndg3RbMSnA5cC6YBxMjAf+HNgOvAWM0svRbgY2OLuJwL3EU2PX3ZdXV2sX7+erq6ugs6fNGkSZ555\nZs8gwLVrbyWVmsPYsTNJpeawYsUN/OlP6QGHUOg07rU8nXihU9yLSI0pdjKvYjai7kffi71fDFyd\ncc63gbNj758kWqv2HUSlkfT+TwIfC693AZPD60OBXTmeX+T0ZrmVapW6zFl2BzONe61OJ66JJEVq\nA+WaNXiwG9Fshqtj798DrMw45zrgxvB6FtHo+hlEgyx2Ea3+NAr4EbAinPfbjHv8JsfzS/KLLXfm\nPZhp3GtxOnGtzihSGwYTTAoeZ1JG1wMrzGw78CjwMPCKu+8ys+VEAy6eT+/PcY+cDSPLli3red3a\n2kpra2vRCcxXrVSKNorBzIV1880ruOyyD9VUb66+qzNGbT1anVGk8trb22lvb090j7I2wJvZbGCZ\nu88N7xcTRbzlea75OXCquz+fsf864Cl3X2VmXUCru+81s0OBre7eLzctVQN8rTd4V5PMXm6Z3Z1F\npPKqrjeXmY0kmpnwHOBXRNPWz3P3rtg544A/uPt+M1tA1H7yvnBskrvvM7OjgHuA2e7++1Bi+Y27\nLzezq4EJ7r44y/NLEkwAFi26kltuWUN8BLq64g6OenOJVLeqCybQ0zV4Bb1dg683s4VEJZTVofSy\nHjgA/ASY76Gbb5j6fiLRbIgfcff2sH8i8O9EdU97iLoG/y7Ls0sWTECDBEXk4FCVwaSSKj3ORESk\nFlXdOBMRETk4KJiIiEhiCiYiIpKYgomIiCSmYCIiIokpmIiISGIKJiIikpiCiYiIJKZgIiIiiSmY\niIhIYgomIiKSmIKJiIgkpmAiIiKJKZiIiEhiCiYiIpKYgomIiCSmYCIiIokpmIiISGIKJiIikljZ\ng4mZzTWzXWb2UzO7Osvx8Wb2DTN7xMweMLNpsWMfMbPHzGynmX3FzBrC/qVm9rSZbQ/b3HJ/DhER\nya2swcTMRgC3AOcBJwPzzOykjNOuAR5299OBS4GV4drDgEXATHc/DagDLo5dd5O7zwzbPeX8HCIi\nkl+5SyazgCfcfY+77wc2AhdknDMNuA/A3R8HppjZpHBsJNBsZnXAKOCXseusrCkXEZGClTuYHA48\nFXv/dNgX9whwIYCZzQKOAo5w918CNwK/AJ4BfufuW2LXtZnZDjP7opmNK9cHEBGRgdVVOgHA9cAK\nM9sOPAo8DLxiZuOJSjEtwLPA18zs3e6+AbgV+LS7u5l9BrgJmJ/t5suWLet53draSmtraxk/iohI\n7Wlvb6e9vT3RPczdS5OabDc3mw0sc/e54f1iwN19eZ5rfgacBswFznP3BWH/e4Gz3L0t4/wW4O7Q\nrpJ5Ly/n5xMRGY7MDHcvqimh3NVcDwLHmVlL6Il1MfCt+AlmNs7M6sPrBcD97v48UfXWbDNrMjMD\nzgG6wnmHxm5xIfBYmT+HiIjkUdZqLnd/xczagHuJAtdad+8ys4XRYV8NTAXWm9kB4CeE6ip37zCz\nrxFVe+0PP1eHW99gZtOBA8BuYGE5P4eIiORX1mquSlM1l4hI8aqxmktERA4CCiYiIpKYgomIiCSm\nYCIiIokpmIiISGIKJiIikpiCiYiIJKZgIiIiiSmYiIhIYgomIiKSmIKJiIgkpmAiIiKJKZiIiEhi\nCiYiIpKYgomIiCSmYCIiIokpmIiISGIKJiIikpiCiYiIJFb2YGJmc81sl5n91MyuznJ8vJl9w8we\nMbMHzGxa7NhHzOwxM9tpZl8xs4awf4KZ3Wtmj5vZZjMbV+7PISIiuZU1mJjZCOAW4DzgZGCemZ2U\ncdo1wMPufjpwKbAyXHsYsAiY6e6nAXXAxeGaxcAWdz8RuA/4RDk/h4iI5Ffuksks4Al33+Pu+4GN\nwAUZ50wjCgi4++PAFDObFI6NBJrNrA4YBTwT9l8ArA+v1wNvK99HEBGRgZQ7mBwOPBV7/3TYF/cI\ncCGAmc0CjgKOcPdfAjcCvyAKIr9z9x+Eaw5x970A7v5r4JCyfQIRERlQXaUTAFwPrDCz7cCjwMPA\nK2Y2nqgE0gI8C3zNzN7t7huy3MNz3XzZsmU9r1tbW2ltbS1dykVEhoH29nba29sT3cPcc+bDiZnZ\nbGCZu88N7xcD7u7L81zzM+A0YC5wnrsvCPvfC5zl7m1m1gW0uvteMzsU2OruU7Pcy8v5+UREhiMz\nw92tmGvKXc31IHCcmbWEnlgXA9+Kn2Bm48ysPrxeANzv7s8TVW/NNrMmMzPgHKArXPYt4H3h9aXA\nN8v8OUREJI+yVnO5+ytm1gbcSxS41rp7l5ktjA77amAqsN7MDgA/AeaHazvM7GtE1V77w8/V4dbL\ngX83sw8Ae4B3lfNziIhIfmWt5qo0VXOJiBSvGqu5RETkIKBgIiIiiSmYiIhIYgomIiKSmIKJiIgk\npmAiIiKJKZiIiEhiCiYiIpKYgomIiCSmYCIiIokpmIiISGIKJiIikpiCiYiIJKZgIiIiiSmYiIhI\nYgomIiKSmIKJiIgkpmAiIiKJlT2YmNlcM9tlZj81s6uzHB9vZt8ws0fM7AEzmxb2n2BmD5vZ9vDz\nWTO7IhxbamZPh2PbzWxuuT+HiIjkVtZgYmYjgFuA84CTgXlmdlLGadcAD7v76cClwEoAd/+pu89w\n95nAGcALwDdi193k7jPDdk85P0eltLe3VzoJidRy+ms57aD0V1qtp38wyl0ymQU84e573H0/sBG4\nIOOcacB9AO7+ODDFzCZlnPNG4H/c/enYvqIWu69Ftf4HWcvpr+W0g9JfabWe/sEodzA5HHgq9v7p\nsC/uEeBCADObBRwFHJFxzkXAnRn72sxsh5l90czGlS7JIiJSrGpogL8emGBm24HLgYeBV9IHzawe\neCvw1dg1twLHuPt04NfATUOXXBERyWTuXr6bm80Glrn73PB+MeDuvjzPNT8HTnX358P7twKXpe+R\n5fwW4G53Py3LsfJ9OBGRYczdi2pKqCtXQoIHgeNChv8r4GJgXvyEUEX1B3ffb2YLgP9MB5JgHhlV\nXGZ2qLv/Ory9EHgs28OL/WWIiMjglDWYuPsrZtYG3EtUpbbW3bvMbGF02FcDU4H1ZnYA+AkwP329\nmY0ianz/YMatbzCz6cABYDewsJyfQ0RE8itrNZeIiBwcqqEBvuQGGihZzczsCDO7z8x+YmaPpgdq\n1hozGxEGlH6r0mkplpmNM7OvmllX+Hc4q9JpKoaZfcTMHjOznWb2FTNrqHSa8jGztWa218x2xvZN\nMLN7zexxM9tczT02c6T/hvD3s8PMvm5mYyuZxlyypT127CozO2BmEwu517ALJgUOlKxmfwI+6u4n\nA38BXF5j6U+7EuisdCIGaQXwXXefCpwOdFU4PQUzs8OARcDM0CmljqitsprdTvT/NW4xsMXdTyQa\nh/aJIU9V4bKl/17g5NDj9AmqN/3Z0o6ZHQGcC+wp9EbDLphQ2EDJquXuv3b3HeH180QZWebYnKoW\n/hDPB75Y6bQUK3yDfK273w7g7n9y999XOFnFGgk0m1kdMAr4ZYXTk5e7bwN+m7H7AmB9eL0eeNuQ\nJqoI2dLv7lvc/UB4+wD9x85VhRy/e4DPAx8v5l7DMZgUMlCyJpjZFGA68OPKpqRo6T/EWmyQOxr4\nXzO7PVTTrTazVKUTVSh3/yVwI/AL4Bngd+6+pbKpGpRD3H0vRF+wgEMqnJ4kPgB8r9KJKFQYjvGU\nuz9azHXDMZgMC2Y2GvgacGVGV+mqZmZ/BewNpSuj9qa9qQNmAv8a5oX7A1GVS00ws/FE3+pbgMOA\n0Wb27sqmqiRq8YsJZvYPwH5331DptBQifHG6Blga313ItcMxmDxDNCVL2hFhX80I1RNfA/7N3b9Z\n6fQU6WzgrWb2M6LxQXPM7I4Kp6kYTxN9K/vv8P5rRMGlVrwR+Jm7/8bdXyGaHPUvK5ymwdhrZpMh\nGlcGdFc4PUUzs/cRVffWUjA/FpgCPBIGkB8BPGRmA5YMh2Mw6RkoGXqxXAzUWo+iLwGd7r6i0gkp\nlrtf4+5HufsxRL/7+9z9kkqnq1ChauUpMzsh7DqH2upI8Atgtpk1mZkRpb8WOhBklmK/BbwvvL4U\nqPYvVX3SH5bF+DjwVnd/qWKpKkxP2t39MXc/1N2Pcfejib5czXD3AYP5sAsm4dtYeqDkT4CN7l4L\n/5kAMLOzgb8B3hBbz0XrtQytK4CvmNkOot5cn61wegrm7h1EpamHiSZRNWB1RRM1ADPbAPwIOMHM\nfmFm7yeas+9cM3ucKCBeX8k05pMj/TcDo4Hvh//Dt1Y0kTnkSHucU2A1lwYtiohIYsOuZCIiIkNP\nwURERBJTMBERkcQUTEREJDEFExERSUzBREREElMwESmQmS00s/dk2d9iZkXNY5Rx/VYzq6VR9iL9\nlHvZXpGqZWbmRQy0cvfb8h0uQZIqxsxGhgG/IoOikokcNEIJYpeZrQ8liSPM7Fwz+5GZ/beZbQpL\nRWNm14cFpnaY2Q1h31Iz+2h4fUY49jBweewZl5rZzbH3d5vZ68LrW82sIyx6Fp9IL1d6s6XhdjO7\nMHbOc+Gnhft3hsWkvpM+z8w+ZWY/tmixrFWxa7ea2efNrINo1L/IoCmYyMHmOOAWdz+VaEbgTwLn\nuPufAw8BHw0ry73N3U8Jixt9Jst9vgRc7u4zshzLVUq5xt1nEU3R0mpmp+RKZIFpiD/r7cBR7j4N\nuIRoYbW0m939rLBY1qgws3NavbvPcvfP50qLSCEUTORgs8fdHwyvZwPTgB+GEsYlRDNOPwu8aGZf\nNLO/Bl6M38CiJWTHufsPw65/K/DZF5vZQ0TzZk0LWy5505DF2cBXoWeyyq2xY+eY2QNhadY5RCuQ\npm0qMO0ieanNRA42L8ReG3Cvu/9N5klmNotogsF3Ek0cek7mKTnu/yf6fklrCvebAlwFnOHuvzez\n29PHsnH3V3Kkoef+YVbgvOu7m1kj8K9Ey/j+MlSvxZ/7QvYrRYqjkokcbOJB4AHgbDM7FsDMRpnZ\n8WbWDIx393uAjwKnxW/g7s8CvzWz9Doh8R5eu4HpoQ3jSKJlpAHGAs8Dz4V1Ot6cN5G507Ab+PPw\n+gKgPrz+IfD28NzJQGvY30RUFfZ/Fi249o58zxUZLJVM5GDT057h7v8bFjC6M3yDd6I2lOeAb5pZ\n+hv8R7Lc5wPAl8zsANFyB+l7/tDMdhMtf9BF1A6Du+8MU9p3ES0rvS1bmmLG5EjDmrD/YWAzvSWL\nrwNvCM99Kjz3WXd/1sy+GPb/CugY4Lkig6Ip6EWGCTNrdvcXQuP9j4GzC1nUSKQUVDIRGT6+bdEa\n8PXApxVIZCipZCIiIompAV5ERBJTMBERkcQUTEREJDEFExERSUzBREREElMwERGRxP4/ce2w8MVD\nHSUAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_alcoholic.plot(kind='scatter', x='residual sugar', y='density')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is that outlier?" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_alcoholic[df_alcoholic['residual sugar'] > 12]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After processing and normalizing the data, we may want to upload this new file to s3.\n", "
top" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write DataFrames to S3 with to_csv( ) and to_json( ) \n", "

\n", " s3.read_csv and read_json are almost identical to their Pandas ancestor and backbone.\n", "
The difference is that s3.to_csv takes the dataframe as an argument, rather than being a function of a dataframe.\n", "
see the code\n", "

" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# where will the file get stored?\n", "s3_target = 's3://prod-datalytics/playground/wine_list.tsv.gz'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use our filtered dataset, to write a new file to s3.
\n", "Using Pandas to_csv args, we have a lot of control of the output format." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\"File uploaded to 's3://prod-datalytics/playground/wine_list.tsv.gz'\"" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.to_csv(df_alcoholic, s3_target, sep='\\t',\n", " index=False, compression='gzip')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write local files to S3 with disk_2_s3( ) \n", "We can send local files to s3 too, first let's write a file to local disk using the built-in Pandas `to_csv()`." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "local_file = 'wine_list.tsv.gz'" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_alcoholic.to_csv(local_file, sep='\\t', index=False, compression='gzip')" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\"'wine_list.tsv.gz' loaded to 's3://prod-datalytics/playground/wine_list.tsv.gz'\"" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.disk_2_s3(file=local_file,\n", " s3_path=s3_target)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# purge it!\n", "os.remove(local_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving and Loading Scikit-Learn Classifiers \n", "If you're into machine learning, you're in luck!\n", "
see the code" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.ensemble import RandomForestClassifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "for the example let's just use a vanilla Random Forest Model" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf = RandomForestClassifier()\n", "clf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is where we'd train and evaluate the model..." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ACCURACY ON TRAINING SET: 0.99\n", "ACCURACY OF TEST SET: 0.61\n" ] } ], "source": [ "# fit the model!\n", "# clf.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My first run (not shown) I got a an test set accuracy of only 61%, which is pretty bad.
\n", "You should try to beat that score!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "'''\n", "write some code here:\n", "look into train_test_split, gridsearchCV, and kfolds from Scikit-Learn.\n", "\n", "This is also a great dataset to practice:\n", "scaling values (see standardScaler)\n", "dimensionality reduction (see PCA)\n", "and a linear model (see Lasso or Logistic Regression)\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you're happy with the performance, we can persist the model as a pickle file." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\"'clf.pkl' loaded to 's3://prod-datalytics/playground/models/clf.pkl'\"" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.dump_clf(clf, 's3://prod-datalytics/playground/models/clf.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And re-use it when the time is right!" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.load_clf('s3://prod-datalytics/playground/models/clf.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Movin' Files between buckets and keys \n", "In the interest of good file-keeping let's move our saved classifier to it's own special folder (key).
" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'CopyObjectResult': {'ETag': '\"fd28ec0656661ce2a86373b097a95b89\"',\n", " 'LastModified': datetime.datetime(2017, 3, 2, 0, 13, 43, tzinfo=tzutc())},\n", " 'CopySourceVersionId': 'ov3ei3i4mEGFOcBMEmi2g8atvAbVHKJx',\n", " 'ResponseMetadata': {'HTTPStatusCode': 200,\n", " 'HostId': 'ax6Q2HTAn+86P6wz6v2MWX3ZLsYoksdpqgcJtyKaXcEur80A4awZMiEEDuMLzzcydYNoyX3wBGQ=',\n", " 'RequestId': 'AE98FD0B9CE85D9F'},\n", " 'VersionId': 'OJX.ffdLzhrSD5kYc3wyMAWhtSWBIawN'}" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.cp(old_path='s3://prod-datalytics/playground/models/clf.pkl',\n", " new_path='s3://prod-datalytics/production_space/models/clf.pkl',)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "to move the file (and delete the old instance) we use `mv`, instead of `cp`." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'DeleteMarker': True,\n", " 'ResponseMetadata': {'HTTPStatusCode': 204,\n", " 'HostId': 'wevPToOM9kIl8QId3r9NzheWcq5c1Rw43kUnH9Js7Ja8N3Ah/8G5DxzfKO9JVaL4uZ8RMkSkN5o=',\n", " 'RequestId': '9E3FC788589304B4'},\n", " 'VersionId': 'tnPVq8F1usk.sBxewcQ2SUWaYOrG3KXN'}" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s3.mv(old_path='s3://prod-datalytics/playground/models/clf.pkl',\n", " new_path='s3://prod-datalytics/production_space/models/clf.pkl',)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "top" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }