{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Image Classification from scratch with TPUs on Cloud ML Engine using ResNet\n", "\n", "This notebook demonstrates how to do image classification from scratch on a flowers dataset using TPUs and the resnet trainer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "PROJECT = 'cloud-training-demos' # REPLACE WITH YOUR PROJECT ID\n", "BUCKET = 'cloud-training-demos-ml' # REPLACE WITH YOUR BUCKET NAME\n", "REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1\n", "\n", "# do not change these\n", "os.environ['PROJECT'] = PROJECT\n", "os.environ['BUCKET'] = BUCKET\n", "os.environ['REGION'] = REGION\n", "os.environ['TFVERSION'] = '1.9'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "gcloud config set project $PROJECT\n", "gcloud config set compute/region $REGION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert JPEG images to TensorFlow Records\n", "\n", "My dataset consists of JPEG images in Google Cloud Storage. I have two CSV files that are formatted as follows:\n", " image-name, category\n", "\n", "Instead of reading the images from JPEG each time, we'll convert the JPEG data and store it as TF Records.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv | head -5 > /tmp/input.csv\n", "cat /tmp/input.csv" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv | sed 's/,/ /g' | awk '{print $2}' | sort | uniq > /tmp/labels.txt\n", "cat /tmp/labels.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clone the TPU repo\n", "\n", "Let's git clone the repo and get the preprocessing and model files. The model code has imports of the form:\n", "
\n", "import resnet_model as model_lib\n", "\n", "We will need to change this to:\n", "
\n", "from . import resnet_model as model_lib\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile copy_resnet_files.sh\n", "#!/bin/bash\n", "rm -rf tpu\n", "git clone https://github.com/tensorflow/tpu\n", "cd tpu\n", "TFVERSION=$1\n", "echo \"Switching to version r$TFVERSION\"\n", "git checkout r$TFVERSION\n", "cd ..\n", " \n", "MODELCODE=tpu/models/official/resnet\n", "OUTDIR=mymodel\n", "rm -rf $OUTDIR\n", "\n", "# preprocessing\n", "cp -r imgclass $OUTDIR # brings in setup.py and __init__.py\n", "cp tpu/tools/datasets/jpeg_to_tf_record.py $OUTDIR/trainer/preprocess.py\n", "\n", "# model: fix imports\n", "for FILE in $(ls -p $MODELCODE | grep -v /); do\n", " CMD=\"cat $MODELCODE/$FILE \"\n", " for f2 in $(ls -p $MODELCODE | grep -v /); do\n", " MODULE=`echo $f2 | sed 's/.py//g'`\n", " CMD=\"$CMD | sed 's/^import ${MODULE}/from . import ${MODULE}/g' \"\n", " done\n", " CMD=\"$CMD > $OUTDIR/trainer/$FILE\"\n", " eval $CMD\n", "done\n", "find $OUTDIR\n", "echo \"Finished copying files into $OUTDIR\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!bash ./copy_resnet_files.sh $TFVERSION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enable TPU service account\n", "\n", "Allow Cloud ML Engine to access the TPU and bill to your project" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile enable_tpu_mlengine.sh\n", "SVC_ACCOUNT=$(curl -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n", " https://ml.googleapis.com/v1/projects/${PROJECT}:getConfig \\\n", " | grep tpuServiceAccount | tr '\"' ' ' | awk '{print $3}' )\n", "echo \"Enabling TPU service account $SVC_ACCOUNT to act as Cloud ML Service Agent\"\n", "gcloud projects add-iam-policy-binding $PROJECT \\\n", " --member serviceAccount:$SVC_ACCOUNT --role roles/ml.serviceAgent\n", "echo \"Done\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!bash ./enable_tpu_mlengine.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Try preprocessing locally" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "export PYTHONPATH=${PYTHONPATH}:${PWD}/mymodel\n", " \n", "rm -rf /tmp/out\n", "python -m trainer.preprocess \\\n", " --train_csv /tmp/input.csv \\\n", " --validation_csv /tmp/input.csv \\\n", " --labels_file /tmp/labels.txt \\\n", " --project_id $PROJECT \\\n", " --output_dir /tmp/out --runner=DirectRunner" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls -l /tmp/out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now run it over full training and evaluation datasets. This will happen in Cloud Dataflow." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "export PYTHONPATH=${PYTHONPATH}:${PWD}/mymodel\n", "gsutil -m rm -rf gs://${BUCKET}/tpu/resnet/data\n", "python -m trainer.preprocess \\\n", " --train_csv gs://cloud-ml-data/img/flower_photos/train_set.csv \\\n", " --validation_csv gs://cloud-ml-data/img/flower_photos/eval_set.csv \\\n", " --labels_file /tmp/labels.txt \\\n", " --project_id $PROJECT \\\n", " --output_dir gs://${BUCKET}/tpu/resnet/data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above preprocessing step will take 15-20 minutes. Wait for the job to finish before you proceed. Navigate to [Cloud Dataflow section of GCP web console](https://console.cloud.google.com/dataflow) to monitor job progress. You will see something like this " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternately, you can simply copy my already preprocessed files and proceed to the next step:\n", "
\n", "gsutil -m cp gs://cloud-training-demos/tpu/resnet/data/* gs://${BUCKET}/tpu/resnet/copied_data\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "gsutil ls gs://${BUCKET}/tpu/resnet/data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train on the Cloud" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "echo -n \"--num_train_images=$(gsutil cat gs://cloud-ml-data/img/flower_photos/train_set.csv | wc -l) \"\n", "echo -n \"--num_eval_images=$(gsutil cat gs://cloud-ml-data/img/flower_photos/eval_set.csv | wc -l) \"\n", "echo \"--num_label_classes=$(cat /tmp/labels.txt | wc -l)\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "TOPDIR=gs://${BUCKET}/tpu/resnet\n", "OUTDIR=${TOPDIR}/trained\n", "JOBNAME=imgclass_$(date -u +%y%m%d_%H%M%S)\n", "echo $OUTDIR $REGION $JOBNAME\n", "gsutil -m rm -rf $OUTDIR # Comment out this line to continue training from the last time\n", "gcloud ml-engine jobs submit training $JOBNAME \\\n", " --region=$REGION \\\n", " --module-name=trainer.resnet_main \\\n", " --package-path=$(pwd)/mymodel/trainer \\\n", " --job-dir=$OUTDIR \\\n", " --staging-bucket=gs://$BUCKET \\\n", " --scale-tier=BASIC_TPU \\\n", " --runtime-version=$TFVERSION --python-version=3.5 \\\n", " -- \\\n", " --data_dir=${TOPDIR}/data \\\n", " --model_dir=${OUTDIR} \\\n", " --resnet_depth=18 \\\n", " --train_batch_size=128 --eval_batch_size=32 --skip_host_call=True \\\n", " --steps_per_eval=250 --train_steps=1000 \\\n", " --num_train_images=3300 --num_eval_images=370 --num_label_classes=5 \\\n", " --export_dir=${OUTDIR}/export" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above training job will take 15-20 minutes. \n", "Wait for the job to finish before you proceed. \n", "Navigate to [Cloud ML Engine section of GCP web console](https://console.cloud.google.com/mlengine) \n", "to monitor job progress.\n", "\n", "The model should finish with a 80-83% accuracy (results will vary):\n", "```\n", "Eval results: {'global_step': 1000, 'loss': 0.7359053, 'top_1_accuracy': 0.82954544, 'top_5_accuracy': 1.0}\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "gsutil ls gs://${BUCKET}/tpu/resnet/trained/export/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can look at the training charts with TensorBoard:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "OUTDIR = 'gs://{}/tpu/resnet/trained/'.format(BUCKET)\n", "from google.datalab.ml import TensorBoard\n", "TensorBoard().start(OUTDIR)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TensorBoard().stop(11531)\n", "print(\"Stopped Tensorboard\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These were the charts I got (I set smoothing to be zero):\n", "\n", "As you can see, the final blue dot (eval) is quite close to the lowest training loss, indicating that the model hasn't overfit. The top_1 accuracy on the evaluation dataset, however, is 80% which isn't that great. More data would help.\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploying and predicting with model\n", "\n", "Deploy the model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "MODEL_NAME=\"flowers\"\n", "MODEL_VERSION=resnet\n", "MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/tpu/resnet/trained/export/ | tail -1)\n", "echo \"Deleting/deploying $MODEL_NAME $MODEL_VERSION from $MODEL_LOCATION ... this will take a few minutes\"\n", "\n", "# comment/uncomment the appropriate line to run. The first time around, you will need only the two create calls\n", "# But during development, you might need to replace a version by deleting the version and creating it again\n", "\n", "#gcloud ml-engine versions delete --quiet ${MODEL_VERSION} --model ${MODEL_NAME}\n", "#gcloud ml-engine models delete ${MODEL_NAME}\n", "gcloud ml-engine models create ${MODEL_NAME} --regions $REGION\n", "gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version=$TFVERSION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use saved_model_cli to find out what inputs the model expects:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "saved_model_cli show --dir $(gsutil ls gs://${BUCKET}/tpu/resnet/trained/export/ | tail -1) --tag_set serve --signature_def serving_default" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the model expects image_bytes. This is typically base64 encoded" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To predict with the model, let's take one of the example images that is available on Google Cloud Storage and convert it to a base64-encoded array" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import base64, sys, json\n", "import tensorflow as tf\n", "import io\n", "with tf.gfile.GFile('gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg', 'rb') as ifp:\n", " with io.open('test.json', 'w') as ofp:\n", " image_data = ifp.read()\n", " img = base64.b64encode(image_data).decode('utf-8')\n", " json.dump({\"image_bytes\": {\"b64\": img}}, ofp)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls -l test.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Send it to the prediction service" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "gcloud ml-engine predict --model=flowers --version=resnet --json-instances=./test.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What does CLASS no. 3 correspond to? (remember that classes is 0-based)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "head -4 /tmp/labels.txt | tail -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's how you would invoke those predictions without using gcloud" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from googleapiclient import discovery\n", "from oauth2client.client import GoogleCredentials\n", "import base64, sys, json\n", "import tensorflow as tf\n", "\n", "with tf.gfile.GFile('gs://cloud-ml-data/img/flower_photos/sunflowers/1022552002_2b93faf9e7_n.jpg', 'rb') as ifp:\n", " credentials = GoogleCredentials.get_application_default()\n", " api = discovery.build('ml', 'v1', credentials=credentials,\n", " discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')\n", " \n", " request_data = {'instances':\n", " [\n", " {\"image_bytes\": {\"b64\": base64.b64encode(ifp.read()).decode('utf-8')}}\n", " ]}\n", "\n", " parent = 'projects/%s/models/%s/versions/%s' % (PROJECT, 'flowers', 'resnet')\n", " response = api.projects().predict(body=request_data, name=parent).execute()\n", " print(\"response={0}\".format(response))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "# Copyright 2018 Google Inc. All Rights Reserved.\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.6" } }, "nbformat": 4, "nbformat_minor": 2 }