{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## AddMNistData\n", "In this notebook we will use a number of libraries including ibm_boto3 in order to download the mnist dataset and upload it to IBM Cloud Object Storage. We will do so using two different methods, one to train a model and store its results using Keras and another for Watson Studio's Neural Network Modeler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### I - Cloud Object Storage" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install wget" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install various libraries, ibm_boto3 allows Python developers to manage Cloud Object Storage." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "import time\n", "import wget\n", "import keras\n", "import numpy\n", "import pickle\n", "import warnings\n", "import ibm_boto3\n", "from keras.datasets import mnist \n", "from ibm_botocore.client import Config\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the endpoint we will use. You can find this information in \"Endpoint\" section of your Cloud Object Storage dashboard" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cos_credentials = {\n", " \"****PLACE YOUR COS CREDENTIALS HERE\"\n", "}\n", "api_key = cos_credentials['apikey']\n", "service_instance_id = cos_credentials['resource_instance_id']\n", "auth_endpoint = 'https://iam.bluemix.net/oidc/token'\n", "service_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create Boto resource by providing type, endpoint_url and credentials." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cos = ibm_boto3.resource('s3',\n", " ibm_api_key_id=api_key,\n", " ibm_service_instance_id=service_instance_id,\n", " ibm_auth_endpoint=auth_endpoint,\n", " config=Config(signature_version='oauth'),\n", " endpoint_url=service_endpoint)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### II Downloading MNIST data and upload it to COS buckets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create the buckets we will use to store training data and training results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** Bucket name has to be globally unique - we will use a timestamp to do so" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "timestamp = str(time.time())\n", "buckets = ['mnist-data-' + timestamp, 'mnist-results-' + timestamp]\n", "for bucket in buckets:\n", " if not cos.Bucket(bucket) in cos.buckets.all():\n", " print('Creating bucket \"{}\"...'.format(bucket))\n", " try:\n", " cos.create_bucket(Bucket=bucket)\n", " except ibm_boto3.exceptions.ibm_botocore.client.ClientError as e:\n", " print('Error: {}.'.format(e.response['Error']['Message']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mnist data set will now be uploaded to an Object Storage bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "link = 'https://s3.amazonaws.com/img-datasets/mnist.npz'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_dir = 'MNIST_KERAS_DATA'\n", "if not os.path.isdir(data_dir):\n", " os.mkdir(data_dir)\n", "\n", "if not os.path.isfile(os.path.join(data_dir, os.path.join(link.split('/')[-1]))):\n", " wget.download(link, out=data_dir) \n", " \n", "!ls MNIST_KERAS_DATA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upload the data files to created buckets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bucket_name = buckets[0]\n", "bucket_obj = cos.Bucket(bucket_name)\n", "\n", "for filename in os.listdir(data_dir):\n", " with open(os.path.join(data_dir, filename), 'rb') as data: \n", " bucket_obj.upload_file(os.path.join(data_dir, filename), filename)\n", " print('{} is uploaded.'.format(filename))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### III Download MNIST as python pickle for Neural Network Modeler" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(X,y), (X_test,y_test) = mnist.load_data()\n", "X_train, X_valid, y_train, y_valid = train_test_split(X,y, test_size=.166, random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create pickle objects" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open('mnist-nnm-train.pkl', 'wb') as f:\n", " pickle.dump((X_train, y_train), f, protocol=pickle.HIGHEST_PROTOCOL)\n", "\n", "with open('mnist-nnm-valid.pkl', 'wb') as f:\n", " pickle.dump((X_valid, y_valid), f, protocol=pickle.HIGHEST_PROTOCOL)\n", "\n", "with open('mnist-nnm-test.pkl', 'wb') as f:\n", " pickle.dump((X_test, y_test), f, protocol=pickle.HIGHEST_PROTOCOL)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create bucket for use by Neural Network Modeler and place the pickled objects in the bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cos.create_bucket(Bucket='mnist-nnm-' + timestamp)\n", "cos.Bucket('mnist-nnm-' + timestamp).upload_file('mnist-nnm-train.pkl', 'mnist-nnm-train.pkl')\n", "cos.Bucket('mnist-nnm-' + timestamp).upload_file('mnist-nnm-valid.pkl', 'mnist-nnm-valid.pkl')\n", "cos.Bucket('mnist-nnm-' + timestamp).upload_file('mnist-nnm-test.pkl', 'mnist-nnm-test.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upon finishing there should be 4 total objects, one in the first bucket and 3 in the second" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for obj in cos.Bucket('mnist-data-' + timestamp).objects.all():\n", " print('Object key: {}, size: {:5.1f}kB'.format(obj.key, obj.size/1024))\n", "\n", "print()\n", "\n", "for obj in cos.Bucket('mnist-nnm-' + timestamp).objects.all():\n", " print('Object key: {}, size: {:5.1f}kB'.format(obj.key, obj.size/1024))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }