{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Structured data prediction using Cloud ML Engine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook illustrates:\n", "\n", "1. Exploring a BigQuery dataset using JupyterLab\n", "2. Creating datasets for Machine Learning using Dataflow\n", "3. Creating a model using the feature columns and Keras API\n", "4. Training on Cloud AI Platform\n", "5. Deploying model\n", "6. Predicting with model\n", "\n", "Before starting the lab, upgrade packages that are required for this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install tensorflow==2.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now you have to restart the kernel by selecting the \"Kernel\" -> \"Restart Kernel\" from the menu bar** to reflect the newly installed modules.\n", "\n", "After restarting the kernel, you can resume the code execution from the next cell." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# change these to try this notebook out\n", "BUCKET = 'babyweight-keras-ml'\n", "PROJECT = 'babyweight-keras'\n", "REGION = 'us-central1'\n", "NOTEBOOK_DIR = '/home/jupyter/training-data-analyst/blogs/babyweight_keras'" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ['BUCKET'] = BUCKET\n", "os.environ['PROJECT'] = PROJECT\n", "os.environ['REGION'] = REGION\n", "os.environ['NOTEBOOK_DIR'] = NOTEBOOK_DIR" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Updated property [core/project].\n", "Updated property [compute/region].\n" ] } ], "source": [ "%%bash\n", "gcloud config set project $PROJECT\n", "gcloud config set compute/region $REGION" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Creating gs://babyweight-keras-ml/...\n" ] } ], "source": [ "%%bash\n", "if ! gsutil ls | grep -q gs://${BUCKET}/; then\n", " gsutil mb -l ${REGION} gs://${BUCKET}\n", "fi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 1: Data Analysis and Preparation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring data\n", "\n", "The data is natality data (record of births in the US). My goal is to predict the baby's weight given a number of factors about the pregnancy and the baby's mother. Later, we will want to split the data into training and eval datasets. The hash of the year-month will be used for that." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "query=\"\"\"\n", "SELECT\n", " weight_pounds,\n", " is_male,\n", " mother_age,\n", " plurality,\n", " gestation_weeks,\n", " FARM_FINGERPRINT(CONCAT(CAST(YEAR AS STRING), CAST(month AS STRING))) AS hashmonth\n", "FROM\n", " publicdata.samples.natality\n", "WHERE year > 2000\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | weight_pounds | \n", "is_male | \n", "mother_age | \n", "plurality | \n", "gestation_weeks | \n", "hashmonth | \n", "
---|---|---|---|---|---|---|
0 | \n", "7.063611 | \n", "True | \n", "32 | \n", "1 | \n", "37.0 | \n", "7108882242435606404 | \n", "
1 | \n", "4.687028 | \n", "True | \n", "30 | \n", "3 | \n", "33.0 | \n", "-7170969733900686954 | \n", "
2 | \n", "7.561856 | \n", "True | \n", "20 | \n", "1 | \n", "39.0 | \n", "6392072535155213407 | \n", "
3 | \n", "7.561856 | \n", "True | \n", "31 | \n", "1 | \n", "37.0 | \n", "-2126480030009879160 | \n", "
4 | \n", "7.312733 | \n", "True | \n", "32 | \n", "1 | \n", "40.0 | \n", "3408502330831153141 | \n", "