{ "cells": [ { "cell_type": "markdown", "id": "0767a21b", "metadata": {}, "source": [ "# ⭐ Scaling Machine Learning in Three Week course \n", "# - Week 3:\n", "## Deployment\n", "\n", "**Prerequisite**\n", "Run notebook `week-3.0-data-prep-for-training` and `week-3.0-evaluate-and-automate-pipelines.ipynb` before.\n", "\n", "\n", "In this excercise, you will use:\n", " * deployments in batch setting\n", "\n", "\n", "\n", "\n", "This excercise is part of the [Scaling Machine Learning with Spark book](https://learning.oreilly.com/library/view/scaling-machine-learning/9781098106812/)\n", "available on the O'Reilly platform or on [Amazon](https://amzn.to/3WgHQvd).\n" ] }, { "cell_type": "code", "execution_count": 17, "id": "c686250b", "metadata": {}, "outputs": [], "source": [ "import mlflow\n", "import mlflow.spark\n", "from pyspark.sql.types import ArrayType, StringType\n", "from pyspark.sql.functions import col, struct\n", "from pyspark.ml.regression import LinearRegression, LinearRegressionModel\n", "from pyspark.sql import SparkSession \n" ] }, { "cell_type": "code", "execution_count": 18, "id": "e02a2c88", "metadata": {}, "outputs": [], "source": [ "spark = SparkSession.builder \\\n", " .master('local[*]') \\\n", " .appName(\"deployment\") \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "id": "f53c3595", "metadata": {}, "source": [ " ### ✅ **Task 1 :** ### Move model from Model folder to Best Model\n", " \n", " Now that we have a model that gives us a good results, it's time to move it to the next phase." ] }, { "cell_type": "code", "execution_count": 19, "id": "32939824", "metadata": {}, "outputs": [], "source": [ "model_path = \"../models/linearRegression_model\"" ] }, { "cell_type": "code", "execution_count": 20, "id": "84b485b5", "metadata": {}, "outputs": [], "source": [ "restored_mllib_model = LinearRegressionModel.load(model_path)\n" ] }, { "cell_type": "code", "execution_count": 21, "id": "44068a22", "metadata": {}, "outputs": [], "source": [ "restored_mllib_model.save(\"../models/best_model\")" ] }, { "cell_type": "markdown", "id": "724fd2d4", "metadata": {}, "source": [ "### ✅ **Task 2 :** use the model for prediction in production\n", "\n", "imagine there is a deployment to production of the best_model\n", "that meanes, that there is a new app that is going to load the model within it and leverage it with Spark. \n", "so now, there is a production dataframe.\n", "\n", "Write the functionality to load the model, and use it to predict production dataframe in a batch setting." ] }, { "cell_type": "code", "execution_count": 22, "id": "5c53c794", "metadata": {}, "outputs": [], "source": [ "# your code goes\n", "# ..." ] }, { "cell_type": "markdown", "id": "92035982", "metadata": {}, "source": [ "How is it different from what you have done so far? \n", "\n", "shar your response in the chat!" ] }, { "cell_type": "code", "execution_count": null, "id": "e4572fde", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" } }, "nbformat": 4, "nbformat_minor": 5 }