{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0767a21b",
   "metadata": {},
   "source": [
    "# ⭐ Scaling Machine Learning in Three Week course \n",
    "# - Week 3:\n",
    "##  Deployment\n",
    "\n",
    "**Prerequisite**\n",
    "Run notebook `week-3.0-data-prep-for-training` and `week-3.0-evaluate-and-automate-pipelines.ipynb` before.\n",
    "\n",
    "\n",
    "In this excercise, you will use:\n",
    " * deployments in batch setting\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "This excercise is part of the [Scaling Machine Learning with Spark book](https://learning.oreilly.com/library/view/scaling-machine-learning/9781098106812/)\n",
    "available on the O'Reilly platform or on [Amazon](https://amzn.to/3WgHQvd).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "c686250b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import mlflow\n",
    "import mlflow.spark\n",
    "from pyspark.sql.types import ArrayType, StringType\n",
    "from pyspark.sql.functions import col, struct\n",
    "from pyspark.ml.regression import LinearRegression, LinearRegressionModel\n",
    "from pyspark.sql import SparkSession \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "e02a2c88",
   "metadata": {},
   "outputs": [],
   "source": [
    "spark = SparkSession.builder \\\n",
    "    .master('local[*]') \\\n",
    "    .appName(\"deployment\") \\\n",
    "    .getOrCreate()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f53c3595",
   "metadata": {},
   "source": [
    " ### ✅ **Task 1 :**  ### Move model from Model folder to Best Model\n",
    " \n",
    " Now that we have a model that gives us a good results, it's time to move it to the next phase."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "32939824",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_path =  \"../models/linearRegression_model\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "84b485b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "restored_mllib_model = LinearRegressionModel.load(model_path)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "44068a22",
   "metadata": {},
   "outputs": [],
   "source": [
    "restored_mllib_model.save(\"../models/best_model\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "724fd2d4",
   "metadata": {},
   "source": [
    "### ✅ **Task 2 :**  use the model for prediction in production\n",
    "\n",
    "imagine there is a deployment to production of the best_model\n",
    "that meanes, that there is a new app that is going to load the model within it and leverage it with Spark. \n",
    "so now, there is a production dataframe.\n",
    "\n",
    "Write the functionality to load the model, and use it to predict production dataframe in a batch setting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "5c53c794",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code goes\n",
    "# ..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92035982",
   "metadata": {},
   "source": [
    "How is it different from what you have done so far? \n",
    "\n",
    "shar your response in the chat!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4572fde",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}