{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## MLflow 5 minute Tracking Quickstart" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Download this Notebook
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook demonstrates using a local MLflow Tracking Server to log, register, and then load a model as a generic Python Function (pyfunc) to perform inference on a Pandas DataFrame.\n", "\n", "Throughout this notebook, we'll be using the MLflow fluent API to perform all interactions with the MLflow Tracking Server." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn import datasets\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.model_selection import train_test_split\n", "\n", "import mlflow\n", "from mlflow.models import infer_signature" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set the MLflow Tracking URI \n", "\n", "Depending on where you are running this notebook, your configuration may vary for how you initialize the interface with the MLflow Tracking Server. \n", "\n", "For this example, we're using a locally running tracking server, but other options are available (The easiest is to use the free managed service within [Databricks Community Edition](https://community.cloud.databricks.com/)). \n", "\n", "Please see [the guide to running notebooks here](https://www.mlflow.org/docs/latest/getting-started/running-notebooks/index.html) for more information on setting the tracking server uri and configuring access to either managed or self-managed MLflow tracking servers." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# NOTE: review the links mentioned above for guidance on connecting to a managed tracking server, such as the free Databricks Community Edition\n", "\n", "mlflow.set_tracking_uri(uri=\"http://127.0.0.1:8080\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load training data and train a simple model\n", "\n", "For our quickstart, we're going to be using the familiar iris dataset that is included in scikit-learn. Following the split of the data, we're going to train a simple logistic regression classifier on the training data and calculate some error metrics on our holdout test data. \n", "\n", "Note that the only MLflow-related activities in this portion are around the fact that we're using a `param` dictionary to supply our model's hyperparameters; this is to make logging these settings easier when we're ready to log our model and its associated metadata." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Load the Iris dataset\n", "X, y = datasets.load_iris(return_X_y=True)\n", "\n", "# Split the data into training and test sets\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "\n", "# Define the model hyperparameters\n", "params = {\"solver\": \"lbfgs\", \"max_iter\": 1000, \"multi_class\": \"auto\", \"random_state\": 8888}\n", "\n", "# Train the model\n", "lr = LogisticRegression(**params)\n", "lr.fit(X_train, y_train)\n", "\n", "# Predict on the test set\n", "y_pred = lr.predict(X_test)\n", "\n", "# Calculate accuracy as a target loss metric\n", "accuracy = accuracy_score(y_test, y_pred)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define an MLflow Experiment\n", "\n", "In order to group any distinct runs of a particular project or idea together, we can define an Experiment that will group each iteration (runs) together. \n", "Defining a unique name that is relevant to what we're working on helps with organization and reduces the amount of work (searching) to find our runs later on. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mlflow.set_experiment(\"MLflow Quickstart\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Log the model, hyperparameters, and loss metrics to MLflow.\n", "\n", "In order to record our model and the hyperparameters that were used when fitting the model, as well as the metrics associated with validating the fit model upon holdout data, we initiate a run context, as shown below. Within the scope of that context, any fluent API that we call (such as `mlflow.log_params()` or `mlflow.sklearn.log_model()`) will be associated and logged together to the same run. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/benjamin.wilson/miniconda3/envs/mlflow-dev-env/lib/python3.8/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.\n", " warnings.warn(\"Setuptools is replacing distutils.\")\n", "Registered model 'tracking-quickstart' already exists. Creating a new version of this model...\n", "2023/11/07 12:17:01 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: tracking-quickstart, version 3\n", "Created version '3' of model 'tracking-quickstart'.\n" ] } ], "source": [ "# Start an MLflow run\n", "with mlflow.start_run():\n", " # Log the hyperparameters\n", " mlflow.log_params(params)\n", "\n", " # Log the loss metric\n", " mlflow.log_metric(\"accuracy\", accuracy)\n", "\n", " # Set a tag that we can use to remind ourselves what this run was for\n", " mlflow.set_tag(\"Training Info\", \"Basic LR model for iris data\")\n", "\n", " # Infer the model signature\n", " signature = infer_signature(X_train, lr.predict(X_train))\n", "\n", " # Log the model\n", " model_info = mlflow.sklearn.log_model(\n", " sk_model=lr,\n", " artifact_path=\"iris_model\",\n", " signature=signature,\n", " input_example=X_train,\n", " registered_model_name=\"tracking-quickstart\",\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load our saved model as a Python Function\n", "\n", "Although we can load our model back as a native scikit-learn format with `mlflow.sklearn.load_model()`, below we are loading the model as a generic Python Function, which is how this model would be loaded for online model serving. We can still use the `pyfunc` representation for batch use cases, though, as is shown below." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "62a7ca65428d42baa6f4528e1e7824e8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading artifacts: 0%| | 0/6 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)actual_classpredicted_class
06.12.84.71.211
15.73.81.70.300
27.72.66.92.322
36.02.94.51.511
\n", "" ], "text/plain": [ " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \n", "0 6.1 2.8 4.7 1.2 \\\n", "1 5.7 3.8 1.7 0.3 \n", "2 7.7 2.6 6.9 2.3 \n", "3 6.0 2.9 4.5 1.5 \n", "\n", " actual_class predicted_class \n", "0 1 1 \n", "1 0 0 \n", "2 2 2 \n", "3 1 1 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predictions = loaded_model.predict(X_test)\n", "\n", "iris_feature_names = datasets.load_iris().feature_names\n", "\n", "# Convert X_test validation feature data to a Pandas DataFrame\n", "result = pd.DataFrame(X_test, columns=iris_feature_names)\n", "\n", "# Add the actual classes to the DataFrame\n", "result[\"actual_class\"] = y_test\n", "\n", "# Add the model predictions to the DataFrame\n", "result[\"predicted_class\"] = predictions\n", "\n", "result[:4]" ] } ], "metadata": { "kernelspec": { "display_name": "mlflow-dev-env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 2 }