{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "#skip\n", "! [ -e /content ] && pip install -Uqq fastai # upgrade fastai on colab" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#all_slow" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "from fastai.basics import *\n", "from fastai.learner import Callback" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "from nbdev.showdoc import *" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#default_exp callback.azureml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# AzureML Callback\n", "\n", "Track fastai experiments with the azure machine learning plattform.\n", "\n", "## Prerequisites\n", "\n", "Install the azureml SDK:\n", "\n", "```python\n", "pip install azureml-core\n", "```\n", "\n", "## How to use it?\n", "\n", "Import and use `AzureMLCallback` during model fitting.\n", "\n", "If you are submitting your training run with azureml SDK [ScriptRunConfig](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets), the callback will automatically detect the run and log metrics. For example:\n", "\n", "```python\n", "from fastai.callback.azureml import AzureMLCallback\n", "learn.fit_one_cycle(epoch, lr, cbs=AzureMLCallback())\n", "```\n", "\n", "If you are running an experiment manually and just want to have interactive logging of the run, use azureml's `Experiment.start_logging` to create the interactive `run`, and pass that into `AzureMLCallback`. For example:\n", "\n", "```python\n", "from azureml.core import Experiment\n", "experiment = Experiment(workspace=ws, name='experiment_name')\n", "run = experiment.start_logging(outputs=None, snapshot_directory=None)\n", "\n", "from fastai.callback.azureml import AzureMLCallback\n", "learn.fit_one_cycle(epoch, lr, cbs=AzureMLCallback(run))\n", "```\n", "\n", "If you are running an experiment on your local machine (i.e. not using `ScriptRunConfig` and not passing an `run` into the callback), it will recognize that there is no AzureML run to log to, and produce no logging output.\n", "\n", "If you are using [AzureML pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines), the `AzureMLCallback` will by default also send the same logging output to the parent run, so that metrics can easily be plotted. If you do not want this (e.g. because you have multiple training steps in a pipeline), you can disable it by setting `log_to_parent`to `False`.\n", "\n", "To save the model weights, use the usual fastai methods and save the model to the `outputs` folder, which is a \"special\" (for Azure) folder that is automatically tracked in AzureML.\n", "\n", "As it stands, note that if you pass the callback into your `Learner` directly, e.g.:\n", "```python\n", "learn = Learner(dls, model, cbs=AzureMLCallback())\n", "```\n", "…some `Learner` methods (e.g. `learn.show_results()`) might add unwanted logging into your azureml experiment runs. Adding further checks into the callback should help eliminate this – another PR needed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "from azureml.core.run import Run\n", "from azureml.exceptions import RunEnvironmentException\n", "import warnings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "class AzureMLCallback(Callback):\n", " \"\"\"\n", " Log losses, metrics, model architecture summary to AzureML.\n", "\n", " If `log_offline` is False, will only log if actually running on AzureML.\n", " A custom AzureML `Run` class can be passed as `azurerun`.\n", " If `log_to_parent` is True, will also log to the parent run, if exists (e.g. in AzureML pipelines).\n", " \"\"\"\n", " order = Recorder.order+1\n", "\n", " def __init__(self, azurerun=None, log_to_parent=True):\n", " if azurerun:\n", " self.azurerun = azurerun\n", " else:\n", " try:\n", " self.azurerun = Run.get_context(allow_offline=False)\n", " except RunEnvironmentException:\n", " # running locally\n", " self.azurerun = None\n", " warnings.warn(\"Not running on AzureML and no azurerun passed, AzureMLCallback will be disabled.\")\n", " self.log_to_parent = log_to_parent\n", "\n", " def before_fit(self):\n", " self._log(\"n_epoch\", self.learn.n_epoch)\n", " self._log(\"model_class\", str(type(self.learn.model)))\n", "\n", " try:\n", " summary_file = Path(\"outputs\") / 'model_summary.txt'\n", " with summary_file.open(\"w\") as f:\n", " f.write(repr(self.learn.model))\n", " except:\n", " print('Did not log model summary. Check if your model is PyTorch model.')\n", "\n", " def after_batch(self):\n", " # log loss and opt.hypers\n", " if self.learn.training:\n", " self._log('batch__loss', self.learn.loss.item())\n", " self._log('batch__train_iter', self.learn.train_iter)\n", " for i, h in enumerate(self.learn.opt.hypers):\n", " for k, v in h.items():\n", " self._log(f'batch__opt.hypers.{k}', v)\n", "\n", " def after_epoch(self):\n", " # log metrics\n", " for n, v in zip(self.learn.recorder.metric_names, self.learn.recorder.log):\n", " if n not in ['epoch', 'time']:\n", " self._log(f'epoch__{n}', v)\n", " if n == 'time':\n", " # split elapsed time string, then convert into 'seconds' to log\n", " m, s = str(v).split(':')\n", " elapsed = int(m)*60 + int(s)\n", " self._log(f'epoch__{n}', elapsed)\n", "\n", " def _log(self, metric, value):\n", " if self.azurerun is not None:\n", " self.azurerun.log(metric, value)\n", " if self.log_to_parent and self.azurerun.parent is not None:\n", " self.azurerun.parent.log(metric, value)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 4 }