{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Submitting Batch Jobs to Modzy with Python SDK  \n",
    "In this notebook, we will use the Modzy Python SDK to submit a batch of data to a the Sentiment Analysis model for inferencing. We will generate a batch of data from Kaggle's [Amazon Reviews dataset](https://www.kaggle.com/datasets/bittlingmayer/amazonreviews?resource=download).\n",
    "\n",
    "For more detailed usage documenation for our Python SDK, visit our **[GitHub page](https://github.com/modzy/sdk-python)**. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Environment Set Up\n",
    "\n",
    "Create a virtual environment (venv, conda, or other preferred virtual environment) with Python 3.6 or newer. \n",
    "\n",
    "Pip install the following packages in your environment. \n",
    "\n",
    "* modzy-sdk>=0.10.0\n",
    "\n",
    "And install Jupyter Notebooks in your preferred environment using the appropriate **[install instructions](https://jupyter.org/install)**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Import Modzy SDK and Initialize Client\n",
    "\n",
    "Insert your instance URL and personal API Key to establish connection to the Modzy API Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import Libraries\n",
    "from modzy import ApiClient, error\n",
    "from pprint import pprint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Initialize Modzy API Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the url we will use for authentication\n",
    "'''\n",
    "Example: \"https://test.modzy.url\"\n",
    "'''\n",
    "API_URL = \"https://<your.modzy.url>\"\n",
    "# the api key we will be using for authentication -- make sure to paste in your personal API access key below\n",
    "API_KEY = \"<your.api.key>\"\n",
    "    \n",
    "# setup our API Client\n",
    "client = ApiClient(base_url=API_URL, api_key=API_KEY)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Discover Available Models\n",
    "In this notebook, we will run inference for the Automobile Classification Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'author': 'Open Source',\n",
      " 'description': 'This model gives sentiment scores showing the polarity and '\n",
      "                'strength of the emotions in text.',\n",
      " 'features': [{'description': 'This model has a built-in explainability '\n",
      "                              'feature. Click '\n",
      "                              '[here](https://arxiv.org/abs/1602.04938) to '\n",
      "                              'read more about model explainability.',\n",
      "               'identifier': 'built-in-explainability',\n",
      "               'name': 'Explainable'}],\n",
      " 'images': [{'caption': 'Sentiment Analysis',\n",
      "             'relationType': 'background',\n",
      "             'url': '/modzy-images/ed542963de/image_background.png'},\n",
      "            {'caption': 'Sentiment Analysis',\n",
      "             'relationType': 'card',\n",
      "             'url': '/modzy-images/ed542963de/image_card.png'},\n",
      "            {'caption': 'Sentiment Analysis',\n",
      "             'relationType': 'thumbnail',\n",
      "             'url': '/modzy-images/ed542963de/image_thumbnail.png'},\n",
      "            {'caption': 'Open Source',\n",
      "             'relationType': 'logo',\n",
      "             'url': '/modzy-images/companies/open-source/company-image.jpg'}],\n",
      " 'isActive': True,\n",
      " 'isCommercial': False,\n",
      " 'isRecommended': True,\n",
      " 'lastActiveDateTime': '2022-05-24T02:11:45.229+00:00',\n",
      " 'latestActiveVersion': '1.0.1',\n",
      " 'latestVersion': '1.0.27',\n",
      " 'modelId': 'ed542963de',\n",
      " 'name': 'Sentiment Analysis',\n",
      " 'permalink': 'ed542963de-open-source-sentiment-analysis',\n",
      " 'snapshotImages': [],\n",
      " 'tags': [{'dataType': 'Input Type',\n",
      "           'identifier': 'text',\n",
      "           'isCategorical': True,\n",
      "           'name': 'Text'},\n",
      "          {'dataType': 'Task',\n",
      "           'identifier': 'label_or_classify',\n",
      "           'isCategorical': True,\n",
      "           'name': 'Label or Classify'},\n",
      "          {'dataType': 'Tags',\n",
      "           'identifier': 'sentiment_analysis',\n",
      "           'isCategorical': False,\n",
      "           'name': 'Sentiment Analysis'},\n",
      "          {'dataType': 'Tags',\n",
      "           'identifier': 'text_analytics',\n",
      "           'isCategorical': False,\n",
      "           'name': 'Text Analytics'},\n",
      "          {'dataType': 'Subject',\n",
      "           'identifier': 'language_and_text',\n",
      "           'isCategorical': True,\n",
      "           'name': 'Language and Text'}],\n",
      " 'versions': ['0.0.28', '1.0.1', '1.0.27', '0.0.27'],\n",
      " 'visibility': ApiObject({\n",
      "  \"scope\": \"ALL\"\n",
      "})}\n"
     ]
    }
   ],
   "source": [
    "# Query model by name\n",
    "auto_model_info = client.models.get_by_name(\"Sentiment Analysis\")\n",
    "pprint(auto_model_info)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define Variables for Inference\n",
    "MODEL_ID = auto_model_info[\"modelId\"]\n",
    "MODEL_VERSION = auto_model_info[\"latestActiveVersion\"]\n",
    "INPUT_FILENAME = list(client.models.get_version_input_sample(MODEL_ID, MODEL_VERSION)[\"input\"][\"sources\"][\"0001\"].keys())[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Create Batch of Data\n",
    "After reading in the 1000 samples in the test subset of the Amazon reviews dataset, we will create a batch of 500 to submit for inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"test.ft.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    text = f.readlines()[:500] # extract first 500\n",
    "\n",
    "# clean reviews before feeding to model\n",
    "text_cleaned = [t.split(\"__label__\")[-1][2:].replace(\"\\n\", \"\") for t in text]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "sources = {\"review_{}\".format(i): {\"input.txt\": review} for i, review in enumerate(text_cleaned)}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Submit Batch Inference to Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Helper Function\n",
    "Below is a helper function we will use to submit inference jobs to the Modzy platform and return the model output using the `submit_text` method. For additional job submission methods, visit our **[docs page](https://docs.modzy.com/docs/python)**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_model_output(model_identifier, model_version, data_sources, explain=False):\n",
    "    \"\"\"\n",
    "    Args:\n",
    "        model_identifier: model identifier (string)\n",
    "        model_version: model version (string)\n",
    "        data_sources: dictionary with the appropriate filename --> local file key-value pairs\n",
    "        explain: boolean variable, defaults to False. If true, model will return explainable result\n",
    "    \"\"\"\n",
    "    job = client.jobs.submit_text(model_identifier, model_version, data_sources, explain)\n",
    "    result = client.results.block_until_complete(job, timeout=None)        \n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'classPredictions': [ApiObject({\n",
      "  \"class\": \"neutral\",\n",
      "  \"score\": 0.716\n",
      "}),\n",
      "                      ApiObject({\n",
      "  \"class\": \"positive\",\n",
      "  \"score\": 0.214\n",
      "}),\n",
      "                      ApiObject({\n",
      "  \"class\": \"negative\",\n",
      "  \"score\": 0.07\n",
      "})]}\n"
     ]
    }
   ],
   "source": [
    "model_results = get_model_output(MODEL_ID, MODEL_VERSION, sources, explain=False)\n",
    "first_review_results = model_results[\"results\"][\"review_0\"][\"results.json\"][\"data\"][\"result\"]\n",
    "pprint(first_review_results)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}