{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Submitting Batch Jobs to Modzy with Python SDK \n", "In this notebook, we will use the Modzy Python SDK to submit a batch of data to a the Sentiment Analysis model for inferencing. We will generate a batch of data from Kaggle's [Amazon Reviews dataset](https://www.kaggle.com/datasets/bittlingmayer/amazonreviews?resource=download).\n", "\n", "For more detailed usage documenation for our Python SDK, visit our **[GitHub page](https://github.com/modzy/sdk-python)**. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Environment Set Up\n", "\n", "Create a virtual environment (venv, conda, or other preferred virtual environment) with Python 3.6 or newer. \n", "\n", "Pip install the following packages in your environment. \n", "\n", "* modzy-sdk>=0.10.0\n", "\n", "And install Jupyter Notebooks in your preferred environment using the appropriate **[install instructions](https://jupyter.org/install)**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Import Modzy SDK and Initialize Client\n", "\n", "Insert your instance URL and personal API Key to establish connection to the Modzy API Client" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import Libraries\n", "from modzy import ApiClient, error\n", "from pprint import pprint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Initialize Modzy API Client" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the url we will use for authentication\n", "'''\n", "Example: \"https://test.modzy.url\"\n", "'''\n", "API_URL = \"https://\"\n", "# the api key we will be using for authentication -- make sure to paste in your personal API access key below\n", "API_KEY = \"\"\n", " \n", "# setup our API Client\n", "client = ApiClient(base_url=API_URL, api_key=API_KEY)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Discover Available Models\n", "In this notebook, we will run inference for the Automobile Classification Model" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'author': 'Open Source',\n", " 'description': 'This model gives sentiment scores showing the polarity and '\n", " 'strength of the emotions in text.',\n", " 'features': [{'description': 'This model has a built-in explainability '\n", " 'feature. Click '\n", " '[here](https://arxiv.org/abs/1602.04938) to '\n", " 'read more about model explainability.',\n", " 'identifier': 'built-in-explainability',\n", " 'name': 'Explainable'}],\n", " 'images': [{'caption': 'Sentiment Analysis',\n", " 'relationType': 'background',\n", " 'url': '/modzy-images/ed542963de/image_background.png'},\n", " {'caption': 'Sentiment Analysis',\n", " 'relationType': 'card',\n", " 'url': '/modzy-images/ed542963de/image_card.png'},\n", " {'caption': 'Sentiment Analysis',\n", " 'relationType': 'thumbnail',\n", " 'url': '/modzy-images/ed542963de/image_thumbnail.png'},\n", " {'caption': 'Open Source',\n", " 'relationType': 'logo',\n", " 'url': '/modzy-images/companies/open-source/company-image.jpg'}],\n", " 'isActive': True,\n", " 'isCommercial': False,\n", " 'isRecommended': True,\n", " 'lastActiveDateTime': '2022-05-24T02:11:45.229+00:00',\n", " 'latestActiveVersion': '1.0.1',\n", " 'latestVersion': '1.0.27',\n", " 'modelId': 'ed542963de',\n", " 'name': 'Sentiment Analysis',\n", " 'permalink': 'ed542963de-open-source-sentiment-analysis',\n", " 'snapshotImages': [],\n", " 'tags': [{'dataType': 'Input Type',\n", " 'identifier': 'text',\n", " 'isCategorical': True,\n", " 'name': 'Text'},\n", " {'dataType': 'Task',\n", " 'identifier': 'label_or_classify',\n", " 'isCategorical': True,\n", " 'name': 'Label or Classify'},\n", " {'dataType': 'Tags',\n", " 'identifier': 'sentiment_analysis',\n", " 'isCategorical': False,\n", " 'name': 'Sentiment Analysis'},\n", " {'dataType': 'Tags',\n", " 'identifier': 'text_analytics',\n", " 'isCategorical': False,\n", " 'name': 'Text Analytics'},\n", " {'dataType': 'Subject',\n", " 'identifier': 'language_and_text',\n", " 'isCategorical': True,\n", " 'name': 'Language and Text'}],\n", " 'versions': ['0.0.28', '1.0.1', '1.0.27', '0.0.27'],\n", " 'visibility': ApiObject({\n", " \"scope\": \"ALL\"\n", "})}\n" ] } ], "source": [ "# Query model by name\n", "auto_model_info = client.models.get_by_name(\"Sentiment Analysis\")\n", "pprint(auto_model_info)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Define Variables for Inference\n", "MODEL_ID = auto_model_info[\"modelId\"]\n", "MODEL_VERSION = auto_model_info[\"latestActiveVersion\"]\n", "INPUT_FILENAME = list(client.models.get_version_input_sample(MODEL_ID, MODEL_VERSION)[\"input\"][\"sources\"][\"0001\"].keys())[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Create Batch of Data\n", "After reading in the 1000 samples in the test subset of the Amazon reviews dataset, we will create a batch of 500 to submit for inference" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "with open(\"test.ft.txt\", \"r\", encoding=\"utf-8\") as f:\n", " text = f.readlines()[:500] # extract first 500\n", "\n", "# clean reviews before feeding to model\n", "text_cleaned = [t.split(\"__label__\")[-1][2:].replace(\"\\n\", \"\") for t in text]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "sources = {\"review_{}\".format(i): {\"input.txt\": review} for i, review in enumerate(text_cleaned)}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Submit Batch Inference to Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Helper Function\n", "Below is a helper function we will use to submit inference jobs to the Modzy platform and return the model output using the `submit_text` method. For additional job submission methods, visit our **[docs page](https://docs.modzy.com/docs/python)**." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "def get_model_output(model_identifier, model_version, data_sources, explain=False):\n", " \"\"\"\n", " Args:\n", " model_identifier: model identifier (string)\n", " model_version: model version (string)\n", " data_sources: dictionary with the appropriate filename --> local file key-value pairs\n", " explain: boolean variable, defaults to False. If true, model will return explainable result\n", " \"\"\"\n", " job = client.jobs.submit_text(model_identifier, model_version, data_sources, explain)\n", " result = client.results.block_until_complete(job, timeout=None) \n", " return result" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'classPredictions': [ApiObject({\n", " \"class\": \"neutral\",\n", " \"score\": 0.716\n", "}),\n", " ApiObject({\n", " \"class\": \"positive\",\n", " \"score\": 0.214\n", "}),\n", " ApiObject({\n", " \"class\": \"negative\",\n", " \"score\": 0.07\n", "})]}\n" ] } ], "source": [ "model_results = get_model_output(MODEL_ID, MODEL_VERSION, sources, explain=False)\n", "first_review_results = model_results[\"results\"][\"review_0\"][\"results.json\"][\"data\"][\"result\"]\n", "pprint(first_review_results)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }