{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GROK Client\n",
    "\n",
    "Use the GROK client to make rest calls to the Azure Search Service to create and run the indexing pipeline. Blob client is used to transfer the images to blob and download the extracted OCR from blob.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example usage:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Create an .env file with the environment variables that includes the names of you index, indexer, skillset, and datasource to create on the search service. Include keys to the blob that contains the documents you want to index, keys to the cognitive service and keys to you computer vision subscription and search service. In order to index more than 20 documents, you must have a computer services subscription. You can find the keys for the services in the Azure Portal. An example of the .env file content is given below:\n",
    "\n",
    "    ```bash\n",
    "\n",
    "    SEARCH_SERVICE_NAME = \"ocr-ner-pipeline\"\n",
    "    SKILLSET_NAME = \"ocrskillset\"\n",
    "    INDEX_NAME = \"ocrindex\"\n",
    "    INDEXER_NAME = \"ocrindexer\"\n",
    "    DATASOURCE_NAME = \"syntheticimages\"\n",
    "    DATASOURCE_CONTAINER_NAME = \"ocrimages\"\n",
    "    PROJECTIONS_CONTAINER_NAME = \"ocrprojection\"\n",
    "\n",
    "    BLOB_NAME = \"syntheticimages\"\n",
    "    BLOB_KEY = \"<YOUR BLOB KEY>\"\n",
    "    SEARCH_SERVICE_KEY = \"<YOUR SEARCH SERVICE KEY>\"\n",
    "    COGNITIVE_SERVICE_KEY = \"<YOUR COGNITIVE SERVICE KEY>\"\n",
    "    ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. Source this .env file to load the variables then you can create and use the Grok class , REST client or blob client."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from genalog.ocr.blob_client import GrokBlobClient\n",
    "from dotenv import load_dotenv\n",
    "load_dotenv(\".env\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. First, we need to upload our image files to azure blob. To do this, we use the blob client and call the `upload_images_to_blob` function. This function takes in the local and remote path and an optional parameter to specify whether to use asyncio asynchronous uploads [https://docs.python.org/3/library/asyncio.html]. Asynchronous uploads are faster, however, some setups of python may not support them. In such cases, sychronous uploads can be made using `use_async=False`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "local_path = \"testimages\"\n",
    "remote_path = \"testimages\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "destination_folder_name, upload_task = blob_client.upload_images_to_blob(local_path, remote_path, use_async=True)\n",
    "await upload_task"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. Once files are uploaded, use the rest client to create an indexing pipeline to extract the text from the images on blob. The results are stored as json blobs in a projection blob container where the names of these json blobs are the base64 encoded paths of the source blob images. The name of this projection container is specified in the env file. The `poll_indexer_till_complete` will block and continuosly poll the indexer until it completly processes all docs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "grok_rest_client = GrokRestClient.\n",
    "grok_rest_client.create_indexing_pipeline()\n",
    "grok_rest_client.run_indexer()\n",
    "indexer_status = grok_rest_client.poll_indexer_till_complete()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "5. Once the indexer completes, use the blob client to download the results from the projections blob."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_folder = \"./ocr\"\n",
    "async_download_task = blob_client.get_ocr_json( remote_path, output_folder, use_async=True)\n",
    "await async_download_task"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:tatk] *",
   "language": "python",
   "name": "conda-env-tatk-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}