{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To try this example, Go to Cell -> Run all\n",
    "\n",
    "\n",
    "Report problems with this example on [GitHub Issues](https://github.com/jina-ai/jina/issues/new/choose)\n",
    "\n",
    "\n",
    "Make sure to run this command to install Jina 2.0 for this notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "!pip install jina"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Minimum Working Example for Jina 2.0\n",
    "\n",
    "This notebook explains code adapted from the [38-Line Get Started](https://github.com/jina-ai/jina#get-started).\n",
    "\n",
    "The demo indices every line of its *own source code*, then searches for the most similar line to `\"request(on=something)\"`. No other library required, no external dataset required. The dataset is the codebase.\n",
    "\n",
    "### Import"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "For this demo, we only need to import `numpy` and `jina`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from jina import Document, DocumentArray, Executor, Flow, requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Character embedding\n",
    "\n",
    "For embedding every line of the code, we want to represent it into a vector using simple character embedding and mean-pooling.\n",
    "\n",
    "The character embedding is a simple identity matrix.\n",
    "\n",
    "To do that we need to write a new `Executor`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "class CharEmbed(Executor):  # a simple character embedding with mean-pooling\n",
    "    offset = 32  # letter `a`\n",
    "    dim = 127 - offset + 1  # last pos reserved for `UNK`\n",
    "    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars\n",
    "\n",
    "    @requests\n",
    "    def foo(self, docs: DocumentArray, **kwargs):\n",
    "        for d in docs:\n",
    "            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]\n",
    "            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # mean-pooling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Indexing\n",
    "\n",
    "To store & retrieve encoded results, we need an indexer. At index time, it stores `DocumentArray` into memory. At query time, it computes the Euclidean distance between the embeddings of query Documents and all embeddings of the stored Documents.\n",
    "\n",
    "The indexing and searching are represented by `@request('/index')` and `@request('/search')`, respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "class Indexer(Executor):\n",
    "    _docs = DocumentArray()  # for storing all documents in memory\n",
    "\n",
    "    @requests(on='/index')\n",
    "    def foo(self, docs: DocumentArray, **kwargs):\n",
    "        self._docs.extend(docs)  # extend stored `docs`\n",
    "\n",
    "    @requests(on='/search')\n",
    "    def bar(self, docs: DocumentArray, **kwargs):\n",
    "         docs.match(self._docs, metric='euclidean', limit=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Callback function\n",
    "\n",
    "Callback function is invoked when the search is done."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "def print_matches(req):  # the callback function invoked when task is done\n",
    "    for idx, d in enumerate(req.docs[0].matches[:3]):  # print top-3 matches\n",
    "        print(f'[{idx}]{d.scores[\"euclid\"].value:2f}: \"{d.text}\"')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Flow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "f = Flow(port_expose=12345).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary\n",
    "\n",
    "source_code = \"\"\"\n",
    "import numpy as np\n",
    "from jina import Document, DocumentArray, Executor, Flow, requests\n",
    "class CharEmbed(Executor):  # a simple character embedding with mean-pooling\n",
    "    offset = 32  # letter `a`\n",
    "    dim = 127 - offset + 1  # last pos reserved for `UNK`\n",
    "    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars\n",
    "    @requests\n",
    "    def foo(self, docs: DocumentArray, **kwargs):\n",
    "        for d in docs:\n",
    "            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]\n",
    "            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling\n",
    "class Indexer(Executor):\n",
    "    _docs = DocumentArray()  # for storing all documents in memory\n",
    "    @requests(on='/index')\n",
    "    def foo(self, docs: DocumentArray, **kwargs):\n",
    "        self._docs.extend(docs)  # extend stored `docs`\n",
    "    @requests(on='/search')\n",
    "    def bar(self, docs: DocumentArray, **kwargs):\n",
    "        q = np.stack(docs.get_attributes('embedding'))  # get all embeddings from query docs\n",
    "        d = np.stack(self._docs.get_attributes('embedding'))  # get all embeddings from stored docs\n",
    "        euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance\n",
    "        for dist, query in zip(euclidean_dist, docs):  # add & sort match\n",
    "            query.matches = [Document(self._docs[int(idx)], copy=True, scores={'euclid': d}) for idx, d in enumerate(dist)]\n",
    "            query.matches.sort(key=lambda m: m.scores['euclid'].value)  # sort matches by their values\n",
    "f = Flow(port_expose=12345, protocol='http', cors=True).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary\n",
    "with f:\n",
    "    f.post('/index', DocumentArray([Document(text=t.strip()) for t in source_code.split('\\n') if t.strip() ])) # index all lines of this notebook's source code\n",
    "    f.post('/search', Document(text='@request(on=something)'), on_done=print_matches)\n",
    "\"\"\"\n",
    "\n",
    "with f:\n",
    "    f.post('/index', DocumentArray([Document(text=t.strip()) for t in source_code.split('\\n') if t.strip() ])) # index all lines of this notebook's source code\n",
    "    f.post('/search', Document(text='@request(on=something)'), on_done=print_matches)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "It finds the lines most similar to \"request(on=something)\" from the server code snippet and prints the following:\n",
    "\n",
    "```\n",
    "[0]0.123462: \"f.post('/search', Document(text='@request(on=something)'), on_done=print_matches)\"\n",
    "[1]0.157459: \"@requests(on='/index')\"\n",
    "[2]0.171835: \"@requests(on='/search')\"\n",
    "```\n",
    "\n",
    "Need help in understanding Jina? Ask a question to friendly Jina community on [Slack](https://slack.jina.ai/) (usual response time: 1hr)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}