{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Demo using ElasticSearch in Python\n",
    "=====================\n",
    "\n",
    "\n",
    "This is a quick demonstration for using ElasticSearch in Python.\n",
    "\n",
    "Some materials are taken from https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html, it's a great book!\n",
    "\n",
    "\n",
    "## elasticsearch.yml\n",
    "\n",
    "Some changes you need to make before lauching a node:\n",
    "\n",
    "- Change `cluster.name` for auto-discovery or not-auto-discovery cluster in your network\n",
    "- Change `node.name` for easy determine which node are in trouble\n",
    "\n",
    "Some more options:\n",
    "\n",
    "- Lock the memory by setting `bootstrap.mlockall` to `true` for performance purpose\n",
    "- Set `network.host` to `127.0.0.1` for security reason\n",
    "\n",
    "## pyelasticsearch\n",
    "\n",
    "We use `pyelasticsearch` package for wrapping ElasticSearch RESTful API around Python in this demo.\n",
    "\n",
    "Install it using `pip install pyelasticsearch`\n",
    "\n",
    "\n",
    "### Set things up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import `ElasticSearch` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pyelasticsearch import ElasticSearch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Config url for using ElasticSearch, there're more parameters but we're good for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "es = ElasticSearch('http://localhost:9200')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We check the health first"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'active_primary_shards': 0,\n",
       " 'active_shards': 0,\n",
       " 'cluster_name': 'elasticsearch_tai-dev',\n",
       " 'initializing_shards': 0,\n",
       " 'number_of_data_nodes': 1,\n",
       " 'number_of_in_flight_fetch': 0,\n",
       " 'number_of_nodes': 1,\n",
       " 'number_of_pending_tasks': 0,\n",
       " 'relocating_shards': 0,\n",
       " 'status': 'green',\n",
       " 'timed_out': False,\n",
       " 'unassigned_shards': 0}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.health()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All we care for now is the 'green' status, that means all things are OK."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Fact**: `health` method is a wrapper for calling `GET /_cluster/health?pretty` directly using API."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CRUD: create-read-update-delete"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now before we have anything to do with ElasticSearch, we need to index our documents to ElasticSearch database."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': '1',\n",
       " '_index': 'library',\n",
       " '_type': 'books',\n",
       " '_version': 1,\n",
       " 'created': True}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.index('library', # Index name\n",
    "         'books',   # Type name\n",
    "         {\n",
    "            'title': 'A very interesting name',\n",
    "            'name': {\n",
    "                'first': 'Hugh',\n",
    "                'last': 'Jackman'\n",
    "            },\n",
    "            'publish_date': '2015-07-02',\n",
    "            'price': 20,\n",
    "         },\n",
    "         id=1        # Doc ID\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Read"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': '1',\n",
       " '_index': 'library',\n",
       " '_source': {'name': {'first': 'Hugh', 'last': 'Jackman'},\n",
       "  'price': 20,\n",
       "  'publish_date': '2015-07-02',\n",
       "  'title': 'A very interesting name'},\n",
       " '_type': 'books',\n",
       " '_version': 1,\n",
       " 'found': True}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.get('library', 'books', 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the document is not existed, an error is raised:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:elasticsearch:GET /library/books/123 [status:404 request:0.004s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is an error!\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    es.get('library', 'books', 123)\n",
    "except:\n",
    "    print(\"This is an error!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Optional (and ugly) ID:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': 'AU5N-j9DhyYCAHYcFB3R',\n",
       " '_index': 'library',\n",
       " '_type': 'books',\n",
       " '_version': 1,\n",
       " 'created': True}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.index('library', # Index name\n",
    "         'books',   # Type name\n",
    "         {\n",
    "            'title': 'Another interesting name',\n",
    "            'name': {\n",
    "                'first': 'Tom',\n",
    "                'last': 'Cruise'\n",
    "            },\n",
    "            'publish_date': '2015-08-02',\n",
    "            'price': 21,\n",
    "         },\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get me that book:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': 'AU5N-j9DhyYCAHYcFB3R',\n",
       " '_index': 'library',\n",
       " '_source': {'name': {'first': 'Tom', 'last': 'Cruise'},\n",
       "  'price': 21,\n",
       "  'publish_date': '2015-08-02',\n",
       "  'title': 'Another interesting name'},\n",
       " '_type': 'books',\n",
       " '_version': 1,\n",
       " 'found': True}"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.get('library', 'books', 'AU5N-j9DhyYCAHYcFB3R')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Update"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': '1', '_index': 'library', '_type': 'books', '_version': 2}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.update('library', # Index name\n",
    "          'books',   # Type name\n",
    "          id = 1,    # Doc ID\n",
    "          doc = {\n",
    "             'title': 'A very interesting name 2',\n",
    "             'name': {\n",
    "                 'first': 'Hugh',\n",
    "                 'last': 'Jackman'\n",
    "             },\n",
    "             'publish_date': '2015-07-03',\n",
    "             'price': 30,\n",
    "          },\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It worked, but the method is kind of ugly though."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': '1',\n",
       " '_index': 'library',\n",
       " '_source': {'name': {'first': 'Hugh', 'last': 'Jackman'},\n",
       "  'price': 30,\n",
       "  'publish_date': '2015-07-03',\n",
       "  'title': 'A very interesting name 2'},\n",
       " '_type': 'books',\n",
       " '_version': 2,\n",
       " 'found': True}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.get('library', 'books', 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The method perform a partial update:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': '1', '_index': 'library', '_type': 'books', '_version': 3}"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.update('library', # Index name\n",
    "          'books',   # Type name\n",
    "          id = 1,    # Doc ID\n",
    "          doc = {\n",
    "             'price': 90,\n",
    "          },\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': '1',\n",
       " '_index': 'library',\n",
       " '_source': {'name': {'first': 'Hugh', 'last': 'Jackman'},\n",
       "  'price': 90,\n",
       "  'publish_date': '2015-07-03',\n",
       "  'title': 'A very interesting name 2'},\n",
       " '_type': 'books',\n",
       " '_version': 3,\n",
       " 'found': True}"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.get('library', 'books', 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Delete"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': '1',\n",
       " '_index': 'library',\n",
       " '_type': 'books',\n",
       " '_version': 4,\n",
       " 'found': True}"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.delete('library', 'books', 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:elasticsearch:GET /library/books/1 [status:404 request:0.004s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Not found!\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    es.get('library', 'books', 1)\n",
    "except:\n",
    "    print('Not found!')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bulk indexing and Search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Bulk index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Input data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "users = [{ \"email\" : \"john@smith.com\", \"name\" : \"John Smith\", \"username\" : \"@john\" }, \n",
    "        { \"email\" : \"mary@jones.com\", \"name\" : \"Mary Jones\", \"username\" : \"@mary\" }]\n",
    "\n",
    "tweet = [{ \"date\" : \"2014-09-13\", \"name\" : \"Mary Jones\", \"tweet\" : \"Elasticsearch means full text search has never been so easy\", \"user_id\" : 2 },\n",
    "        { \"date\" : \"2014-09-14\", \"name\" : \"John Smith\", \"tweet\" : \"@mary it is not just text, it does everything\", \"user_id\" : 1 },\n",
    "        { \"date\" : \"2014-09-15\", \"name\" : \"Mary Jones\", \"tweet\" : \"However did I manage before Elasticsearch?\", \"user_id\" : 2 },\n",
    "        { \"date\" : \"2014-09-16\", \"name\" : \"John Smith\", \"tweet\" : \"The Elasticsearch API is really easy to use\", \"user_id\" : 1 },\n",
    "        { \"date\" : \"2014-09-17\", \"name\" : \"Mary Jones\", \"tweet\" : \"The Query DSL is really powerful and flexible\", \"user_id\" : 2 }]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bulk indexing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'errors': False,\n",
       " 'items': [{'index': {'_id': '0',\n",
       "    '_index': 'demo',\n",
       "    '_type': 'user',\n",
       "    '_version': 1,\n",
       "    'status': 201}},\n",
       "  {'index': {'_id': '1',\n",
       "    '_index': 'demo',\n",
       "    '_type': 'user',\n",
       "    '_version': 1,\n",
       "    'status': 201}}],\n",
       " 'took': 886}"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.bulk((es.index_op(user, id=i) for i, user in enumerate(users)),\n",
    "        index='demo',\n",
    "        doc_type='user')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'errors': False,\n",
       " 'items': [{'index': {'_id': '0',\n",
       "    '_index': 'demo',\n",
       "    '_type': 'tweet',\n",
       "    '_version': 1,\n",
       "    'status': 201}},\n",
       "  {'index': {'_id': '1',\n",
       "    '_index': 'demo',\n",
       "    '_type': 'tweet',\n",
       "    '_version': 1,\n",
       "    'status': 201}},\n",
       "  {'index': {'_id': '2',\n",
       "    '_index': 'demo',\n",
       "    '_type': 'tweet',\n",
       "    '_version': 1,\n",
       "    'status': 201}},\n",
       "  {'index': {'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_type': 'tweet',\n",
       "    '_version': 1,\n",
       "    'status': 201}},\n",
       "  {'index': {'_id': '4',\n",
       "    '_index': 'demo',\n",
       "    '_type': 'tweet',\n",
       "    '_version': 1,\n",
       "    'status': 201}}],\n",
       " 'took': 53}"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.bulk((es.index_op(t, id=i) for i, t in enumerate(tweet)),\n",
    "        index='demo',\n",
    "        doc_type='tweet')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Search all"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '4',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'date': '2014-09-17',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'The Query DSL is really powerful and flexible',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '0',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'email': 'john@smith.com',\n",
       "     'name': 'John Smith',\n",
       "     'username': '@john'},\n",
       "    '_type': 'user'},\n",
       "   {'_id': '0',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'date': '2014-09-13',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'Elasticsearch means full text search has never been so easy',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '1',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'email': 'mary@jones.com',\n",
       "     'name': 'Mary Jones',\n",
       "     'username': '@mary'},\n",
       "    '_type': 'user'},\n",
       "   {'_id': '1',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'date': '2014-09-14',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': '@mary it is not just text, it does everything',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '2',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'date': '2014-09-15',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'However did I manage before Elasticsearch?',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 1.0,\n",
       "  'total': 7},\n",
       " 'timed_out': False,\n",
       " 'took': 6}"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.search({})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Match"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Simple match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '0',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.625,\n",
       "    '_source': {'email': 'john@smith.com',\n",
       "     'name': 'John Smith',\n",
       "     'username': '@john'},\n",
       "    '_type': 'user'},\n",
       "   {'_id': '1',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.625,\n",
       "    '_source': {'date': '2014-09-14',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': '@mary it is not just text, it does everything',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.19178301,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 0.625,\n",
       "  'total': 3},\n",
       " 'timed_out': False,\n",
       " 'took': 282}"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.search('name:john', index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Query API, yeah, we can hide it for sometime but we can't escape:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "query = {'query':\n",
    "            {'match': {'name': 'john'}}\n",
    "        }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '0',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.625,\n",
       "    '_source': {'email': 'john@smith.com',\n",
       "     'name': 'John Smith',\n",
       "     'username': '@john'},\n",
       "    '_type': 'user'},\n",
       "   {'_id': '1',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.625,\n",
       "    '_source': {'date': '2014-09-14',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': '@mary it is not just text, it does everything',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.19178301,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 0.625,\n",
       "  'total': 3},\n",
       " 'timed_out': False,\n",
       " 'took': 9}"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How about 2 terms?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '0',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.22097087,\n",
       "    '_source': {'email': 'john@smith.com',\n",
       "     'name': 'John Smith',\n",
       "     'username': '@john'},\n",
       "    '_type': 'user'},\n",
       "   {'_id': '0',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.22097087,\n",
       "    '_source': {'date': '2014-09-13',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'Elasticsearch means full text search has never been so easy',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '1',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.22097087,\n",
       "    '_source': {'email': 'mary@jones.com',\n",
       "     'name': 'Mary Jones',\n",
       "     'username': '@mary'},\n",
       "    '_type': 'user'},\n",
       "   {'_id': '1',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.22097087,\n",
       "    '_source': {'date': '2014-09-14',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': '@mary it is not just text, it does everything',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '4',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.028130025,\n",
       "    '_source': {'date': '2014-09-17',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'The Query DSL is really powerful and flexible',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '2',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.028130025,\n",
       "    '_source': {'date': '2014-09-15',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'However did I manage before Elasticsearch?',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.028130025,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 0.22097087,\n",
       "  'total': 7},\n",
       " 'timed_out': False,\n",
       " 'took': 179}"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = {'query':\n",
    "            {'match': {'name': 'john mary'}}\n",
    "        }\n",
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And phrase?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [], 'max_score': None, 'total': 0},\n",
       " 'timed_out': False,\n",
       " 'took': 147}"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = {'query':\n",
    "            {'match_phrase': {'name': 'john mary'}}\n",
    "        }\n",
    "\n",
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`search` does not return an error like the `get` method, this kind of behavior is much less scary."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Boolean combination"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can write boolean combinations with `must`, `must_not` and `should`:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Does John Smith mention \"API\" in his tweet?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.38595587,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 0.38595587,\n",
       "  'total': 1},\n",
       " 'timed_out': False,\n",
       " 'took': 13}"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \\\n",
    "{\n",
    "    \"query\": {\n",
    "        \"bool\": {\n",
    "            \"must\": [\n",
    "                {\n",
    "                    \"match_phrase\": {\n",
    "                        \"name\": \"john smith\"\n",
    "                    }\n",
    "                },\n",
    "                {\n",
    "                    \"match\": {\n",
    "                        \"tweet\": \"API\"\n",
    "                    }\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can rank the importance of statments in combination using `boost` field:\n",
    "\n",
    "We try it with 'DSL' and 'API':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '4',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.04016714,\n",
       "    '_source': {'date': '2014-09-17',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'The Query DSL is really powerful and flexible',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.0029369325,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 0.04016714,\n",
       "  'total': 2},\n",
       " 'timed_out': False,\n",
       " 'took': 10}"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \\\n",
    "{\n",
    "    \"query\": {\n",
    "        \"bool\": {\n",
    "            \"should\": [\n",
    "                {\n",
    "                    \"match\": {\n",
    "                        \"tweet\": {\n",
    "                            \"query\": \"DSL\",\n",
    "                            \"boost\": 5,\n",
    "                        }                        \n",
    "                    }\n",
    "                },\n",
    "                {\n",
    "                    \"match\": {\n",
    "                        \"tweet\": \"API\"\n",
    "                    }\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now change `boost`, and the order change:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.025078464,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '4',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.0072710635,\n",
       "    '_source': {'date': '2014-09-17',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'The Query DSL is really powerful and flexible',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 0.025078464,\n",
       "  'total': 2},\n",
       " 'timed_out': False,\n",
       " 'took': 9}"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \\\n",
    "{\n",
    "    \"query\": {\n",
    "        \"bool\": {\n",
    "            \"should\": [\n",
    "                {\n",
    "                    \"match\": {\n",
    "                        \"tweet\": {\n",
    "                            \"query\": \"DSL\",\n",
    "                            \"boost\": 0.5,\n",
    "                        }                        \n",
    "                    }\n",
    "                },\n",
    "                {\n",
    "                    \"match\": {\n",
    "                        \"tweet\": {\n",
    "                            \"query\": \"API\"\n",
    "                        }\n",
    "                    }\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Highlight the result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.38595587,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet',\n",
       "    'highlight': {'tweet': ['The Elasticsearch <em>API</em> is really easy to use']}}],\n",
       "  'max_score': 0.38595587,\n",
       "  'total': 1},\n",
       " 'timed_out': False,\n",
       " 'took': 15}"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \\\n",
    "{\n",
    "    \"query\": {\n",
    "        \"bool\": {\n",
    "            \"must\": [\n",
    "                {\n",
    "                    \"match_phrase\": {\n",
    "                        \"name\": \"john smith\"\n",
    "                    }\n",
    "                },\n",
    "                {\n",
    "                    \"match\": {\n",
    "                        \"tweet\": \"API\"\n",
    "                    }\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    },\n",
    "    \"highlight\": {\n",
    "        \"fields\": {\n",
    "            \"tweet\": {}\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Filter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Find all tweets posted after '2014-09-15':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '4',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'date': '2014-09-17',\n",
       "     'name': 'Mary Jones',\n",
       "     'tweet': 'The Query DSL is really powerful and flexible',\n",
       "     'user_id': 2},\n",
       "    '_type': 'tweet'},\n",
       "   {'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 1.0,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 1.0,\n",
       "  'total': 2},\n",
       " 'timed_out': False,\n",
       " 'took': 8}"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \\\n",
    "{\n",
    "    \"query\": {\n",
    "        \"filtered\": {\n",
    "            \"filter\": {\n",
    "                \"range\": {\n",
    "                    \"date\": {\n",
    "                        \"gt\": '2014-09-15'\n",
    "                    }\n",
    "                }\n",
    "            }\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How about just list only John Smith's tweets, after 2014-09-15?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_shards': {'failed': 0, 'successful': 5, 'total': 5},\n",
       " 'hits': {'hits': [{'_id': '3',\n",
       "    '_index': 'demo',\n",
       "    '_score': 0.38356602,\n",
       "    '_source': {'date': '2014-09-16',\n",
       "     'name': 'John Smith',\n",
       "     'tweet': 'The Elasticsearch API is really easy to use',\n",
       "     'user_id': 1},\n",
       "    '_type': 'tweet'}],\n",
       "  'max_score': 0.38356602,\n",
       "  'total': 1},\n",
       " 'timed_out': False,\n",
       " 'took': 12}"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \\\n",
    "{\n",
    "    \"query\": {\n",
    "        \"filtered\": {\n",
    "            \"query\": {\n",
    "                \"match_phrase\": {\n",
    "                    \"name\": \"John Smith\"\n",
    "                }\n",
    "            },\n",
    "            \"filter\": {\n",
    "                \"range\": {\n",
    "                    \"date\": {\n",
    "                        \"gt\": '2014-09-15'\n",
    "                    }\n",
    "                }\n",
    "            }\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "es.search(query, index='demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Analysis and Analyzer\n",
    "\n",
    "All the fancy things above worked mostly because of Analysis.\n",
    "\n",
    "> Analysis = Tokenization + Token filters\n",
    "\n",
    "> Analyzer = Character filters + Tokenizer + Token filters\n",
    "\n",
    "\n",
    "Analyzers are language-specific, as of July 2015, Vietnamese is not supported, so we won't talk much about it then."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Mapping\n",
    "\n",
    "Mapping is kind of schema in ElasticSearch. It's automatically generated if we don't customize it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'demo': {'mappings': {'tweet': {'properties': {'date': {'format': 'dateOptionalTime',\n",
       "      'type': 'date'},\n",
       "     'name': {'type': 'string'},\n",
       "     'tweet': {'type': 'string'},\n",
       "     'user_id': {'type': 'long'}}}}}}"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.get_mapping('demo', 'tweet')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can add a new field using `put_mapping` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'demo': {'mappings': {'tweet': {'properties': {'date': {'format': 'dateOptionalTime',\n",
       "      'type': 'date'},\n",
       "     'name': {'type': 'string'},\n",
       "     'tweet': {'type': 'string'},\n",
       "     'user_id': {'type': 'long'},\n",
       "     'very_new_field': {'type': 'string'}}}}}}"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "es.put_mapping('demo', 'tweet',\n",
    "               {'tweet':\n",
    "                {'properties':\n",
    "                 {'very_new_field': {'type': 'string'}}}})\n",
    "\n",
    "es.get_mapping('demo', 'tweet')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can't change mapping of an existing field though:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:elasticsearch:PUT /demo/tweet/_mapping [status:400 request:0.068s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Error\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    es.put_mapping('demo', 'tweet',\n",
    "                   {'tweet':\n",
    "                    {'properties':\n",
    "                     {'very_new_field': {'type': 'long'}}}})\n",
    "except:\n",
    "    print(\"Error\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So if you must, specific your mapping before indexing to make sure things go in the way you want."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}