{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# KNOWLEDGE GRAPH"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. SETUP\n",
    "To prepare your environment, you need to install some packages and enter credentials for the Watson services.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 Install the necessary packages\n",
    "You need the latest versions of these packages:\n",
    "Watson Developer Cloud: a client library for Watson services.\n",
    "NLTK: leading platform for building Python programs to work with human language data.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install mammoth"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install the Watson Developer Cloud package:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install watson-developer-cloud==1.5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Install NLTK:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --upgrade nltk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Install IBM Object Storage Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install ibm-cos-sdk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Import packages and libraries\n",
    "Import the packages and libraries that you'll use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import pandas as pd\n",
    "import mammoth\n",
    "import os, re\n",
    "import networkx as nx\n",
    "import io\n",
    "from io import StringIO\n",
    "import matplotlib.pyplot as plt\n",
    "from collections import Iterable \n",
    "from io import BytesIO\n",
    "import zipfile\n",
    "\n",
    "import ibm_boto3\n",
    "from botocore.client import Config\n",
    "\n",
    "import nltk\n",
    "from nltk import word_tokenize,sent_tokenize,ne_chunk\n",
    "from nltk.corpus import stopwords\n",
    "\n",
    "from bs4 import BeautifulSoup\n",
    "from bs4.element import Comment\n",
    "\n",
    "\n",
    "from watson_developer_cloud import NaturalLanguageUnderstandingV1\n",
    "from watson_developer_cloud.natural_language_understanding_v1 \\\n",
    "  import Features, EntitiesOptions, SemanticRolesOptions, RelationsOptions, KeywordsOptions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nltk.download()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Configuration\n",
    "Add configurable items of the notebook below\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Add your service credentials from Bluemix for the Watson services\n",
    "You must create a Watson Natural Language Understanding service on Bluemix. Create a service for Natural Language Understanding (NLU). Insert the username and password values for your NLU in the following cell. Do not change the values of the version fields.\n",
    "\n",
    "Run the cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "natural_language_understanding = NaturalLanguageUnderstandingV1(\n",
    "  username= '',\n",
    "  password='',\n",
    "  version='2017-02-27')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Add your service credentials for Object Storage\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You must create Object Storage service on Bluemix. To access data in a file in Object Storage, you need the Object Storage authentication credentials. Insert the Object Storage authentication credentials as credentials_1 in the following cell after removing the current contents in the cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Insert the Object Storage authentication credentials as credentials_1 here\n",
    "\n",
    "credentials_1 = {\n",
    "    'IBM_API_KEY_ID': '',\n",
    "    'IAM_SERVICE_ID': '',\n",
    "    'ENDPOINT': '',\n",
    "    'IBM_AUTH_ENDPOINT': '',\n",
    "    'BUCKET': '',\n",
    "}\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Add credentials of config_classification.txt here\n",
    "#Insert the authentication credentials as credentials_2 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Add credentials of config__relations.txt here\n",
    "#Insert the authentication credentials as credentials_3 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Insert archive.zip as StreamingBody object "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Persistence and Storage\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.1 Configure Object Storage Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cos = ibm_boto3.client('s3',\n",
    "                    ibm_api_key_id=credentials_1['IBM_API_KEY_ID'],\n",
    "                    ibm_service_instance_id=credentials_1['IAM_SERVICE_ID'],\n",
    "                    ibm_auth_endpoint=credentials_1['IBM_AUTH_ENDPOINT'],\n",
    "                    config=Config(signature_version='oauth'),\n",
    "                    endpoint_url=credentials_1['ENDPOINT'])\n",
    "\n",
    "def get_file(filename):\n",
    "    '''Retrieve file from Cloud Object Storage'''\n",
    "    fileobject = cos.get_object(Bucket=credentials_1['BUCKET'], Key=filename)['Body']\n",
    "    return fileobject\n",
    "\n",
    "    \n",
    "def get_docx_file():\n",
    "    '''Retrieve file '''\n",
    "    docx_files=[]\n",
    "    zip_ref = zipfile.ZipFile(BytesIO(streaming_body_5.read()),'r')\n",
    "    paths = zip_ref.namelist()\n",
    "    for path in paths:\n",
    "        file=zip_ref.extract(path)\n",
    "        docx_files.append(file)\n",
    "    return docx_files\n",
    "\n",
    "\n",
    "\n",
    "def load_string(fileobject):\n",
    "    '''Load the file contents into a Python string'''\n",
    "\n",
    "    text = fileobject.read()\n",
    "    return text\n",
    "\n",
    "def load_df(fileobject,sheetname):\n",
    "    '''Load file contents into a Pandas dataframe'''\n",
    "    excelFile = pd.ExcelFile(fileobject)\n",
    "    df = excelFile.parse(sheetname)\n",
    "    return df\n",
    "\n",
    "def put_file(filename, filecontents):\n",
    "    '''Write file to Cloud Object Storage'''\n",
    "    resp = cos.put_object(Bucket=credentials_1['BUCKET'], Key=filename, Body=filecontents)\n",
    "    return resp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Data Preparation "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.1 Global variables and functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Maintain tagged text and plain text map\n",
    "tagTextMap ={}\n",
    "\n",
    "\n",
    "    \n",
    "def HtmlToJson(html_file):\n",
    "    '''\n",
    "    function to convert information in Html tables to Json\n",
    "    '''\n",
    "    html_list = pd.read_html(html_file)\n",
    "    # ''' converting the list of dataframe into one dataframe '''\n",
    "    dataframe = pd.concat(html_list)\n",
    "\n",
    "    dataframe, dataframe.columns, dataframe.columns.name = dataframe.iloc[1:], dataframe.loc[0].astype(str), None\n",
    "    \n",
    "    j = dataframe.to_json(orient='records')\n",
    "    info_json = json.loads(j)\n",
    "    return info_json\n",
    "\n",
    "def getRawTextandHtmlTableAsJson(docx_file):\n",
    "    '''\n",
    "    function to extract text and Html tables from docx to Json\n",
    "    '''\n",
    "    result = mammoth.convert_to_html(docx_file)\n",
    "    html_table = HtmlToJson(result.value)\n",
    "    raw_text = mammoth.extract_raw_text(docx_file)\n",
    "    raw_text = raw_text.value\n",
    "    messages = result.messages \n",
    "    return html_table,raw_text\n",
    "    \n",
    "def generate_NLUJson_HtmlTablesJson():\n",
    "    '''\n",
    "    function that returns augmented results from nlu, html tables, raw texts\n",
    "    '''\n",
    "    html_tables=[]\n",
    "    raw_texts=[]\n",
    "    docx_files=[]\n",
    "    augmented_results_from_nlu=[]\n",
    "    docx_files= get_docx_file()\n",
    "    for docx_file in docx_files:\n",
    "        with open(docx_file,'rb') as doc_file:\n",
    "            html_table,raw_text = getRawTextandHtmlTableAsJson(doc_file)\n",
    "            html_tables.append(html_table)\n",
    "            raw_texts.append(raw_text)\n",
    "    for raw_text in raw_texts:\n",
    "        response_nlu = classify_text(str(raw_text), config_classification_json)\n",
    "        augmented_results_from_nlu.append(response_nlu)\n",
    "\n",
    "    return augmented_results_from_nlu, html_tables, raw_texts\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Watson Text Classification\n",
    "Write the classification related utility functions in a modularalized form.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5.1 Watson NLU Classification "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def analyze_using_NLU(text_content):\n",
    "    '''\n",
    "    Call Watson Natural Language Understanding service to obtain analysis results.\n",
    "    '''\n",
    "    response = natural_language_understanding.analyze(\n",
    "        text= text_content,\n",
    "        features=Features(\n",
    "        entities=EntitiesOptions(),\n",
    "        relations=RelationsOptions(),\n",
    "        keywords= KeywordsOptions())\n",
    "    )\n",
    "    return response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5.2 Augumented Classification\n",
    "Custom classification utlity functions for augumenting the results of Watson NLU API call\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def split_sentences(text):\n",
    "    \"\"\" Split text into sentences.\n",
    "    \"\"\"\n",
    "    sentence_delimiters = re.compile(u'[\\\\[\\\\]\\n.!?]')\n",
    "    sentences = sentence_delimiters.split(text)\n",
    "    return sentences\n",
    "\n",
    "def split_into_tokens(text):\n",
    "    \"\"\" Split text into tokens.\n",
    "    \"\"\"\n",
    "    tokens = nltk.word_tokenize(text)\n",
    "    return tokens\n",
    "    \n",
    "def POS_tagging(text):\n",
    "    \"\"\" Generate Part of speech tagging of the text.\n",
    "    \"\"\"\n",
    "    POSofText = nltk.tag.pos_tag(text)\n",
    "    return POSofText\n",
    "\n",
    "def keyword_tagging(tag,tagtext,text):\n",
    "    \"\"\" Tag the text matching keywords.\n",
    "    \"\"\"\n",
    "    if (text.lower().find(tagtext.lower()) != -1):\n",
    "        return text[text.lower().find(tagtext.lower()):text.lower().find(tagtext.lower())+len(tagtext)]\n",
    "    else:\n",
    "        return 'UNKNOWN'\n",
    "    \n",
    "def regex_tagging(tag,regex,text):\n",
    "    \"\"\" Tag the text matching REGEX.\n",
    "    \"\"\"    \n",
    "    p = re.compile(regex, re.IGNORECASE)\n",
    "    matchtext = p.findall(text)\n",
    "    regex_list=[]    \n",
    "    if (len(matchtext)>0):\n",
    "        for regword in matchtext:\n",
    "            regex_list.append(regword)\n",
    "    return regex_list\n",
    "\n",
    "def chunk_tagging(tag,chunk,text):\n",
    "    \"\"\" Tag the text using chunking.\n",
    "    \"\"\"\n",
    "    parsed_cp = nltk.RegexpParser(chunk)\n",
    "    pos_cp = parsed_cp.parse(text)\n",
    "    chunk_list=[]\n",
    "    for root in pos_cp:\n",
    "        if isinstance(root, nltk.tree.Tree):               \n",
    "            if root.label() == tag:\n",
    "                chunk_word = ''\n",
    "                for child_root in root:\n",
    "                    chunk_word = chunk_word +' '+ child_root[0]\n",
    "                chunk_list.append(chunk_word)\n",
    "    return chunk_list\n",
    "    \n",
    "def augument_NLUResponse(responsejson,updateType,text,tag):\n",
    "    \"\"\" Update the NLU response JSON with augumented classifications.\n",
    "    \"\"\"\n",
    "    if(updateType == 'keyword'):\n",
    "        if not any(d.get('text', None) == text for d in responsejson['keywords']):\n",
    "            responsejson['keywords'].append({\"text\":text,\"relevance\":0.5})\n",
    "    else:\n",
    "        if not any(d.get('text', None) == text for d in responsejson['entities']):\n",
    "            responsejson['entities'].append({\"type\":tag,\"text\":text,\"relevance\":0.5,\"count\":1})        \n",
    "    \n",
    "\n",
    "def classify_text(text, config):\n",
    "    \"\"\" Perform augumented classification of the text.\n",
    "    \"\"\"\n",
    "    \n",
    "    response = analyze_using_NLU(text)\n",
    "    responsejson = response\n",
    "    \n",
    "    sentenceList = split_sentences(text)\n",
    "    \n",
    "    tokens = split_into_tokens(text)\n",
    "    \n",
    "    postags = POS_tagging(tokens)\n",
    "    \n",
    "    configjson = json.loads(config)\n",
    "    \n",
    "    for stages in configjson['configuration']['classification']['stages']:\n",
    "        for steps in stages['steps']:\n",
    "            if (steps['type'] == 'keywords'):\n",
    "                for keyword in steps['keywords']:\n",
    "                    for word in sentenceList:\n",
    "                        wordtag = keyword_tagging(keyword['tag'],keyword['text'],word)\n",
    "                        if(wordtag != 'UNKNOWN'):\n",
    "                            augument_NLUResponse(responsejson,'entities',wordtag,keyword['tag'])\n",
    "            elif(steps['type'] == 'd_regex'):\n",
    "                for regex in steps['d_regex']:\n",
    "                    for word in sentenceList:\n",
    "                        regextags = regex_tagging(regex['tag'],regex['pattern'],word)\n",
    "                        if (len(regextags)>0):\n",
    "                            for words in regextags:\n",
    "                                augument_NLUResponse(responsejson,'entities',words,regex['tag'])\n",
    "            elif(steps['type'] == 'chunking'):\n",
    "                for chunk in steps['chunk']:\n",
    "                    chunktags = chunk_tagging(chunk['tag'],chunk['pattern'],postags)\n",
    "                    if (len(chunktags)>0):\n",
    "                        for words in chunktags:\n",
    "                            augument_NLUResponse(responsejson,'entities',words,chunk['tag'])\n",
    "            else:\n",
    "                print('UNKNOWN STEP')\n",
    "    \n",
    "    return responsejson\n",
    "\n",
    "def replace_unicode_strings(response):\n",
    "    \"\"\" Convert dict with unicode strings to strings.\n",
    "    \"\"\"\n",
    "    if isinstance(response, dict):\n",
    "        return {replace_unicode_strings(key): replace_unicode_strings(value) for key, value in response.iteritems()}\n",
    "    elif isinstance(response, list):\n",
    "        return [replace_unicode_strings(element) for element in response]\n",
    "    elif isinstance(response, unicode):\n",
    "        return response.encode('utf-8')\n",
    "    else:\n",
    "        return response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5.3 Correferencing and Augmented Relations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def chunk_sentence(text):\n",
    "    \"\"\" Tag the sentence using chunking.\n",
    "    \"\"\"\n",
    "    grammar = \"\"\"\n",
    "      NP: {<DT|JJ|PRP|NN.*>+} # Chunk sequences of DT,JJ,NN\n",
    "          #}<VB*|DT|JJ|RB|PRP><NN.*>+{  # Chink sequences of VB,DT,JJ,NN       \n",
    "      PP: {<IN><NP>}               # Chunk prepositions followed by NP\n",
    "      V: {<V.*>}                   # Verb      \n",
    "      VP: {<VB*><NP|PP|CLAUSE>+}  # Chunk verbs and their arguments\n",
    "      CLAUSE: {<NP><VP>}           # Chunk NP, VP\n",
    "      \"\"\"  \n",
    "    parsed_cp = nltk.RegexpParser(grammar,loop=2)\n",
    "    pos_cp = parsed_cp.parse(text)\n",
    "    return pos_cp\n",
    "    \n",
    "def find_attrs(subtree,phrase):\n",
    "    attrs = ''\n",
    "    if phrase == 'NP':\n",
    "        for nodes in subtree:\n",
    "            if nodes[1] in ['DT','PRP$','POS','JJ','CD','ADJP','QP','NP','NNP']:\n",
    "                attrs = attrs+' '+nodes[0]\n",
    "    return attrs    \n",
    "    \n",
    "def find_subject(t):\n",
    "    for s in t.subtrees(lambda t: t.label() == 'NP'):\n",
    "        return find_attrs(s,'NP')\n",
    "    \n",
    "def resolve_coreference(text, config):\n",
    "    \"\"\" Resolve coreferences in the text for Nouns that are Subjects in a sentence\n",
    "    \"\"\"\n",
    "    sentenceList = split_sentences(text)\n",
    "    referenceSubject = ''\n",
    "    sentenceText = ''\n",
    "    configjson = json.loads(config)\n",
    "    \n",
    "    for sentences in sentenceList:    \n",
    "        tokens = split_into_tokens(sentences)   \n",
    "        postags = POS_tagging(tokens)\n",
    "        sentencetags = chunk_sentence(postags)\n",
    "        subjects = find_subject(sentencetags)\n",
    "        for rules in configjson['configuration']['coreference']['rules']:\n",
    "            if (rules['type'] == 'chunking'):\n",
    "                for tags in rules['chunk']:\n",
    "                    chunktags = chunk_tagging(tags['tag'],tags['pattern'],postags)\n",
    "                    if (len(chunktags)>0):\n",
    "                        for words in chunktags:\n",
    "                            if tags['tag'] == 'PRP':\n",
    "                                if subjects == '':\n",
    "                                    sentenceText = sentenceText+sentences.replace(words,referenceSubject)+'. '\n",
    "                            elif tags['tag'] == 'NAME':\n",
    "                                if words == subjects:\n",
    "                                    referenceSubject = words\n",
    "                                    sentenceText = sentenceText+sentences+'. '\n",
    "                    \n",
    "    return sentenceText\n",
    "\n",
    "def disambiguate_entities(text):\n",
    "    \"\"\" Resolve disambiguity in the text using entities and entity resolution performed using Watson NLU\n",
    "    \"\"\"    \n",
    "    sentenceList = split_sentences(text)\n",
    "    taggedtext = text\n",
    "    response = analyze_using_NLU(text)\n",
    "    responsejson = response\n",
    "    for sentences in sentenceList:\n",
    "        tokens = split_into_tokens(sentences)\n",
    "        postags = POS_tagging(tokens)\n",
    "        name_tagged_text = chunk_tagging('NAME','NAME:{<NNP>+}',postags)\n",
    "    for entities in responsejson['entities']:\n",
    "        regexstr = entities['text']+'(?!>)'\n",
    "        regex = re.compile(regexstr, re.IGNORECASE)\n",
    "        tagText = '<'+entities['type']+':'+entities['text']+'>'\n",
    "        taggedtext = re.sub(regexstr,tagText,taggedtext)\n",
    "        tagTextMap[tagText] = entities['text']\n",
    "    \n",
    "    for roles in responsejson['semantic_roles']:\n",
    "        if 'entities' not in roles['subject']:\n",
    "            print('NO ENTITY')\n",
    "        else:\n",
    "            for entity in roles['subject']['entities']:\n",
    "                if 'disambiguation' not in entity:\n",
    "                    print('NO DISAMBIGUATION')\n",
    "                else:\n",
    "                    regexstr = roles['subject']['text']+'(?!>)'\n",
    "                    regex = re.compile(regexstr, re.IGNORECASE)\n",
    "                    tagText = '<'+entity['type']+':'+entity['text']+'>'\n",
    "                    taggedtext = re.sub(regexstr,tagText,taggedtext)\n",
    "                    tagTextMap[tagText] = entity['text']\n",
    "    \n",
    "    return taggedtext\n",
    "\n",
    "def extract_relations(text, config,relations):\n",
    "    \"\"\" Extract entity relationships in a sentence\n",
    "    \"\"\"    \n",
    "    sentenceList = split_sentences(text)\n",
    "    configjson = json.loads(config)\n",
    "     \n",
    "    for sentences in sentenceList:\n",
    "        for rules in configjson['configuration']['relations']['rules']:\n",
    "            if (rules['type'] == 'd_regex'):\n",
    "                for regex in rules['d_regex']:\n",
    "                    regextags = regex_tagging(regex['tag'],regex['pattern'],sentences)\n",
    "                    if (len(regextags)>0):\n",
    "                        for words in regextags:\n",
    "                            relations.append((tagTextMap[words[0]],regex['tag'],tagTextMap[words[2]]))\n",
    "         \n",
    "    return relations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5.4 Knowledge graph utility functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_nodes_dataframe(G):\n",
    "    '''\n",
    "    function to create nodes dataframe\n",
    "    '''\n",
    "    nodes_df = pd.DataFrame(list(G.nodes(data=True)))\n",
    "    nodes_df.columns = ['entity_names','entitity_attributes']\n",
    "    return nodes_df\n",
    "\n",
    "def create_entityNodes(G,results_from_nlu):\n",
    "    '''\n",
    "    function to create entity nodes\n",
    "    '''\n",
    "    for j in range(len(results_from_nlu)):\n",
    "        for i in range(len(results_from_nlu[j]['entities'])):\n",
    "            new_node_name = results_from_nlu[j]['entities'][i]['text']\n",
    "            G.add_node(new_node_name)\n",
    "            for k,v in results_from_nlu[j]['entities'][i].items():\n",
    "                if( k != 'text'):\n",
    "                    G.node[new_node_name][k]=v\n",
    "\n",
    "def filter_and_format_relations(relationships):\n",
    "    '''\n",
    "    function to filter and format relations\n",
    "    '''\n",
    "    req_relations=[]\n",
    "    filter_relations=[]\n",
    "    relations= relationships\n",
    "    for rel in relations:\n",
    "        r= rel['type']\n",
    "        score = rel['score']\n",
    "        entity_name= list()\n",
    "        entity_type= list()\n",
    "        for arg in rel['arguments']:\n",
    "            entity_name.append(arg['entities'][0]['text'])\n",
    "            entity_type.append(arg['entities'][0]['type'])\n",
    "        if((entity_type[0] == 'GeopoliticalEntity' or entity_type[1] == 'GeopoliticalEntity')):\n",
    "            if(any(nodes_df['entity_names']== entity_name[0] ) and any(nodes_df['entity_names']== entity_name[1])):\n",
    "                filter_relations.append(rel)\n",
    "    \n",
    "    \n",
    "    for filter_rel in filter_relations:\n",
    "        r= filter_rel['type']\n",
    "        score = filter_rel['score']\n",
    "        text= list()\n",
    "        for arg in filter_rel['arguments']:\n",
    "            text.append(arg['entities'][0]['text'])\n",
    "\n",
    "        rel_tuple= list()\n",
    "        rel_tuple.append(text[0])\n",
    "        rel_tuple.append(r)\n",
    "        rel_tuple.append(text[1])\n",
    "\n",
    "        rel_tuple= tuple(rel_tuple)\n",
    "\n",
    "        req_relations.append(rel_tuple)\n",
    "\n",
    "        \n",
    "    return req_relations\n",
    "            \n",
    "    \n",
    "def draw_simple_graph(graph):\n",
    "    '''\n",
    "    funtion to draw graph\n",
    "    '''\n",
    "    nodes = []\n",
    "    labels = []\n",
    "    edges = []\n",
    "    # extract nodes from graph\n",
    "    for tuples in graph:\n",
    "        nodes.append(tuples[0])\n",
    "        nodes.append(tuples[2])\n",
    "        \n",
    "    # extract edges from graph\n",
    "    for edgepairs in graph:\n",
    "        edges.append((edgepairs[0],edgepairs[2]))        \n",
    "    # extract edge labels from graph\n",
    "    for edgetuples in graph:\n",
    "        labels.append(edgetuples[1])\n",
    "    # create networkx graph\n",
    "    G=nx.Graph()\n",
    "    # add nodes\n",
    "    for node in nodes:\n",
    "        G.add_node(node)\n",
    "    # add edges\n",
    "    for edge in graph:\n",
    "        G.add_edge(edge[0], edge[2])\n",
    "\n",
    "    # draw graph\n",
    "    pos = nx.shell_layout(G)\n",
    "    #print(pos)\n",
    "    nx.draw(G, pos,with_labels = True)\n",
    "    edge_labels = dict(zip(edges, labels))\n",
    "    nx.draw_networkx_edge_labels(G, pos, edge_labels = edge_labels)\n",
    "\n",
    "    # show graph\n",
    "    plt.show()\n",
    "    \n",
    "    return G, pos, edge_labels   \n",
    "\n",
    "\n",
    "def knowledge_graph(results_from_nlu):\n",
    "    '''\n",
    "    funtion to draw knowledge graph\n",
    "    '''\n",
    "    relationships=[]\n",
    "    for i in range(len(results_from_nlu)):\n",
    "        for j in range(len(results_from_nlu[i]['relations'])):\n",
    "            relationships.append(results_from_nlu[i]['relations'][j])\n",
    "    rel=[]\n",
    "    configjson= json.loads(config_relation_json)\n",
    "    for i in range(len(raw_texts)):\n",
    "        res = extract_relations(raw_texts[i],configjson)\n",
    "        if res:\n",
    "            rel= res\n",
    "    response=filter_and_format_relations(relationships)\n",
    "    response= response + rel\n",
    "    G, pos, edge_labels = draw_simple_graph(response)\n",
    "    return G, pos, edge_labels\n",
    "\n",
    "\n",
    "def disambiguate_entities(text):\n",
    "    '''\n",
    "    funtion to disambiguate entities\n",
    "    '''\n",
    "    keyword=[]\n",
    "    for j in range(len(results_from_nlu)):\n",
    "        for i in range(len(results_from_nlu[j]['keywords'])):\n",
    "            for k,v in results_from_nlu[j]['keywords'][i].items():\n",
    "                if(k=='text'):\n",
    "                    keyword.append(v)\n",
    "                \n",
    "    for word in keyword:\n",
    "        tag= '<Keyword:'\n",
    "        if word in text:\n",
    "            text=re.sub(word,tag+word+'>' ,text)\n",
    "    return text\n",
    "\n",
    "def extract_relations(text,configjson):\n",
    "    '''\n",
    "    funtion to extract relations\n",
    "    '''\n",
    "    relationship=[]\n",
    "    relations= configjson['configuration']['relations']['rules']\n",
    "    text=disambiguate_entities(text)\n",
    "    for rel in relations:\n",
    "        match= re.findall(rel['pattern'],text)\n",
    "        if match:\n",
    "            temp1= re.split('<Keyword:',match[0][0])\n",
    "            temp2= re.split('<Keyword:',match[0][2])\n",
    "            match1= re.split('>',temp1[1])\n",
    "            match2= re.split('>',temp2[1])\n",
    "            tuplerel=[]\n",
    "            tuplerel.append(match1[0])\n",
    "            tuplerel.append(rel['tag'])\n",
    "            tuplerel.append(match2[0])\n",
    "            relationship.append(tuple(tuplerel))\n",
    "    return relationship"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. Process"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config_classification_json=load_string(get_file(credentials_2['FILE'])).decode(\"utf-8\")\n",
    "\n",
    "config_relation_json=load_string(get_file(credentials_3['FILE'])).decode(\"utf-8\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results_from_nlu, results_from_htmlTable, raw_texts = generate_NLUJson_HtmlTablesJson()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G = nx.MultiDiGraph()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "create_entityNodes(G,results_from_nlu)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "nodes_df = create_nodes_dataframe(G)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G, pos, edge_labels = knowledge_graph(results_from_nlu)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.5",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}