{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CWPK \\#62: Network and Graph Analysis\n",
    "\n",
    "\n",
    "Knowledge Graphs Deserve Attention in Their Own Right\n",
    "--------------------------\n",
    "\n",
    "<div style=\"float: left; width: 305px; margin-right: 10px;\">\n",
    "\n",
    "<img src=\"http://kbpedia.org/cwpk-files/cooking-with-kbpedia-305.png\" title=\"Cooking with KBpedia\" width=\"305\" />\n",
    "\n",
    "</div>\n",
    "\n",
    "We first introduced [NetworkX](https://networkx.github.io/) in installment [**CWPK #56**](https://www.mkbergman.com/2404/cwpk-56-graph-visualization-and-extraction/) of our [*Cooking with Python and KBpedia*](https://www.mkbergman.com/cooking-with-python-and-kbpedia/) series. The purpose of NetworkX in that installment was to stage data for graph visualizations. In today's installment, we look at the other side of the NetworkX coin; that is, as a graph analytics capability. We will also discuss NetworkX in relation to staging data for machine learning.\n",
    "\n",
    "The idea of [graphs](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) or [networks](https://en.wikipedia.org/wiki/Network_theory) is at the center of the concept of [knowledge graphs](https://en.wikipedia.org/wiki/Knowledge_graph). Graphs are unique information artifacts that can be analyzed in their own right as well as being foundations for many unique analytical techniques, including for [machine learning](https://en.wikipedia.org/wiki/Machine_learning) and its [deep learning](https://en.wikipedia.org/wiki/Deep_learning) subset. Still, graphs as conceptual and mathematical structures are of relatively recent vintage. For example, the field known as [graph theory](https://en.wikipedia.org/wiki/Graph_theory) is less than 300 years old. I outlined much of the intellectual history of graphs and their role in analysis in a 2012 article, [*The Age of the Graph*](https://www.mkbergman.com/1020/the-age-of-the-graph/). \n",
    "\n",
    "Graph or network analysis has three principal aspects. The first aspect is to analyze the nature of the graph itself, with its connections, topologies and paths. The second is to use the structural aspects of the graph representation in order to conduct unique analyses. Some of these analyses relate to community or influence or relatedness. The third aspect is to use various or all aspects of the graph representation of the domain to provide, through dimensionality reduction, tractable and feature-rich methods for analyzing or conducting data science work useful to the domain. We'll briefly cover the first two aspects in this installment. The remaining installments in this **Part VI** relate more to the third aspect of graph and deep graph representation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initial Setup\n",
    "\n",
    "We will pick up with our NetworkX work from [**CWPK #56**](https://www.mkbergman.com/2404/cwpk-56-graph-visualization-and-extraction/) to start this installment. (See the concluding sections below if you have not already generated the <code>graph_specs.csv</code> file.)\n",
    "\n",
    "Since I have been away from the code for a bit, I first decide to make sure my Python packages are up-to-date by running this standard command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    ">>>conda update --all"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we invoke our standard start-up routine:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cowpoke.__main__ import *\n",
    "from cowpoke.config import *\n",
    "from owlready2 import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then want to bring NetworkX into our workspace, along with pandas for data management. The routine we are going to write will read our earlier <code>graph_specs.csv</code> file using <code>pandas</code>. We will use this specification to create a <code>networkx</code> representation of the KBpedia structure, and then begin reporting on some basic graph stats (which will take a few seconds to run):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import networkx as nx\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('C:/1-PythonProjects/kbpedia/v300/extractions/data/graph_specs.csv')\n",
    "Graphtype = nx.DiGraph()\n",
    "G = nx.from_pandas_edgelist(df, edge_attr='weight', create_using=Graphtype)\n",
    "print('Graph construction complete.')\n",
    "\n",
    "# Print the number of nodes in the graph\n",
    "print('Number of Nodes:', len(G.nodes()))\n",
    "#\n",
    "\n",
    "print('Edges:', G.edges('Mammal'))\n",
    "#\n",
    "# Get the subgraph of all nodes around node\n",
    "sub = [ n[1] for n in G.edges('Mammal') ]\n",
    "# Actually, need to add the 'marge' node too\n",
    "sub.append('Mammal')\n",
    "#\n",
    "# Now create a new graph, which is the subgraph\n",
    "sg = nx.Graph(G.subgraph(sub))\n",
    "#\n",
    "# Print the nodes of the new subgraph and edges\n",
    "print('Subgraph nodes:', sg.nodes())\n",
    "print('Subgraph edges:', sg.edges())\n",
    "#\n",
    "#\n",
    "# Print basic graph info\n",
    "#info=nx.info(G)\n",
    "print('Basic graph info:', nx.info(G))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have picked 'mammal' to generate some subgraphs and we also call up basic graph info based on <code>networkx</code>. As a [directed graph](https://en.wikipedia.org/wiki/Directed_graph), KBpedia can be characterized by both 'in degree' and 'out degree'. 'in degree' is the number of edges pointing to a given node (or vertex); 'out degree' is the opposite. The average across all nodes in KBpedia exceeds 1.3. Both measures are the same because our only edge type in this structure is <code>subClassOf</code>, which is transitive."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Network Metrics and Operations\n",
    "\n",
    "So we see that our KBpedia graph has loaded properly, and now we are ready to do some basic network analysis. Most of the analysis deals with the relations structure of the graph. NetworkX has a very clean interface to common measures and metrics of graphs, as our examples below demonstrate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "'**Density**' is the ratio of actual edges in the network to all possible edges in the network, and ranges from 0 to 1. A 'dense' graph is one where the number of edges is close to the maximal number of edges; a 'sparse' graph is the opposite. The maximal number of edges is calculated as the potential connections, or nodes X (nodes -1). This potential is multiplied by two for a directed graph, since A &rarr; B and B &rarr; A are both possible. The density is thus the actual number of connections divided by the potential number. The density of KBpedia is quite sparse. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Density:', nx.density(G))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "'**Degree**' is a measure to find the most important nodes in graph, since a node's degree is the sum of its edges. You can find the degree for an individual node, or the max ones as these two algorithms indicate: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Degree:', nx.degree(G,'Mammal'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "'**Average clustering**' is the sum of all node clusterings. A node is clustered if it has a relatively high number of actual links to neighbors in relation to potential links to neighbors. A [small-world network](https://en.wikipedia.org/wiki/Small-world_network) is one where the distance between random nodes grows in proportion to the natural log of the number of nodes in the graph. Low average clustering is an indicator of a small-world network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Average clustering:', nx.average_clustering(G))\n",
    "\n",
    "G_node = 'Mammal'\n",
    "print('Clustering for node', G_node, ':', nx.clustering(G, G_node))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "'**Path length**' is calculated as the number of hop jumps traversing two end nodes is a network. An '**average path length**' measures shortest paths over a graph and then averages them. A small number indicates a shorter, more easily navigated graph on average, but there can be much variance. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Average shortest path length:', nx.average_shortest_path_length(G))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next three measures throw an error, since KBpedia 'is not strongly connected.' '**Eccentricity**' is the maximum length between a node and its connecting nodes in a graph, with the '**diameter**' being the maximum eccentricity across all nodes and the '**radius**' being the minimum."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Eccentricity:', nx.eccentricity(G))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Diameter:', nx.diameter(G))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Radius:', nx.radius(G))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The algorithms that follow take longer to calculate or produce long listings. The first such measure we see is '**centrality**', which in NetworkX is the number of connections to a given node, with higher connectivity a proxy for importance. **Centrality** can be measured in many different ways; there are multiple options in NetworkX. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate different centrality measures\n",
    "print('Centrality:', nx.degree_centrality(G))\n",
    "print('Centrality (eigenvector):', nx.eigenvector_centrality(G))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('In-degree centrality:', nx.in_degree_centrality(G))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Out-degree centrality:', nx.out_degree_centrality(G))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are some longer analysis routines (unfortunately, betweenness takes hours to calculate):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate different centrality measures\n",
    "print('Betweenness:', nx.betweenness_centrality(G))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a directed graph, some NetworkX measures are not applicable. Here are some of them:\n",
    "- <code>nx.is_connected(G)</code>\n",
    "- <code>nx.connected_components(G)</code>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Subgraphs\n",
    "\n",
    "We earlier showed code for extracting a subgraph. Here is a generalized version of that function. Replace the 'Bird' reference concept with any other valid RC from KBpedia:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Provide label for current KBpedia reference concept\n",
    "rc = 'Bird'\n",
    "# Get the subgraph of all nodes around node\n",
    "sub = [ n[1] for n in G.edges(rc) ]\n",
    "# Actually, need to add the 'rc' node too\n",
    "sub.append(rc)\n",
    "#\n",
    "# Now create a new graph, which is the subgraph\n",
    "sg = nx.Graph(G.subgraph(sub))\n",
    "#\n",
    "# Print the nodes of the new subgraph and edges\n",
    "print('Subgraph nodes:', sg.nodes())\n",
    "print('Subgraph edges:', sg.edges())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DeepGraphs\n",
    "There is a notable utility package called [DeepGraphs](https://github.com/deepgraph/deepgraph) (and its [documentation](https://deepgraph.readthedocs.io/en/latest/index.html)) that appears to offer some nice partitioning and quick visualization options. I have not installed or tested it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Full Network Exchange\n",
    "\n",
    "So far, we have seen the use of networks in driving visualizations ([**CWPK #56**](https://www.mkbergman.com/2404/cwpk-56-graph-visualization-and-extraction/)) and, per above, as knowledge artifacts with their own unique characteristics and metrics. The next role we need to highlight for networks is as information providers and graph-based representations of structure and features to analytical applications and machine learners.\n",
    "\n",
    "NetworkX can [convert to and from other data formats](https://networkx.github.io/documentation/latest/reference/convert.html):\n",
    "  - [NumPy](https://en.wikipedia.org/wiki/NumPy) arrays\n",
    "  - [SciPy](https://en.wikipedia.org/wiki/SciPy) sparse matrices, or\n",
    "  - [pandas](https://en.wikipedia.org/wiki/Pandas_(software)) DataFrames.\n",
    "  \n",
    "All of these are attractive because [PyTorch](https://en.wikipedia.org/wiki/PyTorch) has direct routines for them.\n",
    "\n",
    "NetworkX can also [read and write graphs](https://networkx.github.io/documentation/stable/reference/readwrite/index.html) in multiple formats, some of which include:\n",
    "  - adjacency lists\n",
    "  - multiline adjacency lists\n",
    "  - edge lists\n",
    "  - [pickle](https://en.wikipedia.org/wiki/Serialization#Pickle)\n",
    "  - [JSON](https://en.wikipedia.org/wiki/JSON)\n",
    "  - [YAML](https://en.wikipedia.org/wiki/YAML)\n",
    "  - [GraphML](https://en.wikipedia.org/wiki/GraphML), and\n",
    "  - [GEFX](https://gephi.org/gexf/format/).\n",
    "  \n",
    "There are also standard NetworkX functions to convert node and edge labels to integers (such as <code>networkx.relabel.convert_node_labels_to_integers</code>), relabel nodes (<code>networkx.relabel.relabel_nodes</code>), set node attributes (<code>networkx.classes.function.set_node_attributes</code>), or make deep copies (<code>networkx.Graph.to_directed</code>).\n",
    "\n",
    "There are also certain packages that integrate well with NetworkX and PyTorch and related packages such as direct imports or exports to the [Deep Graph Library](https://www.dgl.ai/) (DGL) (see **CWPK #68** and **#69**), or [built-in converters](https://discuss.pytorch.org/t/pytorch-geometric/44994/4) or the [DeepSNAP](https://github.com/snap-stanford/deepsnap) package  may provide a direct bridge between NetworkX and [PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric) (PyG) (see **CWPK #68** and **#70**). \n",
    "\n",
    "However, these representations do **NOT** include the labeled information or annotations. Knowledge graphs, like KBpedia, have some unique aspects that are not fully captured by an existing package like NetworkX.\n",
    "\n",
    "Fortunately, the previous extract-and-build routines at the heart of this *Cooking with Python and KBpedia* series are based around CSV files, the same basis as the <code>pandas</code> package. Via <code>pandas</code> we can capture the ***structure*** of KBpedia, plus its labels and ***annotations***. Further, as we will see in the next installment, we can also capture full ***pages*** for most of these RCs in KBpedia from Wikipedia. This addition will greatly expand our context and feature basis for using KBpedia for machine learning.\n",
    "\n",
    "For now, I present below two of these three inputs, extracted directly from the KBpedia knowledge graph."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### KBpedia Structure\n",
    "The first of two extraction files useful to all further installments in this **Part VI** provides the structure of KBpedia. This structure consists of the hierarchical relation between reference concepts using the <code>subClassOf</code> subsumption relation and the assignment of that RC to a typology (<code>SuperType</code>). I first presented this routine in [**CWPK #56**](https://www.mkbergman.com/2404/cwpk-56-graph-visualization-and-extraction/) and it, indeed, captures the requisite structure of the graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### KEY CONFIG SETTINGS (see extract_deck in config.py) ###             \n",
    "# 'kb_src'        : 'standard'                                        # Set in master_deck\n",
    "# 'loop_list'     : kko_order_dict.values(),                          # Note 1   \n",
    "# 'base'          : 'C:/1-PythonProjects/kbpedia/v300/build_ins/mappings/',              \n",
    "# 'ext'           : '.csv',                                         \n",
    "# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/extractions/data/graph_specs.csv',\n",
    "\n",
    "def graph_extractor(**extract_deck):\n",
    "    print('Beginning graph structure extraction . . .')\n",
    "    loop_list = extract_deck.get('loop_list')\n",
    "    loop = extract_deck.get('loop')\n",
    "    class_loop = extract_deck.get('class_loop')\n",
    "    base = extract_deck.get('base')\n",
    "    ext = extract_deck.get('ext')\n",
    "    \n",
    "    # Note 2\n",
    "    parent_set = ['kko.SocialSystems','kko.Products','kko.Methodeutic','kko.Eukaryotes',\n",
    "              'kko.ConceptualSystems','kko.AVInfo','kko.Systems','kko.Places',\n",
    "              'kko.OrganicChemistry','kko.MediativeRelations','kko.LivingThings',\n",
    "              'kko.Information','kko.CopulativeRelations','kko.Artifacts','kko.Agents',\n",
    "              'kko.TimeTypes','kko.Symbolic','kko.SpaceTypes','kko.RepresentationTypes',\n",
    "              'kko.RelationTypes','kko.OrganicMatter','kko.NaturalMatter',\n",
    "              'kko.AttributeTypes','kko.Predications','kko.Manifestations',\n",
    "              'kko.Constituents']\n",
    "\n",
    "    if loop is not 'class_loop':\n",
    "        print(\"Needs to be a 'class_loop'; returning program.\")\n",
    "        return\n",
    "    header = ['target', 'source', 'weight', 'SuperType']\n",
    "    out_file = extract_deck.get('out_file')\n",
    "    cur_list = []\n",
    "    with open(out_file, mode='w', encoding='utf8', newline='') as output:                                           \n",
    "        csv_out = csv.writer(output)\n",
    "        csv_out.writerow(header)    \n",
    "        for value in loop_list:\n",
    "            print('   . . . processing', value)\n",
    "            s_set = []\n",
    "            root = eval(value)\n",
    "            s_set = root.descendants()\n",
    "            frag = value.replace('kko.','')\n",
    "            for s_item in s_set:\n",
    "                child_set = list(s_item.subclasses())\n",
    "                count = len(list(child_set))\n",
    "                \n",
    "# Note 3                \n",
    "                if value not in parent_set:\n",
    "                    for child_item in child_set:\n",
    "                        s_rc = str(s_item)\n",
    "                        child = str(child_item)\n",
    "                        new_pair = s_rc + child\n",
    "                        new_pair = str(new_pair)\n",
    "                        cur_list.append(new_pair)\n",
    "                        s_rc = s_rc.replace('rc.','')\n",
    "                        child = child.replace('rc.','')\n",
    "                        row_out = (s_rc,child,count,frag)\n",
    "                        csv_out.writerow(row_out)\n",
    "                elif value in parent_set:\n",
    "                    for child_item in child_set:\n",
    "                        s_rc = str(s_item)\n",
    "                        child = str(child_item)\n",
    "                        new_pair = s_rc + child\n",
    "                        new_pair = str(new_pair)\n",
    "                        if new_pair not in cur_list:\n",
    "                            cur_list.append(new_pair)\n",
    "                            s_rc = s_rc.replace('rc.','')\n",
    "                            child = child.replace('rc.','')\n",
    "                            row_out = (s_rc,child,count,frag)\n",
    "                            csv_out.writerow(row_out)\n",
    "                        elif new_pair in cur_list:\n",
    "                            continue\n",
    "        output.close()         \n",
    "        print('Processing is complete . . .')\n",
    "graph_extractor(**extract_deck)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note, again, the <code>parent_set</code> ordering of typology processing at the top of this function. This ordering processes the more distal (leaf) typologies first, and then ignores subsequent processing of identical structural relationships. This means that the graph structure is cleaner and all subsumption relations are \"pushed down\" to their most specific mention.\n",
    "    \n",
    "You can inspect the actual structure file produced using this routine, which is also the general basis for reading into various machine learners:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('C:/1-PythonProjects/kbpedia/v300/extractions/data/graph_specs.csv')\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### KBpedia Annotations\n",
    "And, we also need to bring in the annotation values. The annotation extraction routine was first presented and described in [**CWPK #30**](https://www.mkbergman.com/2365/cwpk-30-extracting-annotations/), and was subsequently generalized and brought into conformance with our configuration routines in [**CWPK #33**](https://www.mkbergman.com/2370/cwpk-33-a-python-package-part-i-the-annotation-extractor/). Note, for example, in the <code>header</code> definition, how we are able to handle either <code>classes</code> or <code>properties</code>. In this instance, plus all subsequent machine learning discussion, we concentrate on the labels and annotations for <code>classes</code>:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### KEY CONFIG SETTINGS (see extract_deck in config.py) ###                \n",
    "# 'krb_src'       : 'extract'                                          # Set in master_deck\n",
    "# 'descent_type'  : 'descent',\n",
    "# 'loop'          : 'class_loop',\n",
    "# 'loop_list'     : custom_dict.values(),                              # Single 'Generals' specified \n",
    "# 'out_file'      : 'C:/1-PythonProjects/kbpedia/v300/extractions/classes/Generals_annot_out.csv',\n",
    "# 'render'        : 'r_label',\n",
    "\n",
    "def annot_extractor(**extract_deck):\n",
    "    print('Beginning annotation extraction . . .') \n",
    "    r_default = ''\n",
    "    r_label = ''\n",
    "    r_iri = ''\n",
    "    render = extract_deck.get('render')\n",
    "    if render == 'r_default':\n",
    "        set_render_func(default_render_func)\n",
    "    elif render == 'r_label':\n",
    "        set_render_func(render_using_label)\n",
    "    elif render == 'r_iri':\n",
    "        set_render_func(render_using_iri)\n",
    "    else:\n",
    "        print('You have assigned an incorrect render method--execution stopping.')\n",
    "        return    \n",
    "    loop_list = extract_deck.get('loop_list')\n",
    "    loop = extract_deck.get('loop')\n",
    "    out_file = extract_deck.get('out_file')\n",
    "    class_loop = extract_deck.get('class_loop')\n",
    "    property_loop = extract_deck.get('property_loop')\n",
    "    descent_type = extract_deck.get('descent_type')\n",
    "    \"\"\" These are internal counters used in this module's methods \"\"\"\n",
    "    p_set = []\n",
    "    a_ser = []\n",
    "    x = 1\n",
    "    cur_list = []\n",
    "    with open(out_file, mode='w', encoding='utf8', newline='') as output:\n",
    "        csv_out = csv.writer(output)                                       \n",
    "        if loop == 'class_loop':                                             \n",
    "            header = ['id', 'prefLabel', 'subClassOf', 'altLabel', \n",
    "                      'definition', 'editorialNote', 'isDefinedBy', 'superClassOf']\n",
    "        else:\n",
    "            header = ['id', 'prefLabel', 'subPropertyOf', 'domain', 'range', \n",
    "                      'functional', 'altLabel', 'definition', 'editorialNote']\n",
    "        csv_out.writerow(header)    \n",
    "        for value in loop_list:                                            \n",
    "            print('   . . . processing', value)                                           \n",
    "            root = eval(value) \n",
    "            if descent_type == 'descent':\n",
    "                p_set = root.descendants()\n",
    "            elif descent_type == 'single':\n",
    "                a_set = root\n",
    "                p_set.append(a_set)\n",
    "            else:\n",
    "                print('You have assigned an incorrect descent method--execution stopping.')\n",
    "                return    \n",
    "            for p_item in p_set:\n",
    "                if p_item not in cur_list:                                 \n",
    "                    a_pref = p_item.prefLabel\n",
    "                    a_pref = str(a_pref)[1:-1].strip('\"\\'')                \n",
    "                    a_sub = p_item.is_a\n",
    "                    for a_id, a in enumerate(a_sub):                        \n",
    "                        a_item = str(a)\n",
    "                        if a_id > 0:\n",
    "                            a_item = a_sub + '||' + str(a)\n",
    "                        a_sub  = a_item\n",
    "                    if loop == 'property_loop':   \n",
    "                        a_item = ''\n",
    "                        a_dom = p_item.domain\n",
    "                        for a_id, a in enumerate(a_dom):\n",
    "                            a_item = str(a)\n",
    "                            if a_id > 0:\n",
    "                                a_item = a_dom + '||' + str(a)\n",
    "                            a_dom  = a_item    \n",
    "                        a_dom = a_item\n",
    "                        a_rng = p_item.range\n",
    "                        a_rng = str(a_rng)[1:-1]\n",
    "                        a_func = ''\n",
    "                    a_item = ''\n",
    "                    a_alt = p_item.altLabel\n",
    "                    for a_id, a in enumerate(a_alt):\n",
    "                        a_item = str(a)\n",
    "                        if a_id > 0:\n",
    "                            a_item = a_alt + '||' + str(a)\n",
    "                        a_alt  = a_item    \n",
    "                    a_alt = a_item\n",
    "                    a_def = p_item.definition\n",
    "                    a_def = str(a_def)[2:-2]\n",
    "                    a_note = p_item.editorialNote\n",
    "                    a_note = str(a_note)[1:-1]\n",
    "                    if loop == 'class_loop':                                  \n",
    "                        a_isby = p_item.isDefinedBy\n",
    "                        a_isby = str(a_isby)[2:-2]\n",
    "                        a_isby = a_isby + '/'\n",
    "                        a_item = ''\n",
    "                        a_super = p_item.superClassOf\n",
    "                        for a_id, a in enumerate(a_super):\n",
    "                            a_item = str(a)\n",
    "                            if a_id > 0:\n",
    "                                a_item = a_super + '||' + str(a)\n",
    "                            a_super = a_item    \n",
    "                        a_super  = a_item\n",
    "                    if loop == 'class_loop':                                  \n",
    "                        row_out = (p_item,a_pref,a_sub,a_alt,a_def,a_note,a_isby,a_super)\n",
    "                    else:\n",
    "                        row_out = (p_item,a_pref,a_sub,a_dom,a_rng,a_func,\n",
    "                                   a_alt,a_def,a_note)\n",
    "                    csv_out.writerow(row_out)                               \n",
    "                    cur_list.append(p_item)\n",
    "                    x = x + 1\n",
    "    print('Total unique IDs written to file:', x)  \n",
    "    print('The annotation extraction for the', loop, 'is completed.') "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can inspect this actual file of labels and annotations using this routine:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('C:/1-PythonProjects/kbpedia/v300/extractions/classes/Generals_annot_out.csv')\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will add Wikipedia **pages** as a third source for informing our machine learning tests and experiments in our next installment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Untested Potentials\n",
    "One area in extended NetworkX capabilities that we do not test here is [community structure](https://en.wikipedia.org/wiki/Community_structure) using the [Louvain Community Detection](https://github.com/taynaud/python-louvain/) package."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Additional Documentation\n",
    "Here are additional resources on network analysis and NetworkX:\n",
    "\n",
    "- [Exploring and Analyzing Network Data with Python](https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python#metrics-available-in-networkx) is a good description of basic measures \n",
    "- [Analysis of Large-Scale Networks: NetworkX](http://www.jponnela.com/dev/wp-content/uploads/2014/02/networkx.pdf) provides good examples and explanations\n",
    "- [Graph Algorithms: Graph Analysis and Graph Learning](https://maelfabien.github.io/machinelearning/graph_3/) are nice examples and drawings\n",
    "- [Network Analysis with Python and NetworkX Cheat Sheet](https://cheatography.com/murenei/cheat-sheets/network-analysis-with-python-and-networkx/)\n",
    "- [Functions to convert NetworkX graphs to and from other formats](http://home.iitk.ac.in/~mudgal/cs365/hw3/code/networkx-1.9.1/networkx/convert.py)\n",
    "- [Knowledge Graph Representation with Jointly Structural and Textual Encoding](https://arxiv.org/pdf/1611.08661.pdf) reflects the structure + text nature of knowledge graphs.\n",
    "\n",
    "\n",
    " <div style=\"background-color:#ffecec; border:1px dotted #f5aca6; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "  <span style=\"font-weight: bold;\">NOTE:</span> This article is part of the <a href=\"https://www.mkbergman.com/cooking-with-python-and-kbpedia/\" style=\"font-style: italic;\">Cooking with Python and KBpedia</a> series. See the <a href=\"https://www.mkbergman.com/cooking-with-python-and-kbpedia/\"><strong>CWPK</strong> listing</a> for other articles in the series. <a href=\"http://kbpedia.org/\">KBpedia</a> has its own Web site. The <em>cowpoke</em> Python <a href=\"https://github.com/Cognonto/cowpoke\">code listing covering the series</a> is also available from GitHub.\n",
    "  </div>\n",
    "\n",
    "<div style=\"background-color:#ebf8e2; border:1px dotted #71c837; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "\n",
    "<span style=\"font-weight: bold;\">NOTE:</span> This <strong>CWPK \n",
    "installment</strong> is available both as an online interactive\n",
    "file <a href=\"https://mybinder.org/v2/gh/Cognonto/CWPK/master\" ><img src=\"https://mybinder.org/badge_logo.svg\" style=\"display:inline-block; vertical-align: middle;\" /></a> or as a <a href=\"https://github.com/Cognonto/CWPK\" title=\"CWPK notebook\" alt=\"CWPK notebook\">direct download</a> to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the <code>*.ipynb</code> file. It may take a bit of time for the interactive option to load.</div>\n",
    "\n",
    "<div style=\"background-color:#feeedc; border:1px dotted #f7941d; vertical-align:middle; margin:15px 60px; padding:8px;\"> \n",
    "<div style=\"float: left; margin-right: 5px;\"><img src=\"http://kbpedia.org/cwpk-files/warning.png\" title=\"Caution!\" width=\"32\" /></div>I am at best an amateur with Python. There are likely more efficient methods for coding these steps than what I provide. I encourage you to experiment -- which is part of the fun of Python -- and to <a href=\"mailto:mike@mkbergman.com\">notify me</a> should you make improvements.    \n",
    "\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}