{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "CWPK \\#61: NLP, Machine Learning and Analysis\n", "=======================================\n", "\n", "A Wealth of Applications Sets the Stage for Pay Offs from KBpedia\n", "--------------------------\n", "\n", "
datasets.load_files
format that may be suitable for transferring many and longer text fields. One option that is intriguing is how to leverage the CSV flat-file orientation of our KG build and extract routines in *cowpoke* for data transfer and transformation.\n",
"\n",
"I also want to keep an eye on the possible use of [skorch](https://github.com/skorch-dev/skorch) to better integrate with the overall PyTorch environment, or to add perhaps needed and missing functionality or ease of development. There is much to explore with these various packages and environments.\n",
"\n",
"#### DGL-KE\n",
"For our basic, 'vanilla', deep graph analysis package we have chosen the eponymous [Deep Graph Library](https://www.dgl.ai/) for basic graph neural network operations, which may run on CPU or GPU machines or clusters. The better interface relevant to KBpedia is through [DGL-KE](https://github.com/awslabs/dgl-ke), a high performance, reportedly easy-to-use, and scalable package for learning large-scale knowledge graph embeddings that extends DGL. DGL-KE also comes configured with the popular models of TransE, TransR, RESCAL, DistMult, ComplEx, and RotatE.\n",
"\n",
"#### PyTorch Geometric\n",
"[PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric) is closely tied to PyTorch, and most impressively has uniform wrappers to about 40 state-of-art graph neural net methods. The idea of \"message passing\" in the approach means that heterogeneous features such as structure and text may be combined and made dynamic in their interactions with one another. Many of these intrigued me on paper, and now it will be exciting to test and have the capability to inspect these new methods as they arise. [DeepSNAP](https://github.com/snap-stanford/deepsnap) may provide a direct bridge between NetworkX and PyTorch Geometric."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Possible Future Extensions\n",
"During the research on this **Part VI** I encountered a few leads that are either not ready for prime time or are off scope to the present **CWPK** series. A potentially powerful, but experimental approach that makes sense is to use SPARQL as the request-and-retrieval mechanism against the graph to feed the machine learners. [RDFFrames](https://ui.adsabs.harvard.edu/link_gateway/2020arXiv200203614M/arxiv:2002.03614) provides an imperative Python API that gets internally translated to SPARQL, and it is integrated with the PyData machine learning software stack; see [GitHub](https://github.com/qcri/rdfframes). Some methods above also use SPARQL. One of the benefits of a SPARQL approach, besides its sheer query and inferencing power, is the ability to keep the knowledge graph intact without data transform pipelines. It is quite available to serve up results in very flexible formats. The relative immaturity of the approach and performance considerations may be difficult challenges to overcome.\n",
"\n",
"I earlier mentioned [KarateClub](https://github.com/benedekrozemberczki/karateclub), a Python framework combining about 40 state-of-the-art unsupervised graph mining algorithms in the areas of node embedding, whole-graph embedding, and community detection. It builds on the packages of NetworkX, [PyGSP](https://pygsp.readthedocs.io/en/stable/), Gensim, NumPy, and SciPy. Unfortunately, the package does not support directed graphs, though plans to do so have been stated. This project is worth monitoring.\n",
"\n",
"A third intriguing area involves the use of [quaternions](https://arxiv.org/pdf/1904.10281v3.pdf) based on [Clifford algebras](https://en.wikipedia.org/wiki/Clifford_algebra) in their machine learning codes. [Charles Peirce](https://en.wikipedia.org/wiki/Charles_Sanders_Peirce), the intellectual guide for the design of KBpedia, was a mathematician of some renown in his own time, and studied and applauded [William Kingdon Clifford](https://en.wikipedia.org/wiki/William_Kingdon_Clifford) and his emerging algebra as a contemporary in the 1870s, shortly before Clifford's untimely death. Peirce scholars have often pointed to this influence in the development of Peirce's own algebras. I am personally interested in probing this approach to learn a bit more of Peirce's manifest interests."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Organization of This Part's Installments\n",
"These selections and the emphasis on our four areas lead to these anticipated **CWPK** installments over the coming weeks:\n",
"\n",
"- **CWPK #61 - NLP, Machine Learning and Analysis**\n",
"- **CWPK #62 - Network and Graph Analysis**\n",
"- **CWPK #63 - Staging Data Sci Resources and Preprocessing**\n",
"- **CWPK #64 - Embedding, NLP Analysis, and Entity Recognition**\n",
"- **CWPK #65 - scikit-learn Basics and Initial Analyses**\n",
"- **CWPK #66 - scikit-learn Classifiers**\n",
"- **CWPK #67 - Knowledge Graph Embedding Models**\n",
"- **CWPK #68 - Setting Up and Configuring the Deep Graph Learners**\n",
"- **CWPK #69 - DGL-KE Classifiers**\n",
"- **CWPK #70 - State-of-Art PyG 2 Classifiers**\n",
"- **CWPK #71 - A Comparison of Results**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional Documentation\n",
"Here are some general resources:\n",
"\n",
"- [Natural Language Processing Recipes: Best Practices and Examples](https://www.kdnuggets.com/2020/05/natural-language-processing-recipes-best-practices-examples.html) - nice set of NLP notebooks\n",
"- Another set of notebooks https://aihub.cloud.google.com/s?category=notebook\n",
"- [Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence](https://arxiv.org/abs/2002.04803) see Figure 1 for a possible Python architecture diagram; 48 pp and more of an academic overview. has many links to GitHub projects\n",
"- [An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec](https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/) a pretty comprehensive overview from 2017\n",
"- [Python Machine Learning Tutorials](https://realpython.com/tutorials/machine-learning/) - an overview of ML tutorials from a generally good source, [Real Python](https://realpython.com/)\n",
"- [PyTorch vs. TensorFlow – A Detailed Comparison](https://www.tooploox.com/blog/pytorch-vs-tensorflow-a-detailed-comparison) - a balanced and fair comparison of the two frameworks.\n",
"\n",
"#### Network Representational Learning\n",
"- [Machine Learning on Graphs: A Model and Comprehensive Taxonomy](https://arxiv.org/pdf/2005.03675.pdf) - the goal of this survey is to provide a unified view of representation learning methods for graph-structured data, to better understand the different ways to leverage graph structure in deep learning models; see [GitHub GCNN TensorFlow implementation](https://github.com/google/gcnn-survey-paper)\n",
"- [Awesome Graph Classification](https://github.com/benedekrozemberczki/awesome-graph-classification) - a collection of graph classification methods, covering embedding, deep learning, graph kernel and factorization papers\n",
"- [A Comprehensive Comparison of Unsupervised Network Representation Learning Methods](https://arxiv.org/pdf/1903.07902v2.pdf) - a comparison of unsupervised only and not attentive to heterogeneous networks.\n",
"\n",
"#### Knowledge Graph Representational Learning\n",
"- [A Review of Relational Machine Learning for Knowledge Graphs](https://arxiv.org/pdf/1503.00759.pdf)(2015); one of the first to focus on the space\n",
"- [Awesome Graph Representation Learning](https://github.com/ky-zhang/awesome-graph-representation-learning) - a curated list of awesome graph representation learning\n",
"- [A Survey on Knowledge Graphs: Representation, Acquisition and Applications](https://arxiv.org/pdf/2002.00388v2.pdf) from August 2020\n",
"- [Heterogeneous Network Representation Learning: Survey and Benchmark](https://arxiv.org/pdf/2004.00216.pdf) best paper for understanding the challenges of heterogeneous network embeddings\n",
"- [Knowledge Graph Embedding: A Survey of Approaches and Applications](https://mnick.github.io/project/knowledge-graph-embeddings/) is an overview of embedding models of entities and relationships for knowledge base completion\n",
"- [Introduction to Geometric Deep Learning](https://blog.paperspace.com/introduction-to-geometric-deep-learning/) \"GDL also shines in applications where the use of graphs is more common, like knowledge graphs.\"\n",
"- [Geometric Deep Learning Library Comparison](https://blog.paperspace.com/geometric-deep-learning-framework-comparison/) follow-on to the above paper. Compares PyTorch Geometric, Deep Graph Library, Graph Nets\n",
"- [RDF2Vec Light -- A Lightweight Approach for Knowledge Graph Embeddings](https://arxiv.org/abs/2009.07659) allows to train partial, task-specific models withonly a fraction of the computation requirements compared to other embedding ap-proaches, while retaining a high performance on multiple tasks."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*.ipynb
file. It may take a bit of time for the interactive option to load.