{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Process Synonyms\n", "\n", "This notebook uses a combination of Python data science libraries and the Google Natural Language API (machine learning) to expand the vocabulary of the chatbot by generating synonyms for topics created in the previous notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "!pip uninstall -y google-cloud-datastore" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "!pip install google-cloud-datastore" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "!pip install inflect" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hit Reset Session > Restart, then resume with the following cells. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "# Only need to do this once...\n", "import nltk\n", "nltk.download('stopwords')\n", "nltk.download('wordnet')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "from nltk.corpus import stopwords\n", "stop = set(stopwords.words('english'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "from google.cloud import datastore" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "datastore_client = datastore.Client()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "client = datastore.Client()\n", "query = client.query(kind='Topic')\n", "results = list(query.fetch())" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "import inflect\n", "plurals = inflect.engine()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Extract Synonyms with Python\n", "Split the topic into words and use PyDictionary to look up synonyms in a \"thesaurus\" for each word. Store these in Datastore and link them back to the topic. Note this section uses the concept of \"stop words\" to filter out articles and other parts of speech that don't contribute to meaning of the topic." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "from nltk.corpus import wordnet\n", "from sets import Set\n", "\n", "for result in results:\n", " for word in result.key.name.split():\n", " \n", " if word in stop:\n", " continue\n", "\n", " \n", " synonyms = Set()\n", " for syn in wordnet.synsets(word):\n", " \n", " if \".n.\" in str(syn):\n", "\n", " for l in syn.lemmas():\n", " lemma = l.name()\n", " if (lemma.isalpha()):\n", " synonyms.add(lemma)\n", " synonyms.add(plurals.plural(lemma))\n", " \n", " if \".a.\" in str(syn):\n", " synonyms = Set()\n", " break\n", "\n", " print result.key.name, word, synonyms\n", " \n", " kind = 'Synonym'\n", " synonym_key = datastore_client.key(kind, result.key.name)\n", "\n", " synonym = datastore.Entity(key=synonym_key)\n", " synonym['synonym'] = result.key.name\n", "\n", " datastore_client.put(synonym)\n", " \n", " synonym_key = datastore_client.key(kind, word)\n", "\n", " synonym = datastore.Entity(key=synonym_key)\n", " synonym['synonym'] = result.key.name\n", "\n", " datastore_client.put(synonym)\n", " \n", " for dictionary_synonym in synonyms:\n", " \n", " synonym_key = datastore_client.key(kind, dictionary_synonym)\n", "\n", " synonym = datastore.Entity(key=synonym_key)\n", " synonym['synonym'] = result.key.name\n", "\n", " datastore_client.put(synonym)\n", " \n", " synonym_key = datastore_client.key(kind, plurals.plural(word))\n", "\n", " synonym = datastore.Entity(key=synonym_key)\n", " synonym['synonym'] = result.key.name\n", "\n", " datastore_client.put(synonym)\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 2 }