{ "cells": [ { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 1 Business Problem" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Employers are always striving to motivate, and create a pleasant work environment for their team members, with the goal of increasing productivity level, while maintaining strong employee retention. It's also not a coincidence that each year, Employers are competing to land on the top 100 rankings such as \"[Canada's Top 100 Employers](https://www.canadastop100.com/)\" and \"[Great Place To Work.](https://www.greatplacetowork.ca)\"\n", "\n", "In order to evaluate the quality of each Employer, we need to analyze Employer Reviews written by both former and current Employees to determine the results. Luckily, \"[Glassdoor](https://www.glassdoor.ca)\" was created for this reason, which gives an inside scope of each Employer. By understanding the main topics in each Employer Reviews, Employers can then make adjustment to improve their work environment, which ultimately improves Employee productivity/retention.\n", "\n", "However, some Employers have hundreds and thousands of reviews, which can take up a lot of time and resource to complete before determining the results.\n", "
\n", "
\n", "\n", "**Business Solutions:**\n", "\n", "To solve this issue, we will extract the main topics from all Employer Reviews for each Employer, and then determine the overall consensus.\n", "\n", "We will perform an unsupervised learning algorithm in Topic Modeling, which uses Latent Dirichlet Allocation (LDA) Model, and LDA Mallet (Machine Learning Language Toolkit) Model.\n", "\n", "We will also determine the dominant topic associated to each Employee Reviews, as well as determining the Employee Reviews for each dominant topics for an in-depth analysis.\n", "
\n", "
\n", "\n", "**Benefits:**\n", "- Efficiently determine the main topics of Employer Reviews\n", "- Increase Employee productivity/retention by improving work environments based on topics from Employer Reviews\n", "- Conveniently determine the topics of each review\n", "- Extract detailed information by determining the most relevant review for each topic \n", "
\n", "
\n", "\n", "**Robustness:**\n", "\n", "To ensure the model performs well, we will take the following steps:\n", "- Run the LDA Model and the LDA Mallet Model to compare the performances of each model\n", "- Run the LDA Mallet Model and optimize the number of topics in the Employer Reviews by choosing the optimal model with highest performance\n", "\n", "Note that the main different between LDA Model vs. LDA Mallet Model is that, LDA Model uses Variational Bayes method, which is faster, but less precise than LDA Mallet Model which uses Gibbs Sampling. \n", "
\n", "
\n", "\n", "**Assumption:**\n", "- To save computation power and time, we have taken a sample size of 500 for each Employer, and assuming that this dataset is sufficient to capture the topics in the Employer Reviews\n", "- We're also assuming that the results in this model is applicable in the same way, as if the model were applied on an entire population of the Employer Reviews dataset, with the exception of few parameter tweaks \n", "
\n", "
\n", "\n", "**Future:**\n", "\n", "This model is Part Two of the \"[Quality Control for Banking using LDA and LDA Mallet,](https://nbviewer.jupyter.org/github/mick-zhang/Quality-Control-for-Banking-using-LDA-and-LDA-Mallet/blob/master/Topic%20Bank%20Github.ipynb?flush_cache=true)\" where we're able to showcase information on Employer Reviews with full visualization of the results." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 2 Data Overview" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0companylocationdatesjob-titlesummaryprosconsadvice-to-mgmtoverall-ratingswork-balance-starsculture-values-starscarrer-opportunities-starscomp-benefit-starssenior-mangemnet-starshelpful-countlink
01googlenoneDec 11, 2018Current Employee - Anonymous EmployeeBest Company to work forPeople are smart and friendlyBureaucracy is slowing things downnone5.04.05.05.04.05.00https://www.glassdoor.com/Reviews/Google-Revie...
12googleMountain View, CAJun 21, 2013Former Employee - Program ManagerMoving at the speed of light, burn out is inev...1) Food, food, food. 15+ cafes on main campus ...1) Work/life balance. What balance? All those ...1) Don't dismiss emotional intelligence and ad...4.02.03.03.05.03.02094https://www.glassdoor.com/Reviews/Google-Revie...
23googleNew York, NYMay 10, 2014Current Employee - Software Engineer IIIGreat balance between big-company security and...* If you're a software engineer, you're among ...* It *is* becoming larger, and with it comes g...Keep the focus on the user. Everything else wi...5.05.04.05.05.04.0949https://www.glassdoor.com/Reviews/Google-Revie...
34googleMountain View, CAFeb 8, 2015Current Employee - Anonymous EmployeeThe best place I've worked and also the most d...You can't find a more well-regarded company th...I live in SF so the commute can take between 1...Keep on NOT micromanaging - that is a huge ben...5.02.05.05.04.05.0498https://www.glassdoor.com/Reviews/Google-Revie...
45googleLos Angeles, CAJul 19, 2018Former Employee - Software EngineerUnique, one of a kind dream jobGoogle is a world of its own. At every other c...If you don't work in MTV (HQ), you will be giv...Promote managers into management for their man...5.05.05.05.05.05.049https://www.glassdoor.com/Reviews/Google-Revie...
\n", "
" ], "text/plain": [ " Unnamed: 0 company location dates \\\n", "0 1 google none Dec 11, 2018 \n", "1 2 google Mountain View, CA Jun 21, 2013 \n", "2 3 google New York, NY May 10, 2014 \n", "3 4 google Mountain View, CA Feb 8, 2015 \n", "4 5 google Los Angeles, CA Jul 19, 2018 \n", "\n", " job-title \\\n", "0 Current Employee - Anonymous Employee \n", "1 Former Employee - Program Manager \n", "2 Current Employee - Software Engineer III \n", "3 Current Employee - Anonymous Employee \n", "4 Former Employee - Software Engineer \n", "\n", " summary \\\n", "0 Best Company to work for \n", "1 Moving at the speed of light, burn out is inev... \n", "2 Great balance between big-company security and... \n", "3 The best place I've worked and also the most d... \n", "4 Unique, one of a kind dream job \n", "\n", " pros \\\n", "0 People are smart and friendly \n", "1 1) Food, food, food. 15+ cafes on main campus ... \n", "2 * If you're a software engineer, you're among ... \n", "3 You can't find a more well-regarded company th... \n", "4 Google is a world of its own. At every other c... \n", "\n", " cons \\\n", "0 Bureaucracy is slowing things down \n", "1 1) Work/life balance. What balance? All those ... \n", "2 * It *is* becoming larger, and with it comes g... \n", "3 I live in SF so the commute can take between 1... \n", "4 If you don't work in MTV (HQ), you will be giv... \n", "\n", " advice-to-mgmt overall-ratings \\\n", "0 none 5.0 \n", "1 1) Don't dismiss emotional intelligence and ad... 4.0 \n", "2 Keep the focus on the user. Everything else wi... 5.0 \n", "3 Keep on NOT micromanaging - that is a huge ben... 5.0 \n", "4 Promote managers into management for their man... 5.0 \n", "\n", " work-balance-stars culture-values-stars carrer-opportunities-stars \\\n", "0 4.0 5.0 5.0 \n", "1 2.0 3.0 3.0 \n", "2 5.0 4.0 5.0 \n", "3 2.0 5.0 5.0 \n", "4 5.0 5.0 5.0 \n", "\n", " comp-benefit-stars senior-mangemnet-stars helpful-count \\\n", "0 4.0 5.0 0 \n", "1 5.0 3.0 2094 \n", "2 5.0 4.0 949 \n", "3 4.0 5.0 498 \n", "4 5.0 5.0 49 \n", "\n", " link \n", "0 https://www.glassdoor.com/Reviews/Google-Revie... \n", "1 https://www.glassdoor.com/Reviews/Google-Revie... \n", "2 https://www.glassdoor.com/Reviews/Google-Revie... \n", "3 https://www.glassdoor.com/Reviews/Google-Revie... \n", "4 https://www.glassdoor.com/Reviews/Google-Revie... " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "csv = (\"employee_reviews.csv\")\n", "df = pd.read_csv(csv, encoding='latin1') # Solves enocding issue when importing csv\n", "df.head(5)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "hidden": true, "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "array(['google', 'amazon', 'facebook', 'netflix', 'apple', 'microsoft'],\n", " dtype=object)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['company'].unique()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "After importing the data, we see that the \"summary\" column is where the Employer Reviews are for each Employer. This is the column that we are going to use for extracting topics.\n", "\n", "Also, we see that there are 5 different Employers under the \"company\" column. As a result, we will review only the first company to capture the results of the Employer Reviews.\n", "\n", "Note: The same steps that we will use for the first Employer can be replicated for other Employers." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 3 Topics Analysis for Google" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "hidden": true, "scrolled": true }, "outputs": [], "source": [ "# Filters to Google only\n", "dfg = df[df['company'] == 'google']\n", "\n", "# Filters the data to the column needed for topic modeling\n", "dfg = dfg[['summary']]\n", "\n", "# Use the first 500 sample size\n", "dfg = dfg.head(500)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Here we have filtered the dataset for the first Employer. Next we filtered the \"summary\" column for Employer Reviews. Lastly, we reduced the size of the sample to 500 to save computation time and power." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 4 Data Cleaning" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We will use regular expressions to clean out any unfavorable characters in our dataset, and then preview what the data looks like after the cleaning." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Best Company to work for']\n" ] } ], "source": [ "data = dfg['summary'].values.tolist() # convert to list\n", "\n", "# Use Regex to remove all characters except letters and space\n", "import re\n", "data = [re.sub(r'[^a-zA-Z ]+', '', str(sent)) for sent in data] \n", "\n", "# Preview the first list of the cleaned data\n", "from pprint import pprint\n", "pprint(data[:1])" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 5 Pre-Processing" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "With our data now cleaned, the next step is to pre-process our data so that it can used as an input for our LDA model.\n", "\n", "We will perform the following:\n", "- Breakdown each sentences into a list of words through Tokenization by using Gensim's `simple_preprocess`\n", "- Additional cleaning by converting text into lowercase, and removing punctuations by using Gensim's `simple_preprocess` once again\n", "- Remove stopwords (words that carry no meaning such as to, the, etc) by using NLTK's `corpus.stopwords`\n", "- Apply Bigram and Trigram model for words that occurs together (ie. warrant_proceeding, there_isnt_enough) by using Gensim's `models.phrases.Phraser`\n", "- Transform words to their root words (ie. walking to walk, mice to mouse) by Lemmatizing the text using `spacy.load(en)` which is Spacy's English dictionary" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['good', 'company', 'work']]\n" ] } ], "source": [ "# Implement simple_preprocess for Tokenization and additional cleaning\n", "import gensim\n", "from gensim.utils import simple_preprocess \n", "def sent_to_words(sentences):\n", " for sentence in sentences:\n", " yield(gensim.utils.simple_preprocess(str(sentence), deacc=True)) # deacc=True removes punctuations \n", "data_words = list(sent_to_words(data))\n", "\n", "\n", "# Remove stopwords using gensim's simple_preprocess and NLTK's stopwords\n", "from nltk.corpus import stopwords\n", "stop_words = stopwords.words('english')\n", "stop_words.extend(['from', 'subject', 're', 'edu', 'use']) # Add additional stop words\n", "def remove_stopwords(texts):\n", " return [[word for word in simple_preprocess(str(doc)) if word not in stop_words] for doc in texts] \n", "data_words_nostops = remove_stopwords(data_words)\n", "\n", "\n", "# Create and Apply Bigrams and Trigrams\n", "bigram = gensim.models.Phrases(data_words, min_count=5, threshold=100) # Higher threshold fewer phrases\n", "trigram = gensim.models.Phrases(bigram[data_words], threshold=100\n", "bigram_mod = gensim.models.phrases.Phraser(bigram) # Faster way to get a sentence into a trigram/bigram\n", "trigram_mod = gensim.models.phrases.Phraser(trigram)\n", "def make_trigram(texts):\n", " return [trigram_mod[bigram_mod[doc]] for doc in texts]\n", "data_words_trigrams = make_trigram(data_words_nostops)\n", " \n", "\n", "# Lemmatize the data\n", "import spacy\n", "nlp = spacy.load('en', disable=['parser', 'ner'])\n", "def lemmatization(texts, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV']):\n", " texts_out = []\n", " for sent in texts:\n", " doc = nlp(\" \".join(sent)) # Adds English dictionary from Spacy\n", " texts_out.append([token.lemma_ for token in doc if token.pos_ in allowed_postags])\n", " # lemma_ is base form and pos_ is lose part\n", " return texts_out\n", "data_lemmatized = lemmatization(data_words_trigrams, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV'])\n", "\n", " \n", "# Preview the data \n", "print(data_lemmatized[:1])" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Here we are able to see texts that are Tokenized, Cleaned (stopwords removed), Lemmatized with applicable bigram and trigrams." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 6 Prepare Dictionary and Corpus" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Now that our data have been cleaned and pre-processed, here are the final steps that we need to implement before our data is ready for LDA input:\n", "- Create a dictionary from our pre-processed data using Gensim's `corpora.Dictionary`\n", "- Create a corpus by applying \"term frequency\" (word count) to our \"pre-processed data dictionary\" using Gensim's `.doc2bow`" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[(0, 1), (1, 1), (2, 1)]]\n" ] } ], "source": [ "import gensim.corpora as corpora\n", "id2word = corpora.Dictionary(data_lemmatized) # Create dictionary\n", "texts = data_lemmatized # Create corpus\n", "corpus = [id2word.doc2bow(text) for text in texts] # Apply Term Frequency\n", "print(corpus[:1]) # Preview the data" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We can see that our corpus is a list of every word in an index form followed by count frequency." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "'company'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id2word[0]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We can also see the actual word of each index by calling the index from our pre-processed data dictionary." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "[[('company', 1), ('good', 1), ('work', 1)]]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[[(id2word[id], freq) for id, freq in cp] for cp in corpus[:1]]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Lastly, we can see the list of every word in actual word (instead of index form) followed by their count frequency using a simple `for` loop.\n", "\n", "Now that we have created our dictionary and corpus, we can feed the data into our LDA Model." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 7 LDA Model" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "**Latent (hidden) Dirichlet Allocation** is a generative probabilistic model of a documents (composites) made up of words (parts). The model is based on the probability of words when selecting (sampling) topics (category), and the probability of topics when selecting a document.\n", "\n", "Essentially, we are extracting topics in documents by looking at the probability of words to determine the topics, and then the probability of topics to determine the documents. \n", "\n", "There are two LDA algorithms. The **Variational Bayes** is used by Gensim's **LDA Model**, while **Gibb's Sampling** is used by **LDA Mallet Model** using Gensim's Wrapper package.\n", "\n", "Here is the general overview of Variational Bayes and Gibbs Sampling:\n", "- **Variational Bayes**\n", " - Sampling the variations between, and within each word (part or variable) to determine which topic it belongs to (but some variations cannot be explained)\n", " - Fast but less accurate\n", "- **Gibb's Sampling (Markov Chain Monte Carlos)**\n", " - Sampling one variable at a time, conditional upon all other variables\n", " - Slow but more accurate" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(0,\n", " '0.102*\"google\" + 0.071*\"great\" + 0.036*\"sale\" + 0.029*\"job\" + 0.024*\"pay\" + '\n", " '0.023*\"love\" + 0.022*\"perk\" + 0.022*\"balance\" + 0.020*\"internship\" + '\n", " '0.018*\"business\"'),\n", " (1,\n", " '0.054*\"experience\" + 0.040*\"lead\" + 0.023*\"grad\" + 0.023*\"associate\" + '\n", " '0.020*\"analytical\" + 0.019*\"designer\" + 0.017*\"strategist\" + 0.011*\"brand\" '\n", " '+ 0.010*\"solution\" + 0.010*\"interactive\"'),\n", " (2,\n", " '0.172*\"manager\" + 0.063*\"account\" + 0.044*\"program\" + 0.041*\"senior\" + '\n", " '0.038*\"product\" + 0.025*\"marketing\" + 0.024*\"culture\" + 0.022*\"bad\" + '\n", " '0.020*\"staff\" + 0.017*\"new\"'),\n", " (3,\n", " '0.233*\"engineer\" + 0.206*\"software\" + 0.071*\"review\" + 0.022*\"cloud\" + '\n", " '0.016*\"engineering\" + 0.015*\"developer\" + 0.014*\"pgm\" + 0.013*\"need\" + '\n", " '0.013*\"senior\" + 0.011*\"legal\"'),\n", " (4,\n", " '0.166*\"work\" + 0.143*\"good\" + 0.141*\"place\" + 0.139*\"great\" + '\n", " '0.112*\"company\" + 0.013*\"career\" + 0.012*\"awesome\" + 0.011*\"excellent\" + '\n", " '0.008*\"start\" + 0.008*\"stuff\"'),\n", " (5,\n", " '0.070*\"intern\" + 0.067*\"amazing\" + 0.055*\"analyst\" + 0.027*\"technical\" + '\n", " '0.025*\"specialist\" + 0.020*\"long\" + 0.017*\"skill\" + 0.017*\"term\" + '\n", " '0.017*\"come\" + 0.013*\"sure\"'),\n", " (6,\n", " '0.123*\"great\" + 0.056*\"people\" + 0.044*\"benefit\" + 0.029*\"culture\" + '\n", " '0.027*\"overall\" + 0.023*\"director\" + 0.023*\"meh\" + 0.021*\"lot\" + '\n", " '0.020*\"many\" + 0.019*\"time\"')]\n" ] } ], "source": [ "# Build LDA Model\n", "lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=id2word, num_topics = 7, random_state = 100,\n", " update_every = 1, chunksize = 100, passes = 10, alpha = 'auto',\n", " per_word_topics=True) # Here we selected 7 topics\n", "pprint(lda_model.print_topics())\n", "doc_lda = lda_model[corpus]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "After building the LDA Model using Gensim, we display the 7 topics in our document along with the top 10 keywords and their corresponding weights that makes up each topic." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 8 LDA Model Performance" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Perplexity: -5.49701415002346\n", "\n", "Coherence Score: 0.6258108598839495\n" ] } ], "source": [ "# Compute perplexity\n", "print('Perplexity: ', lda_model.log_perplexity(corpus))\n", "\n", "# Compute coherence score\n", "from gensim.models import CoherenceModel\n", "coherence_model_lda = CoherenceModel(model=lda_model, texts=data_lemmatized, dictionary=id2word, coherence='c_v')\n", "coherence_lda = coherence_model_lda.get_coherence()\n", "print('Coherence Score: ', coherence_lda)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "In order to determine the accuracy of the topics that we used, we will compute the Perplexity Score and the Coherence Score. The Perplexity score measures how well the LDA Model predicts the sample (the lower the perplexity score, the better the model predicts). The Coherence score measures the quality of the topics that were learned (the higher the coherence score, the higher the quality of the learned topics).\n", "\n", "Here we see a **Perplexity score of -5.49** (negative due to log space), and **Coherence score of 0.62**. \n", "\n", "Note: We will use the Coherence score moving forward, since we want to optimizing the number of topics in our documents." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "## 8.1 Visualize LDA Model" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "" ], "text/plain": [ "PreparedData(topic_coordinates= x y topics cluster Freq\n", "topic \n", "4 -0.322979 -0.081753 1 1 32.837574\n", "0 -0.039580 0.082281 2 1 13.668549\n", "3 0.146494 -0.263874 3 1 13.345621\n", "6 -0.068636 0.059029 4 1 10.814206\n", "5 0.093559 0.068952 5 1 10.212361\n", "2 0.101812 0.081428 6 1 10.015452\n", "1 0.089331 0.053937 7 1 9.106243, topic_info= Category Freq Term Total loglift logprob\n", "term \n", "25 Default 38.000000 engineer 38.000000 30.0000 30.0000\n", "2 Default 67.000000 work 67.000000 29.0000 29.0000\n", "26 Default 34.000000 software 34.000000 28.0000 28.0000\n", "1 Default 58.000000 good 58.000000 27.0000 27.0000\n", "18 Default 57.000000 place 57.000000 26.0000 26.0000\n", "12 Default 84.000000 great 84.000000 25.0000 25.0000\n", "38 Default 21.000000 manager 21.000000 24.0000 24.0000\n", "0 Default 45.000000 company 45.000000 23.0000 23.0000\n", "72 Default 17.000000 google 17.000000 22.0000 22.0000\n", "34 Default 12.000000 review 12.000000 21.0000 21.0000\n", "23 Default 9.000000 intern 9.000000 20.0000 20.0000\n", "68 Default 9.000000 amazing 9.000000 19.0000 19.0000\n", "37 Default 8.000000 account 8.000000 18.0000 18.0000\n", "103 Default 8.000000 people 8.000000 17.0000 17.0000\n", "35 Default 7.000000 analyst 7.000000 16.0000 16.0000\n", "57 Default 6.000000 experience 6.000000 15.0000 15.0000\n", "80 Default 6.000000 benefit 6.000000 14.0000 14.0000\n", "174 Default 6.000000 program 6.000000 13.0000 13.0000\n", "39 Default 7.000000 senior 7.000000 12.0000 12.0000\n", "100 Default 6.000000 sale 6.000000 11.0000 11.0000\n", "117 Default 5.000000 lead 5.000000 10.0000 10.0000\n", "43 Default 7.000000 culture 7.000000 9.0000 9.0000\n", "120 Default 5.000000 product 5.000000 8.0000 8.0000\n", "20 Default 5.000000 job 5.000000 7.0000 7.0000\n", "154 Default 4.000000 overall 4.000000 6.0000 6.0000\n", "298 Default 3.000000 technical 3.000000 5.0000 5.0000\n", "59 Default 4.000000 pay 4.000000 4.0000 4.0000\n", "258 Default 3.000000 specialist 3.000000 3.0000 3.0000\n", "88 Default 4.000000 love 4.000000 2.0000 2.0000\n", "221 Default 3.000000 marketing 3.000000 1.0000 1.0000\n", "... ... ... ... ... ... ...\n", "117 Topic7 4.433551 lead 5.071734 2.2617 -3.2294\n", "141 Topic7 2.609446 grad 3.248193 2.1772 -3.7594\n", "86 Topic7 2.597131 associate 3.235240 2.1765 -3.7642\n", "359 Topic7 2.185940 analytical 2.824709 2.1399 -3.9365\n", "166 Topic7 2.134333 designer 2.772380 2.1347 -3.9604\n", "87 Topic7 1.951527 strategist 2.590575 2.1129 -4.0499\n", "159 Topic7 1.287777 brand 1.925966 1.9937 -4.4656\n", "344 Topic7 1.138010 consultant 1.776058 1.9511 -4.5893\n", "345 Topic7 1.138010 interactive 1.776058 1.9511 -4.5893\n", "346 Topic7 1.138010 solution 1.776058 1.9511 -4.5893\n", "347 Topic7 1.138010 sr 1.776058 1.9511 -4.5893\n", "337 Topic7 1.138001 school 1.776050 1.9511 -4.5893\n", "336 Topic7 1.138001 advanced 1.776050 1.9511 -4.5893\n", "340 Topic7 1.138001 educator 1.776050 1.9511 -4.5893\n", "338 Topic7 1.138001 childhood 1.776050 1.9511 -4.5893\n", "358 Topic7 1.137988 publisher 1.776040 1.9511 -4.5893\n", "381 Topic7 1.137988 vice 1.776040 1.9511 -4.5893\n", "380 Topic7 1.137988 president 1.776040 1.9511 -4.5893\n", "364 Topic7 1.137988 economically 1.776040 1.9511 -4.5893\n", "365 Topic7 1.137988 favorite 1.776040 1.9511 -4.5893\n", "357 Topic7 1.137988 adsense 1.776040 1.9511 -4.5893\n", "348 Topic7 1.137988 fulfillment 1.776040 1.9511 -4.5893\n", "373 Topic7 1.137938 geekland 1.775998 1.9511 -4.5893\n", "331 Topic7 1.137938 collaboration 1.775998 1.9511 -4.5893\n", "350 Topic7 1.137938 pry 1.775998 1.9511 -4.5893\n", "399 Topic7 1.137938 bright 1.775998 1.9511 -4.5893\n", "385 Topic7 1.137938 unethical 1.775998 1.9511 -4.5893\n", "390 Topic7 1.137938 management 1.775998 1.9511 -4.5893\n", "389 Topic7 1.137938 decline 1.775998 1.9511 -4.5893\n", "378 Topic7 1.137938 exact 1.775998 1.9511 -4.5893\n", "\n", "[264 rows x 6 columns], token_table= Topic Freq Term\n", "term \n", "37 6 0.951063 account\n", "360 4 0.522357 active\n", "382 2 0.507025 ad\n", "367 2 0.507019 add\n", "357 7 0.563051 adsense\n", "336 7 0.563047 advanced\n", "354 1 0.893768 altering\n", "68 5 0.879538 amazing\n", "201 4 0.373705 ambitious\n", "201 5 0.373705 ambitious\n", "35 5 0.935292 analyst\n", "314 5 0.622273 analytic\n", "359 7 0.708037 analytical\n", "310 5 0.622273 apple\n", "351 4 0.522327 area\n", "275 5 0.622268 artificial\n", "302 3 0.618322 assistant\n", "86 7 0.927288 associate\n", "177 1 0.939230 awesome\n", "49 6 0.885919 bad\n", "8 2 0.923120 balance\n", "61 3 0.379346 be\n", "61 5 0.379346 be\n", "53 5 0.577073 beat\n", "80 4 0.921917 benefit\n", "52 5 0.577070 best\n", "55 1 0.853713 big\n", "233 1 0.644111 boost\n", "159 7 0.519220 brand\n", "281 5 0.622268 break\n", "... ... ... ...\n", "383 1 0.800487 stuff\n", "74 2 0.495295 summer\n", "307 1 0.542390 super\n", "65 5 0.879973 sure\n", "56 1 0.955498 tech\n", "298 5 0.752426 technical\n", "261 2 0.754018 technology\n", "328 5 0.715036 term\n", "234 1 0.644111 thing\n", "136 4 0.934043 time\n", "228 1 0.644107 undeniably\n", "280 5 0.622273 undervalue\n", "385 7 0.563064 unethical\n", "317 3 0.618313 unlock\n", "294 4 0.623177 vendor\n", "381 7 0.563051 vice\n", "278 5 0.622268 winner\n", "376 6 0.536033 wlb\n", "121 1 0.914898 wonderful\n", "2 1 0.990309 work\n", "299 6 0.637516 worker\n", "51 2 0.736486 worklife\n", "379 2 0.507025 workload\n", "167 5 0.538156 workplace\n", "45 1 0.814835 world\n", "384 6 0.535908 worrying\n", "157 2 0.706627 would\n", "245 3 0.879086 write\n", "330 4 0.716377 year\n", "227 6 0.741158 youtube\n", "\n", "[238 rows x 3 columns], R=30, lambda_step=0.01, plot_opts={'xlab': 'PC1', 'ylab': 'PC2'}, topic_order=[5, 1, 4, 7, 6, 3, 2])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\", category=FutureWarning) # Hides all future warnings\n", "import pyLDAvis\n", "import pyLDAvis.gensim \n", "pyLDAvis.enable_notebook()\n", "vis = pyLDAvis.gensim.prepare(lda_model, corpus, id2word)\n", "vis" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We are using pyLDAvis to visualize our topics. \n", "\n", "For interpretation of pyLDAvis:\n", "- Each bubble represents a topic\n", "- The larger the bubble, the more prevalent the topic will be\n", "- A good topic model has fairly big, non-overlapping bubbles scattered through the chart (instead of being clustered in one quadrant)\n", "- Red highlight: Salient keywords that form the topics (most notable keywords)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 9 LDA Mallet Model" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Now that we have completed our Topic Modeling using \"Variational Bayes\" algorithm from Gensim's LDA, we will now explore Mallet's LDA (which is more accurate but slower) using Gibb's Sampling (Markov Chain Monte Carlos) under Gensim's Wrapper package.\n", "\n", "Mallet's LDA Model is more accurate, since it utilizes Gibb's Sampling by sampling one variable at a time conditional upon all other variables." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(0,\n", " [('work', 0.42162162162162165),\n", " ('benefit', 0.04864864864864865),\n", " ('culture', 0.043243243243243246),\n", " ('engineer', 0.03783783783783784),\n", " ('lot', 0.02702702702702703),\n", " ('environment', 0.02702702702702703),\n", " ('long', 0.021621621621621623),\n", " ('world', 0.016216216216216217),\n", " ('designer', 0.016216216216216217),\n", " ('early', 0.010810810810810811)]),\n", " (1,\n", " [('engineer', 0.2781065088757396),\n", " ('software', 0.2781065088757396),\n", " ('senior', 0.05917159763313609),\n", " ('cloud', 0.029585798816568046),\n", " ('nice', 0.023668639053254437),\n", " ('lead', 0.023668639053254437),\n", " ('staff', 0.01775147928994083),\n", " ('partner', 0.01775147928994083),\n", " ('developer', 0.01775147928994083),\n", " ('big', 0.01775147928994083)]),\n", " (2,\n", " [('great', 0.33076923076923076),\n", " ('sale', 0.06153846153846154),\n", " ('program', 0.046153846153846156),\n", " ('love', 0.046153846153846156),\n", " ('director', 0.038461538461538464),\n", " ('good', 0.03076923076923077),\n", " ('time', 0.023076923076923078),\n", " ('datum', 0.015384615384615385),\n", " ('stuff', 0.015384615384615385),\n", " ('analytical', 0.015384615384615385)]),\n", " (3,\n", " [('manager', 0.18787878787878787),\n", " ('account', 0.06666666666666667),\n", " ('analyst', 0.06666666666666667),\n", " ('people', 0.06060606060606061),\n", " ('product', 0.048484848484848485),\n", " ('perk', 0.04242424242424243),\n", " ('balance', 0.03636363636363636),\n", " ('project', 0.030303030303030304),\n", " ('associate', 0.030303030303030304),\n", " ('meh', 0.024242424242424242)]),\n", " (4,\n", " [('place', 0.24025974025974026),\n", " ('google', 0.11688311688311688),\n", " ('amazing', 0.09090909090909091),\n", " ('review', 0.08441558441558442),\n", " ('job', 0.045454545454545456),\n", " ('excellent', 0.025974025974025976),\n", " ('specialist', 0.01948051948051948),\n", " ('rough', 0.012987012987012988),\n", " ('bad', 0.012987012987012988),\n", " ('depend', 0.006493506493506494)]),\n", " (5,\n", " [('company', 0.32),\n", " ('great', 0.3142857142857143),\n", " ('career', 0.022857142857142857),\n", " ('pay', 0.022857142857142857),\n", " ('grad', 0.017142857142857144),\n", " ('dream', 0.017142857142857144),\n", " ('perfect', 0.017142857142857144),\n", " ('marketing', 0.011428571428571429),\n", " ('security', 0.011428571428571429),\n", " ('executive', 0.005714285714285714)]),\n", " (6,\n", " [('good', 0.3611111111111111),\n", " ('place', 0.1388888888888889),\n", " ('intern', 0.07222222222222222),\n", " ('experience', 0.05),\n", " ('awesome', 0.027777777777777776),\n", " ('technical', 0.016666666666666666),\n", " ('tech', 0.016666666666666666),\n", " ('ambitious', 0.011111111111111112),\n", " ('fix', 0.011111111111111112),\n", " ('data', 0.011111111111111112)])]\n" ] } ], "source": [ "import os\n", "from gensim.models.wrappers import LdaMallet\n", "os.environ.update({'MALLET_HOME':r'/Users/Mick/Desktop/mallet/'}) # Set environment\n", "mallet_path = '/Users/Mick/Desktop/mallet/bin/mallet' # Update this path\n", "\n", "# Build the LDA Mallet Model\n", "ldamallet = LdaMallet(mallet_path,corpus=corpus,num_topics=7,id2word=id2word) # Here we selected 7 topics again\n", "pprint(ldamallet.show_topics(formatted=False))" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "After building the LDA Mallet Model using Gensim's Wrapper package, here we see our 7 new topics in the document along with the top 10 keywords and their corresponding weights that makes up each topic." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "## 9.1 LDA Mallet Model Performance" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Coherence Score: 0.7737574215817109\n" ] } ], "source": [ "# Compute coherence score\n", "coherence_model_ldamallet = CoherenceModel(model=ldamallet, texts=data_lemmatized, dictionary=id2word, coherence=\"c_v\")\n", "coherence_ldamallet = coherence_model_ldamallet.get_coherence()\n", "print('\\nCoherence Score: ', coherence_ldamallet)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Here we see that the Coherence Score for our **LDA Mallet Model** is showing **0.77** which is much improved in comparison to the 0.62 Coherence Score from the LDA Model above. Also, given that we are now using a more accurate model from **Gibb's Sampling**, and combined with the purpose of the Coherence Score was to measure the quality of the topics that were learned, then our next step is to improve the actual Coherence Score, which will ultimately improve the overall quality of the topics learned.\n", "\n", "To improve the quality of the topics learned, we need to find the optimal number of topics in our document, and once we find the optimal number of topics in our document, then our Coherence Score will be optimized, since all the topics in the document are extracted accordingly without redundancy." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 10 Finding the Optimal Number of Topics for LDA Mallet Model" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We will use the following function to run our **LDA Mallet Model**:\n", "\n", " compute_coherence_values\n", " \n", "Note: We will trained our model to find topics between the range of 2 to 40 topics with an interval of 6." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "hidden": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Compute a list of LDA Mallet Models and corresponding Coherence Values\n", "def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3):\n", " coherence_values = []\n", " model_list = []\n", " for num_topics in range(start, limit, step):\n", " model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=num_topics, id2word=id2word)\n", " model_list.append(model)\n", " coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')\n", " coherence_values.append(coherencemodel.get_coherence()) \n", " return model_list, coherence_values\n", "model_list, coherence_values = compute_coherence_values(dictionary=id2word, corpus=corpus, texts=data_lemmatized,\n", " start=2, limit=40, step=6)\n", "\n", "# Visualize the optimal LDA Mallet Model\n", "limit=40; start=2; step=6;\n", "x = range(start, limit, step)\n", "plt.plot(x, coherence_values)\n", "plt.xlabel('Num Topics')\n", "plt.ylabel('Coherence score')\n", "plt.legend(('coherence_values'), loc='best')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Num Topics = 2 has Coherence Value of 0.7254\n", "Num Topics = 8 has Coherence Value of 0.7591\n", "Num Topics = 14 has Coherence Value of 0.782\n", "Num Topics = 20 has Coherence Value of 0.7849\n", "Num Topics = 26 has Coherence Value of 0.7771\n", "Num Topics = 32 has Coherence Value of 0.7673\n", "Num Topics = 38 has Coherence Value of 0.7494\n" ] } ], "source": [ "# Print the coherence scores\n", "for m, cv in zip(x, coherence_values):\n", " print('Num Topics =', m, ' has Coherence Value of', round(cv, 4))" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "With our models trained, and the performances visualized, we can see that the optimal number of topics here is **20 topics** with a Coherence Score of **0.78** which is slightly higher than our previous results at 0.77. However, we can also see that the model with a coherence score of 0.78 is also the highest scoring model, which implies that there are a total 20 dominant topics in this document.\n", "\n", "We will proceed and select our final model using 20 topics." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(0,\n", " '0.282*\"company\" + 0.231*\"people\" + 0.077*\"place\" + 0.051*\"grad\" + '\n", " '0.026*\"challenge\" + 0.026*\"compensation\" + 0.026*\"contract\" + '\n", " '0.026*\"progress\" + 0.026*\"smart\" + 0.026*\"intern\"'),\n", " (1,\n", " '0.367*\"company\" + 0.102*\"cloud\" + 0.041*\"strategist\" + 0.041*\"environment\" '\n", " '+ 0.041*\"employee\" + 0.020*\"pry\" + 0.020*\"usa\" + 0.020*\"break\" + '\n", " '0.020*\"atmosphere\" + 0.020*\"term\"'),\n", " (2,\n", " '0.164*\"review\" + 0.164*\"amazing\" + 0.145*\"senior\" + 0.109*\"job\" + '\n", " '0.109*\"balance\" + 0.055*\"great\" + 0.036*\"big\" + 0.018*\"compensation\" + '\n", " '0.018*\"java\" + 0.018*\"ep\"'),\n", " (3,\n", " '0.333*\"good\" + 0.083*\"review\" + 0.083*\"pay\" + 0.062*\"environment\" + '\n", " '0.042*\"senior\" + 0.042*\"outstanding\" + 0.021*\"depend\" + 0.021*\"point\" + '\n", " '0.021*\"hype\" + 0.021*\"worrying\"'),\n", " (4,\n", " '0.314*\"great\" + 0.137*\"program\" + 0.078*\"perk\" + 0.059*\"time\" + '\n", " '0.059*\"staff\" + 0.039*\"analytical\" + 0.020*\"technician\" + 0.020*\"average\" + '\n", " '0.020*\"coworker\" + 0.020*\"datum\"'),\n", " (5,\n", " '0.530*\"place\" + 0.045*\"perfect\" + 0.045*\"technical\" + 0.030*\"life\" + '\n", " '0.015*\"overpay\" + 0.015*\"iii\" + 0.015*\"learn\" + 0.015*\"vice\" + '\n", " '0.015*\"leader\" + 0.015*\"write\"'),\n", " (6,\n", " '0.431*\"company\" + 0.308*\"good\" + 0.077*\"product\" + 0.015*\"team\" + '\n", " '0.015*\"ad\" + 0.015*\"depend\" + 0.015*\"effect\" + 0.015*\"consultant\" + '\n", " '0.015*\"starting\" + 0.015*\"altering\"'),\n", " (7,\n", " '0.473*\"great\" + 0.145*\"benefit\" + 0.073*\"excellent\" + 0.055*\"product\" + '\n", " '0.055*\"lot\" + 0.018*\"class\" + 0.018*\"customer\" + 0.018*\"real\" + '\n", " '0.018*\"promo\" + 0.018*\"run\"'),\n", " (8,\n", " '0.579*\"good\" + 0.053*\"analyst\" + 0.053*\"business\" + 0.035*\"pgm\" + '\n", " '0.035*\"year\" + 0.018*\"educator\" + 0.018*\"workplace\" + 0.018*\"accountant\" + '\n", " '0.018*\"bad\" + 0.018*\"unclear\"'),\n", " (9,\n", " '0.316*\"google\" + 0.088*\"director\" + 0.035*\"phenomenal\" + 0.035*\"early\" + '\n", " '0.035*\"fun\" + 0.018*\"fulfillment\" + 0.018*\"workload\" + 0.018*\"exact\" + '\n", " '0.018*\"hire\" + 0.018*\"eager\"'),\n", " (10,\n", " '0.437*\"manager\" + 0.155*\"account\" + 0.085*\"project\" + 0.042*\"tech\" + '\n", " '0.042*\"awesome\" + 0.028*\"start\" + 0.014*\"unethical\" + 0.014*\"make\" + '\n", " '0.014*\"googlex\" + 0.014*\"goal\"'),\n", " (11,\n", " '0.435*\"software\" + 0.065*\"dream\" + 0.065*\"developer\" + 0.043*\"data\" + '\n", " '0.043*\"meh\" + 0.043*\"rough\" + 0.022*\"expectation\" + 0.022*\"unique\" + '\n", " '0.022*\"large\" + 0.022*\"senior\"'),\n", " (12,\n", " '0.338*\"work\" + 0.118*\"culture\" + 0.074*\"associate\" + 0.074*\"amazing\" + '\n", " '0.059*\"experience\" + 0.015*\"realistic\" + 0.015*\"listen\" + 0.015*\"glassdoor\" '\n", " '+ 0.015*\"undeniably\" + 0.015*\"workplace\"'),\n", " (13,\n", " '0.348*\"work\" + 0.188*\"intern\" + 0.116*\"analyst\" + 0.058*\"internship\" + '\n", " '0.058*\"love\" + 0.058*\"engineering\" + 0.014*\"demand\" + 0.014*\"assignment\" + '\n", " '0.014*\"learn\" + 0.014*\"gr\"'),\n", " (14,\n", " '0.585*\"work\" + 0.057*\"ambitious\" + 0.038*\"wonderful\" + 0.038*\"awesome\" + '\n", " '0.019*\"add\" + 0.019*\"wlb\" + 0.019*\"surpass\" + 0.019*\"enjoy\" + '\n", " '0.019*\"working\" + 0.019*\"brand\"'),\n", " (15,\n", " '0.609*\"engineer\" + 0.078*\"lead\" + 0.062*\"experience\" + 0.047*\"bad\" + '\n", " '0.016*\"stuff\" + 0.016*\"teammate\" + 0.016*\"datacenter\" + 0.016*\"challang\" + '\n", " '0.016*\"market\" + 0.016*\"practicum\"'),\n", " (16,\n", " '0.479*\"great\" + 0.083*\"career\" + 0.062*\"marketing\" + 0.042*\"lot\" + '\n", " '0.021*\"brand\" + 0.021*\"worker\" + 0.021*\"employee\" + 0.021*\"org\" + '\n", " '0.021*\"job\" + 0.021*\"meh\"'),\n", " (17,\n", " '0.429*\"software\" + 0.238*\"engineer\" + 0.048*\"designer\" + 0.032*\"fix\" + '\n", " '0.016*\"competitive\" + 0.016*\"hrbp\" + 0.016*\"challenge\" + 0.016*\"boost\" + '\n", " '0.016*\"reward\" + 0.016*\"founder\"'),\n", " (18,\n", " '0.441*\"great\" + 0.118*\"sale\" + 0.044*\"long\" + 0.015*\"geekland\" + '\n", " '0.015*\"quality\" + 0.015*\"production\" + 0.015*\"shiny\" + 0.015*\"colleague\" + '\n", " '0.015*\"stuff\" + 0.015*\"adsense\"'),\n", " (19,\n", " '0.364*\"place\" + 0.045*\"partner\" + 0.045*\"specialist\" + 0.045*\"perk\" + '\n", " '0.030*\"love\" + 0.030*\"worklife\" + 0.030*\"fun\" + 0.015*\"issue\" + '\n", " '0.015*\"store\" + 0.015*\"hardware\"')]\n" ] } ], "source": [ "# Select the model with highest coherence value and print the topics\n", "optimal_model = model_list[3]\n", "model_topics = optimal_model.show_topics(formatted=False)\n", "pprint(optimal_model.print_topics(num_words=10)) # Set num_words parament to show 10 words per each topic" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "By using our **Optimal LDA Mallet Model** using Gensim's Wrapper package, we displayed the 20 topics in our document along with the top 10 keywords and their corresponding weights that makes up each topic." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "## 10.1 Visual the Optimal LDA Mallet Model" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "hidden": true, "scrolled": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAAJeCAYAAACplPMoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzs3XecE2Xix/Hv0EURAQVUlAVRWLpS7F0UWHtv2MvvPPX0bCuiFBE5PetZ7s522LFhWxABC6CIFIEFls6iKEUEAaXvzu+PbLKTZDKZJJNkNvt5v168Npl55nmeBJj95skzzximaQoAAABAQI1sdwAAAADwEwIyAAAAYEFABgAAACwIyAAAAIAFARkAAACwICADAAAAFgRk+JZhGKaLP6VpavttwzAWeFRXfcMwnjQMY7VhGFsMw/jGMIyjvKgbANzIhfOpYRg1DcN42DCMcYZhrK/o88Ve9BGIVCvbHQAcHBnxfJSk2ZIGWbZtT1PbAyTt7lFdr0k6QdKdkn6S9DdJ4wzD6Gma5jyP2gAAJ7lwPq0t6a+SfpA0WtJlHtQJ2DK4UQiqiorRjcmmaV6e7b64ZRjG4ZK+k3SpaZpvVWyrI2mhpGmmaV6Yzf4BqJ6q4vlUkgzDqGGaZrlhGB0lFUu6xDTNt7PdL+QeplggZxiGcbVhGMWGYWw3DONXwzBeMQyjaUSZ1YZhvGgYxk2GYSwzDGObYRjTDMM4NqJc1FeChmE0MAzjnxXHbTcMY5VhGO8ahtHEoVtnStoq6f3gBtM0d0h6R1KBYRg1U37hAOAxn55PZZpmuXevEoiNgIycYBjGrZJeljRL0tkKfKV3pqQvDcPYLaL4aZL+IukeSZdWbBtrGEYrh/rrSfpS0v9JelFSgaRbJW2WtKdD1zpIWlwRiq3mSaovKS/eawOATPLx+RTIGOYgo8qrmLIwUNJY0zT7WbYvlTROUj9J/7UcsrekHqZprq4o96WkFZL6S7o+RjPXSOom6TTTND+3bH83TvcaS9pgs329Zf/SOHUAQEb4/HwKZAwjyMgFHRUImq9bN5qmOV7SGknHR5SfFDyZV5TbIGmsoi9isTpV0oqIkzkA5BrOp4AIyMgNjSt+rrLZt9qyP2iNTbk1kvZ3aKOJpJWJd00bJDWy2R7s03qbfQCQLX4+nwIZQ0BGLgiGzOY2+5orOoQ2synXTNLPDm2sk/MJP5Z5kg6u+NrSqr2kLZJKk6gTANLFz+dTIGMIyMgFcxU4aYctGG8YxskKnKi/iih/rGEYzS3lGilwockUhzY+l5RnGEavBPv2saTdJJ1jaa+OpAskjTZNsyzB+gAgnfx8PgUyhov0UOWZprnDMIzBkp4yDOMVSSMlHSjpIUnzFTGXToHRi3GGYQyRVCbpXgX+Lzzk0Mwrkq6V9L5hGMMkTZPUUFIfScNM01weo2/fGYbxkaRnK67+XinpFkn7ShqSzOsFgHTx8/lUkgzDOFGBKRoHVGw63DCMXZLKTNMcldCLBRwQkJETTNN82jCMPyTdrsBSQ5skFUm62zTNrRHFx0qaKekRSfspsNj8aaZpljrUv80wjJMkDZZ0kwJfNa6TNEnSxjjdu0zSw5L+ocASRj9IOtU0zeJEXiMAZILPz6cPSzrc8vy2ij/bJdVz8/oAN7iTHqoVwzBWS/rUNM3rst0XAKjKOJ8ilzEHGQAAALAgIAMAAAAWTLEAAAAALBhBBgAAACwIyAAAAICF35d5Y/4HgFxnZKgdzqcAcp1n51NGkAEAAAALAjIAAABgQUAGAPjToIaBPwCQYX6fg4wckFdYFHpsGNLyhwuy2BsASNCghtKgeHdABpBLCMjIKJbdBuCaH0IpI9hAtcQUCwAAAMDC73fS83Xn4F5wmkXpcKZXABGyt8xbcOrAE+2ljT9H7Isxehs85rel0r8Oi3+M3QisU9222x1GkhOp36mNAWulWnUDj58+VFq/LHYd8doAkC2enU8JyACQXdkNyKHHFYHvvauluR+Eb4s8ZuDv0uC9pCs+llofL+3aJg1tFl0+WH+nC6TzXgzfdv+vUs06zj0Olo0XqK37nY6xe73LJ0ojzkisjWzxU18Af2IdZACAR6yB6/xXpAbNncsP3itwTOvjA89r1YsObdv/CPws/KkyHFvbenCfFPscIywGn0eOFE8YHPjZriD8mFbHVZ3AWVX6CeQAAjIAINwdCwM/Y01HOKEwfh0P7x/4WW/P6H3ZCHqTHg/8vPjN9LcVXJ4ucpk6u2XrnukeY5qIzfHx2rM+j9wPICGsYpFm1iXOrF679nAde/DeSdW5fVe52g4YE7Zt34a7acq9J7k6/ohhE7R60zZJ0XOC7fqb6LzhWK9Zkvbeo66mDzglofoivf7dCg34cG7Ytv5983XDca0Tqsfaz8jXOGnxOvV7aWrYtvO7tdA/L+iSYG+BHHTCvd7U80hr6e44c33j8Vv4e+vi8FFsp8fx9gcfO7Gtd6N9GwBcIyCniVNIlBQWvibfc5JaNNotbp3/+mKJHvt8oe2+VRu3pnQhnFN//XKBnVMfh40u0bDRJZJS62e5aar1vaNt9703Y6Xem7Ey6+8DkDO2/JbtHngv1vSU2+c5H5dKmI11oSLhGEgaATkN4oXjSG7CcSJ15hUWJRTi3NadaL1eSufrDzrukS/14/otCR8HIEk9rk29Dr8FwNOfDATTRnnhfVu3SGrYwl0dgzZKz/SQjrnd3euzK1M6Wep5vbv2AEQhIHvMGuSOP2Qfjbimp6uyTk55/Ouw57HCn7U+tyHRaZqBJD3zxRL90zJqffd7c/TI+Z0d63QzbSMRkcd7+fqtguH4huNaq3/f/Kj9BU9PUtGtxyZUJ1CltUltOlRMIy8P/Cx4PD31Z5tRQ/rb7PBtr50THWT371b5OHKqxG3F0l4HJt+H/xUwigykgIv00sgpHEuBoOcmxC1Z+0fYMU71WV3zv2lx645X780ntQl7/s70n1zX6YXrRkwPe57I60+EdRqJXTiWRDhG9RH8yv7y91Oow2EObcknydcbNPD32PUn2h8vDWooDdxQ+fztyyrbf/4oaemXlX25/ovAz+EHRIfYYDiOuuDOZtWOhaOlx9rZlyMcA0lhBNnnrKOi++0VfyrGQfvsoaW/BgL1FwvWumojXrAsHV6Q8ihwssaXrAk9fv26w+OWt/Y10VFk5hajWrr0HfvQ2OSg1Ov+2xzpqc6J38zDDcMIjNSa5fb1931U6nlD+Lb6TQLznhPpT+3dpJ1b4wfVWPa2DDL85Vv7MoURAw+XjnRXd1DbvoE/ADzDCHIV8m1h/FUqJtxxfAZ6kh3HtElu1Q8ADt6/1n5U8paZqdfdqGV03Qce4d2o5sAN9nX95dvocCwFVsywK9/5otht3Lc6sf6GpjU0lBZ/Lp0y2P2xQesWW5Z5YwQ4IU7fEKT67cHOrdJzR6RWB6oMRpDTyC+rP1RV//hsQcp1mGZgoCmew1s3SbktoEravjnw020QS2qVhTSHvETrT/drSPX1HnVL4A+cvX6udPkH7sun+vdSO/63uMgdjCB7zC4M5xUWKa+wSAffN8bmCPeC9cT7kyue/2ppynW0utfd+zHyBkYFACBj7G5sYr3hid3+yU9IgxsFHq+dLy2ZEPi5dn543d/+K7oOu+eTn7Rv7/Xz7MujWiEgp0Hp8AJ91//kqO07y8pzLsQCQFoEw8lFr2e3H8gsp4sLj7k9MOdckpq2r/wZfBzkdvR9/ED79oIXp57zH3f1ICcxxSJNmu9ZLzSabBeImX4BABZ2o3WGIeWfkfm+wF+ad5S+fTrw+NQH099ecO731vXpbwu+RUDOgGAI/sdnC6KmDSSy0gJhGoCn/HQBmJ/6An9ZPVf6v28y05b1wsjP7pWOuCkz7cJ3CMgZdE/vdrqndzuNLl6lm96ovEJ84ZrNatusQRZ75k9PXtRVt42clVIdS4ex9BEA+JLbOb4PrI+Yk2yZFhGaQxznA5Z1rvHeB0s3Tw8/PlRfjDoj50TzgS7nEZCzoG+nfXVGl/30yexfJEmnPTGR0WEbZx+6f8oBuWYNF0tYAAAyyy5gxgqdNWrG3me3RGG8x8n0g0Bc7XCRXpb865JDEz6mul/c9/p3K+KWKTfNDPQEAADkMgJyltw3aq6rcpEjy1OW/paO7vjWtPtOCT0e8GH896z1vaNDj2cPPDUtfQIAALmNgOyh9g98przCIr0z/SfHchMX/ao3plaOhsabXjHwjA6hx5e88F3ckeRcWkpunwZ1w547va7IfQ13q52WPgEAchRrH6MCc5DT4O735uju9+Z4Vt/VR+dp8Cfzwrb5MQCf/ew3mvXT745l1v2xPWbfY31QKB1eEHaMm9fOnG4AAJAsRpCzLJEl3l679vA098a/Egm8hGMAQMI+K8x2D+AjjCB7aP6Q3pKkJ8cv0pPjF8csd163Fnrsgi4J13/swXurdHiBdpWbatN/tG2ZvCa766u7Tki47qrA6cYr1v0AACTsu+ez3QP4iGH6+6p/X3cOADyQqbUIOZ9WJU5zYd0uOeZUR+sTpCs+ir3/4RbS9s3J9SFWuzGXWrNZ4ziRMtZ1i18/V1oywX3b/z1e+sXlcqIs9VYVeHY+ZYoFAAB+Eu9CsXj7f3gtfhmncDyooXM4djrOMdin+QK4QQ3tw7FT227DMaodplgAAOAnZz8ndb3Mfp/1jnCxRjQ/utlSPsFRTzejuXZ2bXM+zk2/vTBgrVQrfPUjx7bt7pbHSDHECDIAAP4SKxxL0r0/Ox+bbMCVpLUlyR87tFng51+nxuhXhkJnZDjOZNvIKQRkoJrx4xKBAFyqu4e7csmEwueOSPyYSPu0i18mXVMtCMLwEAEZAAA/Ky8LzCt+96rAxXPplq6g2f+X9NQLpAFzkIFqhuXwgCqgKt3R7cth7srV2T29/QA8REAGEPLRrF90x7uz9PKVPXTcIfskXc8F/56i1Zu2adLdJ3rYO6CaiDeP2G/heffkzxWAXzHFAqgGdpWbenXKCvV+apLyCov03oyVUWXyCov0t7d/UL8jWuqKl7+PmqtsN3c5r7AobPvwMQuUV1ik5ev+VOPd60TtBxBHMPzWqld15tT2vN5duTnvpLcfgIcYQQaqgVo1DF1xZEtdcWRLV3ciHHhGB9ty7e7/TAse7B3zuMI+7VTYJ/wiHQIykIQBa+y3z/sw/W2naym2D1wGacAHGEEGENOqjZVrm5YOL9C2nWVZ7A0AvXulu3LJTMPwIhS7aTeynft/Tb1dL5XtyHYP4AMEZACSKqdLuJ0WkVdYpP322i1s25cL1yZcDwAbdkEz0fAZq/yCImnJ+PjtLxydQN8s7f73BHfHBNWsE6fuDM+5fpA51WCKBQBVToOwTpeIFW7zCotC5b4tPCls39WvTItaJYOQDCRg0MbwO7857U+2Dkm6/P34x751Sfz+2h37yw+x2401St3yKGnFtxVlknzdqXLz3qPaYAQZgCTpssNbxi0TDL+m6a7Ov739QypdAqqnWEEskYA2aKN0zO32+5p3lNqc4nysU1vx9hk20SJenVePsd/f4ZzMBlNCMCoYptvfdNnh684BVVFeYZH+eUEXnd+t8oYD/524TMNGl0SVnXLvydq3Yb2o4yX79ZQjR4sNIxCmWXvZkZGhdjifAsh1np1PCchANWMXkJFVBGQA8IZn51OmWADVUMsm9bPdBQAAfIuADOS4J8YtCj1+cvxiSVKPvMbZ6g4AAL7HFIsK1rmTfpsv6bQKgN/6Cv+J/PfDvxnfYYoFAHjDs/Mpy7wBOY5ADABAYgjIVQDrygIAAGQOc5ABAAAACwIyAAAAYEFABgAAACwIyHHMWLFB+fd/pr++MTOleq54+Xu1vne0rhsx3aOepe7TOb+o29BxOu6RL/Xt0t9Sru/H9VvUZfDnOu2JiSnX9fOGrTrxn1+p48CxGj5mQdL1DP5knlrdW2R7lzgAAAA7LPNWIXKZt1SXVuvz1CSVrNoUt1wyKwyksiSdmwv8Xriiu3q1bxa3jmDb8ep028fnvlyiR8YujFsuXn3DRpfovxOXxdz//l+OUreWjVz1CcgAlnkDAG9wJ710ihf43IRMN+HYbV1ecdvW9a+6H+V2U6fbMm7CcTyt7i1yDMeSdN7z3+qu9+ak3BYAAMhNLPPmIHKk8pznvtEPP/4uKRDo3IyMxipjDY3HPvKlJt19Ygo9dSc4Mj7krI664siWUftPe2KiFq7ZHOpfvNcXbyTbbSCPLPfo+Z11QfcDosrd/OZMjZ23JmY9U5b+JusXIk59enf6T3r0/M6u+gcAAKoXRpBjsAtXo246Ouz59l3ljsc7BUzrvp/Wb0mih8kpHV5gG44laeztxyVdp53Rtx4beux2Dnfp8ALbcCxJz1x6mBY/1CfmsZe88F3cPlm3s540AACwQ0C24TbYth0wJhPdyahE5zQ7lW+/356hx0XFq2zLeHWL749n/+JJPQAAAARk5IRb3/rBddlrj2mVxp4AAICqjjnIGTatdL0mLV6nrxf9mu2uhPy8Yau+WrRWExet00Qf9StZiUydGPzJPA08o0MaewMAAKoaAnIa+XGO69adZcq//7Nsd8M3lv76Z7a7AAAAfIaAnCZ24fjw1k30xIVdtN9eu8Usk+k+SdIrV/fQiW2bxi0HAABQHRCQ08CrC8+8FBl6/dKvdMjl1wYAANKPi/Q89vzXS0OPn7vssCz2JDYCJAAAQGwEZI/9Y8yC0OO+nfbNYk+Sw/QKAABQ3RGQbTiFxJe/WR563Kt9s6Tb2LxtV9LH5hKvbtzBDUAAAIBXCMgx7Co3bbcP+WR+6PELV3SP2u82qHUaNDaF3qWHH4JlvD6s2bTNk3o2bt3pi9cLAAD8h4v0YmjTf7Tj/kVDY9/y2CqvsEi92jfTqe2b6Z73i1VuVgbvYed2Uv8Pih2PX7z2D/37q6V6f+bKmPUH9e20r248rrW6HLBXVLl3/+9IXfDvKWHHPH5hV325cK0+ibgLXaaDY2Sb8dp3uo10IvUAAADYYQQ5QunwgrgXsc0Y0Et1asV+6yKPHzd/je56b05YOC4dXqBLex4Ytz+9Hv86ZjiONLp4lc569hvbfT3yGquGYYRt+/s7s6LCcbZ41Xbp8AIV9mnnSV0AAKB6MkzTfiqBT2S9c9ZRyLG3H6e2zRokfXyzPetpav+TPetbsv76xkwVFa8KPffjqhaRo79XHZWnQWcmfse7yHoGndlBVx2Vl0rXAK8Z8Yt4IuvnUwBIM8/OpwRkAMguAjIAeMOz8ylTLAAAAAALAjIAAABgQUAGAAAALAjIAAAAgAUBGQAAALAgIAMAAAAWBGQAAADAgoAMAAAAWBCQAQAAAAsCMgAAAGBBQAYAAAAsCMgAAACABQEZAAAAsCAgAwAAABYEZAC+c/az3+jJ8Yuy3Q0AQDVlmKaZ7T448XXnAHgvr7Ao7Hnp8IIs9SRjjAy1w/kUQK7z7HzKCDIAAABgQUAGAAAALAjIAHzFOqWiGkyvAAD4EHOQkZK8wiI9dE5HXXZ4y2x3BaiqmIMMAN5gDjIAAACQDgRkJO3bpb9luwsAAACeY4pFGkUuVxUpcn7lrnJTbfqPDtue6JJXySyRZdfPeMe9P2Ol7nh3dsz9o289Vu332zNu21XBZ3NX6/9enxG2bfh5nXVxjwNSqjf4vsd6r4P7v77rRLVsUj+huhev/UO9Hv86bNsJbffR/67umURPc0f/D4r15vc/hp4/fG4nXdLzwJTqvH3kLI364efQ87+e2EZ3ndY2kSqYYgEA3vDsfEpAToNdZaba3Dc6bjmngDzqh591+8hZcY8Jcgrjn95yjDru3zChYySpf9983XBc64SOCarqAdnt65QS+9BSOrwgZt3BeuLtd9NOPN1aNtL7fznKVdlE6o2Uap/33qOupg84JaE2I99vN+1YyybTTiJs2iEgA4A3PDuf1vKqIlQKhuPIX4TBX6hLhvVVrRqx/w6/X75et4+c5foXdrDePerW0tzBp0XtO/1fk23rKh1eoKOGf6FvC08K2/7x7F9061s/aNjokqiAbDe6nWsX6SUafPIKizR38Gnao278/06L1/7huP/d6T85tpPMByQ7M1ZscKwvl7h9b9y+H39u36UOA8em2i0AgI8xgpwGTl+dO+0LjiDH2p9Me+v/3KHDHhznaZ2RZXItIA8fs0D//npp6Lnde9Cm/2jtKg//5+k2vHbcv6E+veUY232RdUV+k+Dm7yNWudkrf9dZz3zjqs/JsvZh97q1NC/iA1uidaQ6gmz12AVddF63FnHLejGNKcGpTowgA4A3WMUCAfGCbOPd62SyOzmhsE87lQ4vCP2xs2RY36TDZTAcS/Z/b9Zt5xy6v+t64/W5S4u9Yn6r4YXIupIJx+lSOrwgKhwHtyei9b3hU6diHV8dRuYBIJcxxSJHeBF0Sn/7Ux/N+kUTF/3qQY+qHzdf0c+8v1eGehNbzRqGysq9HUxM5uLQTInXF6d54ZHKLd+4jb71WNd9qC7TWQAgVxCQ02j7rnLVrVU5SO9m2kKmeTmCiPjijehfc3SrtPdh6bC+nv69L/v1z7Dnfvr3nYzl6/5Uq713j1su3kWot/c6RE+MW+RVtwAAGURAToPgiFTbAWOi9r12bXqW2UomlARDkt3yXwRnd7oesJdm/fS7Z/U9cEb7qG171a+t37fs9KwNr5302Fehx34Lx09e1DXhY96Y+qMGFOSn3HYDFxdtAgD8iTN4mvktMAQFA3D9OjWr/dq4flevVk1J/gzI1g9SbkZdM+3sBOZwB23ZscuTtod8Ot+TegAAmUdAToNMTqUYfGYHDfx4XtJzHOcP6Z1yHzZv8yZQ+FFVHEnPVJ8j2/nyzhMy0q5fJPJ/zq8flAEA9ljFIo227ChLextXHpUXeuz0NfyOXeW2209+7OuobR0TXON1+JgFCZWvCvIKi6pcOM5kn/18UV46Rb7OVvfav99V7d8OACAcI8hpEJyD3P6BzxzLeN1e1yGfxywz7b5TtE+DulHHLP31j8DPYX31UFGJXv5muSTp3MP21wczf45VXVgdUiAQ9GzVWAtWb9amrTurdGCyCzfzBp+m3W3mlPolCNn1483rj9BRBzVxXd6tpb+G3+ykKv9dJ2PKvSfpyIe/kCSZZuK3lAcA+B8jyGngJnx4Hazi/RK2huOgZQ/3DT0+qP/oUDguHV6gxy90d3GTtd3vl6/Xpq3+nCvrlt3IaOnwAttw7Bex+hwrHKfK+q1DdQx/+zbcjakVAJDj/Ptbv4pK5A50kWrVMFL6hZrosTUM5/aqWwiInCpSFV9Xuvts/bdr/YBV3UTeRbF7y0aatfJ3XdrzQA05q2OWegUA8AoBGahgvb10LnswydUVrOHYMAIfsKqr4C3hz+iyn/51yaFZ7g0AwGtMsQCqsJUbtiZ8zEuTlyd8TOS3Hssfrnqj616xvheEY3huUMNs9wCACMhpU7JqU9S2HbvKQ79c99ytdqa7BI/54QK9Fo12S6j8n9sTX5Kvul+UB4Q8yfQZoLpgioXHZj1wqroO+Vx9nprkWG7OwFMz1CO4ZV2VQ5Lm/7Ip5u2Ez33u20x1KyFtB4zRwqF9Yu7vkOASfhIX5TnJKyxSrRqGvr/vlLi3EUcO+P2nbPcAQIYQkD22V/3aKh1eoAWrN6v3kxPD9i1/uEDVeNpmldP36cCHHGsoPPLhCVq1cVvoeWSozrbtFd9SWPv8xtQVum/U3NDzE9ruo68W/uqqvsjXluhrdROoZ/30u57/aqnGzlsdtW/dH9tDbV7c4wDdePxBWb9jX+Tf+a5yU4c9OM7xmBPbNtUrV/dId9cAAB4hIKdJu+YNGG2rguwCb6xQ6Je/30T6fEizBvrf1T19EeoT7cPb037S29MqR/D88v678eXCtUnf7RI+YJ0XHPZ4Y+xyktT1Uuns553LuKnHrgyAtCIgAxHcjAr7Legk2uduLRtpxooN6e5WzrF7jx+/sKsaWq4p2L6rTJOXrNObU3/MZNeQTsFwOqhh7KBqt29Qw/CAHAy+8cKum8AMIK0M0zTjl8oeX3cOQPVhDccXdD9Aj57fOeHjJNsPV5maeMX5NFXJBGTrNqfjE6kHQCyenU8ZQQaABLkNx5L/5qkjDRZ9FvhpN9K7bpG09yGZ7Q+AlBGQASCOm96Yme0uwM8O6R34ySgvkDNYBxkA4hhdvCrpYzdu3elhT1DlbViR7R4AcIGAnEZ5hUUZ/2o12CZf6QLemXl/r6SP7TL489Djq47K86A3yKohjQM/F0WsKd7m5MAUi2H7SWvmSY+0ip5ycedi6anOge2r5kivnsUFeIBPEZDThIAK5I7Im4C4+f/9xYK1UeUGndnB034hwwZtlMrLAqH2zQvD913+QWD/jj+l54+StqyPnnKxR9PKbf85Vlr2FdMyAJ9iDjIAuHDDca3134nLQs/TcdMUVAGJLtGWaBm7fYRoIOMYQU6T0uEFate8gb6884RsdwWAB/r3zdf3952S1LGEYwCoWlgHOcdYR7X4pQykj4c3k2EdZADwBusgA0A28QEUAHIXUywAAAAAC0aQU+TmQp3BZ3bQlQks7+RUZzKjVrHqYwQMAAAgGiPIPvLbHzviBu68wiLd8tYPrut0qi+vsEjvTv/JdV0AAADVARfppVEwnLodQY53gZ11/8z7e0WtzRpZJlZdO8vKdfB9YxzbApAxXKQHAN7w7HzKCLJPuFl9wrr9sAfHuarXrq7aNWuEbeemJgAAAJUIyD7Tp2Nzx/3WYHvRf79Ld3cA/bljV0oforj1OQCgqiEg+4A1PDx/eTfXx01d9pvj/nhTJ3q1b+a6LQAAgOqCgFyNvXBF99DjpyYszmJPkMtKhxcwzx0AUKUQkCFJemLcomx3AQAAwBdYBxmAa5FziRc/1Ee1a0Z/znazmoqbY9weCwCAlxhBhiSp/X57ZrsL8LlggLVOmbAuF2gVLHPN0a2SrlsKTAMiHAMAMo2AXI39uWNX6PHoW4/NYk9QVVjDavCx08jvA2e0j1unNRxH1n39q9OT6icAAKkgIPvAvX3z01JvrNG9oA4PjE1Lu8hNSx7qm+0uAACQEQRkH7jxuNYBPpbtAAAgAElEQVShx/HWi211b/wbigTtLCtPrWOARa2a0TcoYvoDACAXEZB9aOGazTH3JXpn8KFFJbbbrUG8Ti3+GSB77KZq2E27AAAgU0hGPmENAqc9MVF5hUVa/+eO0LbIu5G5DQ4vTloWVpdpRo9SLxraJ5WuoxrbuHWnJ/VYQzLhGMiiQQ3TU+8H16enXiBNWObNR0qHF4SF18MeHBezXKL1pVoX8OEPP+vsQ/cP29Zl8Oee1J1XWKSBZ3TQ1UfneVIfgBhWz5X+fXTg8RlPSd2uqtz32b2Bn9aQPGij5XGM7cF9gzZWlrluvNSiR+DxqjnSnHcCf2IdD/gMAdlnnFYGWP5wgYzoaaC2zu/WIlTfzrLyqAv27jy1rW4+qU1qnUW1ctvIWVEB2UuEYyDNRpwhLZ9YGU7fuyY8IPd+WPruOfvw+tm9ldtXTqsMxFZ22yRp384V+wnFqDoIyD6V7Miu3XG1a9ZgpBieyCss0jFt9tauclPfLftNkv2/uetfna5x89dEHVuzhqEbjmute3q3s63bDv92AY+UTpZq1q58fv7L7o/t/XDl4+DIcCQCMHIIc5AzoIbbYV/AIt6KJplWOrxAF/c4QJOXrHMMx5KiwnFQWbmp579aGrYt+DpvPqmNFg7to4VD+2ji3SdG7QeQooEbpLKdFSO9Scw1Dh6XrnnKgI8wgpwB/Y5sme0uwGf6jyrWsHM6xS332dzV6t2xeQZ6FNvudWqFgvDw8zpr+Hmd4x7jdtR39srfbcsf2Lh+1Jx8AB4IjvLu2BJ7SoTtcQ1jz0cGchAjyGnCL3Y4eXPqj3HLlA4vyHo4Treznvkm210Aqqc69bPdA8DXCMgp+OX3rbbbCceV/jFmgToP/jw0Uhjp9e9WqOuQ2CshvDx5uUpWbQo9v/A/U2zLTShZqx4PjXfsS8+HxuupCYsdy/R76Xud//y3Mfe/P2OlWt87Wu9M/ylmmRMe/cox+Hn172PB6s06ZIDz3RIlqcdD4/WvL5Z40qbXgiPHdnd1DL5PVx2Vl8kuAbnLOkXCaQQ4VplUplgEV7hgigaqCMNM9M4TmeXrzrkJOtX1AqNDBozRjl3hd/KbMaCXmuxRJ/Q8+P6dd1gLvT9zpaTwVTymDzhFT4xbrDemrtDjF3bV39+ZFTo2crWPfRvW0+9bdmrrzjINPrODrrSEqmCZcw7dX6N++Dns+OD+JcP6qk3/0SrotK+KildFlbHWc8WRLfXqlBW29UhS1wP20vxVm0Kv32llksjXI0m/b9mpO9+drfEla2L++wnW1e/IlnrNoS+SdM3RrfTyN8ttX5MfdB86Xuv+2G6774HT2+uaY1pluEcZl6mLFHx9PgUAD3h2PiUgp8Ap8JzfrYX+eUGXDPbGX/IKixzDWF5hkT7869HqesBeYdum3XeK9mlQN+xmEZE3jrDWbddO5LbI5/1emqpJi9dFBVe7gBncNvDjeRrxbWnCATNW/yLbc3usU39r1TC0ZFhfxzJzB5+mPepy6YHPEJABwBuenU/5TZkCP47G+cHNb850Vc4ajoN6PDQ+9L7+5fiDJEkntWuqLxasTaovdh9iXrv28Kjt+fvuGfa8e8tGmr5iQ+j54DM7aMS3pXGDf6ZE9uGjm4+OmtZh18+nxi/WfQX5ae0bAABVHQEZnvt0zqqkjuvTsbnGzF0det5y79296lLc6TDXuvgaPziaHWsE+NAh47Rhyw67Q9OuS4voDxsAACA5BGR47qqj8vS/b0sTPs4ajr3m1ahvsJ43v/8xajR5w5YdMadTpJubVTEAAIA7rGIBzw06s4Orcqc+MTFqW6pBtvS3Pz2tL5ZLex4Yt0w6w3Fk3f1HFevYg/dJW3sAAFQnjCAjLe48ta1tQAwG1sjpCqmIrCMyFO9Vv7ZjX5Jpw025Jy7qqttHzooqY/faY61AYVcm1nv32rU9XfURAAA4YxULVGl+uWgOSAGrWACoXhK5i2NiPDufMsUCAAAAsMi5gNxx4FhNWvyrY5kOA8fqj+27MtQjAACAHBa8Q+KqOYGf80YFtr96ZuU+szz2sZH1+EDOzEEOzse8/IiW6vfS95Ki53Xu06Cuft28XZcd3lIdB46NKgMAAIAkhG4nXvGzwznSFR9b9ttMq7Bui/U4S3JiBLn/qGJJgbA79OyOMW/t++vm7SodXqCHzulIMM4R/D0CSJuXeklvXBB7/wfXS8+mcHHszzOkZ3pIDzWXiv6efD1OXu4tDT9QmvKMN/WZpvTSqdKQJtLr5wWee6V0svRUF+mJDtJP33tXb9ALJ0nD9pW+f8H7uq1eOzvwdzrx0fS2U1XYjQo36+AQmP0xgpwTF+nlFRbp3MP21+MXdg1tW7lhq475xxdxb0ksEbIAZBUX6WVb2Fe8G6WRl0sln9iUq/hlPuN/0id/i70/FrNcGtzIXZ/OekY6tF/8csG+195Num91+DbH4xIYnXMbWBo0l+5Y6L5eSXrjfGnxuMSOCXLzGuL13aghDdzgXCZWfdb2vX7PqxrryLHdT2uZ4OP7VkvPHSH9bXb0/uRxq+lIH8z8WR/M/Dnb3QAAVHV24ViSVk6XWnS3D8eS8y/4REfFPro58MdtYNi5NbF23ISRWB8UYtm8OrGQk86RwiGNpfKy+OXM8tSDWXUPx27YzT+uvZu0ZV34th+nSAceGfi31KB5ZvoWQ86MILdsUl9f33WiYxlGkAH4ECPI2WYXcKwjXbbHVOwfc4809d/R2+O1U29PqfCn8P3PHSmtnW/fjps6rawjypK07Cvp1bNsjk+w/sjyU56VxvYP39ZriHR0jA8RQY/nS5t+ce7Hx7dIM19139egR1pLW34L32b3VX6kREekgyOksep4uIW0fbP7uuEFz86nORGQF6/9Q70e/9ox6NqFYdbQBeADBORsixcCE9kfLyAnMy3AbehOprybgOx1nyPLO5WNHAlOtC8DNwSmUbgp66b+eB+YkG1MsbA6uOkekqIvyuvVvpleuKJ76PlNJ7ZJ6+1/AQDVnGlKRozf0W5D1EkDpC+GJtd+3IBnM+qZSn3J1uvWA+sT7G9EWadwLEnH3C5NfiLxfoW1STjORTkRkCV30yTuPq2t7j6tbQZ6AwCokpIJO/0+DKxcIEmj75QKHkutD8fdlXxATtSOP6U6u2emraB1izLTjpu/y1MGpRaQCcc5KyeWeQMAIG3aWQZgDu4Vvf8gy/UvP05Jf39iSSasTUoxzCdj2Vfpqfe751OvwydLjCH7CMgAADipZwlNvYY4ly3bkd6+eG3r+sy32fOG9NT7WWF66kW1lDNTLAAASLum7b2pZ/gB0rZN3tSVKROGZGfEOVnpHg0+aUB660dWVZuAzGoVAICsmjBYmvR4tnuRuHQHTbfrR9fbM739SNTeh2S7B0ijahOQAQDImlghs0596d5fole+8MNc2KHNpF3b7Pf932SpeafwbYmujhG21FvF476PSlvWS189HH1M5LrRQBoRkAEASKfI4FiztnT/OvuyfhIZjr1escFuabjRd8Uum0z9QJIIyAAAZFJVCMeJ3kAjWft1lX6ZZb+v25XSGU+np10gDgIygLgOHzZBT1/cVYe3bpLtrgBVW1Uc1Tzn3/HLJMPt3fTcOv0J6dPbU68HEMu8AYgjr7BIazZt00X//U6HD5uQ7e4Auc8P84+tulwSv0zJJ+nvRzzdr8l2D5BDCMgAXFuzKcYFOwBy17Kv45cZeXny9adrNQi/fdBAlUJABuBa7ZqcMoCUxAttfgx1r57pvD/VPq9b5N3rjpyqkdDKGj5875E1zEEG4Kh0eIHyCou05261NWfgqdnuDlD12a3767dwNmCtNLRp5XO7Pg/eSzJN79p08x64mascawk5STr9Scksk75+RPpjTeJ9RLVBQAYQFzfaAVJgt5yZUxi0K59ptepGb4vX5yFNpPJd7ttI5nU63VTETd2f3pZYe6i2+L4UAIB0cxPqThlcWa7/qvT2x41BG6VWxzmXqbN7ZZ8f+C2Buhsm/yHA7XGDNkp193BXtm2fqrnCCNLGML38esR7vu4cAHjAiF/EE5xP4Q/JLu/m9bJwyEWenU+ZYgE4eP7rpfrHmAWh57edcohuO+XgpOr6x5gFev7rpaHnvdo30wtXdE+5j1XV1a9M05cL14aev3J1D53YtqnDEQCqPEIuqghGkFGt5RUWhR5b59lat8fidl6uF3VF1pHonOBYrzNeO5G8mIucyvthPfaFK7qrV/tmrtpwU18W51kzgozqI5WATLhGfJ6dT5mDDERwE+CWPxw/TD0xbpGruoJtXvny9zH3pxLeJi/xx21t8wqLEno/7Fjfh+tfnW5bZsnaPxLvHAAAFkyxACyswazx7nU08/5etvuNOJ9R/9yxS09NWBy2zS7kWtv7etGvKjdN1YhXuaTeT07UZ7fFuXimwuUvTnVVzk0fk3XaExPjtvX7lp3qOuTzsHaT+WBwyuMubmoQYdHQPgkfAyCDtiRwASDgAUaQgQqRX7dHhuPgdjehrcMDY0OPrzu2dcxjIre3vnd0zDrvP7196PGC1Zvj9sFNe5mycE1lf2O9h3vVrx21PZVwPn9Ib8f91rrr1OJUCGSc69UoGkqPtE5vX4AI/FYAIqQaIiND3YCC/ITae3Pqj7blrj2mVcJ9aXNf7MCdKYmGXK9CfP06NUOPjxg2wZM6AaTI7gYp8f7EqwNIAwIyYOH1CGsy9fUfVeyqnJvguaus8rqsb+45KeG+eC2Z96PnQ+PDno//+/EJ17F607aEjwGQJskG3KbtCcfIGAIy4AOt99ndVblEAmZZefiiBfs32i2hPvnF2s3bw563aVq58H/+/Z+F7Yv80FCzRqYWiACQkEEb3YfdYNmbpqS3T4AFF+kBFVxcGxfXojXJzQ2+9aSDddvIWal3wOKg/lVvekWitu4sc9y/dFjfjCxdByBJjAjDpwjIQIVBZ3RIuY5TI1ZrSHdAdLvSg19CYLrfj6CRNxzhuP/C/zASBQCIjSkWQIV9GtTNdhdccRN2r35lWgZ64l+Ht24Stc0azr9fvj6T3QEAVDGMIANpNDkDF8Z9+MPPOvvQ/cO2WW/hvEdd//w39+r9mDfktLCl9CTp/16f4UndAAD45zcnkANaNqmvFb9tCT1vkaYL40bddLTOee4bSdJtI2dFBWSruYNPS0sfkuHV+7F7ncpT15YdZapfp6Y+m7vatuytJx+spyNu2gIAgBOmWAAe+vquEzPSzqEH7hVzX6bm+bpxUrumaW+j/QOfOe7/e69DYu7zy9xsAIC/EJCBHNB2wBjb7cse7pvhnoR7+aoeGW/TKfSOiTHKDACAFQEZSKN0juZag+D2XeW2ZWp4sXadh0549Kustv+X12fonvfnZLUPAAD/IyADHsvmnN9LX/gua23Hslf92qHHpb/96Vm9k+6unM7y3bLfXB83ctpPnvUBAJCbCMiAxyJXjcgrLNKnc1a5OjaVEed5v2zSt0srg6Jf5tfOeuDUsOduX+MT4xY5lj2gcf3Q44v/6/zBYPnD/ngvAABVA6tYAGlQOrwgLNzd/OZM3fxmetspeHqSJ3VOWvyrnv9qaVjYDgq2ZRjStce01o3HtXa1fnTk+5GuqScf33yM7Xa7mSZ++QABwIe2rJceaRW+7fD/k/r8I/B4UMPKuwA+00Nat6jy+WvnSEu/qHw+qGF4PXZ3D4wsY1fOqZ5gf4JlIh+7rQchBGQgTSJDYSYls5pGIn01TenFScv04qRloW3xAmcm3o/OLWx+yQBAouo3Dg+Oiz6T3ryoMiBbrVsU/twuHAefl+8KD9fBMvt2lm60DHJ8fEt4nYMaSvt2kW6suFvr0GbR9Ty4T2Uwtgbm/xxbWbdd25HbIIkpFkBalQ4vcD1S2bNV46RGNR+7oEvUtpZN6tuUzL5E3o9ZD5zKKC8Afzikt/P+LhdLM1+132cNnzUqxiVnvx1eZu2C8Odn/iu6nmA4lqQBa6L33/9r5ePu1wR+HtBTWlVxYfIb5zv3DWEM0zSz3Qcnvu4cAHggU0uNcD4FErFqTmD01SoYKD+8SZr1hmXE1uanZD91IrKuyHKxpkTYbd99H+muJdFt3j5PathCevNCadHY8CkX8fpStXl2PmWKBQAAgJVdKLUGzLOfCwTkNfMCUx8c63IRPq3hdlBDqUZN6YH18Y8zYkwEiLXdbX/AFAsAAIAoboLk80eFT30o35V6m4M2SuVl7vpx56LY+yLt3y35flVDBGQAAAAnTtMTrIY0ib9qRKStv4c/H9rMvtwz3SsfP9TcXX+srv8i8PO5I8O3L/488bqqAaZYAAAAWBU8Fj0vOF7QtStjXVUicnvQP1ra1xWvnmSmSrjpDyRxkR4AZBsX6QGANzw7nzLFAgAAALAgIAMAAAAWBGQAAADAgoAMAAAAWBCQAQAAUHXYrcThMQIyAAAAqo4MLEvHOsgAAADIHOvob7sC6eI3peL3pPevrdhfEYDNcmlwo8Djkx+Qjr0jY10kIAMAACAzBjWsDMCDGgbCsSR99Nfw7YM2SkaN8G0EZAAAAFQLr/SVBqyx3ze0qbRre2b7I+YgAwAAIFMGbZT+kVf5WJJ6XBujbENpwNqs3AqbgAwAAIDM2bohEH7njQo873he5bzk+R9LXS/NXt8qGKZpZrsPTnzdOQDwgJGhdjifAsg+6xxk73l2PmUEGQAAAJn3aJts9yAmRpABILsYQQZQvTzbU6rbQLpugtc1e3Y+JSADQHYRkAHAG0yxAAAAANLB7+sgZ2pkBQByHedTAHCJEWQAAADAgoAMAAAAWBCQAQAAAAsCMgAAAGBBQAYAAAAsCMgAAACABQEZAAAAsCAgAwAAABYEZAAAAMCCgAwAAABYEJABAAAACwIyAAAAYEFABgAAACwIyAAAAIAFARkAAACwICADAAAAFgRkAAAAwIKADAAAAFgQkAEAAAALAjIAAABgQUAGAAAALAjIAAAAgAUBGQAAALAgIAMAAAAWBGQAAADAgoAMAAAAWBCQAQAAAAsCMgAAAGBBQAYAAAAsCMgAAACABQEZAAAAsCAgAwAAABYEZAAAAMCCgAwAAABYEJABAAAACwIyAAAAYEFABgAAACwIyAAAAIAFARkAAACwICADAAAAFgRkAAAAwIKADAAAAFgQkAEAAAALAjIAAABgQUAGAAAALAjIAAAAgAUBGQAAALAgIAMAAAAWBGQAAADAgoAMAAAAWBCQAQAAAAsCMgAAAGBBQAYAAAAsCMgAAACABQEZAAAAsCAgAwAAABYEZAAAAMCCgAwAAABYEJABAAAACwIyAAAAYEFABgAAACwIyAAAAIAFARkAAACwICADAAAAFgRkAAAAwIKADAAAAFgQkAEAAAALAjIAAABgQUAGAAAALAjIAAAAgAUBGQAAALAgIAMAAAAWBGQAAADAgoAMAAAAWBCQAQAAAAsCMgAAAGBBQAYAAAAsCMgAAACABQEZAAAAsCAgAwAAABYEZAAAAMCCgAwAAABYEJABAAAACwIyAAAAYEFABgAAACwIyAAAAIAFARm+ZRiG6eJPaZraftswjAUe1HOEYRgvGYax0DCMLYZhrDAMY4RhGAd60U8AcCNHzqdtDMP4xDCMHw3D2GYYxq+GYXxhGEYvL/oJWNXKdgcAB0dGPB8labakQZZt29PU9gBJu3tQz2WSDpb0uKQFkg6QNFDSdMMwOpumudqDNgAgnlw4nzaQtFrSSEkrJe0l6S+SxhqGcbppmqM9aAOQJBmmaWa7D4ArFaMbk03TvDzbfXHLMIx9TNP8NWLbIZIWSrrPNM1h2ekZgOqsKp5P7RiGUUfST5ImmqZ5Qbb7g9zBFAvkDMMwrjYMo9gwjO0VX729YhhG04gyqw3DeNEwjJsMw1hW8TXdNMMwjo0oF/WVoGEYDQzD+GfFcdsNw1hlGMa7hmE0idWnyHBcsW2RpE2S9k/tFQNAevjxfGrHNM0dkjZL2pXsawXsEJCREwzDuFXSy5JmSTpbga/0zpT0pWEYu0UUP02Br+XukXRpxbaxhmG0cqi/nqQvJf2fpBclFUi6VYET854J9rVrxTEliRwHAJng9/OpYRg1DMOoZRjGvoZhDJXUQtKz7l8hEB9zkFHlVXzFNlDSWNM0+1m2L5U0TlI/Sf+1HLK3pB7B+b+GYXwpaYWk/pKuj9HMNZK6STrNNM3PLdvfTaKvz0v6RdKIRI4FgHSrIufTpyX9teLxJknnm6Y52eWxgCuMICMXdJTUWNLr1o2maY6XtEbS8RHlJ1kvjjNNc4OksYq+iMXqVEkrIk7mCTEMw1DgF8thki4zTXNzsnUBQJpUhfPpI5J6KDCq/YWkdw3DODXJugBbBGTkgsYVP1fZ7Ftt2R+0xqbcGjnPCW6iwFXTqXhcgdGXfqZpfpViXQCQDr4/n5qm+aNpmtNN0/xE0rkKTAV5JNn6ADsEZOSC9RU/m9vsa27ZH9TMplwzST87tLFOKVxUZxjGg5Juk3SjaZrvJFsPAKSZ78+nVmZgKa4Zktp4UR8QREBGLpirwEn7YutGwzBOVuBE/VVE+WMNw2huKddIgQtNpji08bmkvGQWpDcM4y4FLnK50zTNFxM9HgAyyNfn00iGYdSSdLSkpanWBVhxkR6qPNM0dxiGMVjSU4ZhvKLAIvIHSnpI0nxFzKVTYPRinGEYQySVSbpXgf8LDzk084qkayW9bxjGMEnTJDWU1EfSMNM0l9sdZBjGlQp89feRpG8MwzjCsvt30zRTvrsUAHjF5+fThyXVUyB8r5G0nwIXAnaWdH7irxaIjYCMnGCa5tOGYfwh6XYFlhraJKlI0t2maW6NKD5W0kwFgut+kooVuJq61KH+bYZhnCRpsKSbFPiqcZ2kSZI2OnStT8XPsyr+RPajd9wXBwAZ5OPz6XRJt0i6XIHl4FZJ+kHSUaZpTk3wZQKOuJMeqhXDMFZL+tQ0zeuy3RcAqMo4nyKXMQcZAAAAsCAgAwAAABZMsQAAAAAsGEEGAAAALAjIAAAAgIXfl3lj/geAXGdkqB3OpwBynWfnU0aQAQAAAAsCMgAAAGBBQAYAAAAsCMgAAACABQEZAAAAsCAgAwAAABYE5Gqs04hOKe0HAADIRQRkAAAAwIKAnGMWrl+oLq920avzX7XdP2rxKJ314Vkxj7/6s6v11Myn0tU9AAAA3/P7nfSQgE4jOqlDkw6afcVs9Xijhx6d9qiKrywO2y9JxVcW206f6DSik27ofINuOfQWplcAAIBqixHkHFJ8ZbHePv1tSdK0y6bFLGP9GRQMxLcceovtfgAAgOqCEeQc89r81/TItEeitp8+6vS4xxKKAQAACMg5pdOITvqh3w/q175f6HnQuq3rstUtIC7r9J9MtxmUbNte1QMA8A+mWOSINVvWSJJq1bD/zPPdpd/FrePIN4/0tE+AX1kDefBPslI9HgDgPwTkHNGsfrOw57Eusnvmh2ckSYe9dljY9uIri/XHzj/iHg8AAJDrCMg5JLg6RacRnVR8ZbF6tewVtf8/c/6jTiM6aWa/mXGPBwAAqI4M0zSz3Qcnvu4cUNVEfviJnPs7cuFIDf1uqG2ZoLcK3lLHvTva1htZduTpI9W+SfuwbY/PeFyvzH3Ftn+xVldxKmNt/5i3j9HG7RtD28886Ew9dMxDMeuKVW+nEZ009bKpql+rvm1/YvUh1r44jEQPSBLnUwC5zrPzKRfpAQgZ+t3QsOedRnRSvVr1QssGrt+2XsePPF4TLpigpvWbRpWdeNFENarXKPT8ok8vigqMr8x9RR337qi3Ct4KlbNjFzidvuH4fMXn2rh9Y8yA6vTBAAAAK6ZYANVUpxGdNKLPiKjtwdAYDJHWNbUb12ssSTr53ZNt6wyGY2s9kW1KCoXjWOVi7XMqe8dXdxB4AQCeICAD1cylRZeGHh/W9DCHklINI/oU4WaE1mp72fYEehcQbzrEdZ9fF7Wt/+H9E24HAAA7TLEAqpEDGhyg4nXRQbbziM6ac+WcqO1397jb8z7UNGq6LhsrKP/yxy9R285qc1bSfQIAwIqADFQjo88dbXtji1hB9OW5L+uy/Ms87UOZWea6LFMmAADZwBQLoBp64NsHorbZheS1W9ZGbes8onNa+mTlh2D89oK3s90FAECWEJCBamjU4lG2IdS6Lfh45trwNbNNmWrRoEVS7c64fIYkaWf5ztC2L3/6MmZ5u9C+9PelSbWdqCdmPBG3LwCA3MQUCwAxzbh8hrq93i1q+5hzxyRVX52adSRF38nxmP2P0eSfJ4dts964JlK6R5jt2j6+xfH6euXXYeU+X/G57vjqjrBt1mP8MBIOAEgcNwoBgOziRiEA4A3PzqdMsQAAAAAsCMgAAACABQEZAAAAsCAgAwAAABYEZAAAAMCCgAwAAABYEJABAAAACwIyAAAAYEFABgAAACwIyAAAAIAFARkAAACwqJXtDiB3dBrRKex58ZXFoe1Djx6qs9qcFVY2uN/63FqH9Xjrvro162r65dND5a7+7GpNX1P5fOplU1W/Vv3Q853lO3XYa4fZ9k2Sur/eXdvLtoeeH9jgQBWdW5TAKwcAALmEgAxPRAbeNVvWpFxHrH2dRnTSxu0b1bBuQ0nSvnvsq+Le0WE76LDXDgt7vnnH5rC6t5dtj9kuACB91r/2mhr365ftbgBRmGIBz2zdtTX0uFn9Zgkf7xRSrfumXjZVx7x9TOj5sGOGJdROgzoNEu4bAMB7ax5K7PwNZAojyPBErOkR6WCdPiFJLxa/qKdmPuWqby33bKlPz/k0bP/0y6eH9g85aojOOfgcj3sMAP5V0i4/7HmLZ59Rg5NPDu1rcMrJavHMM6Hn+QtKYh570JjRqtOqVcz9+w4dqr3OP09bZ89W6UUXR5Wx1g1kEwEZnrFOgXCaLuG1p2Y+FTWfOZG+1a1ZN2z/A98+wAESHJ8AACAASURBVJQLANVKZOgNPs9fUBIKsJFhN/K4yGO3lSywLSNJu3XpEqqbUAw/YooFPGcXLgd8M8Cz+q0B2HpxnRvxgi/BGEB1Yg2/wT+RrCE5kTC7/Jxz1G4u51RUTYwgwxN2o7ZBwSkOTmUSrT8YZOvWrBu3/ch9N3a+MfT41i9u1Zc/fWlbNwBUB7sdeqjy3nozqWNL2uWrTqtWOmjMaNv9Rq3MxIxkpmm4OSbyA4PTiHnQztWrteSEE2PWy5SSqsEwTTPbfXDi684hMzI5XQPIAiND7XA+RZSdK1dqySm9HINaMASWtMtXq1GjVC+/XWi7DEP5JfOjyob2yzkEejHFIpU6zLIyLejQ0TEgJ1t3vGOzMb2kGkxp8ex8yhQLAACqqdotWkiSSi+8KLTNOsK58NDKNeTzF5Ro+TmVFzHvO3SoZBlkizXa+ttLL4W2LTnp5Kg+lLTvkGz3gbRhigUAANVY/oIS/XL3PbYjvuVbt4Y9t15Yt9f552nb/HkqaZcvo1atsLnKYeXz22vto/+MqttaXzJznEs6dLR9nD9vbmCbzYobrqdf2NQdrFeSts6erVX979P2pUs9G5H99ZlntGefPvpz0mStGT5cktTooou0YeTIqFH5ZoWFWjN8uGrUq6e2s34I21cnL08qL9eOH3+sPC7Oe4VoTLEAgOxiigWQgljB101ATnWKhdP+RI/99ZlntO6ZZ6NW94j12M2+GnvsobbTp7nuUw7w7HzKCDIAAIAPNL37rpj7nJbbs9Pg5JO1ecIET/pVHRGQAQAAfKBG/fqO++secohaf/xRhnpTvXGRHgAAgM/lLyjR9kWLst2NaoOADAAActLCbt0luZ+WUBUEX8um0WOSel3BY8xt2zztV67hIj0AyC4u0gPS5NcnntTGTz9Vmwnjs90VTy08rJua3HiD9r7xxviFI2wYOVJr//mY2n4/VTIydfrJGM9eEAEZALKLgAwA3uBGIQAAAEA6EJABAAAAC5Z5AwAAyKJ4F9vl+M09fIkRZAAAYnC7SkAurZIAgIAMAEDKGOEDcgtTLAAAiCM4QhwZhK0jx5H7dq5erfI//1Tdgw6Ke3yrDz5QvfaMQldXdh+w+FYiuwjIcK3TiE6hx8VXFmexJwCQOSXt8kMBxvpYUth2O8sKTg8r51RvrHIAMo8pFgAAOIgaGV65MqXjJakkv73qH364YxkA2cMIMgAACSj7/XfVbtEitUpMU1umTuVrdMCnCMgAAGRBu9mzZNStm+1uALDBFAsAADLskCnfakGXrtnuRsJ+vv3vKmnfQb/cc4/nda+85VaVtO+g1YOHeFrvzlWrtOTkU7ToiCO1deZMT+te0a+fFnTuorWPPeZpvelUvm2blp7WW4t69tSGN97Mdnd8yzBNM9t9cOLrzlU3XKQHpIWRoXY4nyYh8qK8knb5avXeu6rXsWPoeaRg+Z2rV2vJCSfGnF/sdKyfJDsNxM1rcVN3Mu/Jn999px+vutqxTKsPR6leu3YJ153O9yNWO178u4jb7xo1lD9/XsrtZJln51MCMlwjIANpQUCGb6UyR9op1P328sta+8ijntQVKdE+u627bNMmLep5ePyCKbYT5FVATtf74VOenU+ZgwwAAKJEBis3a/W6CVfb5s+PCsd5b7+l3bpWTjkpveRSbf3hh7B2kh2Rdlq7OpG6I8NxvPejdosWajN+XNx60ymd70euIyBbuB0htZa7qsNVuqP7HSnVJ0mHvXaYdpbvjNqezEhtvHat+1NtK5G+pKsNAIC3Vlx2edjzWIEpf0FJwiOUy889L27deW8F5sZa644X3Jafd37ceq3brXUv7Vugg0YXxazb7QcB6/uR6HKA6ZbI+0FI5iK9MNbw9vev/u7qmP/N+19KbXYa0UmdRnSyDcfW/V6IV5dX7cSqj3AMAFXDlhkzQo+bJXBBXrywnOiIc+T+LVOnxiy7bV7l/NkGp57qWK8kHfzN5NDjHcuWxS0fq09Oti9d6rqs16zv9YEvvxy3fJMbrk9nd6ocAnIM41bYfy0y4JsBnrWRSCDtNKKT3lrwVsJtLP19acJteYFwDAC5ofHVVznub3TxxUnV6zpo1qwZerjiSvu+RI7Wtnj6qbjV1mrSJOx5OlaiCN5JMdt2P+rIuGWa/t3dwGB1wRSLBH205CNX5X7+4+fQYzfTHN45/R3lN4n+5G0tN2zqMF3S7hK3XZUknf3R2WHPYwXVTiM6eRZiCcdA1bWgQ0e1K54j1YgeP1l8/AnarUMHtXju2ah9f3w9UT//7W9qO+uHqH2StO7Z5/T7qFGOczKX9DpVe517jvb+y1+SfwHIuLrt2qa1/vx5c+OOTC85pVfK7fz2wotqeof9lMmqxosb0FT3aRaMICcpXujr/X7vmPsen/F4VF124diunVRGgp36TDgGqreSdvkqaZevdvPmqqR9B9sLdw7++is1HzLYdt/uRx2ptrN+UEm7fC076+yo/XtdfJHajB+nknb5Ktu4MWxf8GebcZ9r/SuvpOkVIl1WDxrsqtzS3n3S3JOAPU480XXZ/JL5aeyJVHPPBmmtH+lDQPZAoqH1lbmVvwD2qL1H3PKRIXPwFHcnI6c60oFwDFRtwdEiu1Gj4LZae+8tKXCzASujdu1Que0LF4a2l7TLV9tp34e+zs5fUKJFhx8Rdqx1pOqQ77/34qXAQ4mMRrabMzvmvh2lpR70Rlr/2muO+w94/jn3lRmJrwqWyPvhl3/PwQ/A8f6gEgE5glOoKzPLPG9vyqVTXJWzBun3Fr2XUBuEYwBONo0dK0lxf1Fa95X/8Yfr+hf26OlYd9226f2KHomL/JC0qch+hYfIv0+jTp209Slow1tvp72NSK3ee9dVOUJm7mAOsoPIebldXw2/LejZbc7Wh0s+dKyjhhH+GaT7692T6suUS6dk/EI7twjHQNVWo359SbEvmgr+0rdbDiq43emmBvHmMdaoVzexDiMj6rZpo+1LlkiSfr7jTv18x52O5TM1X7Vmg/jfvHoteOfEoHTdARD+QUBOwYNHPxg3IM++Ivzrpu1l29PZpYwjHANV3x7HHhu3jNMv+3gX8yzo1Dlw4R+qlNaffqKts+eo9KKL4pbNZBhsetfdjvvNsjIZlpUvvJK/oERL+/TVjuXLXZX1i0aXXKLmAx/IdjeqHKZYJMEuBBYtq/z6ya8jvV4jHAO5paRdvpb26Ws7FaKkXb7Kt2xRSbt81WzYMGxf7X33jTmXMX9BicydO1XSLl/Lzz6HuY5VzG5dOsfc1/Tvtyt/QYnrMLi/R8uo1e/ezXH/go7ufwdvnjAhobYPGjM65r4D/v3vhN6PTNnwVuJLxIIRZM8UTipUQeuCbHcjY+w+BHi5VByAzHL6pW7dF1mubNMm7Vy1Kmx7Sbt8LSs4Xa2LPk2obviP09SZRO1Z0Fc/Z2IZNdN0XXTlX29OqOrg+7H3jTdqn9tvS+hYVC2MINtI5PbM1V1NI/xrLN4noHpZ1PPwqODUoNcp2rV+fZZ6BK94GY7trLzZXTjN1HzfdrNnue6H38MxHzxTR0COY/Ty2F+nSNLkiyfH3HdggwO97o6vFF9ZrFlXzPJ0rWYAVUvwAj3rn83jxuuQKd9mu2vwoaZ33xV6vHl8YtMbJPfBz02ojlqBo27uXizKtKbEEZDjuGdi+D3ozzzozLDnDeuGz8WzKjo3elmc7s2SW8XCbyJDMSEZqL6C8y6tf5BbStrla0GHjipL8ZuBJtdcE1VvvHbdivx353TsTzfeGPY80Q90wRviRK4H7ieJvB9WS0/rTaAWc5AT9tAxD8Xc99cJf9WzJ0ffgtXqld6vJBUeu73ufFGCH8zsN1OHvXZY6Pl1n1+nF099MYs9gh88Ou1RvTr/1bBt1g9Uwf8PwW1PznxSLxW/FFXm8vzLdU/Pe6KOs6vTyu7/m13Zrbu2qucbPUPPa9eorZn9ZkbVVXxlcVifrfXHm57VoUkHvX165tdwBZIRuXyfWVamRUcd7XjMHscfpwP+85+E6g0+bnjWmdr7ppv0y513aWtx9P8lNx+8mt8/QKsfHBpVd4tnnlGN3erpx2uvsz2uZqNGceuO7Pf2hQu1sOuhjsccNPYz1WnZMm7dKi/Xby+9rN9e+K/KNm2O2h1st/a+zdX42mvV+PLL49dp02eCr3uMIHto4sqJ6jwi9hW/dtyW31G2I/T4kEaHJNRGptSuUVuN6zUOPZ+6amoWewO/eHX+q+rWrJuKrywOBUinD4kvFb8kKXp6k104jldnZLk3C960bfPQ1w5Vzzd66tCmh6r4ymKNPH2kdpbvjHsx6snvnqz/b+/Ow5wqDz2O/zILi8iwFxzKIqAzAQKKuI1cKyhCtYIoiqg4CFR6rbe2tnVFe+XSSm/Fx4pX60aND4psgi1WAUVBBEHKFjADIgjIOiKLbOMwk/vHmHCSnKxzkkyS7+d5eJxzzrtlHDI/3rznPW0atTENxqXvlsrhdGhQ50Fylbo067pZ2nhgI5+uIKMdXbwk7jXDh9/+h74cMDDucCxJzW67TZ0XzA86//U995iGY1t+fkI/8Yg0G+tbmtS1m/ZPmmQajo0q9+zVvgl/jOnpd3yiEx8CcgjGX3jjl4+Pup5Hke+enXDZ6X/dRlN+6D+H+h3PHjQ76vEk2+Jhi/2OCQNwlbr06sBX/Y4Drwcad8m4oOVNXoEzzsavI80WO1qa77RyqvqU+rbrq9d+WjPT3bVFV1+5Y5XHQra3//h+LRy60He8YPsC39er96/W5H6TfZ86FTUvYpcXpI1kbMcXTXDLLWgcc8Cr1759VHXsZe6o9+dO9+0JY/kentm3L6FaLLGIyszN4R8xGfgxqyQVnlkYsvzgLoM17pNxvuNw26MFthu4BrouCvx+sP0bzOz4bkfIG1mHFQ3ThE8nmF6TpN/2jn6rqEg/f3uP7ZUkPdPvGdPrl7xxiWn9/Jx8VVZX+p2bsWmGru5wte/4inZXmLb5hvsN3Wq/NdLQgZQwe1JirPUiPTwm1rbjYVXbxtdVcO01Ue/nHM33I5lBlNAbGwJyDGIJefNvDP6IJ7CtwBAZjXBroOsSQjKMavNJwjVvXRN0btKqSZq0KvIvKe/Pobf/xy59TDede5Nfmf6z+sc1rrycvKCAXFVdJUnqN6OfpNCve6p7KgEZaSGWUBW43jUT7H/qKb/jWB52Ynd/Lre9q9VDQpIQkFPIbOY5lP4d+uupK56KXLAOISTD+/+/qHmRZl03K+i8UfmJcvWb0U8tGrbwO7/zu50aXjzc71wsP0fG5Rfjl4/X+OXjE/5zmGPL8esbSBf7Jk5M9RDqlAMvvuT7uvnI0tgq22wWjwbJxBrkFDPeZGSm8MxCuUpdaReOveYMnuN3vPXw1hSNBKlkDMeheGddP7r5I9+5Ee+OkCQ9fPHDfmXjmZEOdUNfIkLs+ze9b3mbQDJ8+6oz1UOosxpfFdunTaF2zEB6YAY5jHhmqZLRVzLbqm27XZp2YRYNfn63+HdB58w+TYnlExYrhPqEozY/v3xqgnTTfsor2jFqdFx1jcsrcpuEfkZAutp+++0xLTk59sknvq+LN/A+kG4IyAASzuF0qE/bPlq6K/STJ6MVuK448Fpgv6HaiLbNZIwTqCsalZT4Hbu7dZd944aI9QLXHp+74tO4+t/z8CM69NZbfudSeXNZ8QaXyrqf/jv8zf89p5a/vDtivaCn9OURt9IN/8cAJIyr1KW9x/aq/6z+WrprqVaPWK38nPxaB9Fol0oEzkIP7jLYb5vFwLKXTbtMR74/ErK9eMb5wroX9OzaZyVJHZt01D+v/2et2wWSpqrKL+zlNGgQ8elxtQm0h956Sz+67zdqcdddcbdhpcBgWz55ssonTz59vUEDeRL4/UDq2DyeyPvwplCdHhwAWCBZd/Lwfoqobb60RFUHD8Zcr7ZhMNrt4ZIt3t056uJryXCWvZ9ykx4AAPBz7vJlspe5g5ZcmGnYs6fsZe5ah8Hq48drVT+RvK/PVq9exLItx4615PuB1GIGGQBSixlkZLVws7PGkOkutqvwiT+pyZAhQfVzGjdW0WcrfcfnrlyhzRddHLY9ZCTL3k9ZgwwAAFLGG1qrjx/Xpl4XWBJiN190sTrOnKGGjtP3IGTaQ0yQWCyxAAAAGccYjr2+GjYsBSNBOiIgAwCArFCxdVuqh4A0QUAGAAAADAjIALKS9yEeZnsyJ/MJfgCAuoeADCAruUpdyrHxFgikk90PPZzqISBL8NsBAACkjYovvpAkHf9sFTtTIGEIyAAyypUzr5T7QM02UYFLKGJZOuFwOnTJG5eEXIYBILm8279tvW6Q3MV2bR8xgn2NkTDsgwwgo3xw0wdyOB1ylbpq1Y6xvjck17ZNAKHlnHFGxMBrdj3wXKg2CNOIBTPIADLamhFrtPvobklSri03rjYIxgCQXZhBBpDR8nLyNGD2AEkEXQBAdJhBBpCRth3eZlkgXrB9gSXtAADSAwEZaSvam6e4wSr7dG3RVYPmDqpVG8afm99+9FtmnwEgi7DEAkDGmf6z6X4BN8eWo2pPte848B9N3mNjCHaVunznFw5dmMjhAgDqGJvH40n1GMKp04NDapmFGiAN2ZLUD++nADKdZe+nLLFAwoTaf3bA7AEqP1Hudy3SI3/DlfHaeGCj3/Wyb8vC1unh7OHX7pQNU/yur92/1u86++ECAJAdWGKBpBjVfZSeW/uc7j7vbu0+ulutGraSpKC9Zc32mo1m/9mNBzbqlnm3+JUrbl7s9zF5II88QX2P6j7Kdzzi3RG+6z1f6ylJWnfHumheLgAASGPMICPhHl/+uH5zwW/0/Lrn/c47nA71bdc3Yv1w4ficZudIUlA4jtf68vWm59fdsc5vDSsAAMhcBGQk3KzNs0Jee6bfM0Hn5m6ZG3XbpV1LLX3C2ZHvj5iedzgdyrHx1wUAgGzAb3wkzIMXPRjXrOulhZdGXXbcJ+PCLqOoDW+7DqdD+Tn5LK8AACBLsAYZCXOb/Ta/4Nr6jNaSpHdueEeSdPFZF5vO/nrLxWLxsMWWziRL0a19BgAAmYcZZCTcjOtmSJLev+l9OZwOtW/cXpL08tUvS6oJonctvEsOp0P1c+vH1UfzBs2Vl5PnF8idG526fPrlvj5K3y3Vhzs/jKld4+4V3hv1AABAZmMfZMCEw+lQ56adNXfwXL9zzCgjAdgHGQCswT7IQKIZw7HXl4e+TMFIAGSDXffdJ3exXe5ie8Sy0ZYDEB/WIAMhOJwOPXrJozpx6oSeXPWkJKlz084pHhUAAEg0AjJgInApRWm30hSNBADMFQwcmOohJJS72C57mTvVw0CWIiADAJBmMj04snwEqcYaZAAAAMCAGWQAAJLom+eeV/kzp58iamvQQMVr16h+5y5h65nNqkYzk+yt5y1b5ughT2Vl1O1s6XelKnfvjrlfSTr60Ufa+Yv/DDrf5tFxanbbbUHnv3W+pn1PPBE09mj7/vbvr2rfn/8c01gDvz+7H3pYh+fMiakNZB4CMgAASWIW+DwnT/rW25ZPnpz0/uMp7y62q8mgQSr83z+bXo/U197/mRAUkGu7rCLcWKXY/jEBsA8yAKQW+yBnia+G3aIT62oeWR8Y1gKDWayzntGUDdl2VZWUmxuyXuf576lehw4x9W/s06zMpt4XqmjVZxHHHO3Mbbj+vNdymzXTucuXha0bS5+ok9gHGQCAdBIqHIc6lwim/ZiE4x0j7/R9HRiOJantpJqtL81mXCOFY0lhw3FthPveVh08GFd9ZCcCMgAAWSCW8Hfs00/D1im49lpL+6sNbyBv84fHIpat3LMn5LXmI0ZYNiakP9YgAwAAU+m0JrfZ8OEhr+U2baqqQ4e0pW+/kMG99SMPJ2poSEPMIAMAkGCnyssjljnzip8kYSTZqdV/3ZPqISDNMIMMAECC5bVsGbGM52RFEkYSm0xZk3ty0+ZUDwFphhlkAAASzRb55nrvul9Y79CMGZIkWx7zgogOARkAAJja/cADcdc17oSRSPk//rGk6NZLF29wJXo4yBAEZAAAksht75rqIUTkXVpx+O1/xFzXuxtEsmbEu7y/MOz177dtS8o4kFkIyAAAJIFvPa/H4zfb6S62y11sV5tHH42qnWOffOL7+vsdOywdo5F33bR3fBVbvpQk7Xn0Ud+5yl27guoZd4Pwlju6ZIkkad+EP/rORcNdbJeqq33H3y1837Rc0do1vvKbel/oV//Ln14jKXPWUyM5eJIeAKQWT9LLMmbh8JylHyuvZcuQT5CLNlCahcBYn0pnVP3dd9p04UUhr3dZ9IHyCwtNr+0YNVrHlgU/uc4r2qcFxlKvNnUI0BnBsvdTAjIApBYBGQCswaOmAQAAgEQgIAMAAAAGBGQAAADAgIAMAAAAGBCQAQAAAAMCMgAAAGBAQAYAAAAMCMgAAABxiPQAl30TJ0b9kJdEjgOxIyADAAAkQOsHH0z1EBAnAjIAAMhIxpnVUF97j81mYQPrRJqpjXUmN1ybO0aPCXs9mvEgfnmpHgAAAECiNejWzfS8u9gue5nb9/U5nyxVXosWvuuekye1Y/QYX5lQjO1EI7DfvFatdM7HS3zXz/rjBOW3aWPadmBdWI8ZZAAAkNEOzZ6ts2fPMp05NgZPe5lbX1zWx69M2Xnnq8PrU8O2H0847jz/Pb9+T5WX+5XxhuNAJz/37yeWfhE9ZpABACF5AwW/hOO3Y8zPdWzpUt8x38vk2/PIODW98ca46jbs0SPsdXexXR3feD3mdut16BD2elnP8+SpqAg6v+2GG6Tc3Jj7Q2wIyAAAJFD7l1/yfc3H4cllL3PrwIsvBZ1vdsst0Tdii9xHrDPIkbiL7bJ/vlHKyfEd+6mqsqwvmGOJBQAAyFj7n3rK7/jru3+pNv/9B9/xwWnTat2HNyRHK7+wMHL5HPOIxicQycEMMgBkIXexXUWfrVRO48a+X9TNb79drcc9ErpO125SdbWk8L+kfe3dOVKtH3jA9Jq9zK1tQ2/SyQ0bom6vy8IFym/XLuiaMZwYb1zq8tGHpus4vWXrnX22Or/7L79r+ydN0oGXXpa9zK3yp/+qb/72t7Djq9yzV1v69vUdE17qpk7z5kkKDrLe472Pj/c7F4/AmWSzXTPOWfqx8lq2VJdFHwTtQhHYr9U7ZiA2No/Hk+oxhFOnBwcAFojwAa5l/N5Pw/1yDbxb3sjWoIE8J08GlQssa8vNleeHj4HDtZfXooVOHTgQsb1I48tr1Uqnystly8+Xp7LStOxXNw/TifXrJUm5BY1VdeS7oDLegOyVf1YbVe7ZK0mq37mzOr0zz3x8Npv0w+/TSGGfEA0kjGXvp8wgA0AWMwulx5YtV6OSSyOWM4a94ytXRlUu2vb2PPZY1O11/tc7qtepk774j8t1qrzcbwav6sgR5RYUSJJOrF8f9fiaDBmiwif+5Feu4ssvw76GcO0BSC+sQQYA+NkxalTMdbbfUWrpGA7NmGl6PjC4S1K9Tp0kSV0WfRB07dspf5cklXXrbtpe64cfMj1vDMeSam6YApA1mEEGAFjGynWRuc2bB51rP2VKyD5s+flB5zxVp374b1XtxhfihinWgQKZiYAMALCMlUsLqg4etKwtL6u34gpsk8AMZAaWWKQRh9OhY5XHUj0MAEgOk5vIyxzhH9oQSn5hYW1H42dzyWWS2LUCyFQE5DTiKnWpUX6jVA8DQIaLJ/R561SfOJHQMRh3qIiFd33yV8OGxT0mo/w2rS1pB0DdxBKLJOk9tbcqqiqUn5Ov1SNW+847nA65Sl16d9u7un/J/ZJqgrCRw+nwfT3hsgka3GWw3/UPd36oXy36VVBdh9OhtXesVa7N/5GU3j4BwMolAbbcXG06v1fQ+YY9e6jj9OlxtWk2vtruU2tFm2e/9VbItgIFlgm39y2AuoF9kJPA4XRo1nWzVNS8SM6NTj256klfQPWG3xf6v6CSwhLfsVmAdTgdQQE5sHxg+DU7btOojRYOXWjxqwQQp5Tug2wvc+vE6tXa8fO7VPTvVZZ0dGL9em0ffqua3TEi6EEh8dg5dqy+3/m1Ov/rHQtGVzMLvfmSS3Vmnz5q+9ena9XWgZdf0YEXX9S5K1dYMjYAtWLZ+ylLLBJs/PKap/MUNS+SJJV2M98KqaSwRJJ5MI7EWKdP2z5+M85mCMcAjBr26mVZOJakhj16qHjjBkvCsSS1e+EFy8KxVLPbRdG/V9U6HEtSizGjCcdABiIgJ9jMzTV7eTqcDt8fSfr94t/7ytx/4f1xtV1RVRF07vmrnvc77lDQIWJgBgAAwGmsQU6CMY4xurfXvSGvF9QriKvd+rn1I5aZN2SeLyCz9hgAACAyZpAT7Jl+z+hl18tJ66+HM74tkABkF3uZmxvEACAEAnKC9W3XV1LN7O3mg5s11T3V8iUP3vaGvzNcHnmCZoldpS6WWQAAAESJJRZJ4A2s/Wb0U/eW3f0CrNmSh1iWQXjLlkwr0ZM/edJ3s1+4sgAAAAiNgJxEi25eVOs22hW0Mz2/bPiyWrcNAAAAlliknV4/Ct6EPxKH06Elw5YkYDQAAACZhxnkOs64djjWJRLeuu/d+J6aNWhm6bgAAAAyFU/SA4DUSsmT9AAgA/EkPQAAACARCMgAAACAAQEZAAAAMCAgAwAAAAYEZAAAAMCAgAwAAAAYEJABAAAAAwIyAAAAYEBABgAAAAwIyAAAAIABARkAAAAwICADAAAABgRkAAAAwICAnGIXTL1ADqcj1cPwqUtjAQAASAUCMgAAAGBAQAYAAAAM8lI9ANQwLm1wlbqCrrlKXaZl+s/qr73H9vqVN9Z3OB1q1bCVyk+Um15fuXelRs8fbc2LAAAAyAAE5DoiMNSGCsmBFg5dGFQu0M1FN+sXPX/hu75oxyL1a99PkjR6/uigvgEAALIZSyzShFk4YyxjjwAABiJJREFUDmXf8X1+x95wLEl5OXm698N7JUnzts6zZnAAAAAZhBnkNLf18FYNnjs46vL5Ofk6VX1KkvTQxw8lalgAACAB3MV2v2N7mTtFI8lszCCnucFzB8tV6vL9icWd3e9M0KgAAEAi2Mvcspe5dfacOakeSkYjIGex+y64L9VDAAAAqHNYYlEHrBmxxu/muDUj1sRU31j36g5Xx1Q3x5bDjXkAAMTh+IoV2l460nccarlDNMsi3MV2dfnoQ+W3aaMyRw95KislSbkFBTp35YqYxuUutqv9lClqVHKp6bW2kyap4NprYmoz29g8Hk+qxxBOnR4cAFjAlqR+eD8FLOQNvT/6/e/UYuRIubt1lxQcfr3luiz6QNXffaetg68PWc5Wv748FRWSpNYPPqh9EydKOTmyf74xqP+T7jJtGzIkZNg260NVVXJ3657J65Ytez9lBhkAACAOxqBpL3PLXWyXu9gedD6wTuCMspenosKvfPORpXGPy6wPb4hHZKxBBgAAiEGogBuLyl27LBhJeGbjzODZY0sxg4y05F03HevOHVbVryt9AABSJ9qgbEWgjlXbvz6tXff+2nd8dPGSpI8hnRGQgSiEepIhACB7RZqN9Zw8qbLzzldOgwYqWnv6BvxkBOaCAQNknKPeOXZswvvMJARkZKVkhF0CNQBkt7Lzzpckv3CcbMY10SyviB5rkIEISqaVpHoIAIA6xBs0q0+ciLnu7vsfsHo4IXnHuffx8UnrM1Mwg4yEeO3z1/SXz/4SdN5sVtW7fMG4H7Or1KUhbw/RlkNbJNXs7zzpikmmfQXu4xxq5tZsv+dws7wDZw/UrqOnP6AKrD+w40D95Sf+rzHWPkLVO6vRWVowdEHIcqG+j6GuVVRVqPfU3kHnOxR00Lwh8yKODwDgr3DiRG06v1fQ+YY9e6jj9OmS/He2sEpgW8bjUDPEB6dNU8dpb1g2hmzADDISwhuOm9ZvqtmDZvvOh3ooyVUzr1Kftn18xyv2rNCWQ1s087qZkqQF24PDorG91695PWIfrlKXxvaMfg2WMRxHy1Xq0rCiYVGX/+bEN77xXv7jy/Vi/xclSXuO7bHsAS5zt8z1heMxjjFy/tTpu7b9yHZL+gCAbNPk+sGyl7nVccZ02XJz1fzOkTXHP4RjL3uZW2f0vkD1Onb0PSbaXuZWftu2QeWiWQJhbCPwj5li13pJUsPzz4/zlWYnHhSCpDKb5Qw8F3i8bPcyjV04NmydSH3EUyaeslaN43DFYfV5s48KzyzU/BvnR9VmqGvsplHn8aAQAAkT8qEhmcmy91NmkFEnvH392yGvlRRm5hrgV1yvSDIPrk3qN5Ek7T66O6ljAgBkniwJx5ZiDTLqhE5NOqV6CEn39Oqnk9KPd303M8kAkB381inn5qZuIGmMgIyEsGr9LKxhvAnS+9/1petlS9qn+wCAZGHGuPYIyLCcN4B1a9FNb/7sTdNrSD7vzPE9H9yjxV8vVg9nD7/zAACgBmuQkTCB4Rh1w7NXPksoBgAgDAIykqbaU53qIdQpgbt2JJujJbP5AACYISAjKao91er5Wk/L2w0Ml97jT2/91LI+ft3r16Z9WenyNy/3Ow51Q12OLcd0LOHGFnJf6G+YRQYAwAz7ICMhQj1RLtQ+yJH2ODYrs+6OdaahO8eWo3V3rItqTEY3nHODHi95POrXM2fwHHVp2sV3vPvobg2YPSBsH+H2L46mbKTygd+ncOXD9YGkYh9kALCGZe+nBGQASC0CMgBYgweFAAAAAIlAQAYAAAAMCMgAAACAAQEZAAAAMCAgAwAAAAYEZAAAAMCAgAwAAAAYEJABAAAAAwIyAAAAYEBABgAAAAwIyAAAAIBBXqoHEIFlz9QGgCzH+ykARIkZZAAAAMCAgAwAAAAYEJABAAAAAwIyAAAAYEBABgAAAAwIyAAAAIABARkAAAAwICADAAAABgRkAAAAwICADAAAABgQkAEAAAADAjIAAABgQEAGAAAADAjIAAAAgAEBGQAAADAgIAMAAAAGBGQAAADAgIAMAAAAGBCQAQAAAAMCMgAAAGBAQAYAAAAMCMgAAACAAQEZAAAAMPh/R7CV548EkyUAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Wordcloud of Top N words in each topic\n", "from matplotlib import pyplot as plt\n", "from wordcloud import WordCloud, STOPWORDS\n", "import matplotlib.colors as mcolors\n", "cols = [color for name, color in mcolors.TABLEAU_COLORS.items()]\n", "cloud = WordCloud(stopwords=stop_words,\n", " background_color='white',\n", " width=2500,\n", " height=1800,\n", " max_words=10,\n", " colormap='tab10',\n", " color_func=lambda *args, **kwargs: cols[i],\n", " prefer_horizontal=1.0)\n", "topics = optimal_model.show_topics(formatted=False)\n", "fig, axes = plt.subplots(2, 2, figsize=(10,10), sharex=True, sharey=True)\n", "for i, ax in enumerate(axes.flatten()):\n", " fig.add_subplot(ax)\n", " topic_words = dict(topics[i][1])\n", " cloud.generate_from_frequencies(topic_words, max_font_size=300)\n", " plt.gca().imshow(cloud)\n", " plt.gca().set_title('Topic ' + str(i), fontdict=dict(size=16))\n", " plt.gca().axis('off')\n", "plt.subplots_adjust(wspace=0, hspace=0)\n", "plt.axis('off')\n", "plt.margins(x=0, y=0)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Here we also visualized the first 4 topics in our document along with the top 10 keywords. Each keyword's corresponding weights are shown by the size of the text.\n", "\n", "Based on the visualization, we see the following topics:\n", "- Topic 0: Employer Quality\n", "- Topic 1: Management Quality\n", "- Topic 2: Employer Perception\n", "- Topic 3: Employee Happiness" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 11 Analysis" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Now that our **Optimal Model** is constructed, we will apply the model and determine the following:\n", "- Determine the dominant topics for each document\n", "- Determine the most relevant document for each of the 20 dominant topics\n", "- Determine the distribution of documents contributed to each of the 20 dominant topics" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "## 11.1 Finding topics for each document" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Document_NoDominant_TopicTopic_Perc_ContribKeywordsDocument
006.00.0577company, good, product, team, ad, depend, effe...Best Company to work for
114.00.0798great, program, perk, time, staff, analytical,...Moving at the speed of light burn out is inevi...
2210.00.0731manager, account, project, tech, awesome, star...Great balance between bigcompany security and ...
3313.00.0648work, intern, analyst, internship, love, engin...The best place Ive worked and also the most de...
4411.00.0833software, dream, developer, data, meh, rough, ...Unique one of a kind dream job
5513.00.0723work, intern, analyst, internship, love, engin...NICE working in GOOGLE as an INTERN
6615.00.0630engineer, lead, experience, bad, stuff, teamma...Software engineer
770.00.0628company, people, place, grad, challenge, compe...great place to work and progress
8814.00.0660work, ambitious, wonderful, awesome, add, wlb,...Google Surpasses Realistic Expectations
9915.00.0588engineer, lead, experience, bad, stuff, teamma...Execellent for engineers
\n", "
" ], "text/plain": [ " Document_No Dominant_Topic Topic_Perc_Contrib \\\n", "0 0 6.0 0.0577 \n", "1 1 4.0 0.0798 \n", "2 2 10.0 0.0731 \n", "3 3 13.0 0.0648 \n", "4 4 11.0 0.0833 \n", "5 5 13.0 0.0723 \n", "6 6 15.0 0.0630 \n", "7 7 0.0 0.0628 \n", "8 8 14.0 0.0660 \n", "9 9 15.0 0.0588 \n", "\n", " Keywords \\\n", "0 company, good, product, team, ad, depend, effe... \n", "1 great, program, perk, time, staff, analytical,... \n", "2 manager, account, project, tech, awesome, star... \n", "3 work, intern, analyst, internship, love, engin... \n", "4 software, dream, developer, data, meh, rough, ... \n", "5 work, intern, analyst, internship, love, engin... \n", "6 engineer, lead, experience, bad, stuff, teamma... \n", "7 company, people, place, grad, challenge, compe... \n", "8 work, ambitious, wonderful, awesome, add, wlb,... \n", "9 engineer, lead, experience, bad, stuff, teamma... \n", "\n", " Document \n", "0 Best Company to work for \n", "1 Moving at the speed of light burn out is inevi... \n", "2 Great balance between bigcompany security and ... \n", "3 The best place Ive worked and also the most de... \n", "4 Unique one of a kind dream job \n", "5 NICE working in GOOGLE as an INTERN \n", "6 Software engineer \n", "7 great place to work and progress \n", "8 Google Surpasses Realistic Expectations \n", "9 Execellent for engineers " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def format_topics_sentences(ldamodel=optimal_model, corpus=corpus, texts=data):\n", " sent_topics_df = pd.DataFrame()\n", " # Get dominant topic in each document\n", " for i, row in enumerate(ldamodel[corpus]): \n", " row = sorted(row, key=lambda x: (x[1]), reverse=True) \n", " # Get the Dominant topic, Perc Contribution and Keywords for each document\n", " for j, (topic_num, prop_topic) in enumerate(row):\n", " if j == 0: \n", " wp = ldamodel.show_topic(topic_num) \n", " topic_keywords = \", \".join([word for word, prop in wp])\n", " sent_topics_df = sent_topics_df.append(pd.Series([int(topic_num), round(prop_topic,4),\n", " topic_keywords]), ignore_index=True)\n", " else:\n", " break\n", " sent_topics_df.columns = ['Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords'] # Create dataframe title\n", " # Add original text to the end of the output (recall that texts = data_lemmatized)\n", " contents = pd.Series(texts)\n", " sent_topics_df = pd.concat([sent_topics_df, contents], axis=1)\n", " return(sent_topics_df) \n", "df_topic_sents_keywords = format_topics_sentences(ldamodel=optimal_model, corpus=corpus, texts=data)\n", "df_dominant_topic = df_topic_sents_keywords.reset_index()\n", "df_dominant_topic.columns = ['Document_No', 'Dominant_Topic', 'Topic_Perc_Contrib', 'Keywords', 'Document']\n", "df_dominant_topic.head(10)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Here we see a list of the first 10 document with corresponding dominant topics attached." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "## 11.2 Finding documents for each topic" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Topic_NumTopic_Perc_ContribKeywordsDocument
00.00.0804company, people, place, grad, challenge, compe...Company full of people running around caring o...
11.00.0833company, cloud, strategist, environment, emplo...I broke down crying on the datacenter floor
22.00.0717review, amazing, senior, job, balance, great, ...Amazing place to develop technical skills
33.00.0744good, review, pay, environment, senior, outsta...Good pay and work
44.00.0807great, program, perk, time, staff, analytical,...Average with a hint of arrogance
55.00.0778place, perfect, technical, life, overpay, iii,...Not perfect but still the best place in the wo...
66.00.0702company, good, product, team, ad, depend, effe...Best Company in the world
77.00.0874great, benefit, excellent, product, lot, class...Great benefits but large enough to get lost in
88.00.0713good, analyst, business, pgm, year, educator, ...Good company with good benefits lots of red ta...
99.00.0828google, director, phenomenal, early, fun, fulf...Early Childhood Educator
1010.00.0865manager, account, project, tech, awesome, star...Project Manager
1111.00.0833software, dream, developer, data, meh, rough, ...Unique one of a kind dream job
1212.00.0759work, culture, associate, amazing, experience,...Massage Therapist
1313.00.0849work, intern, analyst, internship, love, engin...Software Engineering Intern
1414.00.0723work, ambitious, wonderful, awesome, add, wlb,...wonderful place to work
1515.00.0702engineer, lead, experience, bad, stuff, teamma...Engineering Practicum Internship
1616.00.0751great, career, marketing, lot, brand, worker, ...Google is great recruiting org not so much
1717.00.0798software, engineer, designer, fix, competitive...Sr Interactive Designer Sr Solution Consultant
1818.00.0844great, sale, long, geekland, quality, producti...Adsense Publisher
1919.00.0765place, partner, specialist, perk, love, workli...Love working at Google in Boulder CO
\n", "
" ], "text/plain": [ " Topic_Num Topic_Perc_Contrib \\\n", "0 0.0 0.0804 \n", "1 1.0 0.0833 \n", "2 2.0 0.0717 \n", "3 3.0 0.0744 \n", "4 4.0 0.0807 \n", "5 5.0 0.0778 \n", "6 6.0 0.0702 \n", "7 7.0 0.0874 \n", "8 8.0 0.0713 \n", "9 9.0 0.0828 \n", "10 10.0 0.0865 \n", "11 11.0 0.0833 \n", "12 12.0 0.0759 \n", "13 13.0 0.0849 \n", "14 14.0 0.0723 \n", "15 15.0 0.0702 \n", "16 16.0 0.0751 \n", "17 17.0 0.0798 \n", "18 18.0 0.0844 \n", "19 19.0 0.0765 \n", "\n", " Keywords \\\n", "0 company, people, place, grad, challenge, compe... \n", "1 company, cloud, strategist, environment, emplo... \n", "2 review, amazing, senior, job, balance, great, ... \n", "3 good, review, pay, environment, senior, outsta... \n", "4 great, program, perk, time, staff, analytical,... \n", "5 place, perfect, technical, life, overpay, iii,... \n", "6 company, good, product, team, ad, depend, effe... \n", "7 great, benefit, excellent, product, lot, class... \n", "8 good, analyst, business, pgm, year, educator, ... \n", "9 google, director, phenomenal, early, fun, fulf... \n", "10 manager, account, project, tech, awesome, star... \n", "11 software, dream, developer, data, meh, rough, ... \n", "12 work, culture, associate, amazing, experience,... \n", "13 work, intern, analyst, internship, love, engin... \n", "14 work, ambitious, wonderful, awesome, add, wlb,... \n", "15 engineer, lead, experience, bad, stuff, teamma... \n", "16 great, career, marketing, lot, brand, worker, ... \n", "17 software, engineer, designer, fix, competitive... \n", "18 great, sale, long, geekland, quality, producti... \n", "19 place, partner, specialist, perk, love, workli... \n", "\n", " Document \n", "0 Company full of people running around caring o... \n", "1 I broke down crying on the datacenter floor \n", "2 Amazing place to develop technical skills \n", "3 Good pay and work \n", "4 Average with a hint of arrogance \n", "5 Not perfect but still the best place in the wo... \n", "6 Best Company in the world \n", "7 Great benefits but large enough to get lost in \n", "8 Good company with good benefits lots of red ta... \n", "9 Early Childhood Educator \n", "10 Project Manager \n", "11 Unique one of a kind dream job \n", "12 Massage Therapist \n", "13 Software Engineering Intern \n", "14 wonderful place to work \n", "15 Engineering Practicum Internship \n", "16 Google is great recruiting org not so much \n", "17 Sr Interactive Designer Sr Solution Consultant \n", "18 Adsense Publisher \n", "19 Love working at Google in Boulder CO " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Group top 20 documents for the 20 dominant topic\n", "sent_topics_sorteddf_mallet = pd.DataFrame()\n", "sent_topics_outdf_grpd = df_topic_sents_keywords.groupby('Dominant_Topic') \n", "for i, grp in sent_topics_outdf_grpd:\n", " sent_topics_sorteddf_mallet = pd.concat([sent_topics_sorteddf_mallet,\n", " grp.sort_values(['Perc_Contribution'], ascending=[0]).head(1)], axis=0)\n", "sent_topics_sorteddf_mallet.reset_index(drop=True, inplace=True)\n", "sent_topics_sorteddf_mallet.columns = ['Topic_Num', \"Topic_Perc_Contrib\", \"Keywords\", \"Document\"]\n", "sent_topics_sorteddf_mallet " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Here we see a list of most relevant documents for each of the 20 dominant topics." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "## 11.3 Document distribution across Topics" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Dominant TopicNum_DocumentPerc_Document
00.0350.070
11.0320.064
22.0200.040
33.0240.048
44.0270.054
55.0350.070
66.0190.038
77.0170.034
88.0220.044
99.0330.066
1010.0430.086
1111.0180.036
1212.0210.042
1313.0260.052
1414.0150.030
1515.0240.048
1616.0170.034
1717.0330.066
1818.0190.038
1919.0200.040
\n", "
" ], "text/plain": [ " Dominant Topic Num_Document Perc_Document\n", "0 0.0 35 0.070\n", "1 1.0 32 0.064\n", "2 2.0 20 0.040\n", "3 3.0 24 0.048\n", "4 4.0 27 0.054\n", "5 5.0 35 0.070\n", "6 6.0 19 0.038\n", "7 7.0 17 0.034\n", "8 8.0 22 0.044\n", "9 9.0 33 0.066\n", "10 10.0 43 0.086\n", "11 11.0 18 0.036\n", "12 12.0 21 0.042\n", "13 13.0 26 0.052\n", "14 14.0 15 0.030\n", "15 15.0 24 0.048\n", "16 16.0 17 0.034\n", "17 17.0 33 0.066\n", "18 18.0 19 0.038\n", "19 19.0 20 0.040" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Number of Documents for Each Topic\n", "topic_counts = df_topic_sents_keywords['Dominant_Topic'].value_counts()\n", "topic_contribution = round(topic_counts/topic_counts.sum(), 4)\n", "topic_num_keywords = {'Topic_Num': pd.Series([0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,\n", " 11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0])}\n", "topic_num_keywords = pd.DataFrame(topic_num_keywords)\n", "df_dominant_topics = pd.concat([topic_num_keywords, topic_counts, topic_contribution], axis=1)\n", "df_dominant_topics.reset_index(drop=True, inplace=True)\n", "df_dominant_topics.columns = ['Dominant Topic', 'Num_Document', 'Perc_Document']\n", "df_dominant_topics" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Here we see the number of documents and the percentage of overall documents that contributes to each of the 20 dominant topics." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# 12 Answering the Questions" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Based on our modeling above, we were able to use a very accurate model from Gibb's Sampling, and further optimize the model by finding the optimal number of dominant topics without redundancy.\n", "\n", "As a result, we are now able to see the 20 dominant topics that were extracted from our dataset. Furthermore, we are also able to see the dominant topic for each of the 500 documents, and determine the most relevant document for each dominant topics.\n", "\n", "With the in-depth analysis of each individual topics and documents above, Employers can use this approach to learn the topics from Employer Reviews, and make appropriate adjustments to improve their work environment, which can ultimately improve employee productivity/retention." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": false, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "325px", "left": "1082.66px", "top": "110px", "width": "197.344px" }, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }