{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"# 1 Business Problem"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Employers are always striving to motivate, and create a pleasant work environment for their team members, with the goal of increasing productivity level, while maintaining strong employee retention. It's also not a coincidence that each year, Employers are competing to land on the top 100 rankings such as \"[Canada's Top 100 Employers](https://www.canadastop100.com/)\" and \"[Great Place To Work.](https://www.greatplacetowork.ca)\"\n",
"\n",
"In order to evaluate the quality of each Employer, we need to analyze Employer Reviews written by both former and current Employees to determine the results. Luckily, \"[Glassdoor](https://www.glassdoor.ca)\" was created for this reason, which gives an inside scope of each Employer. By understanding the main topics in each Employer Reviews, Employers can then make adjustment to improve their work environment, which ultimately improves Employee productivity/retention.\n",
"\n",
"However, some Employers have hundreds and thousands of reviews, which can take up a lot of time and resource to complete before determining the results.\n",
"
\n",
"
\n",
"\n",
"**Business Solutions:**\n",
"\n",
"To solve this issue, we will extract the main topics from all Employer Reviews for each Employer, and then determine the overall consensus.\n",
"\n",
"We will perform an unsupervised learning algorithm in Topic Modeling, which uses Latent Dirichlet Allocation (LDA) Model, and LDA Mallet (Machine Learning Language Toolkit) Model.\n",
"\n",
"We will also determine the dominant topic associated to each Employee Reviews, as well as determining the Employee Reviews for each dominant topics for an in-depth analysis.\n",
"
\n",
"
\n",
"\n",
"**Benefits:**\n",
"- Efficiently determine the main topics of Employer Reviews\n",
"- Increase Employee productivity/retention by improving work environments based on topics from Employer Reviews\n",
"- Conveniently determine the topics of each review\n",
"- Extract detailed information by determining the most relevant review for each topic \n",
"
\n",
"
\n",
"\n",
"**Robustness:**\n",
"\n",
"To ensure the model performs well, we will take the following steps:\n",
"- Run the LDA Model and the LDA Mallet Model to compare the performances of each model\n",
"- Run the LDA Mallet Model and optimize the number of topics in the Employer Reviews by choosing the optimal model with highest performance\n",
"\n",
"Note that the main different between LDA Model vs. LDA Mallet Model is that, LDA Model uses Variational Bayes method, which is faster, but less precise than LDA Mallet Model which uses Gibbs Sampling. \n",
"
\n",
"
\n",
"\n",
"**Assumption:**\n",
"- To save computation power and time, we have taken a sample size of 500 for each Employer, and assuming that this dataset is sufficient to capture the topics in the Employer Reviews\n",
"- We're also assuming that the results in this model is applicable in the same way, as if the model were applied on an entire population of the Employer Reviews dataset, with the exception of few parameter tweaks \n",
"
\n",
"
\n",
"\n",
"**Future:**\n",
"\n",
"This model is Part Two of the \"[Quality Control for Banking using LDA and LDA Mallet,](https://nbviewer.jupyter.org/github/mick-zhang/Quality-Control-for-Banking-using-LDA-and-LDA-Mallet/blob/master/Topic%20Bank%20Github.ipynb?flush_cache=true)\" where we're able to showcase information on Employer Reviews with full visualization of the results."
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"# 2 Data Overview"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"hidden": true,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | Unnamed: 0 | \n", "company | \n", "location | \n", "dates | \n", "job-title | \n", "summary | \n", "pros | \n", "cons | \n", "advice-to-mgmt | \n", "overall-ratings | \n", "work-balance-stars | \n", "culture-values-stars | \n", "carrer-opportunities-stars | \n", "comp-benefit-stars | \n", "senior-mangemnet-stars | \n", "helpful-count | \n", "link | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "none | \n", "Dec 11, 2018 | \n", "Current Employee - Anonymous Employee | \n", "Best Company to work for | \n", "People are smart and friendly | \n", "Bureaucracy is slowing things down | \n", "none | \n", "5.0 | \n", "4.0 | \n", "5.0 | \n", "5.0 | \n", "4.0 | \n", "5.0 | \n", "0 | \n", "https://www.glassdoor.com/Reviews/Google-Revie... | \n", "|
1 | \n", "2 | \n", "Mountain View, CA | \n", "Jun 21, 2013 | \n", "Former Employee - Program Manager | \n", "Moving at the speed of light, burn out is inev... | \n", "1) Food, food, food. 15+ cafes on main campus ... | \n", "1) Work/life balance. What balance? All those ... | \n", "1) Don't dismiss emotional intelligence and ad... | \n", "4.0 | \n", "2.0 | \n", "3.0 | \n", "3.0 | \n", "5.0 | \n", "3.0 | \n", "2094 | \n", "https://www.glassdoor.com/Reviews/Google-Revie... | \n", "|
2 | \n", "3 | \n", "New York, NY | \n", "May 10, 2014 | \n", "Current Employee - Software Engineer III | \n", "Great balance between big-company security and... | \n", "* If you're a software engineer, you're among ... | \n", "* It *is* becoming larger, and with it comes g... | \n", "Keep the focus on the user. Everything else wi... | \n", "5.0 | \n", "5.0 | \n", "4.0 | \n", "5.0 | \n", "5.0 | \n", "4.0 | \n", "949 | \n", "https://www.glassdoor.com/Reviews/Google-Revie... | \n", "|
3 | \n", "4 | \n", "Mountain View, CA | \n", "Feb 8, 2015 | \n", "Current Employee - Anonymous Employee | \n", "The best place I've worked and also the most d... | \n", "You can't find a more well-regarded company th... | \n", "I live in SF so the commute can take between 1... | \n", "Keep on NOT micromanaging - that is a huge ben... | \n", "5.0 | \n", "2.0 | \n", "5.0 | \n", "5.0 | \n", "4.0 | \n", "5.0 | \n", "498 | \n", "https://www.glassdoor.com/Reviews/Google-Revie... | \n", "|
4 | \n", "5 | \n", "Los Angeles, CA | \n", "Jul 19, 2018 | \n", "Former Employee - Software Engineer | \n", "Unique, one of a kind dream job | \n", "Google is a world of its own. At every other c... | \n", "If you don't work in MTV (HQ), you will be giv... | \n", "Promote managers into management for their man... | \n", "5.0 | \n", "5.0 | \n", "5.0 | \n", "5.0 | \n", "5.0 | \n", "5.0 | \n", "49 | \n", "https://www.glassdoor.com/Reviews/Google-Revie... | \n", "
\n", " | Document_No | \n", "Dominant_Topic | \n", "Topic_Perc_Contrib | \n", "Keywords | \n", "Document | \n", "
---|---|---|---|---|---|
0 | \n", "0 | \n", "6.0 | \n", "0.0577 | \n", "company, good, product, team, ad, depend, effe... | \n", "Best Company to work for | \n", "
1 | \n", "1 | \n", "4.0 | \n", "0.0798 | \n", "great, program, perk, time, staff, analytical,... | \n", "Moving at the speed of light burn out is inevi... | \n", "
2 | \n", "2 | \n", "10.0 | \n", "0.0731 | \n", "manager, account, project, tech, awesome, star... | \n", "Great balance between bigcompany security and ... | \n", "
3 | \n", "3 | \n", "13.0 | \n", "0.0648 | \n", "work, intern, analyst, internship, love, engin... | \n", "The best place Ive worked and also the most de... | \n", "
4 | \n", "4 | \n", "11.0 | \n", "0.0833 | \n", "software, dream, developer, data, meh, rough, ... | \n", "Unique one of a kind dream job | \n", "
5 | \n", "5 | \n", "13.0 | \n", "0.0723 | \n", "work, intern, analyst, internship, love, engin... | \n", "NICE working in GOOGLE as an INTERN | \n", "
6 | \n", "6 | \n", "15.0 | \n", "0.0630 | \n", "engineer, lead, experience, bad, stuff, teamma... | \n", "Software engineer | \n", "
7 | \n", "7 | \n", "0.0 | \n", "0.0628 | \n", "company, people, place, grad, challenge, compe... | \n", "great place to work and progress | \n", "
8 | \n", "8 | \n", "14.0 | \n", "0.0660 | \n", "work, ambitious, wonderful, awesome, add, wlb,... | \n", "Google Surpasses Realistic Expectations | \n", "
9 | \n", "9 | \n", "15.0 | \n", "0.0588 | \n", "engineer, lead, experience, bad, stuff, teamma... | \n", "Execellent for engineers | \n", "
\n", " | Topic_Num | \n", "Topic_Perc_Contrib | \n", "Keywords | \n", "Document | \n", "
---|---|---|---|---|
0 | \n", "0.0 | \n", "0.0804 | \n", "company, people, place, grad, challenge, compe... | \n", "Company full of people running around caring o... | \n", "
1 | \n", "1.0 | \n", "0.0833 | \n", "company, cloud, strategist, environment, emplo... | \n", "I broke down crying on the datacenter floor | \n", "
2 | \n", "2.0 | \n", "0.0717 | \n", "review, amazing, senior, job, balance, great, ... | \n", "Amazing place to develop technical skills | \n", "
3 | \n", "3.0 | \n", "0.0744 | \n", "good, review, pay, environment, senior, outsta... | \n", "Good pay and work | \n", "
4 | \n", "4.0 | \n", "0.0807 | \n", "great, program, perk, time, staff, analytical,... | \n", "Average with a hint of arrogance | \n", "
5 | \n", "5.0 | \n", "0.0778 | \n", "place, perfect, technical, life, overpay, iii,... | \n", "Not perfect but still the best place in the wo... | \n", "
6 | \n", "6.0 | \n", "0.0702 | \n", "company, good, product, team, ad, depend, effe... | \n", "Best Company in the world | \n", "
7 | \n", "7.0 | \n", "0.0874 | \n", "great, benefit, excellent, product, lot, class... | \n", "Great benefits but large enough to get lost in | \n", "
8 | \n", "8.0 | \n", "0.0713 | \n", "good, analyst, business, pgm, year, educator, ... | \n", "Good company with good benefits lots of red ta... | \n", "
9 | \n", "9.0 | \n", "0.0828 | \n", "google, director, phenomenal, early, fun, fulf... | \n", "Early Childhood Educator | \n", "
10 | \n", "10.0 | \n", "0.0865 | \n", "manager, account, project, tech, awesome, star... | \n", "Project Manager | \n", "
11 | \n", "11.0 | \n", "0.0833 | \n", "software, dream, developer, data, meh, rough, ... | \n", "Unique one of a kind dream job | \n", "
12 | \n", "12.0 | \n", "0.0759 | \n", "work, culture, associate, amazing, experience,... | \n", "Massage Therapist | \n", "
13 | \n", "13.0 | \n", "0.0849 | \n", "work, intern, analyst, internship, love, engin... | \n", "Software Engineering Intern | \n", "
14 | \n", "14.0 | \n", "0.0723 | \n", "work, ambitious, wonderful, awesome, add, wlb,... | \n", "wonderful place to work | \n", "
15 | \n", "15.0 | \n", "0.0702 | \n", "engineer, lead, experience, bad, stuff, teamma... | \n", "Engineering Practicum Internship | \n", "
16 | \n", "16.0 | \n", "0.0751 | \n", "great, career, marketing, lot, brand, worker, ... | \n", "Google is great recruiting org not so much | \n", "
17 | \n", "17.0 | \n", "0.0798 | \n", "software, engineer, designer, fix, competitive... | \n", "Sr Interactive Designer Sr Solution Consultant | \n", "
18 | \n", "18.0 | \n", "0.0844 | \n", "great, sale, long, geekland, quality, producti... | \n", "Adsense Publisher | \n", "
19 | \n", "19.0 | \n", "0.0765 | \n", "place, partner, specialist, perk, love, workli... | \n", "Love working at Google in Boulder CO | \n", "
\n", " | Dominant Topic | \n", "Num_Document | \n", "Perc_Document | \n", "
---|---|---|---|
0 | \n", "0.0 | \n", "35 | \n", "0.070 | \n", "
1 | \n", "1.0 | \n", "32 | \n", "0.064 | \n", "
2 | \n", "2.0 | \n", "20 | \n", "0.040 | \n", "
3 | \n", "3.0 | \n", "24 | \n", "0.048 | \n", "
4 | \n", "4.0 | \n", "27 | \n", "0.054 | \n", "
5 | \n", "5.0 | \n", "35 | \n", "0.070 | \n", "
6 | \n", "6.0 | \n", "19 | \n", "0.038 | \n", "
7 | \n", "7.0 | \n", "17 | \n", "0.034 | \n", "
8 | \n", "8.0 | \n", "22 | \n", "0.044 | \n", "
9 | \n", "9.0 | \n", "33 | \n", "0.066 | \n", "
10 | \n", "10.0 | \n", "43 | \n", "0.086 | \n", "
11 | \n", "11.0 | \n", "18 | \n", "0.036 | \n", "
12 | \n", "12.0 | \n", "21 | \n", "0.042 | \n", "
13 | \n", "13.0 | \n", "26 | \n", "0.052 | \n", "
14 | \n", "14.0 | \n", "15 | \n", "0.030 | \n", "
15 | \n", "15.0 | \n", "24 | \n", "0.048 | \n", "
16 | \n", "16.0 | \n", "17 | \n", "0.034 | \n", "
17 | \n", "17.0 | \n", "33 | \n", "0.066 | \n", "
18 | \n", "18.0 | \n", "19 | \n", "0.038 | \n", "
19 | \n", "19.0 | \n", "20 | \n", "0.040 | \n", "