{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  Climate2015
1.1  Students' engagement grouped by goals from entry survey
1.1.1  What are your main reasons for taking this course? (Choose all that apply)
1.2  From exit survey: grouped by goals in the entry survey
1.2.1  Are you likely to...: (Choose all that apply)
1.2.2  Were your goals for taking the course met?
1.2.3  Which of the following components of the course were you very satisfied with? (Choose all that apply)
2  Multiple courses
2.1  Exit survey feedback grouped by goals from entry survey
2.1.1  Are you likely to....
" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "from __future__ import division\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from collections import OrderedDict\n", "\n", "import plotly\n", "import plotly.offline as py\n", "import plotly.graph_objs as go\n", "import plotly.tools as tls\n", "py.init_notebook_mode()\n", "\n", "course_id = 'UBCx__Climate101x__3T2015'" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hideOutput": false, "hidePrompt": false }, "source": [ "## Climate2015\n", "### Students' engagement grouped by goals from entry survey\n", "** Entry survey**\n", "\n", "#### What are your main reasons for taking this course? (Choose all that apply)\n", "1. Develop my understanding of an area related to my current studies or job.\n", "2. Learn something interesting, challenging, or fun.\n", "3. Earn credentials.\n", "4. Decide whether to pursue education or a career in this topic area.\n", "5. Share an experience with friends who are taking this course.\n", "6. Please or impress other people.\n", "7. Learn more about MOOCs.\n", "8. Deep theoretical understanding of the topic.\n", "9. Gain practical knowledge and useful skills.\n", "10. Practice and improve my English.\n", "11. Other or not sure." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def query_entry(course_id = course_id):\n", " \"\"\"\n", " Query entry survey;\n", " Example course_id: 'UBCx__Marketing1x__3T2015'\n", " \"\"\"\n", " query = \"Select * From [%s.entry_survey_mapped]\" % course_id\n", " survey = pd.io.gbq.read_gbq(query, project_id='ubcxdata', verbose=False)\n", " survey = survey.drop_duplicates('user_id', keep='last').ix[:, 11:]\n", " survey.columns = survey.columns.str.replace(\"s_\", \"\")\n", " return survey\n", "\n", "def query_exit(course_id = course_id):\n", " \"\"\"\n", " Query exit survey;\n", " Example course_id: 'UBCx__Marketing1x__3T2015'\n", " \"\"\" \n", " query = \"Select * From [%s.exit_survey_mapped]\" % course_id\n", " survey = pd.io.gbq.read_gbq(query, project_id='ubcxdata', verbose=False)\n", " survey = survey.drop_duplicates('user_id', keep='last').ix[:, 11:]\n", " survey.columns = survey.columns.str.replace(\"s_\", \"\")\n", " return survey" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def query_pc(course_id = course_id): \n", " \"\"\"\n", " Given course_id(e.g. 'UBCx__Marketing1x__3T2015'), query and calculate ndays_act, sum_dt, nforum_posts, \n", " nvideos_watched, nproblems_attempted for sampled students,\n", " also return total_videos and total_problems.\n", " \"\"\"\n", " query = \"\"\"\n", " Select pc.user_id As user_id, is_active, certified, pc.course_id As course_id, pc.mode As mode, pc.grade As grade, \n", " pc.ndays_act As ndays_act, pc.sum_dt As sum_dt, pc.nforum_posts As nforum_posts,\n", " v.videos_watched As nvideos_watched, p.problems_attempted As nproblems_attempted\n", " From [{0}.person_course] pc\n", " Left Join\n", " (SELECT username, exact_count_distinct(video_id) As videos_watched \n", " FROM [{0}.video_stats_day]\n", " Group By username) v\n", " on pc.username = v.username\n", " Left Join \n", " (Select user_id, exact_count_distinct(item.answer_id) As problems_attempted\n", " From [{0}.problem_analysis]\n", " Group By user_id) p\n", " On pc.user_id = p.user_id\n", " Where pc.sum_dt > 0\"\"\".format(course_id)\n", " df = pd.io.gbq.read_gbq(query, project_id='ubcxdata', verbose=False).fillna(0)\n", " \n", " return df" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": false }, "outputs": [], "source": [ "exit = query_exit()\n", "entry = query_entry()\n", "pc = query_pc()\n", "surveys = pd.merge(entry, exit, on='user_id', how='inner')\n", "# pc = query_pc()\n", "merged = entry.merge(pc, on='user_id')" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "def compute_pcstats(merged, selections, start):\n", " \"\"\"\n", " Compute students' behavior: median events (nplay_video, nproblem_check, nforum_posts)\n", " meidan ndays_act, median grade, pct_passing grouped by goals;\n", " merged: entry survey merged with person_course\n", " selections: a list of all the goals from entry survey\n", " start: index of the first column for this question in the entry survey\n", " \"\"\"\n", " df = []\n", " rm = []\n", " counts = []\n", " pct_passing = []\n", " selections2 = selections[:]\n", " merged_count = len(merged)*0.05\n", " for i in range(len(selections2)):\n", " group_count = merged.ix[:, i+start].notnull().sum()\n", "\n", " if group_count > merged_count:\n", " counts.append(group_count)\n", " pct_passing.append(merged.ix[merged.ix[:, i+start].notnull(),'certified'].sum() / group_count)\n", " df.append(pd.DataFrame(data={'Selection': selections[i],\n", " 'grade': merged.ix[merged.ix[:, i+start].notnull(), 'grade'],\n", " 'nforum_posts': merged.ix[merged.ix[:, i+start].notnull(), 'nforum_posts'],\n", " 'ndays_act': merged.ix[merged.ix[:, i+start].notnull(), 'ndays_act'], \n", " 'nproblems_attempted': merged.ix[merged.ix[:, i+start].notnull(), 'nproblems_attempted'],\n", " 'nvideos_watched': merged.ix[merged.ix[:, i+start].notnull(), 'nvideos_watched']\n", " }))\n", " else:\n", " rm.append(selections2[i]) \n", " result = pd.concat(df)\n", " #result.fillna(0, inplace=True)\n", " for s in rm:\n", " selections2.remove(s)\n", " median = result.groupby('Selection').median().ix[selections2, :] \n", " return median, counts, pct_passing " ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "selections = ['Relevant understanding', \n", " 'Something interesting',\n", " 'Verified certificate',\n", " 'Pursue further',\n", " 'Learn with friends',\n", " 'Impress people',\n", " 'More about MOOCs',\n", " 'Substantial understanding',\n", " 'Practical knowledge',\n", " 'Improve English',\n", " 'Others']\n", "merged1 = merged[merged.is_active==1]\n", "median, counts, pct_passing = compute_pcstats(merged1, selections, 10)\n", "# median" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "def plot_pcstats(median, counts, pct_passing, title=None):\n", " \"\"\"\n", " Plot students' behavior: median events (nplay_video, nproblem_check, nforum_posts)\n", " meidan ndays_act, median grade, pct_passing grouped by goals;\n", " counts: number of students for each goal\n", " \"\"\"\n", " trace1 = go.Bar(x=median.index, y=median.nvideos_watched, name='# videos watched', showlegend=True)\n", " trace2 = go.Bar(x=median.index, y=median.nproblems_attempted, name='# problems attempted', showlegend=True)\n", " trace3 = go.Bar(x=median.index, y=median.nforum_posts, name='# forum posts', showlegend=True)\n", " trace4 = go.Bar(x=median.index, y=median.ndays_act, name='days active', showlegend=False)\n", " trace5 = go.Bar(x=median.index, y=median.grade, name='grade', showlegend=False)\n", " trace6 = go.Bar(x=median.index, y=pct_passing, name='pct passing', showlegend=False)\n", " trace7 = go.Bar(x=median.index, y=counts, name='# of students', showlegend=False)\n", "\n", " fig = tls.make_subplots(rows=5, cols=1, print_grid=False,\n", " subplot_titles=('median events', 'median days active', \n", " 'median grade', 'pct passing', '# of students'))\n", "\n", " fig.append_trace(trace1, 1, 1)\n", " fig.append_trace(trace2, 1, 1)\n", " fig.append_trace(trace3, 1, 1)\n", " fig.append_trace(trace4, 2, 1)\n", " fig.append_trace(trace5, 3, 1)\n", " fig.append_trace(trace6, 4, 1)\n", " fig.append_trace(trace7, 5, 1)\n", " \n", " fig['layout']['xaxis1'].update(tickfont=dict(size=8))\n", " fig['layout']['xaxis2'].update(tickfont=dict(size=8))\n", " fig['layout']['xaxis3'].update(tickfont=dict(size=8))\n", " fig['layout']['xaxis4'].update(tickfont=dict(size=8))\n", " fig['layout']['xaxis5'].update(tickfont=dict(size=8))\n", " \n", " fig['layout']['yaxis1'].update(showgrid=False)\n", " fig['layout']['yaxis2'].update(showgrid=False)\n", " fig['layout']['yaxis3'].update(showgrid=False)\n", " fig['layout']['yaxis4'].update(showgrid=False)\n", " fig['layout']['yaxis5'].update(showgrid=False)\n", "\n", " fig['layout'].update(barmode='stack', title=title, font=dict(size=12),\n", " height=920, width=600, margin=go.Margin(t=80, b=120))\n", " py.iplot(fig) " ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_pcstats(median, counts, pct_passing, title=\"Students' engagement by goals\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### From exit survey: grouped by goals in the entry survey\n", "** Exit survey**\n", "\n", "#### Are you likely to...: (Choose all that apply) \n", "1. Recommend this course to a friend\n", "2. Take a more advance course on this topic\n", "3. Revisit course components in future\n", "4. Seek to increase my involvement in this topic (work opportunities, books, etc.)\n", "5. Change my habits regarding climate change" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": false, "hideCode": false, "hideOutput": false, "hidePrompt": true }, "outputs": [], "source": [ "def compute_heatmap(surveys, selections, matrix, start, col_names):\n", " \"\"\"\n", " Compute stats for questions 'Are you likely...' and 'How satisfied ...';\n", " surveys: entry survey merged with exit survey\n", " selections: a list of all the goals from entry survey\n", " start: index of the first column for question about goals in the merged survey\n", " matrix: for question Are you likely...' or 'How satisfied ...', \n", " a dictionary with column names as keys and an empty list as values\n", " e.g. matrix = {'Q2_2_1': [], 'Q2_2_2': [], 'Q2_2_3': [], 'Q2_2_4': [], 'Q2_2_5': []}\n", " col_names: corresponding names for the columns\n", " e.g. ['Recommend', 'Advanced', 'Revisit', 'Involvement', 'Habbits']\n", " \"\"\"\n", " rm = []\n", " selections2 = selections[:]\n", " survey_count = len(surveys)*0.05\n", "\n", " for i in range(len(selections2)):\n", " group_count = surveys.ix[:, i+start].notnull().sum()\n", " if group_count > survey_count:\n", " for k, v in matrix.iteritems():\n", " v.append(surveys.ix[surveys.ix[:, i+start].notnull(), k].notnull().sum() / group_count)\n", " else:\n", " rm.append(selections2[i])\n", " for k, v in matrix.iteritems():\n", " v.append(surveys.ix[:, k].notnull().sum() / len(surveys))\n", " for s in rm:\n", " selections2.remove(s)\n", " selections2 = selections2 + ['Overall']\n", " matrix = pd.DataFrame(matrix)\n", " matrix.index = selections2\n", " matrix.columns = col_names\n", " return matrix" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "matrix = {'Q2_2_1': [], 'Q2_2_2': [], 'Q2_2_3': [], 'Q2_2_4': [], 'Q2_2_5': []}\n", "col_names = ['Recommend', 'Advanced', 'Revisit', 'Involvement', 'Habbits']\n", "\n", "likely = compute_heatmap(surveys, selections, matrix, 10, col_names)\n", "#likely" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "collapsed": true, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "def heatmap(df, title=None, width=700):\n", " \"\"\"Plot heatmap given a dataframe\"\"\"\n", " data = [\n", " go.Heatmap(\n", " z=df.values,\n", " x=df.columns,\n", " y=df.index,\n", " colorscale=[[0.0, 'rgb(224,243,248)'], [1.0, 'rgb(43,130,189)']]\n", " )\n", " ]\n", "\n", " layout = go.Layout(\n", " yaxis=dict(autorange='reversed'), \n", " width=width, height=500,\n", " margin=go.Margin(l=100, b=120),\n", " title=title\n", " )\n", "\n", " fig = go.Figure(data=data, layout=layout)\n", " py.iplot(fig)" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "heatmap(likely.T, title='Exit survey feedback grouped by goals')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Were your goals for taking the course met? \n", "- Yes and more, the course exceeded my expectations.\n", "- Yes, my goals were met.\n", "- My goals were somewhat met." ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "def cal_goalsmet(surveys, selections, start, col_name):\n", " \"\"\"\n", " Calculate stats for goalsmet question;\n", " surveys: entry survey merged with exit survey\n", " selections: a list of all the goals from entry survey\n", " start: index of the first column for question about goals in the merged survey\n", " col_name: name of goalsmet column, e.g. 'Q2_1'\n", " \"\"\"\n", " \n", " df = []\n", " rm = []\n", " selections2 = selections[:]\n", " survey_count = len(surveys)*0.05\n", " for i in range(len(selections)):\n", " group_count = surveys.ix[:, i+start].notnull().sum()\n", " if group_count > survey_count:\n", " df.append(pd.DataFrame(data={'Selection': selections[i],\n", " 'goalsmet': surveys.ix[surveys.ix[:, i+start].notnull(), col_name]}))\n", " else:\n", " rm.append(selections2[i])\n", " for s in rm:\n", " selections2.remove(s)\n", " goalsmet = pd.concat(df)\n", " goalsmet = goalsmet.groupby('Selection').goalsmet.value_counts(normalize=True).unstack('Selection').fillna(0).T\n", " col_order = ['Yes and more, the course exceeded my expectations.',\n", " 'Yes, my goals were met.', 'My goals were somewhat met.']\n", " goalsmet = goalsmet.ix[selections2, col_order]\n", " goalsmet.loc['Overall'] = surveys[col_name].value_counts(normalize=True)[col_order]\n", " \n", " return goalsmet" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def stacked_bar(df, names, barmode, title=None, width=600):\n", " \"\"\"\n", " Plot stacked or overlay bar graph given the dataframe\n", " names: names for all the traces\n", " \"\"\"\n", " data = []\n", " colors = ['rgb(77, 175, 74)', 'rgb(255, 127, 0)', 'rgb(55, 126, 184)', 'rgb(228, 26, 28)']\n", " for i in range(0, df.shape[1]):\n", " data.append(go.Bar(x=df.index, y=df.ix[:, i], marker=dict(color=colors[i]), name=names[i]))\n", " \n", " layout = go.Layout(\n", " width=width,\n", " margin=go.Margin(b=150),\n", " barmode=barmode,\n", " title=title)\n", " \n", " fig = go.Figure(data=data, layout=layout)\n", " py.iplot(fig)" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "names = ['Exceeded expection', 'Goals met', 'Goals somewhat met']\n", "goals_met = cal_goalsmet(surveys, selections, 10, 'Q2_1')\n", "stacked_bar(goals_met, names, 'stack', title='Percent goals met grouped by goals', width=800)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Which of the following components of the course were you very satisfied with? (Choose all that apply) \n", "- Challenge level\n", "- Workload\n", "- Pace\n", "- Depth\n", "- Instructor's communication\n", "- Instructor's responsiveness\n", "- Instructor's knowledge of the subject matter" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "satisfy = {'Q4_1_1': [], 'Q4_1_2': [], 'Q4_1_3': [], 'Q4_1_4': [], 'Q4_1_5': [], 'Q4_1_6': [], 'Q4_1_7': []}\n", "col_names = ['challenge', 'workload', 'pace', 'depth', 'communication', 'responsiveness', 'knowledge']\n", "satisfy = compute_heatmap(surveys, selections, satisfy, 10, col_names)\n", "#satisfy" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "heatmap(satisfy.T, title='Percent satisfaction of course components grouped by goals')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multiple courses\n", "### Exit survey feedback grouped by goals from entry survey\n", "#### Are you likely to....\n", "\n", "Recommend this course to a friend" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": false }, "outputs": [], "source": [ "entry_marketing = query_entry('UBCx__Marketing1x__3T2015')\n", "exit_marketing = query_exit('UBCx__Marketing1x__3T2015')\n", "surveys_marketing = pd.merge(entry_marketing, exit_marketing, on='user_id', how='inner')\n", "\n", "exit_climate = query_exit('UBCx__Climate101x__3T2015')\n", "entry_climate = query_entry('UBCx__Climate101x__3T2015')\n", "surveys = pd.merge(entry_climate, exit_climate, on='user_id', how='inner')\n", "\n", "exit_climate2 = query_exit('UBCx__Climate1x__1T2016')\n", "entry_climate2 = query_entry('UBCx__Climate1x__1T2016')\n", "surveys2 = pd.merge(entry_climate2, exit_climate2, on='user_id', how='inner')" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "# different survey choinces in Marketing\n", "selections_m = ['Practical knowledge',\n", " 'Verified certificate',\n", " 'Something interesting',\n", " 'More about MOOCs',\n", " 'Substantial understanding',\n", " 'Improve English',\n", " 'Others']\n", "matrix_m = {'Q1_1': [], 'Q1_2_y': [], 'Q1_3_y': [], 'Q1_4': [], 'Q1_5': [], 'Q1_6': [], 'Q1_7': [], 'Q1_8_y': []}\n", "colnames_m = ['Recommend', 'Advanced', 'Revist', 'Instructor', 'Involvement', 'UBC', 'Habbits', 'edX']\n", "likely_marketing = compute_heatmap(surveys_marketing, selections_m, matrix_m, 0, colnames_m )" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": true, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "selections = ['Relevant understanding', \n", " 'Something interesting',\n", " 'Verified certificate',\n", " 'Pursue further',\n", " 'Learn with friends',\n", " 'Impress people',\n", " 'More about MOOCs',\n", " 'Substantial understanding',\n", " 'Practical knowledge',\n", " 'Improve English',\n", " 'Others']\n", "matrix = {'Q2_2_1': [], 'Q2_2_2': [], 'Q2_2_3': [], 'Q2_2_4': [], 'Q2_2_5': []}\n", "col_names = ['Recommend', 'Advanced', 'Revisit', 'Involvement', 'Habbits']\n", "\n", "likely = compute_heatmap(surveys, selections, matrix, 10, col_names)" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": true, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "matrix = {'Q2_2_1': [], 'Q2_2_2': [], 'Q2_2_3': [], 'Q2_2_4': [], 'Q2_2_5': []}\n", "likely2 = compute_heatmap(surveys2, selections, matrix, 10, col_names)" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [], "source": [ "recommend = pd.concat([likely.loc[likely_marketing.index].Recommend, likely2.loc[likely_marketing.index].Recommend,\n", " likely_marketing.Recommend], axis=1)\n", "recommend.columns = ['Climate2015', 'Climate2016', 'Marketing2015']\n", "# recommend" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false, "hideCode": false, "hidePrompt": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "heatmap(recommend.T, title='Percent recommendation grouped by goals', width=600)" ] } ], "metadata": { "celltoolbar": "Hide code", "hide_input": false, "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" }, "toc": { "toc_cell": true, "toc_number_sections": true, "toc_threshold": 6, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 0 }