{ "metadata": { "name": "", "signature": "sha256:9ee31b1ba23d6574c7a962640eb79c5527b3a92f67cc238fadd74a426f142714" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Predicting Student Performance\n", "\n", "###A data science experiment using data from the KDD 2010 Educational Data Mining Challenge\n", "\n", "The aim of this IPython Notebook is to show how we can use Python to build predictive algorithms that solve data science problems in the arena of education.\n", "\n", "**This notebook is still heavily under construction**" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "# Get the data: Algebra 2005-2006\n", "train_filepath = 'data/algebra0506/algebra_2005_2006_train.txt'\n", "test_filepath = 'data/algebra0506/algebra_2005_2006_test.txt'\n", "traindata = pd.read_table(train_filepath)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some more information the data format can be found on the [challenge website](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Inspect some of the training data\n", "traindata.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
| \n", " | Row | \n", "Anon Student Id | \n", "Problem Hierarchy | \n", "Problem Name | \n", "Problem View | \n", "Step Name | \n", "Step Start Time | \n", "First Transaction Time | \n", "Correct Transaction Time | \n", "Step End Time | \n", "Step Duration (sec) | \n", "Correct Step Duration (sec) | \n", "Error Step Duration (sec) | \n", "Correct First Attempt | \n", "Incorrects | \n", "Hints | \n", "Corrects | \n", "KC(Default) | \n", "Opportunity(Default) | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "0BrbPbwCMz | \n", "Unit ES_04, Section ES_04-1 | \n", "EG4-FIXED | \n", "1 | \n", "3(x+2) = 15 | \n", "2005-09-09 12:24:35.0 | \n", "2005-09-09 12:24:49.0 | \n", "2005-09-09 12:25:15.0 | \n", "2005-09-09 12:25:15.0 | \n", "40 | \n", "NaN | \n", "40 | \n", "0 | \n", "2 | \n", "3 | \n", "1 | \n", "[SkillRule: Eliminate Parens; {CLT nested; CLT... | \n", "1 | \n", "
| 1 | \n", "2 | \n", "0BrbPbwCMz | \n", "Unit ES_04, Section ES_04-1 | \n", "EG4-FIXED | \n", "1 | \n", "x+2 = 5 | \n", "2005-09-09 12:25:15.0 | \n", "2005-09-09 12:25:31.0 | \n", "2005-09-09 12:25:31.0 | \n", "2005-09-09 12:25:31.0 | \n", "16 | \n", "16 | \n", "NaN | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "[SkillRule: Remove constant; {ax+b=c, positive... | \n", "1~~1 | \n", "
| 2 | \n", "3 | \n", "0BrbPbwCMz | \n", "Unit ES_04, Section ES_04-1 | \n", "EG40 | \n", "1 | \n", "2-8y = -4 | \n", "2005-09-09 12:25:36.0 | \n", "2005-09-09 12:25:43.0 | \n", "2005-09-09 12:26:12.0 | \n", "2005-09-09 12:26:12.0 | \n", "36 | \n", "NaN | \n", "36 | \n", "0 | \n", "2 | \n", "3 | \n", "1 | \n", "[SkillRule: Remove constant; {ax+b=c, positive... | \n", "2 | \n", "
| 3 | \n", "4 | \n", "0BrbPbwCMz | \n", "Unit ES_04, Section ES_04-1 | \n", "EG40 | \n", "1 | \n", "-8y = -6 | \n", "2005-09-09 12:26:12.0 | \n", "2005-09-09 12:26:34.0 | \n", "2005-09-09 12:26:34.0 | \n", "2005-09-09 12:26:34.0 | \n", "22 | \n", "22 | \n", "NaN | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "[SkillRule: Remove coefficient; {ax+b=c, divid... | \n", "1~~1 | \n", "
| 4 | \n", "5 | \n", "0BrbPbwCMz | \n", "Unit ES_04, Section ES_04-1 | \n", "EG40 | \n", "2 | \n", "-7y-5 = -4 | \n", "2005-09-09 12:26:38.0 | \n", "2005-09-09 12:28:36.0 | \n", "2005-09-09 12:28:36.0 | \n", "2005-09-09 12:28:36.0 | \n", "118 | \n", "118 | \n", "NaN | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "[SkillRule: Remove constant; {ax+b=c, positive... | \n", "3~~1 | \n", "
5 rows \u00d7 19 columns
\n", "