{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Data Science Project 1: NYC Dept. of Education Data on NYS Math Exam results\n", "\n", "Data source: https://nycopendata.socrata.com/data?cat=education \n", "Data description: \n", "NYS Math Exam Results for NYC between Grades 3-8 from 2006-2011. \n", "Proficieny Level 1, 2- Below level for that grade \n", "Proficiency Level 3- appropriate for that grade \n", "Proficiency Level 4- above the alevel apprpriate for that grade \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###What am I exploring? \n", "- Exam Proficiency levels across all of NYC \n", "- Exam Proficiency levels within boroughs \n", "- Exam Proficiency levels across all of NYC by gender \n", "- Exam Proficiency levels in NYC changes from 2006-2011 \n", "\n", "**------------------------------------------------------------------------------------------------**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##1. Acquiring Data" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd \n", "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Manual data fixing done before reading it here: \n", "Removed columns with percentage (This I know) \n", "Added columns 1&2 together to create a new one (I don't know how to do in Python\" " ] }, { "cell_type": "code", "collapsed": false, "input": [ "BoroMath = pd.read_csv(\"MathTest_Boro2.csv\")\n", "#saved locally in the same folder" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**------------------------------------------------------------------------------------------------**\n", "\n", "##2.Cleaning Data " ] }, { "cell_type": "code", "collapsed": false, "input": [ "BoroMath = BoroMath.rename(columns={'Level1&2': 'BelowAverage', 'Level3':'Average', 'Level4':'AboveAverage'})\n", "#renamed the header for ease \n", "\n", "BoroMath.head(3)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | Borough | \n", "Grade | \n", "Year | \n", "Category | \n", "Number Tested | \n", "Mean Scale Score | \n", "BelowAverage | \n", "Average | \n", "AboveAverage | \n", "Level3&4 | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "BRONX | \n", "3 | \n", "2006 | \n", "Female | \n", "7984 | \n", "664 | \n", "2546 | \n", "4232 | \n", "1206 | \n", "5438 | \n", "
1 | \n", "BRONX | \n", "3 | \n", "2006 | \n", "Male | \n", "8461 | \n", "663 | \n", "2773 | \n", "4386 | \n", "1302 | \n", "5688 | \n", "
2 | \n", "BRONX | \n", "3 | \n", "2007 | \n", "Female | \n", "7803 | \n", "675 | \n", "1780 | \n", "4410 | \n", "1613 | \n", "6023 | \n", "
3 rows \u00d7 10 columns
\n", "\n", " | Borough | \n", "Grade | \n", "Year | \n", "Category | \n", "Number Tested | \n", "Mean Scale Score | \n", "BelowAverage | \n", "Average | \n", "AboveAverage | \n", "
---|---|---|---|---|---|---|---|---|---|
72 | \n", "BRONX | \n", "All Grades | \n", "2006 | \n", "Female | \n", "48244 | \n", "644 | \n", "26437 | \n", "18403 | \n", "3404 | \n", "
73 | \n", "BRONX | \n", "All Grades | \n", "2006 | \n", "Male | \n", "49616 | \n", "642 | \n", "27282 | \n", "18682 | \n", "3652 | \n", "
74 | \n", "BRONX | \n", "All Grades | \n", "2007 | \n", "Female | \n", "47011 | \n", "654 | \n", "21301 | \n", "21034 | \n", "4676 | \n", "
75 | \n", "BRONX | \n", "All Grades | \n", "2007 | \n", "Male | \n", "50061 | \n", "651 | \n", "23866 | \n", "21195 | \n", "5000 | \n", "
76 | \n", "BRONX | \n", "All Grades | \n", "2008 | \n", "Female | \n", "45661 | \n", "662 | \n", "15303 | \n", "25117 | \n", "5241 | \n", "
77 | \n", "BRONX | \n", "All Grades | \n", "2008 | \n", "Male | \n", "48700 | \n", "659 | \n", "18005 | \n", "25219 | \n", "5476 | \n", "
78 | \n", "BRONX | \n", "All Grades | \n", "2009 | \n", "Female | \n", "45423 | \n", "671 | \n", "10599 | \n", "27585 | \n", "7239 | \n", "
79 | \n", "BRONX | \n", "All Grades | \n", "2009 | \n", "Male | \n", "48521 | \n", "668 | \n", "13256 | \n", "27851 | \n", "7414 | \n", "
80 | \n", "BRONX | \n", "All Grades | \n", "2010 | \n", "Female | \n", "45466 | \n", "670 | \n", "26240 | \n", "13502 | \n", "5724 | \n", "
81 | \n", "BRONX | \n", "All Grades | \n", "2010 | \n", "Male | \n", "48687 | \n", "668 | \n", "28920 | \n", "13910 | \n", "5857 | \n", "
82 | \n", "BRONX | \n", "All Grades | \n", "2011 | \n", "Female | \n", "45598 | \n", "672 | \n", "24794 | \n", "16042 | \n", "4762 | \n", "
83 | \n", "BRONX | \n", "All Grades | \n", "2011 | \n", "Male | \n", "48423 | \n", "669 | \n", "27535 | \n", "15843 | \n", "5045 | \n", "
12 rows \u00d7 9 columns
\n", "\n", " | Year | \n", "Number Tested | \n", "Mean Scale Score | \n", "BelowAverage | \n", "Average | \n", "AboveAverage | \n", "
---|---|---|---|---|---|---|
count | \n", "60.000000 | \n", "60.000000 | \n", "60.000000 | \n", "60.000000 | \n", "60.000000 | \n", "60.000000 | \n", "
mean | \n", "2008.500000 | \n", "42922.066667 | \n", "672.683333 | \n", "15044.266667 | \n", "18976.133333 | \n", "8901.666667 | \n", "
std | \n", "1.722237 | \n", "20169.577332 | \n", "11.342872 | \n", "9136.840989 | \n", "9767.739463 | \n", "5341.662627 | \n", "
min | \n", "2006.000000 | \n", "12431.000000 | \n", "642.000000 | \n", "1639.000000 | \n", "4452.000000 | \n", "2493.000000 | \n", "
25% | \n", "2007.000000 | \n", "26797.000000 | \n", "665.750000 | \n", "6785.250000 | \n", "9237.000000 | \n", "4414.000000 | \n", "
50% | \n", "2008.500000 | \n", "48333.500000 | \n", "673.500000 | \n", "13360.000000 | \n", "19191.500000 | \n", "6629.000000 | \n", "
75% | \n", "2010.000000 | \n", "60109.250000 | \n", "682.000000 | \n", "22884.500000 | \n", "27596.750000 | \n", "13857.250000 | \n", "
max | \n", "2011.000000 | \n", "70730.000000 | \n", "688.000000 | \n", "32470.000000 | \n", "36905.000000 | \n", "19667.000000 | \n", "
8 rows \u00d7 6 columns
\n", "\n", " | \n", " | Number Tested | \n", "Mean Scale Score | \n", "BelowAverage | \n", "Average | \n", "AboveAverage | \n", "
---|---|---|---|---|---|---|
Year | \n", "Category | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
2006 | \n", "Female | \n", "216807 | \n", "3289 | \n", "90995 | \n", "93221 | \n", "32591 | \n", "
Male | \n", "223781 | \n", "3282 | \n", "96436 | \n", "93633 | \n", "33712 | \n", "|
2007 | \n", "Female | \n", "211962 | \n", "3335 | \n", "71104 | \n", "99512 | \n", "41346 | \n", "
Male | \n", "222919 | \n", "3322 | \n", "80398 | \n", "100880 | \n", "41641 | \n", "|
2008 | \n", "Female | \n", "206949 | \n", "3370 | \n", "49665 | \n", "111877 | \n", "45407 | \n", "
Male | \n", "217858 | \n", "3357 | \n", "59549 | \n", "112195 | \n", "46114 | \n", "|
2009 | \n", "Female | \n", "206320 | \n", "3408 | \n", "34282 | \n", "116928 | \n", "55110 | \n", "
Male | \n", "217072 | \n", "3394 | \n", "42827 | \n", "119581 | \n", "54664 | \n", "|
2010 | \n", "Female | \n", "207155 | \n", "3404 | \n", "92805 | \n", "66925 | \n", "47425 | \n", "
Male | \n", "218583 | \n", "3393 | \n", "103001 | \n", "68683 | \n", "46899 | \n", "|
2011 | \n", "Female | \n", "207503 | \n", "3409 | \n", "85770 | \n", "77753 | \n", "43980 | \n", "
Male | \n", "218415 | \n", "3398 | \n", "95824 | \n", "77380 | \n", "45211 | \n", "
12 rows \u00d7 5 columns
\n", "\n", " | Number Tested | \n", "Mean Scale Score | \n", "BelowAverage | \n", "Average | \n", "AboveAverage | \n", "
---|---|---|---|---|---|
Year | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
2006 | \n", "216807 | \n", "3289 | \n", "90995 | \n", "93221 | \n", "32591 | \n", "
2007 | \n", "211962 | \n", "3335 | \n", "71104 | \n", "99512 | \n", "41346 | \n", "
2008 | \n", "206949 | \n", "3370 | \n", "49665 | \n", "111877 | \n", "45407 | \n", "
2009 | \n", "206320 | \n", "3408 | \n", "34282 | \n", "116928 | \n", "55110 | \n", "
2010 | \n", "207155 | \n", "3404 | \n", "92805 | \n", "66925 | \n", "47425 | \n", "
2011 | \n", "207503 | \n", "3409 | \n", "85770 | \n", "77753 | \n", "43980 | \n", "
6 rows \u00d7 5 columns
\n", "\n", " | Number Tested | \n", "Mean Scale Score | \n", "BelowAverage | \n", "Average | \n", "AboveAverage | \n", "
---|---|---|---|---|---|
Year | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
2006 | \n", "223781 | \n", "3282 | \n", "96436 | \n", "93633 | \n", "33712 | \n", "
2007 | \n", "222919 | \n", "3322 | \n", "80398 | \n", "100880 | \n", "41641 | \n", "
2008 | \n", "217858 | \n", "3357 | \n", "59549 | \n", "112195 | \n", "46114 | \n", "
2009 | \n", "217072 | \n", "3394 | \n", "42827 | \n", "119581 | \n", "54664 | \n", "
2010 | \n", "218583 | \n", "3393 | \n", "103001 | \n", "68683 | \n", "46899 | \n", "
2011 | \n", "218415 | \n", "3398 | \n", "95824 | \n", "77380 | \n", "45211 | \n", "
6 rows \u00d7 5 columns
\n", "