{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Review - 2,000xp\n", "\n", "\n", "\n", "## Preliminaries: 1000xp\n", "Here we create a list of our friends:\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "friends = ['Ann', 'Ben', 'Clara', 'David', 'Kelly', 'Rachel']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can you write a loop that will print out each of our friends? Something like this:\n", "\n", " Ann\n", " Ben\n", " Clara\n", " David\n", " Kelly\n", " Rachel" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we have a list of numbers:\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "numbers = [11, 22, 37, 51, 17]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to print the sum of the numbers but we have an error in the following code. Can you fix it?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "17\n" ] } ], "source": [ "total = 0\n", "for num in numbers:\n", " total = total + num\n", "print(num)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have a list of grades:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "grades = [89, 91, 79, 85, 97, 91]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can you print out each grade, one per line like this:\n", "\n", " 89\n", " 91\n", " 79\n", " 85\n", " 97\n", " 91\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to print out only the grades over 90:\n", "\n", " 91\n", " 97\n", " 91\n", "\n", "Can you change your code to do so?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We represent a sentence as a list of words." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n", "sentence = ['the', 'man', 'saw', 'the', 'girl', 'on', 'the', 'hill', 'with', 'the', 'telescope']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can you write code that will compute and print the number of occurrences of the word *the* in the sentence?" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have another sentence" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "sentence2 = ['the', 'man', 'saw', 'a', 'girl', 'on', 'the', 'hill', 'with', 'the', 'telescope', 'on', 'a', 'sunny', 'day']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and we want to count the occurrences of *the* and *a* and print something like:\n", "\n", " the: 4\n", " a: 2\n", " \n", "Can you write the code?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have a list of our friends and their home states:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "friends = [['Ann', 'North Dakota'], ['Ben', 'Virginia'], ['Clara', 'Connecticut'], \n", " ['David', 'Minnesota'], ['Kelly', 'Kansas'], ['Rachel', 'New Mexico']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plus we set a variable for the person we want to search for:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "searchPerson = 'Kelly'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to write code that finds the home state for the person `searchPerson` and prints it. For example, if the value of `searchPerson` is `Kelly` the code should print out `Kansas`. If `searchPerson` is `Rachel` it should print out `New Mexico`. Test your code by changing the value of searchPerson above." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have a list of friends and their ages:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "friends = [['Ann', 21], ['Ben', 17], ['Clara', 24], \n", " ['David', 27], ['Kelly', 18], ['Rachel', 20],\n", " ['Jennifer', 15], ['Sam', 19], ['Vivic', 16]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would like to print out the friends who are between the ages of 18 and 21 inclusive (meaning if someone is 18 that person's name should be printed)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "# Your work here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dataframes\n", "\n", "\n", "\n", "Let's load in a small dataset of women athletes:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameSportHeightWeight
0Asuka TeramotoGymnastics5466
1Brittainey RavenBasketball72162
2Chen NanBasketball78204
3Gabby DouglasGymnastics4990
4Helalia JohannesTrack6599
\n", "
" ], "text/plain": [ " Name Sport Height Weight\n", "0 Asuka Teramoto Gymnastics 54 66\n", "1 Brittainey Raven Basketball 72 162\n", "2 Chen Nan Basketball 78 204\n", "3 Gabby Douglas Gymnastics 49 90\n", "4 Helalia Johannes Track 65 99" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "athletes = pd.read_csv('https://raw.githubusercontent.com/zacharski/machine-learning/master/data/athletes.csv')\n", "athletes.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The athletes either compete professionally in gymnastics, basketball (WNBA) or track (running marathons).\n", "\n", "Remember when you worked on the Titanic problem and had to come up with a rule to decide whether a person survived or not? Your code looked something like this:\n", "\n", "\n", " def predict(passenger):\n", " # Your code here\n", " # You should return 1 if you think the person survived and 0 if you think they didn't\n", " # For example, if you think males survived but females didn't you would write:\n", " if passenger['Sex'] == 'female' and passenger['Pclass'] != 3 :\n", " return 1\n", " elif passenger['Age'] < 5:\n", " return 1\n", " else:\n", " return 0\n", "\n", "I would like you to write similar code, but this time to predict whether someone is a gymnast, a basketball player, or runs track. (Your code should return one of the strings `Gymnastics, Basketball, ` or `Track`. \n", "You should only use the attributes Height and Weight. Here is the code to alter:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "def predict(athlete):\n", " # Your code here.\n", " # to get the height of the person you would use athlete['Height']\n", " # For now we will always predict 'Basketball' (You need to change this)\n", " \n", " return 'Basketball'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's run our code and see what our accuracy is:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy 33.333\n" ] } ], "source": [ "testdatafile = 'https://raw.githubusercontent.com/zacharski/machine-learning/master/data/athletes-gbv.csv'\n", "testdata = pd.read_csv(testdatafile)\n", "predictions = []\n", "totalCorrect = 0\n", "for athlete_index, athlete in testdata.iterrows():\n", " predictions.append(predict(athlete))\n", "\n", " \n", "from sklearn.metrics import accuracy_score\n", "score = accuracy_score(testdata['Sport'], predictions) \n", "\n", " \n", "print('Accuracy %5.3f' % (score * 100))\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the default code always predicts `Basketball` and for our test set we have, for each sport, 25 players, our accuracy is 33.333%. Can you do better?\n", "\n", "### XP\n", "\n", "% accuracy | xp\n", ":---: | :---:\n", "< 50 | 250\n", "50 - 79 | 500 + (accuracy - 50) x 16.5\n", ">=80 | 1000 + (accuracy - 80) x 50" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }