{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plots\n", "plots.style.use('fivethirtyeight')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arrays ##\n", "\n", "Arrays are a data structure that holds a sequence of values of the same type. For example, a squence of all numbers, or a squence of all strings, etc. \n", "\n", "We can use use the `make_array` function from the `datascience` package to create what are called `ndarray` that are array implemented by the `NumPy` package. One can perform a range of operations on these arrays in a very efficient manner. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ranges ##\n", "\n", "Range functions allow one to create arrays of ordered sequences of numbers. We can use the `np.arange()` function to create NumPy ndarrays. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tables ##\n", "\n", "Tables stored structured data. We can use the `datascience` package to create `Table` objects that we can perform data manipulation operations on (the `Table` object is a simplified version of a Pandas DataFrame). \n", "\n", "Some methods we can perform on `Table` objects are:\n", "- `tb.show(k)`: show the first k rows of the table\n", "- `tb.select('col1', 'col2')`: select `col1` and `col2` from the table\n", "- `tb.drop('col')`: remove `col` from the table\n", "- `tb.sort('col')`: sort the rows in the table based on the values in `col`\n", "- `tb.where('col', value)`: reduce the table to rows where `col` is equal to `value` \n", "\n", "These methods all return Table objects that have been modified based on the methods that have been called. \n", "\n", "\n", "Let's look at data on ice cream cones that is described in the class textbook. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Flavor Color Price
strawberry pink 3.55
chocolate light brown 4.75
chocolate dark brown 5.25
strawberry pink 5.25
chocolate dark brown 5.25
bubblegum pink 4.75
" ], "text/plain": [ "Flavor | Color | Price\n", "strawberry | pink | 3.55\n", "chocolate | light brown | 4.75\n", "chocolate | dark brown | 5.25\n", "strawberry | pink | 5.25\n", "chocolate | dark brown | 5.25\n", "bubblegum | pink | 4.75" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load the ice cream data. Each row represents one ice cream cone.\n", "cones = Table.read_table('cones.csv')\n", "cones" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Show the first 2 rows of the data\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# select only the Flavor column\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# the original cones Table is not modified\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# select the Flavor and Price columns\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# remove the Color column\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# sort by price\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# sort by price highest to loweset\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# select only the chocolate cones\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# We can combine mulitple method called. Let's drop the color and then sort by price\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: NBA Salaries ##\n", "\n", "Let's look basketball (NBA) salaries from the 2015-2016 season. The data is originally from https://www.statcrunch.com/app/index.php?dataid=1843341\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYER POSITION TEAM SALARY
Paul Millsap PF Atlanta Hawks 18.6717
Al Horford C Atlanta Hawks 12
Tiago Splitter C Atlanta Hawks 9.75625
Jeff Teague PG Atlanta Hawks 8
Kyle Korver SG Atlanta Hawks 5.74648
Thabo Sefolosha SF Atlanta Hawks 4
Mike Scott PF Atlanta Hawks 3.33333
Kent Bazemore SF Atlanta Hawks 2
Dennis Schroder PG Atlanta Hawks 1.7634
Tim Hardaway Jr. SG Atlanta Hawks 1.30452
\n", "

... (407 rows omitted)

" ], "text/plain": [ "PLAYER | POSITION | TEAM | SALARY\n", "Paul Millsap | PF | Atlanta Hawks | 18.6717\n", "Al Horford | C | Atlanta Hawks | 12\n", "Tiago Splitter | C | Atlanta Hawks | 9.75625\n", "Jeff Teague | PG | Atlanta Hawks | 8\n", "Kyle Korver | SG | Atlanta Hawks | 5.74648\n", "Thabo Sefolosha | SF | Atlanta Hawks | 4\n", "Mike Scott | PF | Atlanta Hawks | 3.33333\n", "Kent Bazemore | SF | Atlanta Hawks | 2\n", "Dennis Schroder | PG | Atlanta Hawks | 1.7634\n", "Tim Hardaway Jr. | SG | Atlanta Hawks | 1.30452\n", "... (407 rows omitted)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# NBA players, 2015-2016 season\n", "nba = Table.read_table('nba_salaries.csv').relabeled(3, 'SALARY')\n", "\n", "nba" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Let's get Stephen Curry's data\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Let's get data from the New York Knicks\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Columns of Tables are Arrays ##\n", "\n", "We can extract columns from a `Table` as either:\n", "\n", "- A new `Table` with fewer columns using `tb.select()`\n", "- An `ndarray` using `tb.column()` " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# extract a column from a Tables as a Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# extract a column from a Tables as an ndarray\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a Table from Scratch ##\n", "\n", "We can also create tables from scratch using the `Tables()` method and then adding columns to the table using the `tb.with_colum(\"col_name\", ndarray)` method. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }