{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plots\n", "plots.style.use('fivethirtyeight')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arrays ##\n", "\n", "Arrays are a data structure that holds a sequence of values of the same type. For example, a squence of all numbers, or a squence of all strings, etc. \n", "\n", "We can use use the `make_array` function from the `datascience` package to create what are called `ndarray` that are array implemented by the `NumPy` package. One can perform a range of operations on these arrays in a very efficient manner. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ranges ##\n", "\n", "Range functions allow one to create arrays of ordered sequences of numbers. We can use the `np.arange()` function to create NumPy ndarrays. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tables ##\n", "\n", "Tables stored structured data. We can use the `datascience` package to create `Table` objects that we can perform data manipulation operations on (the `Table` object is a simplified version of a Pandas DataFrame). \n", "\n", "Some methods we can perform on `Table` objects are:\n", "- `tb.show(k)`: show the first k rows of the table\n", "- `tb.select('col1', 'col2')`: select `col1` and `col2` from the table\n", "- `tb.drop('col')`: remove `col` from the table\n", "- `tb.sort('col')`: sort the rows in the table based on the values in `col`\n", "- `tb.where('col', value)`: reduce the table to rows where `col` is equal to `value` \n", "\n", "These methods all return Table objects that have been modified based on the methods that have been called. \n", "\n", "\n", "Let's look at data on ice cream cones that is described in the class textbook. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Flavor | Color | Price | \n", "
---|---|---|
strawberry | pink | 3.55 | \n", "
chocolate | light brown | 4.75 | \n", "
chocolate | dark brown | 5.25 | \n", "
strawberry | pink | 5.25 | \n", "
chocolate | dark brown | 5.25 | \n", "
bubblegum | pink | 4.75 | \n", "
PLAYER | POSITION | TEAM | SALARY | \n", "
---|---|---|---|
Paul Millsap | PF | Atlanta Hawks | 18.6717 | \n", "
Al Horford | C | Atlanta Hawks | 12 | \n", "
Tiago Splitter | C | Atlanta Hawks | 9.75625 | \n", "
Jeff Teague | PG | Atlanta Hawks | 8 | \n", "
Kyle Korver | SG | Atlanta Hawks | 5.74648 | \n", "
Thabo Sefolosha | SF | Atlanta Hawks | 4 | \n", "
Mike Scott | PF | Atlanta Hawks | 3.33333 | \n", "
Kent Bazemore | SF | Atlanta Hawks | 2 | \n", "
Dennis Schroder | PG | Atlanta Hawks | 1.7634 | \n", "
Tim Hardaway Jr. | SG | Atlanta Hawks | 1.30452 | \n", "
... (407 rows omitted)
" ], "text/plain": [ "PLAYER | POSITION | TEAM | SALARY\n", "Paul Millsap | PF | Atlanta Hawks | 18.6717\n", "Al Horford | C | Atlanta Hawks | 12\n", "Tiago Splitter | C | Atlanta Hawks | 9.75625\n", "Jeff Teague | PG | Atlanta Hawks | 8\n", "Kyle Korver | SG | Atlanta Hawks | 5.74648\n", "Thabo Sefolosha | SF | Atlanta Hawks | 4\n", "Mike Scott | PF | Atlanta Hawks | 3.33333\n", "Kent Bazemore | SF | Atlanta Hawks | 2\n", "Dennis Schroder | PG | Atlanta Hawks | 1.7634\n", "Tim Hardaway Jr. | SG | Atlanta Hawks | 1.30452\n", "... (407 rows omitted)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# NBA players, 2015-2016 season\n", "nba = Table.read_table('nba_salaries.csv').relabeled(3, 'SALARY')\n", "\n", "nba" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Let's get Stephen Curry's data\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Let's get data from the New York Knicks\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Columns of Tables are Arrays ##\n", "\n", "We can extract columns from a `Table` as either:\n", "\n", "- A new `Table` with fewer columns using `tb.select()`\n", "- An `ndarray` using `tb.column()` " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# extract a column from a Tables as a Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# extract a column from a Tables as an ndarray\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a Table from Scratch ##\n", "\n", "We can also create tables from scratch using the `Tables()` method and then adding columns to the table using the `tb.with_colum(\"col_name\", ndarray)` method. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }