{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to Data Science\n", "## Part 0. - Prelude to Data Science\n", "\n", "### Table of contents\n", "\n", "- Numpy\n", "- Scipy\n", "- Other useful techniques\n", "\n", "---\n", "\n", "## Numpy\n", "\n", "\n", "\n", "
\n", "\n", "NumPy is the fundamental package for scientific computing with Python. It contains among other things:\n", "\n", "- a powerful N-dimensional array object\n", "- sophisticated (broadcasting) functions\n", "- tools for integrating C/C++ and Fortran code\n", "- useful linear algebra, Fourier transform, and random number capabilities\n", "\n", "Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Basic numpy object" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.array([0, 1, 2, 3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How much faster is it than a regular python list?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "L = range(1000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit [i**2 for i in L]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(1000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit a**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Please follow this tutorial in the hard way. So type everything in the tutorial - do not copy-paste it - and solve every excercise." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your tutorial code comes here..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scipy\n", "\n", "\n", "\n", "
\n", "\n", "Scipy is great project containing several scientific and mathematical modules. \n", "We will use several modules but most importantly we need to handle sparse data.\n", "\n", "Scipy's sparse module is the sparse equivalent of the numpy array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import scipy.sparse as sp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sp.eye((10))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sp.eye((10)).todense()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Again, please follow this tutorial in the hard way. So type everything in the tutorial - do not copy-paste it - and solve every excercise." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your tutorial code comes here..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mind blasting stuff\n", "\n", "\n", "\n", "
\n", "\n", "Let's try and solve basic mathematical problems with simple code!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's fit a curve" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.optimize import curve_fit" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def f(x, a, b, c):\n", " return a*x**2 + b*x + c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def df(x, a, b, c):\n", " return a*x + b + c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.linspace(0, 50, 100)\n", "y = f(x, 0.5, 1.5, 5.5)\n", "y_noisy = y + 0.2 * np.random.normal(size=len(x))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "params, cov = curve_fit(df, x, y_noisy)\n", "params, cov" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_hat = df(x, *params)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(x, y, 'bo', x, y_hat, 'r-')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "error = np.sum(np.abs(y-y_hat))\n", "error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's find the inverse of a matrix!\n", "\n", "Follow the linked tutorial in the hard way!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your tutorial code comes here..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's solve a linear system!\n", "\n", "Follow the linked tutorial in the hard way!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your tutorial code comes here..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Disclaimer:\n", "_Since numpy and scipy sounds like the names of 90's cartoon characters, in this notebook we chose Ren & Stimpy as their mascots._" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "gamedev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "ca92682351adbbf2ee8deffc828b194a22e094b58ec705bbf3ab67bff10701df" } } }, "nbformat": 4, "nbformat_minor": 1 }