{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Machine Learning Lunch

\n", "
\n", "

Tom Brander


\n", "

June 28, 2017

\n", "\n", "
\n", "\n", "\n", "\n", "\n", "## Many Thanks to [Compose](https://compose.com/) for the space and lunch! \n", " \n", "### http://oswco.com [@dartdog](https://twitter.com/dartdog)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

The PyData Stack

\n", "Source: [Jake VanderPlas: State of the Tools](https://www.youtube.com/watch?v=5GlNDD7qbP4)\n", "and [Thomas Wiecki](https://quantopian.github.io/pyfolio/) \n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To the uninitiated the whole pile of Python stuff looks terribly complicated. \n", "To some extent it is. \n", "But there has been a ton of work done to bring order out of the apparent chaos! " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# The Libraries (just a starting point)\n", "\n", "+ Python, of course (https://www.python.org/)\n", " - A few years ago there was a change from the the Python 2, series to the Python 3 series\n", " - Now the recomendation is just go with Python 3.6 \n", "+ Pandas (http://pandas.pydata.org/)\n", " - Main data manipulation library, mostly using DataFrames (think Excel on steroids)\n", " - Many IO capabilities GBQ, S-3, Parquet, SQL, CSV, JSON, And web data (stock prices and financial data)\n", " - Built on top of Numpy \n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "+ Numpy (http://www.numpy.org/)\n", " - High performance numerical library particularly array and matrix oriented \n", "+ Matplotlib (https://matplotlib.org/)\n", " - the grand daddy of Python Plotting libraries, many other libraries build on it to simplify and or stylize it \n", "+ Sci-kit Learn (http://scikit-learn.org/stable/)\n", " - A collection of libraries for almost all types of machine learning with consitant API's and supporting libraries " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "+ TensorFlow (https://www.tensorflow.org/)\n", " - Google's open source numerical computing library\n", " - on which they have built and released a large number of machine learning components\n", " - along with a number of supporting components (I/O, encoding, serving etc)\n", "+ Keras (https://keras.io/)(https://www.tensorflow.org/api_docs/python/tf/contrib/keras)\n", " - a simplified interface to many Machine learning libraries, also incorporated into TensorFlow\n", " - supports Theano, Cntk, Pytorch (and more on the way)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "+ StatsModels (http://www.statsmodels.org/stable/index.html)(https://patsy.readthedocs.io/en/latest/)\n", " - Many statistical techniques and the Patsy statistical language (much like R)\n", "+ PyMC3 (https://pymc-devs.github.io/pymc3/index.html)\n", " - Baysian Modeling library (in many ways comparable to Stan but newer)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Jupyter\n", "+ (http://jupyter.org/)\n", "+ What this notebook is done with\n", "+ Has become a common format for \"open data Science\"\n", "+ Has also become a great method for shaing code and documentation throughout the Python community\n", "+ Supports many other languages, or Kernels R, Julia (a newer stats language) Go, Ruby ++ In many cases allows easier interoperabilitty between them" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Anaconda\n", "+ (https://www.continuum.io/anaconda-overview)\n", "+ All of the above,(150+ libraries), (except TensorFlow/Keras) and much more is auto installed for you using the Anaconda distribution including a nice IDE, Spyder\n", "+ As a bonus you get a faster than \"Normal\" version of Python with Intel MKL extensions built in\n", " - Speed-boosted NumPy, SciPy, scikit-learn, and NumExpr\n", " - The packaging of MKL with redistributable binaries in Anaconda for easy access to the MKL runtime library.\n", " - Python bindings to the low level MKL service functions, which allow for the modification of the number of threads being used during runtime.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Books\n", "+ Data Science Handbook, freely available on GitHub, excellent resource https://github.com/jakevdp/PythonDataScienceHandbook\n", "+ Python Machine Learning by Sebastian Raschka https://www.packtpub.com/big-data-and-business-intelligence/python-machine-learning Primarily Sci-Kit learn Highly Recommeded!\n", "+ Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron http://shop.oreilly.com/product/0636920052289.do\n", "+ Deep Learning with Python by Francois Chollet https://www.manning.com/books/deep-learning-with-python " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Five More Tips\n", "+ Jupyter Notebook Gallery, Awesome https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks\n", "+ CSVKIT (https://csvkit.readthedocs.io/en/1.0.2/)\n", "+ Pandas profiling (https://github.com/JosPolfliet/pandas-profiling)\n", "+ Kaggle (https://www.kaggle.com/)\n", "+ What type of algorithim? (http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Course\n", "+ https://github.com/amueller/scipy-2016-sklearn Videos and notebooks" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Machine Learning:\n", "
\n", "
\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Skills\n", "
\n", "
\n", "\n", "
\n", "From: https://opendatascience.com/blog/what-is-data-science-and-what-does-a-data-scientist-do/" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/tom/anaconda3/envs/py36n/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", " \"This module will be removed in 0.20.\", DeprecationWarning)\n", "Optimization Progress: 23%|██▎ | 182/800 [00:23<01:10, 8.77pipeline/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Generation 1 - Current best internal CV score: 0.9730848861283643\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Optimization Progress: 35%|███▍ | 278/800 [00:47<01:26, 6.01pipeline/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Generation 2 - Current best internal CV score: 0.9821428571428571\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Optimization Progress: 46%|████▋ | 372/800 [00:58<00:25, 16.66pipeline/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Generation 3 - Current best internal CV score: 0.9821428571428571\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Optimization Progress: 58%|█████▊ | 464/800 [01:15<01:00, 5.51pipeline/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Generation 4 - Current best internal CV score: 0.9821428571428571\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Optimization Progress: 70%|██████▉ | 556/800 [01:37<00:22, 10.80pipeline/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Generation 5 - Current best internal CV score: 0.9904761904761905\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Optimization Progress: 80%|████████ | 642/800 [01:48<00:16, 9.72pipeline/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Generation 6 - Current best internal CV score: 0.9904761904761905\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " " ] }, { "name": "stdout", "output_type": "stream", "text": [ "Generation 7 - Current best internal CV score: 0.9904761904761905\n", "\n", "Best pipeline: DecisionTreeClassifier(RBFSampler(XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=DEFAULT, XGBClassifier__min_child_weight=20, XGBClassifier__n_estimators=100, XGBClassifier__nthread=1, XGBClassifier__subsample=0.95), RBFSampler__gamma=0.35), DecisionTreeClassifier__criterion=entropy, DecisionTreeClassifier__max_depth=DEFAULT, DecisionTreeClassifier__min_samples_leaf=15, DecisionTreeClassifier__min_samples_split=10)\n", "0.894736842105\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r" ] } ], "source": [ "from tpot import TPOTClassifier\n", "import pandas as pd\n", "from sklearn.datasets import load_iris\n", "from sklearn.model_selection import train_test_split\n", "import numpy as np\n", "\n", "iris = load_iris()\n", "X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),\n", " iris.target.astype(np.float64), train_size=0.75, test_size=0.25)\n", "\n", "tpot = TPOTClassifier(generations=7, population_size=100, verbosity=2, random_state=2)\n", "tpot.fit(X_train, y_train)\n", "\n", "print(tpot.score(X_test, y_test))\n", "tpot.export('tpot_iris_pipeline.py')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BernoulliNB(BernoulliNB(input_matrix, BernoulliNB__alpha=10.0, BernoulliNB__fit_prior=DEFAULT), BernoulliNB__alpha=0.01, BernoulliNB__fit_prior=DEFAULT)BernoulliNB(DecisionTreeClassifier(input_matrix, DecisionTreeClassifier__criterion=gini, DecisionTreeClassifier__max_depth=5, DecisionTreeClassifier__min_samples_leaf=20, DecisionTreeClassifier__min_samples_split=10), BernoulliNB__alpha=100.0, BernoulliNB__fit_prior=True)BernoulliNB(DecisionTreeClassifier(input_matrix, DecisionTreeClassifier__criterion=gini, DecisionTreeClassifier__max_depth=7, DecisionTreeClassifier__min_samples_leaf=2, DecisionTreeClassifier__min_samples_split=3), BernoulliNB__alpha=0.01, BernoulliNB__fit_prior=DEFAULT)BernoulliNB(GaussianNB(input_matrix), BernoulliNB__alpha=0.1, BernoulliNB__fit_prior=DEFAULT)BernoulliNB(LogisticRegression(input_matrix, LogisticRegression__C=DEFAULT, LogisticRegression__dual=DEFAULT, LogisticRegression__penalty=l1), BernoulliNB__alpha=0.1, BernoulliNB__fit_prior=False)BernoulliNB(Normalizer(input_matrix, Normalizer__norm=l2), BernoulliNB__alpha=0.1, BernoulliNB__fit_prior=DEFAULT)BernoulliNB(Normalizer(input_matrix, Normalizer__norm=max), BernoulliNB__alpha=0.001, BernoulliNB__fit_prior=True)BernoulliNB(RobustScaler(input_matrix), BernoulliNB__alpha=1.0, BernoulliNB__fit_prior=False)BernoulliNB(RobustScaler(input_matrix), BernoulliNB__alpha=100.0, BernoulliNB__fit_prior=DEFAULT)BernoulliNB(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesClassifier__criterion=DEFAULT, SelectFromModel__ExtraTreesClassifier__max_features=DEFAULT, SelectFromModel__ExtraTreesClassifier__n_estimators=100, SelectFromModel__threshold=0.2), BernoulliNB__alpha=1.0, BernoulliNB__fit_prior=False)...XGBClassifier(input_matrix, XGBClassifier__learning_rate=0.5, XGBClassifier__max_depth=2, XGBClassifier__min_child_weight=4, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.95)XGBClassifier(input_matrix, XGBClassifier__learning_rate=0.5, XGBClassifier__max_depth=2, XGBClassifier__min_child_weight=4, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=DEFAULT)XGBClassifier(input_matrix, XGBClassifier__learning_rate=0.5, XGBClassifier__max_depth=3, XGBClassifier__min_child_weight=18, XGBClassifier__n_estimators=100, XGBClassifier__nthread=1, XGBClassifier__subsample=0.7)XGBClassifier(input_matrix, XGBClassifier__learning_rate=0.5, XGBClassifier__max_depth=5, XGBClassifier__min_child_weight=17, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.25)XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=1, XGBClassifier__min_child_weight=19, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.8)XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=1, XGBClassifier__min_child_weight=6, XGBClassifier__n_estimators=100, XGBClassifier__nthread=1, XGBClassifier__subsample=1.0)XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=2, XGBClassifier__min_child_weight=4, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.95)XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=2, XGBClassifier__min_child_weight=6, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.95)XGBClassifier(input_matrix, XGBClassifier__learning_rate=DEFAULT, XGBClassifier__max_depth=5, XGBClassifier__min_child_weight=17, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.25)XGBClassifier(input_matrix, XGBClassifier__learning_rate=DEFAULT, XGBClassifier__max_depth=DEFAULT, XGBClassifier__min_child_weight=19, XGBClassifier__n_estimators=100, XGBClassifier__nthread=1, XGBClassifier__subsample=0.45)
02.0000002.0000002.000002.0000002.0000002.0000002.0000002.0000002.0000002.000000...1.0000001.0000001.0000001.000001.0000001.0000001.000001.000001.000001.000000
10.3665110.7056420.936180.7056420.7056420.3665110.3665110.7774330.7056420.366511...0.9448760.9448760.3581780.339130.4331780.9448760.936180.936180.339130.366511
\n", "

2 rows × 626 columns

\n", "
" ], "text/plain": [ " BernoulliNB(BernoulliNB(input_matrix, BernoulliNB__alpha=10.0, BernoulliNB__fit_prior=DEFAULT), BernoulliNB__alpha=0.01, BernoulliNB__fit_prior=DEFAULT) \\\n", "0 2.000000 \n", "1 0.366511 \n", "\n", " BernoulliNB(DecisionTreeClassifier(input_matrix, DecisionTreeClassifier__criterion=gini, DecisionTreeClassifier__max_depth=5, DecisionTreeClassifier__min_samples_leaf=20, DecisionTreeClassifier__min_samples_split=10), BernoulliNB__alpha=100.0, BernoulliNB__fit_prior=True) \\\n", "0 2.000000 \n", "1 0.705642 \n", "\n", " BernoulliNB(DecisionTreeClassifier(input_matrix, DecisionTreeClassifier__criterion=gini, DecisionTreeClassifier__max_depth=7, DecisionTreeClassifier__min_samples_leaf=2, DecisionTreeClassifier__min_samples_split=3), BernoulliNB__alpha=0.01, BernoulliNB__fit_prior=DEFAULT) \\\n", "0 2.00000 \n", "1 0.93618 \n", "\n", " BernoulliNB(GaussianNB(input_matrix), BernoulliNB__alpha=0.1, BernoulliNB__fit_prior=DEFAULT) \\\n", "0 2.000000 \n", "1 0.705642 \n", "\n", " BernoulliNB(LogisticRegression(input_matrix, LogisticRegression__C=DEFAULT, LogisticRegression__dual=DEFAULT, LogisticRegression__penalty=l1), BernoulliNB__alpha=0.1, BernoulliNB__fit_prior=False) \\\n", "0 2.000000 \n", "1 0.705642 \n", "\n", " BernoulliNB(Normalizer(input_matrix, Normalizer__norm=l2), BernoulliNB__alpha=0.1, BernoulliNB__fit_prior=DEFAULT) \\\n", "0 2.000000 \n", "1 0.366511 \n", "\n", " BernoulliNB(Normalizer(input_matrix, Normalizer__norm=max), BernoulliNB__alpha=0.001, BernoulliNB__fit_prior=True) \\\n", "0 2.000000 \n", "1 0.366511 \n", "\n", " BernoulliNB(RobustScaler(input_matrix), BernoulliNB__alpha=1.0, BernoulliNB__fit_prior=False) \\\n", "0 2.000000 \n", "1 0.777433 \n", "\n", " BernoulliNB(RobustScaler(input_matrix), BernoulliNB__alpha=100.0, BernoulliNB__fit_prior=DEFAULT) \\\n", "0 2.000000 \n", "1 0.705642 \n", "\n", " BernoulliNB(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesClassifier__criterion=DEFAULT, SelectFromModel__ExtraTreesClassifier__max_features=DEFAULT, SelectFromModel__ExtraTreesClassifier__n_estimators=100, SelectFromModel__threshold=0.2), BernoulliNB__alpha=1.0, BernoulliNB__fit_prior=False) \\\n", "0 2.000000 \n", "1 0.366511 \n", "\n", " ... \\\n", "0 ... \n", "1 ... \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=0.5, XGBClassifier__max_depth=2, XGBClassifier__min_child_weight=4, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.95) \\\n", "0 1.000000 \n", "1 0.944876 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=0.5, XGBClassifier__max_depth=2, XGBClassifier__min_child_weight=4, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=DEFAULT) \\\n", "0 1.000000 \n", "1 0.944876 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=0.5, XGBClassifier__max_depth=3, XGBClassifier__min_child_weight=18, XGBClassifier__n_estimators=100, XGBClassifier__nthread=1, XGBClassifier__subsample=0.7) \\\n", "0 1.000000 \n", "1 0.358178 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=0.5, XGBClassifier__max_depth=5, XGBClassifier__min_child_weight=17, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.25) \\\n", "0 1.00000 \n", "1 0.33913 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=1, XGBClassifier__min_child_weight=19, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.8) \\\n", "0 1.000000 \n", "1 0.433178 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=1, XGBClassifier__min_child_weight=6, XGBClassifier__n_estimators=100, XGBClassifier__nthread=1, XGBClassifier__subsample=1.0) \\\n", "0 1.000000 \n", "1 0.944876 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=2, XGBClassifier__min_child_weight=4, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.95) \\\n", "0 1.00000 \n", "1 0.93618 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=1.0, XGBClassifier__max_depth=2, XGBClassifier__min_child_weight=6, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.95) \\\n", "0 1.00000 \n", "1 0.93618 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=DEFAULT, XGBClassifier__max_depth=5, XGBClassifier__min_child_weight=17, XGBClassifier__n_estimators=DEFAULT, XGBClassifier__nthread=1, XGBClassifier__subsample=0.25) \\\n", "0 1.00000 \n", "1 0.33913 \n", "\n", " XGBClassifier(input_matrix, XGBClassifier__learning_rate=DEFAULT, XGBClassifier__max_depth=DEFAULT, XGBClassifier__min_child_weight=19, XGBClassifier__n_estimators=100, XGBClassifier__nthread=1, XGBClassifier__subsample=0.45) \n", "0 1.000000 \n", "1 0.366511 \n", "\n", "[2 rows x 626 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc=pd.DataFrame(tpot.evaluated_individuals_)\n", "proc.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "source": [ "# Other links\n", "+ Tpot http://rhiever.github.io/tpot/\n", "+ Some new Nvidia developments https://devblogs.nvidia.com/parallelforall/goai-open-gpu-accelerated-data-analytics/\n", "+ State of the art Medical example https://fluforecaster.herokuapp.com/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "+Initial data explore http://localhost:8889/notebooks/Documents/InfluenceH/Working_copies/Cond_fcast_wkg/ccsProfileInitialanalyis.ipynb \n", "+current model\n", "http://localhost:8889/notebooks/Documents/InfluenceH/Working_copies/Cond_fcast_wkg/WIPNNModelonehottarget2.ipynb#\n", "+ RE forecast https://docs.google.com/spreadsheets/d/1HJxK82QYeYO13hQGaAg3hw4a4j8s0lSGdfk4BBjzh38/edit#gid=3\n", "+ RE survey http://d1ambw9zjiu0uw.cloudfront.net/custom_reports3/21.pdf?1491572271\n", "+ stock example http://localhost:8888/notebooks/Documents/pyfolio_wkng/examples/single_stock_example.ipynb# BUT! see issues https://github.com/quantopian/empyrical/issues/52\n", "+ Old CCS propensity http://localhost:8888/notebooks/Documents/InfluenceH/Working_copies/CCS_wking/MultiCCS_PropModel_Div.ipynb\n", "+ beginning CCS NN http://localhost:8888/notebooks/Documents/InfluenceH/influence/USF_elu_downsample_1.1.ipynb\n", "+ Zillow initial explore http://localhost:8888/notebooks/Documents/Zillow_w/Notebooks/frkagnotebook.ipynb\n", "+ Zillow Bayes initial look http://localhost:8888/notebooks/Documents/Zillow_w/Notebooks/zillow_bayes.ipynb From: http://willwolf.io/2017/06/15/random-effects-neural-networks/\n", "+ Zillow initial Profile http://localhost:8888/notebooks/Documents/Zillow_w/Notebooks/ProfileInitialanalyis.ipynb\n", "+ Zillow R initial profile https://www.kaggle.com/philippsp/exploratory-analysis-zillow\n", "+ Stock trading https://github.com/Kacawi/datacamp-community and https://medium.com/datacamp/python-for-finance-algorithmic-trading-60fdfb9bb20d\n", "+ Large collection of NN/NLP resourcs https://unsupervisedmethods.com/over-150-of-the-best-machine-learning-nlp-and-python-tutorials-ive-found-ffce2939bd78" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook on Jupyter hub http://nbviewer.jupyter.org/github/dartdog/ML-lunch/blob/master/ML_resources.ipynb \n", "R vs Python (2 pages ) http://www.kdnuggets.com/2017/06/ecosystem-data-science-machine-learning-software.html" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%load_ext watermark" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false, "slideshow": { "slide_type": "notes" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tom Brander \n", "last updated: Wed Jun 28 2017 09:41:56 CDT\n", "\n", "CPython 3.6.1\n", "IPython 5.3.0\n", "\n", "pandas 0.20.2\n", "numpy 1.12.1\n", "scipy 0.19.0\n", "sklearn 0.18.1\n", "tpot 0.8.3\n", "tensorflow 1.2.0\n", "\n", "compiler : GCC 4.8.2 20140120 (Red Hat 4.8.2-15)\n", "system : Linux\n", "release : 4.4.0-81-generic\n", "machine : x86_64\n", "processor : x86_64\n", "CPU cores : 8\n", "interpreter: 64bit\n", "Git hash : 1116fa037187d38cec52144d0bebde5ae0a0e484\n" ] } ], "source": [ "%watermark -a \"Tom Brander\" -u -n -t -z -v -m -p pandas,numpy,scipy,sklearn,tpot,tensorflow -g" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tue Jun 27 07:47:13 2017 \r\n", "+-----------------------------------------------------------------------------+\r\n", "| NVIDIA-SMI 375.66 Driver Version: 375.66 |\r\n", "|-------------------------------+----------------------+----------------------+\r\n", "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\r\n", "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\r\n", "|===============================+======================+======================|\r\n", "| 0 GeForce GTX 1070 Off | 0000:01:00.0 On | N/A |\r\n", "| N/A 45C P8 10W / N/A | 623MiB / 8105MiB | 0% Default |\r\n", "+-------------------------------+----------------------+----------------------+\r\n", " \r\n", "+-----------------------------------------------------------------------------+\r\n", "| Processes: GPU Memory |\r\n", "| GPU PID Type Process name Usage |\r\n", "|=============================================================================|\r\n", "| 0 1098 G /usr/lib/xorg/Xorg 282MiB |\r\n", "| 0 2345 G compiz 65MiB |\r\n", "| 0 2767 G ...anced GL_KHR_blend_equation_advanced_cohe 222MiB |\r\n", "| 0 10292 G ...s-passed-by-fd --v8-snapshot-passed-by-fd 50MiB |\r\n", "+-----------------------------------------------------------------------------+\r\n" ] } ], "source": [ "!nvidia-smi" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nvcc: NVIDIA (R) Cuda compiler driver\r\n", "Copyright (c) 2005-2016 NVIDIA Corporation\r\n", "Built on Wed_May__4_21:01:56_CDT_2016\r\n", "Cuda compilation tools, release 8.0, V8.0.26\r\n" ] } ], "source": [ "!nvcc --version" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.66 Mon May 1 15:29:16 PDT 2017\r\n", "GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) \r\n" ] } ], "source": [ "!cat /proc/driver/nvidia/version" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Best basic book Mainly SciKit Learn https://www.packtpub.com/big-data-and-business-intelligence/python-machine-learning Very useful for all Python ML stuff and algorithims and what to use when and where.. \n", "\n", "This still early release but the best for Keras (written by the guy who also conceived and wrote the library itself) https://www.manning.com/books/deep-learning-with-python \n", "\n", "Best book for TensorFlow http://shop.oreilly.com/product/0636920052289.do also conversely uses SciKit learn, as a method to explain some of the concepts in TF.. Highly recommended.. \n", "\n", "Most accessible code can be found with Jupyter examples so you want to get that set up on your machine http://jupyter.org/ \n", "\n", "Easiest way to get everything you need and keep up to date is Anaconda https://www.continuum.io/downloads Includes Jupyter mentioned above as well as Spyder a Python IDE (Win Linux And Mac) Don't even think of doing another way.. (you will thank me!) \n", "\n", "up front data exploration CSV kit https://csvkit.readthedocs.io/en/1.0.2/ specifically csvstat (lots more there though) including some god transition and report stuff.. \n", "\n", "I like https://github.com/JosPolfliet/pandas-profiling \n", "\n", "Oh yes now a days just start with python 3.6 not 2.7 " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python [conda env:py36n]", "language": "python", "name": "conda-env-py36n-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" }, "livereveal": { "scroll": true, "theme": "serif", "transition": "zoom" } }, "nbformat": 4, "nbformat_minor": 2 }