{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook ([video #2](https://www.youtube.com/watch?v=IsXXlYVBt1M&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=2))\n", "\n", "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n", "\n", "**Note:** Since the video recording, the official name of the \"IPython Notebook\" was changed to \"Jupyter Notebook\". However, the functionality is the same." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Agenda\n", "\n", "- What are the benefits and drawbacks of scikit-learn?\n", "- How do I install scikit-learn?\n", "- How do I use the Jupyter Notebook?\n", "- What are some good resources for learning Python?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![scikit-learn algorithm map](images/02_sklearn_algorithms.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Benefits and drawbacks of scikit-learn\n", "\n", "### Benefits:\n", "\n", "- **Consistent interface** to Machine Learning models\n", "- Provides many **tuning parameters** but with **sensible defaults**\n", "- Exceptional **documentation**\n", "- Rich set of functionality for **companion tasks**\n", "- **Active community** for development and support\n", "\n", "### Potential drawbacks:\n", "\n", "- Harder (than R) to **get started with Machine Learning**\n", "- Less emphasis (than R) on **model interpretability**\n", "\n", "### Further reading:\n", "\n", "- Ben Lorica: [Six reasons why I recommend scikit-learn](https://www.oreilly.com/content/six-reasons-why-i-recommend-scikit-learn/)\n", "- scikit-learn authors: [API design for machine learning software](https://arxiv.org/pdf/1309.0238v1.pdf)\n", "- Data School: [Should you teach Python or R for data science?](https://www.dataschool.io/python-or-r-for-data-science/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![scikit-learn logo](images/02_sklearn_logo.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing scikit-learn\n", "\n", "**Option 1:** [Install scikit-learn library](https://scikit-learn.org/stable/install.html) and dependencies (NumPy and SciPy)\n", "\n", "**Option 2:** [Install Anaconda distribution](https://www.anaconda.com/products/individual) of Python, which includes:\n", "\n", "- Hundreds of useful packages (including scikit-learn)\n", "- IPython and Jupyter Notebook\n", "- conda package manager\n", "- Spyder IDE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Jupyter logo](images/02_jupyter_logo.svg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the Jupyter Notebook\n", "\n", "### Components:\n", "\n", "- **IPython interpreter:** enhanced version of the standard Python interpreter\n", "- **Browser-based notebook interface:** weave together code, formatted text, and plots\n", "\n", "### Installation:\n", "\n", "- **Option 1:** [Install the Jupyter notebook](https://jupyter.readthedocs.io/en/latest/install.html) (includes IPython)\n", "- **Option 2:** Included with the Anaconda distribution\n", "\n", "### Launching the Notebook:\n", "\n", "- Type **jupyter notebook** at the command line to open the dashboard\n", "- Don't close the command line window while the Notebook is running\n", "\n", "### Keyboard shortcuts:\n", "\n", "**Command mode** (gray border)\n", "\n", "- Create new cells above (**a**) or below (**b**) the current cell\n", "- Navigate using the **up arrow** and **down arrow**\n", "- Convert the cell type to Markdown (**m**) or code (**y**)\n", "- See keyboard shortcuts using **h**\n", "- Switch to Edit mode using **Enter**\n", "\n", "**Edit mode** (green border)\n", "\n", "- **Ctrl+Enter** to run a cell\n", "- Switch to Command mode using **Esc**\n", "\n", "### IPython, Jupyter, and Markdown resources:\n", "\n", "- [nbviewer](https://nbviewer.jupyter.org/): view notebooks online as static documents\n", "- [IPython documentation](https://ipython.readthedocs.io/en/stable/)\n", "- [Jupyter Notebook quickstart](https://jupyter.readthedocs.io/en/latest/content-quickstart.html)\n", "- [GitHub's Mastering Markdown](https://guides.github.com/features/mastering-markdown/): short guide with lots of examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resources for learning Python\n", "\n", "- [Codecademy's Python course](https://www.codecademy.com/learn/learn-python): browser-based, tons of exercises\n", "- [DataQuest](https://www.dataquest.io/): browser-based, teaches Python in the context of data science\n", "- [Google's Python class](https://developers.google.com/edu/python/): slightly more advanced, includes videos and downloadable exercises (with solutions)\n", "- [Python for Everybody](https://www.py4e.com/): beginner-oriented book, includes slides and videos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comments or Questions?\n", "\n", "- Email: \n", "- Website: https://www.dataschool.io\n", "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", "\n", "© 2021 [Data School](https://www.dataschool.io). All rights reserved." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" } }, "nbformat": 4, "nbformat_minor": 1 }