{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# What is Machine Learning, and how does it work? ([video #1](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1))\n", "\n", "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Machine Learning](images/01_robot.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Agenda\n", "\n", "- What is Machine Learning?\n", "- What are the two main categories of Machine Learning?\n", "- What are some examples of Machine Learning?\n", "- How does Machine Learning \"work\"?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is Machine Learning?\n", "\n", "One definition: \"Machine Learning is the semi-automated extraction of knowledge from data\"\n", "\n", "- **Knowledge from data**: Starts with a question that might be answerable using data\n", "- **Automated extraction**: A computer provides the insight\n", "- **Semi-automated**: Requires many smart decisions by a human" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What are the two main categories of Machine Learning?\n", "\n", "**Supervised learning**: Making predictions using data\n", " \n", "- Example: Is a given email \"spam\" or \"ham\"?\n", "- There is an outcome we are trying to predict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Spam filter](images/01_spam_filter.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Unsupervised learning**: Extracting structure from data\n", "\n", "- Example: Segment grocery store shoppers into clusters that exhibit similar behaviors\n", "- There is no \"right answer\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Clustering](images/01_clustering.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How does Machine Learning \"work\"?\n", "\n", "High-level steps of supervised learning:\n", "\n", "1. First, train a **Machine Learning model** using **labeled data**\n", "\n", " - \"Labeled data\" has been labeled with the outcome\n", " - \"Machine Learning model\" learns the relationship between the attributes of the data and its outcome\n", "\n", "2. Then, make **predictions** on **new data** for which the label is unknown" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Supervised learning](images/01_supervised_learning.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The primary goal of supervised learning is to build a model that \"generalizes\": It accurately predicts the **future** rather than the **past**!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Questions about Machine Learning\n", "\n", "- How do I choose **which attributes** of my data to include in the model?\n", "- How do I choose **which model** to use?\n", "- How do I **optimize** this model for best performance?\n", "- How do I ensure that I'm building a model that will **generalize** to unseen data?\n", "- Can I **estimate** how well my model is likely to perform on unseen data?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resources\n", "\n", "- Book: [An Introduction to Statistical Learning](https://www.statlearning.com/) (section 2.1, 14 pages)\n", "- Video: [Learning Paradigms](https://www.youtube.com/watch?v=mbyG85GZ0PI&t=2162s) (13 minutes, starting at 36:02)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comments or Questions?\n", "\n", "- Email: \n", "- Website: https://www.dataschool.io\n", "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", "\n", "© 2021 [Data School](https://www.dataschool.io). All rights reserved." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" } }, "nbformat": 4, "nbformat_minor": 1 }