{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [ "s1", "content", "l1" ] }, "source": [ "# Machine Learning\n", "\n", "## Introduction to Machine Learning\n", "\n", "### Machine Learning\n", "\n", "It is important for Data Scientists to understand the basics of Machine Learning in order to understand what is happening under the hood. One of the first questions students wonder is \"Why is Machine Learning used synonymously with Data Science?\". We shall try to answer this foremost question in this brief course on Machine Learning, abbreviated as ML. \n", "\n", "### Definition\n", "\n", "Machine Learning can be described as an sub-area of AI that involves learning or intelligence by computers. \n", "\n", "### Artificial Intelligence, Pattern Recognition, Statistical Learning\n", "\n", "These are the other terms that are commonly used by the Data Scientists. There are important differences that are worth knowing as we progress to understand Data Science. \n", "\n", "### Artificial Intelligence (AI)\n", "\n", "The field of AI research was founded at a conference on the campus of Dartmouth College in the summer of 1956. Those who attended would become the leaders of AI research for decades. Many of them predicted that a machine as intelligent as a human being would exist in no more than a generation and they were given millions of dollars to make this vision come true [1]. AI is now used to describe algorithms or mathematical techniques that can imitate intelligence involving activities such as learning, inference, predictions and decision making. \n", "\n", "### Pattern Recognition\n", "\n", "Pattern Recognition is an area of AI which involves inference of patterns in the complex data set.\n", "\n", "### Statistical Learning\n", "\n", "Statistical Learning involves aspects of statistical analysis of the data set, results as well as theories of statistics along with the intelligence aspect of Pattern Recognition. \n", "\n", "## Classes of Problems in ML\n", "\n", "\n", "\n", "The types of problems in ML can be categorized into Supervised Learning, Unsupservised Learning, Reinforcment Learning and Recommendation. \n", "\n", "### Supervised Learning\n", "\n", "Supervised Learning refers to a class of learning where data with the resulting outputs for certain scenarios are available. The ML algorithms learn from the known data sets and their results. The ML algorithms are termed supervised as we can evaluate how good they are depending on their ability to produce output similar to what is already known. There are mainly 2 categories of Supervised Learning problems:\n", "\n", "- Prediction\n", "- Classification\n", "\n", "Prediction - This type of ML algorithms are involved in prediction such as prediction of weather, stocks, etc...\n", "\n", "Classification - Image Classification, Character Recognition are usually the type of problems that fall into this category. \n", "\n", "### Unsupervised Learning\n", "\n", "The category of problems that involve extracting meaningful information from the data such as clustering are called as Unsupervised Learning. This is because, no target is involved in the operations of the ML algorithms. \n", "\n", "\n", "\n", "### Recommendation \n", "\n", "Recommendations of movies, shopping lists are examples of ML algorithms that fall into this category. \n", "\n", "### Reinforcement Learning\n", "\n", "This class of algorithms solve decision making steps in scenarios by taking various actions while maximizing a reward. Robotics to a large extent, is a field that uses Reinforcement Learning.\n", "\n", "
\n", "## Exercise:\n", "\n", "Given the data set of college majors with information as to who secured a job and who didn't, what is this class of ML problem?\n", "\n", "\n", "\n", "\t\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
College Majors
Major GradeInternshipSportsJob at Graduation
EngineeringANoNoNo
ArtsBYesYesYes
MathematicsBBNoYes
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "tags": [ "s1", "ce", "l1" ] }, "outputs": [], "source": [ "supervised = 'Supervised Learning, Classification'\n", "unsupervised = 'UnSupervised Learning'\n", "recommendation = 'Recommendation'\n", "reinforcement = 'Reinforcement Learning'\n", "# Assign the following variable with the variable above representing the class of ML problem.\n", "ml_class = supervised\n" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s1", "l1", "hint" ] }, "source": [] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true, "tags": [ "s1", "l1", "ans" ] }, "outputs": [], "source": [ "ml_class = supervised" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "s1", "hid", "l1" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "continue\n" ] } ], "source": [ "ref_tmp_var = False\n", "\n", "\n", "try:\n", " ref_assert_var = False\n", " if ml_class=='Supervised Learning, Classification':\n", " ref_assert_var = True\n", " else:\n", " ref_assert_var = False\n", " \n", " #ref_assert_var = False\n", " \n", " #import Levenshtein\n", " #ml_class_ = 'Supervised Learning, Classification'\n", " #ratio = Levenshtein.ratio(ml_class_, ml_class)\n", " #if ratio < 0.79:\n", " # ref_assert_var = False\n", " #else:\n", " # ref_assert_var = True\n", " \n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "else:\n", " if ref_assert_var:\n", " ref_tmp_var = True\n", " else:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l2", "content", "s2" ] }, "source": [ "\n", "\n", "


\n", "## Supervised Learning\n", "\n", "### What is Supervised Learning?\n", "\n", "In this lesson, we shall try to understand supervised learning and define it in a mathematical language. Suppose we have a data with points (x, y) that are generated by a process that looks like this when we plot:\n", "\n", "\n", "\n", "As humans, we can infer that the points follow a straight line. How do we explain the process mathematically? We can do so by solving for the straight line equation:\n", "
\n",
    "y = mx + c\n",
    "
\n", "Let us solve for this equation by assuming that one of the points, (2, 11) lies on the graph:\n", "
\n",
    "11 = 2m + c                   ...(1)\n",
    "
\n", "Sovling for c,\n", "
\n",
    "c = 11 - 2m                    \n",
    "
\n", "We need another point to solve this equation since there are 2 variables. \n", "\n", "Let us now consider another point say (7, 26):\n", "
\n",
    "26 = 7m + c                   ....(2)\n",
    "
\n", "With the equations (1) and (2),\n", "\n", "we can solve for m = 3 and c = 5.\n", "Hence, we can write the equation for the line as:\n", "
\n",
    "y = 3x + 5\n",
    "
\n", "We can also verify for the known points in x = {0, 1, 2, ..., 10} that the y generated is indeed correct. For example for point on x-axis, 13,\n", "
\n",
    "y = 3*13 + 5 = 44 which is correct.\n",
    "
\n", "Once we know the equation of the line, we can say that we can predict the values for any point on the x-axis. We have now learnt the process that generated the points. Given these points, providing a mathematical technique to discover the line equation would be termed as learning, as with the known parameters, m & c, we can predict the future values of y, given any x outside of the known dataset. \n", "\n", "\n", "\n", "\n", "### Terminologies in Machine Learning\n", "\n", "The terminologies in ML are going to be important to Data Science as these are used frequently. \n", "\n", "- In the above example, the known data set of x and corresponding y for each point in x, is called Ground Truth or Training Data.\n", "- set of points of y is called the Target Vector.\n", "- The function y = mx + c is called the Model and m, c are known are the parameters of the model. \n", "- The process of solving equations is called Learning or Training the Model.\n", "- Computing the value of y for an x is called prediction.\n", "- A set of values of x for which y needs to be predicted, that are outside of the known values used in Training is termed test data.\n", "\n", "
\n", "## Exercise:\n", "\n", "Given a line y = 5*x + 3,\n", "\n", "compute predictions for x = {1, 5, 10, 12} and assign it to variable y." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true, "tags": [ "l2", "ce", "s2" ] }, "outputs": [], "source": [ "# Compute y for x, define x below.\n", "x = 'array'\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l2", "s2", "hint" ] }, "source": [ "

Use numpy arrays

" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [ "l2", "s2", "ans" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 8 28 53 63]\n" ] } ], "source": [ "import numpy as np\n", "x = np.array([1, 5, 10, 12])\n", "y = 5*x + 3\n", "print(y)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [ "l2", "hid", "s2" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "continue\n" ] } ], "source": [ "ref_tmp_var = False\n", "\n", "\n", "try:\n", " ref_assert_var = False\n", " import numpy as np\n", " \n", " x_ = np.array([1, 5, 10, 12])\n", " y_ = 5*x_ + 3\n", " \n", " ref_assert_var = False\n", " \n", " if np.all(x == x_):\n", " ref_assert_var = True\n", " \n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "else:\n", " if ref_assert_var:\n", " ref_tmp_var = True\n", " else:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "\n", "assert ref_tmp_var" ] } ], "metadata": { "executed_sections": [], "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" }, "rf_version": 1 }, "nbformat": 4, "nbformat_minor": 2 }