{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#5 Machine Learning Basics" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "* [손고리즘] middle learning - 파이썬을 이용한 기계학습 알고리즘 기초 / 딥러닝 파트 5장 [2]\n", "* 김무성" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.1 Learning Algorithms\n", "* 5.2 Example: Linear Regression\n", "* 5.3 Generalization, Capacity, Overfitting and Under-fitting\n", "* 5.4 Hyperparameters and Validation Sets\n", "* 5.5 Estimators, Bias and Variance\n", "* 5.6 Maximum Likelihood Estimation\n", "* 5.7 Bayesian Statistics\n", "* 5.8 Supervised Learning Algorithms\n", "* 5.9 Unsupervised Learning Algorithms\n", "* 5.10 Weakly Supervised Learning\n", "* 5.11 Building a Machine Learning Algorithm\n", "* 5.12 The Curse of Dimensionality and Statistical Lim-itations of Local Generalization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<font color=\"red\">Deep learning is a specific kind of machine learning. In order to understand deeplearning well, one must have a solid understanding of the basic principles of ma-chine learning</font>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.1 Learning Algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.1.1 The Task, T\n", "* 5.1.2 The Performance Measure, P\n", "* 5.1.3 The Experience, E" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A machine learning algorithm is an algorithm that is able to learn from data. Butwhat do we mean by learning?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* A popular definition of learning in the context ofcomputer programs is \n", " - “A computer program is said to learn \n", " - from experience E \n", " - with respect to some class of tasks T \n", " - and performance measure P , \n", " - if its performance at tasks in T , as measured by P , improves with experience E”" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.1.1 The Task, T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Classification\n", "* Classification with missing inputs\n", "* Regression \n", "* Transcription\n", "* Translation\n", "* Structured output\n", "* Anomaly detection\n", "* Synthesis and sampling\n", "* Imputation of missing values\n", "* Denoising\n", "* Density or probability function estimation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Classification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap1.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap2.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://ipython-books.github.io/images/ml.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Classification with missing inputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Classification becomes more challenging if the computer program is not guaranteed that every measurement in its input vector will always be provided. \n", "* In order to solve the classification task, the learning algorithm only has to define a single function mappingfrom a vector input to a categorical output. \n", "* When some of the inputsmay be missing, rather than providing a single classification function, the learning algorithm must learn a set of functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap3.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://ipython-books.github.io/images/ml.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.bindichen.co.uk/uploads/regression%20ng.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Transcription" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In this type of task, the machine learning system is asked toobserve a relatively unstructured representation of some kind of data andtranscribe it into discrete, textual form." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"https://playingwithcode.files.wordpress.com/2013/03/calcpad_screenshot_2.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Translation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In a translation task, the input already consists of a sequenceof symbols in some language, and the computer program must convert thisinto a sequence of symbols in another language." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://amas.ie/wp-content/uploads/google-translate-2.jpg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Structured output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Structured output tasks involve any task where the output is a vector con-taining important relationships between the different elements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://nlp.stanford.edu/projects/fig1.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Anomaly detection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In this type of task, the computer program sifts througha set of events or objects, and flags some of them as being unusual or atypi-cal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.merchantrms.com/images/how-it-works.gif\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Synthesis and sampling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In this type of task, the machine learning algorithmis asked to generate new examples that are similar to those in the trainingdata." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://i867.photobucket.com/albums/ab236/littlebigbucket/SteppingError.png\" width=600 >" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/SpeechSynthesisProgrammingGuide/art/speechsynth_speechprocess.gif\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Imputation of missing values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The algorithm must provide a prediction of the values of themissing entries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### a new example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap4.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### missing value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap5.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.adapticon.com/images/impByIntpl.gif\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Denoising" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### a corrupted example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap6.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### a clean example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap7.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The learner must predict the cleanexample x from its corrupted version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap8.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* or more generally predict the con-ditional probability distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap9.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://people.tuebingen.mpg.de/burger/neural_denoising/images/denoising.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Density or probability function estimation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* P_model(x) can be interpreted as a probability density func-tion (if x is continuous) or a probability function (if x is discrete) on thespace that the examples were drawn from. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap12.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap11.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* if we have performed density estimation toobtain a probability distribution p(x), we can use that distribution to solvethe missing value imputation task." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap13.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In practice, density estimationdoes not always allow us to solve all of these related tasks, because in manycases the required operations on p(x) are computationally intractable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap14.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://sebastianraschka.com/Images/2014_parzen_density_estimation/parzen_goal.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.intechopen.com/source/html/42874/media/image4.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.1.2 The Performance Measure, P" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* accuracy\n", "* error rate\n", " - 0-1 loss\n", "* probability \n", "* test set & training set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to evaluate the abilities of a machine learning algorithm, we must designa quantitative measure of its performance. Usually this performance measure Pis specific to the task T being carried out by the system" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### accuracy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Accuracy is just theproportion of examples for which the model produces the correct output." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.welaptega.com/wp-content/uploads/2014/09/testing-accuracy.jpg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### error rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We canalso obtain equivalent information by measuring the error rate, the proportion ofexamples for which the model produces an incorrect output." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 0-1 loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We often refer to the error rate as the expected 0-1 loss. The 0-1 loss on a particular example is 0 if it is correctly classified and 1 if it is not" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://images.cnitblog.com/blog/61573/201411/081829455185940.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### probability" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* For tasks such as density estimation,we can measure the probability the model assigns to some examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### test set & training set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We therefore evaluate these performance mea-sures using a test set of data that is separate from the data used for training themachine learning system." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.holehouse.org/mlclass/10_Advice_for_applying_machine_learning_files/Image%20[1].png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.1.3 The Experience, E" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Unsupervised learning algorithms\n", "* Supervised learning algorithms\n", "* reinforcement learing algorithms\n", "* dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Most of the learning algorithms in this book can be understood \n", "\n", "* as being allowed to experience an entire <font color=\"red\">dataset</font>. \n", "* A dataset is a \n", " - collection of many objects called <font color=\"red\">examples</font>, \n", " - with each example containing many <font color=\"red\">features</font> \n", " - that have been objectively measured. \n", " - Sometimes we will also call examples <font color=\"red\">data points</font>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unsupervised learning algorithms " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://images.slideplayer.biz.tr/7/1917098/slides/slide_2.jpg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* unsupervised" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap15.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap16.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* supervised" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap17.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap18.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Supervised learning algorithms " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* label or target" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://wiki.cs.princeton.edu/images/2/2a/SupervisedLearning.jpg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unsupervised learning and supervised learning are not formally defined terms.\n", "\n", "* The lines between them are often blurred. \n", "* Many machine learning technologies can be used to perform both tasks. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For example, the chain rule of probability states that for a vector " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap19.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### the joint distribution can be decomposed as" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap20.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* This decomposition means that we can solve the ostensibly unsupervised problemof modeling p(x) by splitting it into n supervised learning problems." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://image.slidesharecdn.com/basicprobabilitystatistic-140531034844-phpapp02/95/basic-probability-statistics-4-638.jpg?cb=1401508165\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://image.slidesharecdn.com/lecturev2-150724073404-lva1-app6892/95/text-mining-from-bayes-rule-to-dependency-parsing-20-638.jpg?cb=1437726491\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### we can solve the supervised learning problem of learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap22.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### by using tra-ditional unsupervised learning technologies to learn the joint distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap23.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### and inferring" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap24.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### reinforcement learing algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Some machine learning algorithms do not just experience a fixed dataset.\n", "* For example, reinforcement learning algorithms interact with an environment, sothere is a feedback loop between the learning system and its experiences. \n", "* Such algorithms are beyond the scope of this book." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* features\n", "* design matrix\n", "* heterogeneous data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### features" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Most machine learning algorithms simply experience a dataset. A dataset canbe described in many ways. In all cases, a dataset is a collection of examples.Each example is a collection of observations called features collected from a dif-ferent time or place. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://image.slidesharecdn.com/t-235p-211yui-v2-140617144512-phpapp01/95/hivemail-scalable-machine-learning-library-for-apache-hive-21-638.jpg?cb=1403016416\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### design matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* One common way of describing a dataset is with a design matrix. \n", "* A design matrix is a matrix containing a different example in each row. \n", "* Each column of thematrix corresponds to a different feature" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://i.stack.imgur.com/VZtEr.jpg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### design matrix example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* For instance, the Iris dataset contains150 examples with four features for each example. \n", "* This means we can representthe dataset with a design matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap25.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap26.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* is the sepal lengthof plant i" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap27.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* is the sepal width of plant i" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### heterogeneous data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Different sections of this book describe how to handle differenttypes of heterogeneous data.\n", "* In cases like these, rather than describing the datasetas a matrix with m rows, we will describe it as a set containing m elements, e.g." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap28.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* This notation does not imply that any two example vectors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap29.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap30.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* have the same size." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Often when working with a dataset containing a design matrix of feature observations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap31.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* we alsoprovide a vector of labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap32.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "with" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap33.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* providing the label for example i" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.2 Example: Linear Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Linear_regression.svg/400px-Linear_regression.svg.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://ordination.okstate.edu/plane.jpg\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* input" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap34.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap35.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* linear regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap36.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap37.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* ith feature" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap38.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* ith weight" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap41.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### performance measure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### test set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* design matrix of input" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap43.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* regression target vector" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap44.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* predictionso of model on the test set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap45.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* mean squared error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://image.slidesharecdn.com/forecastingtechniques-091206232900-phpapp02/95/forecasting-techniques-22-728.jpg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap46.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Intuitively, one can see that this error measure decreases to 0 when" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap47.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We can also see that" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap48.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "so the error increases whenever the Euclidean distance between the predictionsand the targets increases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* To make a machine learning algorithm, we need to design an algorithm thatwill improve the weights " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap39.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "in a way that reduces" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap49.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "when the algorithmis allowed to gain experience by observing a training set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap50.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* To minimize" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap51.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we can simply solve for where its gradient is 0:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.mathsmutt.co.uk/files/dcten_files/image002.gif\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.tlchm.bris.ac.uk/~paulmay/misc/1s/img00108.gif\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap52.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.1.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/fig5.1.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* It’s worth noting that the term linear regression is often used to refer to aslightly more sophisticated model with one additional parameter—an intercept term" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap53.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In this model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap54.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "so the mapping from parameters to predictions is still a linear function but themapping from features to predictions is now an affine function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Linear regression is of course an extremely simple and limited learning al-gorithm, but it provides an example of how a learning algorithm can work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.3 Generalization, Capacity, Overfitting and Underfitting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.3.1 The No Free Lunch Theorem\n", "* 5.3.2 Regularization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The central challenge in machine learning is that we must perform well on <font color=\"red\">new,previously unseen inputs</font>—not just those on which our model was trained. ==> <font color=\"red\">generalization</font>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* generalization error\n", "* training error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* training set\n", "* test set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/Traintest.svg/700px-Traintest.svg.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In our linear regression example, we trained the model by minimizing thetraining error," ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap55.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* but we actually care about the test error," ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap56.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* <font color=\"red\">How can we affect performance on the test set when we only get to observethe training set?</font> \n", " - The field of <font color=\"red\">statistical learning theory</font> provides some answers. \n", " - If the training and the test set are <font color=\"red\">collected arbitrarily</font>, \n", " - there is indeed little we can do. \n", " - If we are allowed to make <font color=\"red\">some assumptions</font> about <font color=\"blue\">how the training and test set are collected</font>, then we can make some progress" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* some assumptions\n", " - <font color=\"red\">i.i.d. assumptions</font>.\n", " - independent\n", " - identically distributed\n", " - <font color=\"red\">data generating distribution</font>, or data generating process " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://artint.info/figures/ch07/plate.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The factors determining how well a machinelearning algorithm will perform are its ability to:\n", " 1. Make the training error small.\n", " 2. Make the gap between training and test error small." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* These two factors correspond to the two central challenges in machine learning:\n", " - underfitting and \n", " - overfitting." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://sanghyukchun.github.io/images/post/59-1.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* capacity\n", " - We <font color=\"red\">can control</font> whether a model is more likely to overfit or underfit <font color=\"red\">by altering its capacity</font>. \n", " - Informally, <font color=\"blue\">a model’s capacity is its ability to fit a wide variety of functions</font>. \n", " - Models with low capacity may struggle to fit the training set. \n", " - Models with high capacity can overfit, i.e., memorize properties of the training set that do not serve them well on the test set." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* hypothesis space\n", " - One way to <font color=\"red\">control the capacity</font> of a learning algorithm is <font color=\"red\">by choosing its hypothesis space</font>, the set of functions that the learning algorithm is allowed to choose as being the solution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.svms.org/srm/Sewell2006.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* A polynomial of degree one gives us the linear regression model with whichwe are already familiar, with prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap57.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* By introducing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap58.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "as another feature provided to the linear regression model, wecan learn a model that is quadratic as a function of" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap60.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap59.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Note that this is still a linear function of the parameters, so we can still use thenormal equations to train the model in closed form.\n", "* We can continue to add more powers of x as additional features, for example to obtain a polynomial of degree 9:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap61.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Machine learning algorithms will generally perform best <font color=\"red\">when their capacityis appropriate in regard to the true complexity of the task</font> they need to performand the amount of training data they are provided with." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/fig5.2.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Occam’s razor " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://muslimsi.com/wp-content/uploads/2014/12/quote-occam-s-razor-no-more-things-should-be-presumed-to-exist-than-are-absolutely-necessary-i-e-the-william-of-occam-372636.jpg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We must remember that while simpler functions are more likely to generalize(to have a small gap between training and test error) <font color=\"red\">we must still choose a sufficiently complex hypothesis</font> to achieve low training error." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/fig5.3.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* non-parametric model\n", " - To reach the most extreme case of arbitrarily high capacity, we introducethe concept of non-parametric models. So far, we have seen only parametricmodels, such as linear regression.\n", " - Parametric models learn a function describedby a parameter vector whose size is finite and fixed before any data is observed.\n", " - Non-parametric models have no such limitation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://img.docstoccdn.com/thumb/orig/113799484.png\" width=400 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* k-NN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.statsoft.com/portals/0/Support/KNNOverViewImageA.jpg\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://scikit-learn.sourceforge.net/0.8/_images/plot_neighbors_11.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* nearest neighbor regression(k-NN Regression)\n", " - Unlikely linear regression,which has a fixed-length vector of weights, the nearest neighbor regression modelsimply stores the X and y from the training set.\n", " - When asked to classify a test point x, the model looks up the nearest entry in the training set and returns theassociated regression target." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap62.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap63.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap64.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.saedsayad.com/images/KNN_similarity.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://scikit-learn.sourceforge.net/0.6/_images/plot_neighbors_regression.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Bayes error\n", " - The error incurred by an oraclemaking predictions from the true distribution p(x, y) is called the Bayes error.\n", " - 참고 - http://newsight.tistory.com/127" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"https://ranabasheer.files.wordpress.com/2011/04/bayes_error.jpg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"https://www.projectrhea.org/rhea/images/thumb/1/1d/Upper_bounds_Bayes_error_Pic1.jpg/700px-Upper_bounds_Bayes_error_Pic1.jpg\" width=600 /> " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/fig5.4.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* representational capacity & effective capacity\n", " - It’s worth mentioning that capacity is not just determined by which model we use. \n", " - The model specifies which family of functions the learning algorithm can choose from when varying the parameters in order to reduce a training objective.\n", " - This is called the <font color=\"red\">representational capacity</font> of the model. \n", " - In many cases, finding the best function within this family is a very difficult optimization problem. \n", " - In practice, the learning algorithm does not actually find the best function, <font color=\"red\">just one that significantly reduces the training error</font>. \n", " - These additional restrictions mean that the <font color=\"red\">model’s effective capacity may be less than its representational capacity</font>." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.3.1 The No Free Lunch Theorem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The no freelunch theorem for machine learning (Wolpert, 1996) states that, <font color=\"red\">averaged overall possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points</font>. In other words,in some sense, <font color=\"blue\">no machine learning algorithm is universally any better than anyother</font>. The most sophisticated algorithm we can conceive of has the same averageperformance (over all possible tasks) as merely predicting that every point belongsto the same class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Fortunately, these results hold only when we average over all possible datagenerating distributions. <font color=\"red\">If we make assumptions about the kinds of probability distributions we encounter in real-world applications</font>, then <font color=\"blue\">we can design learning algorithms that perform well on these distributions</font>." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.3.2 Regularization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://yosinski.com/mlss12/media/slides/MLSS-2012-Fukumizu-Kernel-Methods-for-Statistical-Learning_050.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### preference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The no free lunch theorem implies that we must design our machine learning algorithms to perform well on a specific task. \n", "* We do so by building a set of preferences into the learning algorithm. \n", "* When these preferences are aligned with the learning problems we ask the algorithm to solve, it performs better." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### hypotheis space of solutions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* So far, the only method of modifying a learning algorithm we have discussedis to increase or decrease the model’s capacity by adding or removing functionsfrom the hypothesis space of solutions the learning algorithm is able to choose." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The behavior of our algorithm is strongly affected not just by how large wemake the set of functions allowed in its hypothesis space, but by the specificidentity of those functions.\n", " - linear functions\n", " - The learning algorithm we have studied so far, linearregression, has a hypothesis space consisting of the set of linear functions of itsinput.\n", " - nonlinear functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap70.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### weight decay" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.alglib.net/dataanalysis/i/art0_trainreg.gif\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* For example, we can modify the training criterion for linear regression to include weight decay. \n", "* To perform linear regression with weight decay, we minimizenot only the mean squared error on the training set, but instead a criterion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap72.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "that expresses a preference for the weights to have smaller squared" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap73.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "norm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specifically," ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap74.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap75.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "is a value chosen ahead of time that controls the strength of our preferencefor smaller weights." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap76.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we impose no preference, " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and larger" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap77.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "forces the weights to become smaller." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Minimizing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap78.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "results in a choice of weights thatmake a tradeoff between fitting the training data and being small. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/fig5.5.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### regularization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In our weight decay example, we expressed our preference for linear functions defined with smaller weights explicitly, via an extra term in the criterion we minimize. \n", "* There are many other ways of expressing preferences for different solutions, both implicitly and explicitly. \n", "* Together, these different approaches are known as regularization. \n", "* Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.4 Hyperparameters and Validation Sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.4.1 Cross-Validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### hyperparameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Most machine learning algorithms have <font color=\"red\">several settings</font> that we can use to <font color=\"red\">control the behavior of the learning algorithm</font>. \n", "* These <font color=\"red\">settings are called hyperparameters</font>.\n", "* The <font color=\"red\">values of hyperparameters are not adapted by the learning algorithm itself</font>(though we can design a nested learning procedure where one learning algorithmlearns the best hyperparameters for another learning algorithm).\n", "* In the polynomial regression example we saw in Fig. 5.2, \n", " - there is a single hyperparameter: the <font color=\"red\">degree of the polynomial</font>, which <font color=\"blue\">acts as a capacity hyperparameter</font>. \n", " - The <font color=\"red\">λ value</font> used to control the strength of weight decay is another example of a hyperparameter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/fig5.2.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://d1vn86fw4xmcz1.cloudfront.net/content/royptb/365/1555/3247/F2.large.jpg\" width= 600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://cdn2.hubspot.net/hub/426799/hubfs/images/Screen_Shot_2015-05-27_at_8.40.18_AM.png?t=1440617420708&width=400\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### validation set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* More frequently, <font color=\"red\">we do not learn the hyperparameter</font> because it is not appropriate to learn that hyperparameter <font color=\"red\">on the training set</font>. \n", "* If learned on the training set, such hyperparameters would always choose the maximum possible model capacity, resulting in <font color=\"red\">overfitting</font> (referto Figure 5.3).\n", "* To solve this problem, <font color=\"red\">we need a validation set of examples</font> that the training algorithm <font color=\"red\">does not observe</font>.\n", "* Earlier we discussed how a held-out <font color=\"red\">test set</font>, composed of examples coming from the same distribution as the training set, can be used <font color=\"red\">to estimate</font> the generalization error of a learner, <font color=\"red\">after the learning process has completed</font>. \n", "* It is important that the <font color=\"red\">test examples are not used in any way to make choices about the model</font>, including its hyperparameters. \n", "* For this reason, <font color=\"red\">no example from the test set can be used in the validation set</font>.\n", "* For this reason, we always <font color=\"red\">construct the validation set from the training data</font>.\n", "* Specifically, we split the training data into two disjoint subsets. \n", "* Typically, one uses about 80% of the data for training and 20% for validation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://2.bp.blogspot.com/-Fxbj18t3qUk/TnJVmn6mu4I/AAAAAAAAAFE/JT2rx_RngcM/s1600/TrainingSplitting.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://images.cppblog.com/cppblog_com/sosi/WindowsLiveWriter/Crossvalidation_127F5/image_thumb.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.4.1 Cross-Validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* One issue with the idea of splitting the dataset into train/test or train/validation/test subsets is that only a small fraction of examples are used to evaluate generaliza-tion. \n", "* train/test\n", " - These procedures are based on the idea of repeating the training / testing computation on different randomly chosen subsets or splits of the original dataset.\n", "* train/validation/test\n", " - If <font color=\"red\">model selection or hyperparameter optimization</font> is required, things get more computationally expensive: \n", " - one can recurse the k-fold cross-validation idea, in-side the training set. \n", " - So we can have an <font color=\"red\">outer loop</font> that estimates test error and provides a “training set” for a <font color=\"red\">hyperparameter-free learner</font>, calling it k times to“train”. \n", " - That hyperparameter-free learner can then split its received training set by k-fold cross-validation into internal training/validation subsets (for example,splitting into k − 1 subsets is convenient, to reuse the same test blocks as the outer loop), \n", " - call a <font color=\"red\">hyperparameter-specific learner</font> for each choice of hyperparameter value on each of the training partition of this <font color=\"red\">inner loop</font>, \n", " - and compute the validation error by averaging across the k −1 validation sets \n", " - the errors made by the k −1 hyperparameter-specific learners trained on each of the internal training subsets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"http://www.intechopen.com/source/html/39037/media/image4.jpeg\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"https://chrisjmccormick.files.wordpress.com/2013/07/10_fold_cv.png\" width=600 />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.5 Estimators, Bias and Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.5.1 Point Estimation\n", "* 5.5.2 Bias\n", "* 5.5.3 Variance\n", "* 5.5.4 Trading off Bias and Variance and the Mean Squared Error\n", "* 5.5.5 Consistency" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.5.1 Point Estimation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap79.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap80.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap81.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.2.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap82.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Function Estimation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap83.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap85.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap86.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap87.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap88.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap89.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.5.2 Bias" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Example: Bernoulli Distribution \n", "* Example: Gaussian Distribution Estimator of the Mean \n", "* Example: Gaussian Distribution Estimators of the Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.3.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap90.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap91.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap92.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap93.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap94.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Bernoulli Distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap95.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap96.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap97.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap98.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap99.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Gaussian Distribution Estimator of the Mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap100.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap101.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap102.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap103.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.4.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap104.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Gaussian Distribution Estimators of the Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap105.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.5.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap106.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap107.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap108.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.6.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap109.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.5.3 Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Example: Bernoulli Distribution\n", "* Example: Gaussian Distribution Estimators of the Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.7.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.8.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Bernoulli Distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap110.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap111.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap112.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Gaussian Distribution Estimators of the Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap113.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap114.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.9.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/eq5.10.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap115.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap116.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap117.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap118.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap119.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.5.4 Trading off Bias and Variance and the Mean Squared Error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Example: Gaussian Distribution Estimators of the Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap120.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/fig5.6.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Gaussian Distribution Estimators of the Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap121.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.5.5 Consistency" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap122.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap123.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap124.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap125.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap126.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap127.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/cap128.png\" />" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.6 Maximum Likelihood Estimation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.6.1 Conditional Log-Likelihood and Mean Squared Error\n", "* 5.6.2 Properties of Maximum Likelihood" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.6.1 Conditional Log-Likelihood and Mean Squared Error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.6.2 Properties of Maximum Likelihood" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.7 Bayesian Statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.7.1 Maximum A Posteriori (MAP) Estimation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.7.1 Maximum A Posteriori (MAP) Estimation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.8 Supervised Learning Algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.8.1 Probabilistic Supervised Learning\n", "* 5.8.2 Support Vector Machines\n", "* 5.8.3 Other Simple Supervised Learning Algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.8.1 Probabilistic Supervised Learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.8.2 Support Vector Machines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.8.3 Other Simple Supervised Learning Algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.9 Unsupervised Learning Algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.9.1 Principal Components Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.9.1 Principal Components Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.10 Weakly Supervised Learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.11 Building a Machine Learning Algorithm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5.12 The Curse of Dimensionality and Statistical Lim-itations of Local Generalization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 5.12.1 The Curse of Dimensionality\n", "* 5.12.2 Smoothness and Local Constancy A Priori Preference\n", "* 5.12.3 Manifold Learning and the Curse of Dimensionality" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.12.1 The Curse of Dimensionality" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.12.2 Smoothness and Local Constancy A Priori Preference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.12.3 Manifold Learning and the Curse of Dimensionality" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 참고자료" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "* [1] 딥러닝 주교재 - http://www.iro.umontreal.ca/~bengioy/dlbook/\n", "* [2] 5 Machine Learning Basics - http://www.iro.umontreal.ca/~bengioy/dlbook/ml.html" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }