# The landscape of machine learning methods
Author: Adam Rivers

The landscape of ML was discussed a little in the module Framing ML problems. In this section we will go over a few ways that ML methods are classified, give the the major classes for each and place common methods you may come across into context.


## Categories by application

This is how we initially broke down ML models. Since we covered this on Lesson 2 we will not go over it further here.

![image.png](attachment:image.png)


## Categories by learning type

One way to group learning methods is by learning type. These types include supervised learning, unsupervised learning, semi-supervised leaning and active learning.

### Supervised learning

Supervised learning is what we have focused on in this course. In this method data are divided into training testing sets. Labeled training data is used to train the classifier.Supervised leaning is the most common type of machine learning. Both classification and regression methods fall into this category.

Examples:

* Regression
 * lasso regression
 * neural networks
 * tree methods
 
* Classification
 * Support vector machines
 * neural networks
 * tree methods
 * logistic regression
 

### Unsupervised learning 


Unsupervised learning attempts to learn the structure of data without labels. This problem is different from supervised leaning because there is no one correct answer. This also means that the metrics applied to supervised leaning are not relevant. Other metrics such as "lift" or "compactness" are sometimes applied Unsupervised leaning attempts to cluster data, reduce the dimension of data or find associations as in association rule mining. 


* Clustering
 * Hierarchical clustering
 * K-means clustering
 
* Dimensionality reduction
 * PCA
 * PCoA (NMDS)
 * T-distributed Stochastic Neighbors Embedding
 * Autoencoders
 
* Association rule mining
 * Apriori
 * ELcat
 * Repeated Incremental Pruning to Produce Error Reduction (RIPPER)
 
 ### Semi-supervised learning
 
 Semi-supervised learning takes a small amount of labeled training data and a large amount of unlabeled training data and learns from it. Why would we want to do this? Often labeling data (often manually) is expensive, We may have lots of images we can scrape from the internet but turning them into training data requires a human to identify what's in them (CAPTCHA anyone?). If we assume that similar images identified by clustering have the sam label we now have much larger dataset to learn from. 
 
 Examples:
 
 * Self training
 * Gaussian mixture models
 * Generative adversarial networks (GANs)
 
 
 ### Active learning
 
 In active learning we start with a large training set which is mostly unlabeled and train with the labeled data. now we find the unlabeled points that are most needed to improve the classifier, for examples those points closets to a decision boundary. The classifier asks for you to get labels for those points then adds them to improve the classifier incrementally. This is often useful when getting the training data is expensive. 
 
 For example we are constructing a classifier to predict sanitizer resistance in Salmonella from its genome sequence. We have 200,000 genomes and isolates but no sanitizer resistance data. Measuring sanitizer resistance takes about 1 week per 200 strains, so screening all strains in our Lab would take 20 years! We start building the classifier by selecting 200 of the most distantly related strains and screening them. Then we run the classifier at the end of the week to select the strains for the next week. At the end of several rounds, performance improvements plateau and we stop screening, hopefully this only takes a month or two. 
 
Active learning often bolts onto existing methods but includes special decision functions to select the next round of samples to label. 

Examples:

* Exp3 bandit
* Graph density
* K-center greedy

for examples see the [Google active learning repo](https://github.com/google/active-learning)

## Categories of learning by method

(with ideas from [This site](https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/))

The other major method of grouping methods is clustering them by how they operate, terms like neural network, random forest or Logistic regression refer to types of learning methods.

## Regression algorithms

Regression attempts to learn the relationship between a response variable and one or more predictor variables by iteratively adjusting the model parameters to reduce a cost or loss function.

Examples include:

* Ordinary Least Squares Regression (OLS)
* Other Linear Regressions
* Logistic Regression
* Locally Estimated Scatterplot Smoothing (LOESS)

## Neighbor or instance based methods

Instance or neighbor based methods use the label of most similar representative to classify something of return a list of the most similar items.

Examples include

* PageRank A web search algorithm, which returns the pages most related to your query
* BLAST A method to return the DNA Sequence most similar to your query. when you assign Taxonomy base on the best hit you are using nearest neighbor classification.
* K-nearest neighbors
* Support Vector machines
* Self organizing maps e.g. (ESOM)

## Regularization methods

These are not really a learning method in their own right but rather a way of improving methods by finding the optimal point in the Bias-Variance tradeoff. 

Examples are:

* L1 (Ridge regression)
* L2 LASSO regression
* ElasticNet (a combination of L1 and L2)
* Least-Angle Regression 

## Decision tree methods

In decision tree methods a tree (directed acyclic graph) is created where each node represents a feature used in the model. For each feature a decision value is set which determines which branch is followed to the next node (feature).

Decision trees are good at handling nonlinear data, smaller training sets and are easy to train.

Examples include:

* Iterative Dichotomiser ID3
* CART 

Often multiple decision trees are used together trees are used in ensemble models like Random Forests.

## Ensemble methods

These are methods that aggregate many weak classifiers to create a larger good classifier. For example you might make one classifier for reach feature that guesses the class based on one value. Each one by iteslf has poor accuracy but when each is combined together the accuracy is good.


Examples include:

* Boosted trees
* Bootstrapped Aggregation (Bagging)
* Gradient Boosted Regression Trees (GBRT)
* Random Forest


## Bayesian Methods 

This is a broad label and many methods employ some aspect of Bayesian methods. The key distinguishing feature is they apply a Bayesian framework. This means they:

1. Explicitly model the uncertainty in a prediction 
1. Assume that parameters are a distribution not a point value
2. Explicitly state a prior distribution, update that with data to calculate a posterior likelihood

Examples include:

* Naive Bayes
* Averaged One-Dependence Estimators
* Bayesian networks

## Neural networks

Neural networks are a class of learning algorithms that pass input values through multiple functions represented by neurons. Two paramters in each neuron Weights and Biases are iteratively modified to minimize loss in training data. This modification is typically done through a modification method called back propagation.

Examples include:

* The Perceptron (The first neural network method from the 1950's)
* Multilayer perceptron
* Autoencoders


## Deep neural networks

Deep Neural networks are really just neural networks with multiple layers.They have been around since the 1980's but initially performed poorly because they were hard to train and required lots of data. Advances in methods, computing and data converged about 10 years ago to allow the development of new configurations of neural networks that performed extremely well on specific tasks. And the hype was born.

* Examples include:

* Feed forward neural networks (simple multilayer neural networks)
* Convolutional neural networks (networks which took advantage of spatial data in pictures
* LSTM (Long short-term memory networks, which operate on sequential data







