# Machine Learning Overview

### Preliminaries
- Goal
  - Top-level overview of machine learning
- Materials
  - Mandatory  
    - this notebook
  - Optional
    - Study Bishop pp. 1-4


### What is Machine Learning?
- Machine Learning relates to **building models from data and using these models in applications**.

- **Problem**: Suppose we want to develop an algorithm for a complex process about which we have little knowledge (so hand-programming is not possible).
  

  - **Solution**: Get the computer to develop the algorithm by itself by showing it examples of the behavior that we want.
  

  - Practically, we choose a library of models, and write a program that picks a model and tunes it to fit the data.
  

- This field is known in various scientific communities with slight variations under different names such as machine learning, statistical inference, system identification, data mining, source coding, data compression, data science, etc.

### Machine Learning and the Scientific Inquiry Loop

<p style="text-align:center;"><img src="./figures/scientific-inquiry-loop.png" width="600px"></p>

- Machine learning technology uses the scientific inquiry loop to develop models and use these models in applications.

### Machine Learning is Difficult

- Modeling (Learning) Problems
  - Is there any regularity in the data anyway?
  - What is our prior knowledge and how to express it mathematically?
  - How to pick the model library?
  - How to tune the models to the data?
  - How to measure the generalization performance?
  

- Quality of Observed Data
  - Not enough data
  - Too much data?
  - Available data may be messy (measurement noise, missing data points, outliers)

### A Machine Learning Taxonomy

<p style="text-align:center;"><img src="./figures/ml-taxonomy.png" width="600px"></p>

- **Supervised Learning**: Given examples of inputs and corresponding
desired outputs, predict outputs on future inputs.
  - Examples: classification, regression, time series prediction
  

- **Unsupervised Learning**: (a.k.a. **density estimation**). Given only inputs, automatically discover representations, features, structure, etc.
  - Examples: clustering, outlier detection, compression
  

- **Trial Design**: (a.k.a. experimental design, active learning). Learn to make actions that optimize some performance criterion about the expected future. 
  - Examples: playing games like chess, self-driving cars, robotics. 
  - Two major approaches include **reinforcement learning** and **active inference**
    - **Reinforcement Learning**: Given an observed sequence of input signals and (occasionally observed) rewards for those inputs, _learn_ to select actions
that maximize *expected* future rewards.
    - **Active inference**: Given an observed sequence of input signals and a prior probability distribution about future observations, _learn_ to select actions
that minimize *expected* prediction errors (i.e., minimize actual minus predicted sensation).    

- Other stuff, like **preference learning**, **learning to rank**, etc., can often be (re-)formulated as special cases of either a supervised, unsupervised or trial design problem.

### Supervised Learning

- Given observations of desired input-output behavior $D=\{(x_1,y_1),\dots,(x_N,y_N)\}$ (with $x_i$ inputs and $y_i$ outputs), the goal is to estimate the conditional distribution $p(y|x)$ (i.e., how does $y$ depend on $x$?).

##### Classification 

<p style="text-align:center;"><img src="./figures/Bishop-Figure4.5b.png" width="300px"></p>

- The target variable $y$ is a _discrete-valued_ vector representing class labels 
- The special case $y \in \{\text{true},\text{false}\}$ is called **detection**. 

##### Regression 

<p style="text-align:center;"><img src="./figures/Bishop-Figure1.2.png" width="300px"></p>

- Same problem statement as classification but now the target variable is a _real-valued_ vector.
- Regression is sometimes called **curve fitting**.

### Unsupervised Learning

Given data $D=\{x_1,\ldots,x_N\}$, model the (unconditional) probability distribution $p(x)$ (a.k.a. **density estimation**). The two primary applications are **clustering** and **compression** (a.k.a. dimensionality reduction).  

##### Clustering

<p style="text-align:center;"><img src="./figures/fig-Zoubin-clustering-example.png" width="500px"></p>

- Group data into clusters such that all data points in a cluster have similar properties.
- Clustering can be interpreted as ''unsupervised classification''.

##### Compression / dimensionality reduction

<p style="text-align:center;"><img src="./figures/fig-compression-example.png" width="500px"></p>

- Output from coder is much smaller in size than original, but if coded signal if further processed by a decoder, then the result is very close (or exactly equal) to the original.
- Usually, the compressed image comprises continuously valued variables. In that case, compression can be interpreted as ''unsupervised regression''.

### Trial Design and Decision-making

- Given the state of the world (obtained from sensory data), the agent must _learn_ to produce actions (like making a movement or making a decision) that optimize some performance criterion about the expected future.

<p style="text-align:center;"><img src="./figures/RL-example.png" width="600px"></p>

- In contrast to supervised and unsupervised learning, an agent is able to affect its data set by making actions, e.g., a robot can change its input video data stream by turning the head of its camera. 

- In this course, we focus on the active inference approach to trial design, see the [Intelligent Agent lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Intelligent-Agents-and-Active-Inference.ipynb) for details. 


### <a id="some-ml-apps">Some Machine Learning Applications</a>

- computer speech recognition, speaker recognition
- face recognition, iris identification
- printed and handwritten text parsing
- financial prediction, outlier detection (credit-card fraud)
- user preference modeling (amazon); modeling of human perception
- modeling of the web (google)
- machine translation
- medical expert systems for disease diagnosis (e.g., mammogram)
- strategic games (chess, go, backgammon), self-driving cars

- In summary, **any 'knowledge-poor' but 'data-rich' problem**
 

In [1]:
open("../../styles/aipstyle.html") do f display("text/html", read(f,String)) end