<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="pandoc" /> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta property="og:title" content="Chapter 7. Learning (II): SVM & Ensemble Learning | Data Analytics: A Small Data Approach" /> <meta property="og:type" content="book" /> <meta property="og:description" content="This book is suitable for an introductory course of data analytics to help students understand some main statistical learning models, such as linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, etc. Data science practice is a process that should be told as a story, rather than a one-time implementation of one single model. This process is a main focus of this book, with many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines." /> <meta name="author" content="Shuai Huang & Houtao Deng" /> <meta name="date" content="2022-01-02" /> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> <script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js" type="text/javascript"></script> <meta name="description" content="This book is suitable for an introductory course of data analytics to help students understand some main statistical learning models, such as linear regression, logistic regression, tree models and random forests, ensemble learning, sparse learning, principal component analysis, kernel methods including the support vector machine and kernel regression, etc. Data science practice is a process that should be told as a story, rather than a one-time implementation of one single model. This process is a main focus of this book, with many course materials about exploratory data analysis, residual analysis, and flowcharts to develop and validate models and data pipelines."> <title>Chapter 7. Learning (II): SVM & Ensemble Learning | Data Analytics: A Small Data Approach</title> <script src="libs/header-attrs-2.11/header-attrs.js"></script> <link href="libs/tufte-css-2015.12.29/tufte.css" rel="stylesheet" /> <link href="libs/tufte-css-2015.12.29/envisioned.css" rel="stylesheet" /> <meta name="description" content="My awesome presentation"/> <!-- Global site tag (gtag.js) - Google Analytics --> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-194836795-1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'UA-194836795-1'); </script> <script src="https://use.typekit.net/ajy6rnl.js"></script> <script>try{Typekit.load({ async: true });}catch(e){}</script> <!-- <link rel="stylesheet" href="css/normalize.css"> --> <!-- <link rel="stylesheet" href="css/envisioned.css"/> --> <link rel="stylesheet" href="css/tablesaw-stackonly.css"/> <link rel="stylesheet" href="css/nudge.css"/> <link rel="stylesheet" href="css/sourcesans.css"/> <style type="text/css">code{white-space: pre;}</style> <style type="text/css"> pre > code.sourceCode { white-space: pre; position: relative; } pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { pre > code.sourceCode { white-space: pre-wrap; } pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; color: #aaaaaa; } pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } div.sourceCode { } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span.al { color: #ff0000; font-weight: bold; } /* Alert */ code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code span.at { color: #7d9029; } /* Attribute */ code span.bn { color: #40a070; } /* BaseN */ code span.bu { } /* BuiltIn */ code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ code span.ch { color: #4070a0; } /* Char */ code span.cn { color: #880000; } /* Constant */ code span.co { color: #60a0b0; font-style: italic; } /* Comment */ code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code span.do { color: #ba2121; font-style: italic; } /* Documentation */ code span.dt { color: #902000; } /* DataType */ code span.dv { color: #40a070; } /* DecVal */ code span.er { color: #ff0000; font-weight: bold; } /* Error */ code span.ex { } /* Extension */ code span.fl { color: #40a070; } /* Float */ code span.fu { color: #06287e; } /* Function */ code span.im { } /* Import */ code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ code span.kw { color: #007020; font-weight: bold; } /* Keyword */ code span.op { color: #666666; } /* Operator */ code span.ot { color: #007020; } /* Other */ code span.pp { color: #bc7a00; } /* Preprocessor */ code span.sc { color: #4070a0; } /* SpecialChar */ code span.ss { color: #bb6688; } /* SpecialString */ code span.st { color: #4070a0; } /* String */ code span.va { color: #19177c; } /* Variable */ code span.vs { color: #4070a0; } /* VerbatimString */ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ </style> </head> <body> <!--bookdown:toc:start--> <nav class="pushy pushy-left" id="TOC"> <ul> <li><a href="#cover">Cover</a></li> <li><a href="#preface">Preface</a></li> <li><a href="#acknowledgments">Acknowledgments</a></li> <li><a href="#chapter-1.-introduction">Chapter 1. Introduction</a></li> <li><a href="#chapter-2.-abstraction-regression-tree-models">Chapter 2. Abstraction: Regression & Tree Models</a></li> <li><a href="#chapter-3.-recognition-logistic-regression-ranking">Chapter 3. Recognition: Logistic Regression & Ranking</a></li> <li><a href="#chapter-4.-resonance-bootstrap-random-forests">Chapter 4. Resonance: Bootstrap & Random Forests</a></li> <li><a href="#chapter-5.-learning-i-cross-validation-oob">Chapter 5. Learning (I): Cross-validation & OOB</a></li> <li><a href="#chapter-6.-diagnosis-residuals-heterogeneity">Chapter 6. Diagnosis: Residuals & Heterogeneity</a></li> <li><a href="#chapter-7.-learning-ii-svm-ensemble-learning">Chapter 7. Learning (II): SVM & Ensemble Learning</a></li> <li><a href="#chapter-8.-scalability-lasso-pca">Chapter 8. Scalability: LASSO & PCA</a></li> <li><a href="#chapter-9.-pragmatism-experience-experimental">Chapter 9. Pragmatism: Experience & Experimental</a></li> <li><a href="#chapter-10.-synthesis-architecture-pipeline">Chapter 10. Synthesis: Architecture & Pipeline</a></li> <li><a href="#conclusion">Conclusion</a></li> <li><a href="#appendix-a-brief-review-of-background-knowledge">Appendix: A Brief Review of Background Knowledge</a></li> </ul> </nav> <!--bookdown:toc:end--> <div class="menu-btn"><h3>☰ Menu</h3></div> <div class="site-overlay"></div> <div class="row"> <div class="col-sm-12"> <nav class="pushy pushy-left" id="TOC"> <ul> <li><a href="index.html#cover">Cover</a></li> <li><a href="preface.html#preface">Preface</a></li> <li><a href="acknowledgments.html#acknowledgments">Acknowledgments</a></li> <li><a href="chapter-1.-introduction.html#chapter-1.-introduction">Chapter 1. Introduction</a></li> <li><a href="chapter-2.-abstraction-regression-tree-models.html#chapter-2.-abstraction-regression-tree-models">Chapter 2. Abstraction: Regression & Tree Models</a></li> <li><a href="chapter-3.-recognition-logistic-regression-ranking.html#chapter-3.-recognition-logistic-regression-ranking">Chapter 3. Recognition: Logistic Regression & Ranking</a></li> <li><a href="chapter-4.-resonance-bootstrap-random-forests.html#chapter-4.-resonance-bootstrap-random-forests">Chapter 4. Resonance: Bootstrap & Random Forests</a></li> <li><a href="chapter-5.-learning-i-cross-validation-oob.html#chapter-5.-learning-i-cross-validation-oob">Chapter 5. Learning (I): Cross-validation & OOB</a></li> <li><a href="chapter-6.-diagnosis-residuals-heterogeneity.html#chapter-6.-diagnosis-residuals-heterogeneity">Chapter 6. Diagnosis: Residuals & Heterogeneity</a></li> <li><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#chapter-7.-learning-ii-svm-ensemble-learning">Chapter 7. Learning (II): SVM & Ensemble Learning</a></li> <li><a href="chapter-8.-scalability-lasso-pca.html#chapter-8.-scalability-lasso-pca">Chapter 8. Scalability: LASSO & PCA</a></li> <li><a href="chapter-9.-pragmatism-experience-experimental.html#chapter-9.-pragmatism-experience-experimental">Chapter 9. Pragmatism: Experience & Experimental</a></li> <li><a href="chapter-10.-synthesis-architecture-pipeline.html#chapter-10.-synthesis-architecture-pipeline">Chapter 10. Synthesis: Architecture & Pipeline</a></li> <li><a href="conclusion.html#conclusion">Conclusion</a></li> <li><a href="appendix-a-brief-review-of-background-knowledge.html#appendix-a-brief-review-of-background-knowledge">Appendix: A Brief Review of Background Knowledge</a></li> </ul> </nav> </div> </div> <div class="row"> <div class="col-sm-12"> <div id="chapter-7.-learning-ii-svm-ensemble-learning" class="section level1 unnumbered"> <h1>Chapter 7. Learning (II): SVM & Ensemble Learning</h1> <div id="overview-5" class="section level2 unnumbered"> <h2>Overview</h2> <p>Chapter 7 revisits <em>learning</em> from a perspective that is different from <strong>Chapter 5</strong>. In <strong>Chapter 5</strong> we have introduced the concept of overfitting and the use of cross-validation as a safeguard mechanism to help us build models that don’t overfit the data. It focused on fair evaluation of the performances of a <em>specific model</em>. <strong>Chapter 7</strong>, taking on a process-oriented view of the issue of overfitting, focuses on performances of a <em>learning algorithm</em><label for="tufte-sn-167" class="margin-toggle sidenote-number">167</label><input type="checkbox" id="tufte-sn-167" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">167</span> Algorithms are computational procedures that learn models from data. They are processes.</span>. This chapter introduces two methods that aim to build a safeguard mechanism into the learning algorithms themselves. The two methods are the <strong>Support Vector Machine</strong> (<strong>SVM</strong>) and <strong>Ensemble Learning</strong><label for="tufte-sn-168" class="margin-toggle sidenote-number">168</label><input type="checkbox" id="tufte-sn-168" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">168</span> The random forest model is a typical example of ensemble learning</span>. While all models could overfit a dataset, these methods aim to reduce risk of overfitting based on their unique modeling principles.</p> <p>In short, <strong>Chapter 5</strong> introduced evaluative methods that concern <em>if a model has learned from the data</em>. It is about quality assessment. <strong>Chapter 7</strong> introduces learning methods that concern <em>how to learn better from the data</em>. It is about quality improvement.</p> </div> <div id="support-vector-machine" class="section level2 unnumbered"> <h2>Support vector machine</h2> <div id="rationale-and-formulation-10" class="section level3 unnumbered"> <h3>Rationale and formulation</h3> <p>A learning algorithm has an <em>objective function</em> and sometimes a set of <em>constraints</em>. The objective function corresponds to a quality of the learned model that could help it succeed on the unseen testing data. Eqs. <a href="chapter-2.-abstraction-regression-tree-models.html#eq:2-multiLR-LS-matrix">(16)</a>, <a href="chapter-3.-recognition-logistic-regression-ranking.html#eq:3-likelihood">(28)</a>, and <a href="chapter-6.-diagnosis-residuals-heterogeneity.html#eq:6-complete-loglike2">(40)</a>, are examples of objective functions. They are developed based on the <em>likelihood principle</em>. Besides the likelihood principle, researchers have been studying what else quality a model should have and what objective function we should optimize to enhance this quality of the model. The constraints, on the other hand, guard the bottom line: the learned model needs to at least perform well on the training data so it is possible to perform well on future unseen data<label for="tufte-sn-169" class="margin-toggle sidenote-number">169</label><input type="checkbox" id="tufte-sn-169" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">169</span> The testing data, while unseen, is assumed to be statistically the same as the training data. This is a basic assumption in machine learning.</span>.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-models"></span> <img src="graphics/7_models.png" alt="Which model (i.e., here, which line) should we use as our classification model to separate the two classes of data points?" width="100%" /> <!-- <p class="caption marginnote">-->Figure 117: Which model (i.e., here, which line) should we use as our classification model to separate the two classes of data points?<!--</p>--> <!--</div>--></span> </p> <p></p> <p>Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-models">117</a> shows an example of a binary classification problem. The constraints here are obvious: the models should correctly classify the data points. And the <span class="math inline">\(3\)</span> models all perform well, while we hesitate to say that the <span class="math inline">\(3\)</span> models are equally good. Common sense tells us that Model <span class="math inline">\(3\)</span> is the least favorable. Unlike the other two, Model <span class="math inline">\(3\)</span> is close to a few data points. This makes Model <span class="math inline">\(3\)</span> bear a risk of misclassification on future unseen data: the locations of the existing data points provide a suggestion about where future unseen data may locate; but this is a suggestion, not a hard boundary.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-margins"></span> <img src="graphics/7_margins.png" alt="The model that has a larger margin is better---the basic idea of SVM" width="100%" /> <!-- <p class="caption marginnote">-->Figure 118: The model that has a larger margin is better—the basic idea of SVM<!--</p>--> <!--</div>--></span> </p> <p></p> <p>In other words, the line of Model <span class="math inline">\(3\)</span> is too close to the data points and therefore lacks a safe <strong>margin</strong>. The concept of margin is shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-margins">118</a>. To reduce risk, we should have the margin as large as possible. The other two models have larger margins, and Model <span class="math inline">\(2\)</span> is the best because it has the largest margin.</p> <p>In summary, while all the models shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-models">117</a> meet the <em>constraints</em> (i.e., perform well on the training data points), this is just the bottom line for a model to be good, and they are ranked differently based on an <em>objective function</em> that maximizes the margin of the model. This is the <strong>maximum margin</strong> principle invented in SVM.</p> </div> <div id="theory-and-method-7" class="section level3 unnumbered"> <h3>Theory and method</h3> <p><em>Derivation of the SVM formulation.</em></p> <p>Consider a binary classification problem as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-margins">118</a>. At this moment, we consider situations that all data points could be correctly classified by a line, which is clearly the case in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-models">117</a>. This is called <strong>the linearly separable case</strong>. Denote the data points as <span class="math inline">\(\left\{\left(x_{n}, y_{n}\right), n=1,2, \dots, N\right\}\)</span>. Here, the outcome variable <span class="math inline">\(y\)</span> is denoted as <span class="math inline">\(y_n \in \{1,-1\}\)</span>, i.e., <span class="math inline">\(y=1\)</span> denotes the circle points; <span class="math inline">\(y=-1\)</span> denotes the square points.</p> <p>The mathematical model to represent a line is <span class="math inline">\(\boldsymbol{w}^{T} \boldsymbol{x}+b = 0\)</span>. Based on this form, we can segment the space into <span class="math inline">\(5\)</span> regions, as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-5regions">119</a>. And by looking at the value of <span class="math inline">\(\boldsymbol{w}^{T} \boldsymbol{x}+b\)</span>, we know which region the data point <span class="math inline">\(\boldsymbol{x}\)</span> falls into. In other words, Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-5regions">119</a> tells us a <em>classification rule</em></p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-5regions"></span> <img src="graphics/7_model_values.png" alt="The $5$ regions" width="100%" /> <!-- <p class="caption marginnote">-->Figure 119: The <span class="math inline">\(5\)</span> regions<!--</p>--> <!--</div>--></span> </p> <p></p> <p><span class="math display" id="eq:7-DM-SVM">\[\begin{equation} \small \begin{aligned} \text { If } \boldsymbol{w}^{T} \boldsymbol{x}+b>0, \text { then } y=1; \\ \text { Otherwise, } y=-1. \end{aligned} \tag{57} \end{equation}\]</span></p> <p>Note that</p> <p><span class="math display" id="eq:7-5regions">\[\begin{equation} \small \begin{gathered} \text{For data points on the margin: } \left|\boldsymbol{w}^{T} \boldsymbol{x}+b\right|=1; \\ \text {For data points beyond the margin: } \left|\boldsymbol{w}^{T} \boldsymbol{x}+b\right|>1. \end{gathered} \tag{58} \end{equation}\]</span></p> <!-- % \begin{equation} --> <!-- % \begin{aligned} --> <!-- % \text {Data points on the margin:} & \left|\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right|=1; \\ --> <!-- % \text {Data points beyond the margin:} & \left|\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right|>1. --> <!-- % (\#eq:7-SVMcons) --> <!-- % \end{aligned} --> <!-- % \end{equation} --> <p>These two equations in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-5regions">(58)</a> provide the <em>constraints</em> for the SVM formulation, i.e., the bottom line for a model to be a good model. The two equations can be succinctly rewritten as one</p> <p><span class="math display">\[\begin{equation*} \small y\left(\boldsymbol{w}^{T} \boldsymbol{x}+b\right) \geq 1. \end{equation*}\]</span></p> <p>Thus, a draft version of the SVM formulation is</p> <p><span class="math display" id="eq:7-SVM-draft">\[\begin{equation} \small \begin{gathered} \text{\textit{Objective function}: Maximize Margin}, \\ \text { \textit{Subject to}: } y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right) \geq 1 \text { for } n=1,2, \ldots, N. \end{gathered} \tag{59} \end{equation}\]</span></p> <p>The <em>objective function</em> is to maximize the <em>margin</em> of the model. Note that a model is characterized by its parameters <span class="math inline">\(\boldsymbol{w}\)</span> and <span class="math inline">\(b\)</span>. And the goal of Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-draft">(59)</a> is to find the model—and therefore, the parameters—that maximizes the margin. In order to carry out this idea, we need the margin to be a concrete mathematical entity that <em>could be</em> characterized by the parameters <span class="math inline">\(\boldsymbol{w}\)</span> and <span class="math inline">\(b\)</span><label for="tufte-sn-170" class="margin-toggle sidenote-number">170</label><input type="checkbox" id="tufte-sn-170" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">170</span> Not all good ideas could be readily materialized in concrete mathematical forms. There is no guaranteed mathematical reality and if there is one it is always hard-earned.</span>.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-margin-w"></span> <img src="graphics/7_margin_w.png" alt="Illustration of the margin as a function of $\boldsymbol{w}$" width="100%" /> <!-- <p class="caption marginnote">-->Figure 120: Illustration of the margin as a function of <span class="math inline">\(\boldsymbol{w}\)</span><!--</p>--> <!--</div>--></span> </p> <p></p> <p>We refer readers to the <strong>Remarks</strong> section to see details of how the margin is derived as a function of <span class="math inline">\(\boldsymbol{w}\)</span>. Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-margin-w">120</a> shows the result: the margin of the model is <span class="math inline">\(\frac{2}{\|\boldsymbol{w}\|}\)</span>. Here, <span class="math inline">\(\|\boldsymbol{w}\|^{2} = \boldsymbol{w}^{T} \boldsymbol{w}\)</span>. And note that to <em>maximize the margin of a model</em> is equivalent to <em>minimize <span class="math inline">\(\|\boldsymbol{w}\|\)</span></em>. This gives us the objective function of the SVM model<label for="tufte-sn-171" class="margin-toggle sidenote-number">171</label><input type="checkbox" id="tufte-sn-171" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">171</span> Note that here we use <span class="math inline">\(\|\boldsymbol{w}\|^{2}\)</span> instead of <span class="math inline">\(\|\boldsymbol{w}\|\)</span>. This formulation is easier to solve.</span></p> <p><span class="math display" id="eq:7-SVMobj">\[\begin{equation} \small \text {Maximize Margin} = \min _{\boldsymbol{w}} \frac{1}{2}\|\boldsymbol{w}\|^{2}. \tag{60} \end{equation}\]</span></p> <p>Thus, the final SVM formulation is</p> <p><span class="math display" id="eq:7-SVM">\[\begin{equation} \small \begin{gathered} \min _{\boldsymbol{w}} \frac{1}{2}\|\boldsymbol{w}\|^{2}, \\ \text { Subject to: } y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right) \geq 1 \text { for } n=1,2, \ldots, N. \end{gathered} \tag{61} \end{equation}\]</span></p> <p><em>Optimization solution.</em></p> <p>Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM">(61)</a> is called the <strong>primal formulation</strong> of SVM. To solve it, it is often converted into its dual form, the <strong>dual formulation</strong> of SVM. This could be done by the method of <strong>Lagrange multiplier</strong> that introduces a dummy variable, <span class="math inline">\(\alpha_{n}\)</span>, for each constraint, i.e., <span class="math inline">\(y_{n}\left(\boldsymbol{w}^{T}\boldsymbol{x}_{n}+b\right)\geq 1\)</span>, such that we could move the constraints into the objective function. By definition, <span class="math inline">\(\alpha_{n} \geq 0\)</span>.</p> <p><span class="math display">\[\begin{equation*} \small L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2}\|\boldsymbol{w}\|^{2}-\sum_{n=1}^{N} \alpha_{n}\left[y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right)-1\right]. \end{equation*}\]</span></p> <p>This could be rewritten as</p> <p><span class="math display" id="eq:7-SVM-lag">\[\begin{equation} \small L(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \underbrace{\frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}}_{(1)} - \underbrace{\sum_{n=1}^{N} \alpha_{n} y_{n} \boldsymbol{w}^{T} \boldsymbol{x}_{n}}_{(2)}-\underbrace{b \sum_{n=1}^{N} \alpha_{n} y_{n}}_{(3)}+\underbrace{\sum_{n=1}^{N} \alpha_{n}}_{(4)}. \tag{62} \end{equation}\]</span></p> <p>Then we use the First Derivative Test again: differentiating <span class="math inline">\(L(\boldsymbol{w}, b, \boldsymbol{\alpha})\)</span> with respect to <span class="math inline">\(\boldsymbol{w} \text { and } b\)</span>, and setting them to <span class="math inline">\(0\)</span> yields the following solutions</p> <p><span class="math display" id="eq:7-SVM-w">\[\begin{equation} \small \boldsymbol{w}=\sum_{n=1}^{N} \alpha_{n} y_{n} \boldsymbol{x}_{n}; \tag{63} \end{equation}\]</span></p> <p><span class="math display" id="eq:7-SVM-alpha">\[\begin{equation} \small \sum_{n=1}^{N} \alpha_{n} y_{n}=0. \tag{64} \end{equation}\]</span></p> <p>Using the conclusion in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-w">(63)</a>, part (1) of Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-lag">(62)</a> could be rewritten as</p> <p><span class="math display">\[\begin{equation*} \small \frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}=\frac{1}{2} \boldsymbol{w}^{T} \sum_{n=1}^{N} \alpha_{n} y_{n} \boldsymbol{x}_{n}=\frac{1}{2} \sum_{n=1}^{N} \alpha_{n} y_{n} \boldsymbol{w}^{T} \boldsymbol{x}_{n}. \end{equation*}\]</span></p> <p>It has the same form as part (2) of Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-lag">(62)</a>. The two could be merged together into <span class="math inline">\(-\frac{1}{2} \sum_{n=1}^{N} \alpha_{n} y_{n} \boldsymbol{w}^{T} \boldsymbol{x}_{n}\)</span>. Note that<label for="tufte-sn-172" class="margin-toggle sidenote-number">172</label><input type="checkbox" id="tufte-sn-172" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">172</span> I.e., use the conclusion in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-w">(63)</a> again.</span></p> <p><span class="math display">\[\begin{equation*} \small \frac{1}{2} \sum_{n=1}^{N} \alpha_{n} y_{n} \boldsymbol{w}^{T} \boldsymbol{x}_{n}=\frac{1}{2} \sum_{n=1}^{N} \alpha_{n} y_{n}\left(\sum_{n=1}^{N} \alpha_{n} y_{n} \boldsymbol{x}_{n}\right)^{T} \boldsymbol{x}_{n}=\frac{1}{2} \sum_{n=1}^{N} \sum_{m=1}^{N} \alpha_{n} \alpha_{m} y_{n} y_{m} \boldsymbol{x}_{n}^{T} \boldsymbol{x}_{m}. \end{equation*}\]</span></p> <p>Part (3) of Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-lag">(62)</a>, according to the conclusion in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-alpha">(64)</a>, is <span class="math inline">\(0\)</span>.</p> <p>Based on these results, we can rewrite <span class="math inline">\(L(\boldsymbol{w}, b, \boldsymbol{\alpha})\)</span> as</p> <p><span class="math display">\[\begin{equation*} \small L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\sum_{n=1}^{N} \alpha_{n}-\frac{1}{2} \sum_{n=1}^{N} \sum_{m=1}^{N} \alpha_{n} \alpha_{m} y_{n} y_{m} \boldsymbol{x}_{n}^{T} \boldsymbol{x}_{m}. \end{equation*}\]</span></p> <p>This is the objective function of the dual formulation of Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM">(61)</a>. The decision variables are the Lagrange multipliers, the <span class="math inline">\(\boldsymbol{\alpha}\)</span>. By definition the Lagrange multipliers should be non-negative, and we have the constraint of the Lagrange multipliers described in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-alpha">(64)</a>. All together, the <strong>dual formulation</strong> of the SVM model is</p> <p><span class="math display" id="eq:7-SVM-dual">\[\begin{equation} \small \begin{gathered} \max _{\boldsymbol{\alpha}} \sum_{n=1}^{N} \alpha_{n}-\frac{1}{2} \sum_{n=1}^{N} \sum_{m=1}^{N} \alpha_{n} \alpha_{m} y_{n} y_{m} \boldsymbol{x}_{n}^{T} \boldsymbol{x}_{m}, \\ \text { Subject to: } \alpha_{n} \geq 0 \text { for } n=1,2, \dots, N \text {, and } \sum_{n=1}^{N} \alpha_{n} y_{n}=0. \end{gathered} \tag{65} \end{equation}\]</span></p> <p>This is a <em>quadratic programming</em> problem that can be solved using many existing well established algorithms.</p> <p><em>Support vectors.</em></p> <p>The data points that lay on the margins, as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-sv">121</a>, are called <strong>support vectors</strong>. These geometrically unique data points are also found to be numerically interesting: in the solution of the dual formulation of SVM as shown in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual">(65)</a>, the <span class="math inline">\(\alpha_{n}\)</span>s that correspond to the support vectors are those that are nonzero. In other words, the data points that are not support vectors will have their <span class="math inline">\(\alpha_{n}\)</span>s to be zero in the solution of Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual">(65)</a>.<label for="tufte-sn-173" class="margin-toggle sidenote-number">173</label><input type="checkbox" id="tufte-sn-173" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">173</span> Note that each data point contributes a constraint in the primal formulation of SVM, and therefore, corresponds to a <span class="math inline">\(\alpha_{n}\)</span> in the dual formulation.</span></p> <p>If we revisit Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-w">(63)</a>, we can see that only the nonzero <span class="math inline">\(\alpha_n\)</span> contribute to the estimation of <span class="math inline">\(\boldsymbol{w}\)</span>. Indeed, Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-sv">121</a> shows that support vectors are sufficient to geometrically define the margins. And if we know the margins, the decision boundary is determined, i.e., as the central line in the middle of the two margins.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-sv"></span> <img src="graphics/7_sv.png" alt="Support vectors are the data points that lay on the margins. In other words, the support vectors define the margins." width="100%" /> <!-- <p class="caption marginnote">-->Figure 121: Support vectors are the data points that lay on the margins. In other words, the support vectors define the margins.<!--</p>--> <!--</div>--></span> </p> <p></p> <p>The support vectors hold crucial implications for the learned model. Theoretical evidences showed that the number of support vectors is a metric that can indicate the “healthiness” of the model, i.e., the smaller the total number of support vectors, the better the model. It also reveals that the main statistical information of a given dataset the SVM model uses is the support vectors. The number of support vectors is usually much smaller than the number of data points <span class="math inline">\(N\)</span>. Some works have been inspired to accelerate the SVM model training by discarding the data points that are probably not support vectors<label for="tufte-sn-174" class="margin-toggle sidenote-number">174</label><input type="checkbox" id="tufte-sn-174" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">174</span> If we can screen the data points before we solve Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual">(65)</a> by discarding some data points that are not support vectors, the size of the optimization problem in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual">(65)</a> could be reduced.</span>. To understand why the nonzero <span class="math inline">\(\alpha_n\)</span> correspond to the support vectors, interested readers can find the derivation in the <strong>Remarks</strong> section.</p> <p><em>Summary.</em> After solving Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual">(65)</a>, we obtain the solutions of <span class="math inline">\(\boldsymbol{\alpha}\)</span>. With that, we estimate the parameter <span class="math inline">\(\boldsymbol{w}\)</span> based on Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-w">(63)</a>. To estimate the parameter <span class="math inline">\(b\)</span>, we use any <em>support vector</em>, i.e., say, <span class="math inline">\((\boldsymbol{x}_{n}, y_n)\)</span>, and estimate <span class="math inline">\(b\)</span> by</p> <p><span class="math display">\[\begin{equation*} \small \text{If } y_n = 1, b=1-\boldsymbol{w}^{T} \boldsymbol{x}_{n}; \end{equation*}\]</span></p> <p><span class="math display" id="eq:7-SVM-b">\[\begin{equation} \text{If } y_n = -1, b=-1-\boldsymbol{w}^{T} \boldsymbol{x}_{n}.\tag{66} \end{equation}\]</span></p> <p><em>Extension to nonseparable cases.</em></p> <p>We have assumed that the two classes are separable. Since this is impossible in some applications, we revise the SVM formulation—specifically, to revise the constraints of the SVM formulation—by allowing some data points to be within the margins or even on the wrong side of the decision boundary.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-slackvar"></span> <img src="graphics/7_slackvar.png" alt="Behaviors of the slack variables" width="100%" /> <!-- <p class="caption marginnote">-->Figure 122: Behaviors of the slack variables<!--</p>--> <!--</div>--></span> </p> <p></p> <p>Note that the original constraint structure in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM">(61)</a> is derived based on the linearly separable case shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-models">117</a>. For the nonseparable case, Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-slackvar">122</a> shows three scenarios: the <em>Type A</em> data points fall within the margins but still on the right side of their class, the <em>Type B</em> data points fall on the wrong side of their class, and the <em>Type C</em> data points fall on the right side of their class and also beyond or on the margin.</p> <p>The <em>Type A</em> data points and the <em>Type B</em> data points are both <em>compromised</em>, and we introduce a <strong>slack variable</strong> to describe the <em>degree</em> of compromise for both types of data points.</p> <p>For instance, consider the circle points that belong to the class (<span class="math inline">\(y_n=1\)</span>), we have<label for="tufte-sn-175" class="margin-toggle sidenote-number">175</label><input type="checkbox" id="tufte-sn-175" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">175</span> Readers may revisit Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-5regions">119</a> to understand Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVMcons-a">(67)</a>.</span></p> <p><span class="math display" id="eq:7-SVMcons-a">\[\begin{equation} \small \begin{gathered} \text {Data points (Type A): } \boldsymbol{w}^{T} \boldsymbol{x}_{n}+b \in (0,1); \\ \text {Data points (Type B): } \boldsymbol{w}^{T} \boldsymbol{x}_{n}+b < 0. \end{gathered} \tag{67} \end{equation}\]</span></p> <p>Then we define a slack variable <span class="math inline">\(\xi_{n}\)</span> for any data point <span class="math inline">\(n\)</span> of Types A or B</p> <p><span class="math display">\[\begin{equation*} \small \text {The slack variable $\xi_{n}$}: \xi_{n} = 1 - \left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right). \end{equation*}\]</span></p> <p>And we define <span class="math inline">\(\xi_{n}\)</span> for any data point of Type C to be <span class="math inline">\(0\)</span> since there is no compromise.</p> <p>All together, as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-slackvar">122</a>, we have</p> <p><span class="math display" id="eq:7-SVMcons">\[\begin{equation} \small \begin{gathered} \text {Data points (Type A): } \xi_{n} \in (0,1]; \\ \text {Data points (Type B): } \xi_{n} > 1; \\ \text {Data points (Type C): } \xi_{n}=0. \end{gathered} \tag{68} \end{equation}\]</span></p> <p>Similarly, for the square points that belong to the class (<span class="math inline">\(y= -1\)</span>), we define a slack variable <span class="math inline">\(\xi_{n}\)</span> for each data point <span class="math inline">\(n\)</span></p> <p><span class="math display">\[\begin{equation*} \small \text {The slack variable $\xi_{n}$}: \xi_{n} = 1 + \left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right). \end{equation*}\]</span></p> <p>The same result in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVMcons">(68)</a> could be derived.</p> <p>As the slack variable <span class="math inline">\(\xi_{n}\)</span> describes the <em>degree</em> of compromise for the data point <span class="math inline">\(\boldsymbol{x}_{n}\)</span>, an optimal SVM model should also minimize the total amount of compromise. Based on this additional learning principle, we revise the objective function in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM">(61)</a> and get</p> <p><span class="math display" id="eq:7-SVMobj2">\[\begin{equation} \small \underbrace{\min _{\boldsymbol{w}} \frac{1}{2}\|\boldsymbol{w}\|^{2}}_{\text{\textit{Maximize Margin}}} + \underbrace{C \sum_{n=1}^{N} \xi_{n}.}_{\text{\textit{Minimize Slacks}}} \tag{69} \end{equation}\]</span></p> <p>Here, <span class="math inline">\(C\)</span> is a user-specified parameter to control the balance between the two objectives: <em>maximum margin</em> and <em>minimum sum of slacks</em>.</p> <p>Then we revise the constraints<label for="tufte-sn-176" class="margin-toggle sidenote-number">176</label><input type="checkbox" id="tufte-sn-176" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">176</span> I.e., use the results in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-5regions">119</a> and Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-slackvar">122</a>.</span> to be</p> <p><span class="math display" id="eq:7-SVMcons2">\[\begin{equation*} \small y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right) \geq 1-\xi_{n} \text {, for } n=1,2, \dots, N. \tag{70} \end{equation*}\]</span></p> <p>Putting the revised objective function and constraints together, the formulation of the SVM model for nonseparable case becomes</p> <p><span class="math display" id="eq:7-SVM2">\[\begin{equation} \begin{gathered} \min _{\boldsymbol{w}} \frac{1}{2}\|\boldsymbol{w}\|^{2}+C \sum_{n=1}^{N} \xi_{n}, \\ \text { Subject to: } y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right) \geq 1-\xi_{n}, \\ \xi_{n} \geq 0, \text { for } n=1,2, \ldots, N. \end{gathered} \tag{71} \end{equation}\]</span></p> <p>A dual form that is similar to Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual">(65)</a> could be derived, which is skipped here<label for="tufte-sn-177" class="margin-toggle sidenote-number">177</label><input type="checkbox" id="tufte-sn-177" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">177</span> Interested readers could read this book for a comprehensive and deep understanding of SVM: Scholkopf, B. and Smola, A.J., <em>Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond.</em> MIT Press, 2001.</span>.</p> <p><em>Extension to nonlinear SVM.</em></p> <p>Sometimes, the decision boundary could not be characterized as linear models, i.e., see Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-7">123</a> (a).</p> <p></p> <div class="figure" style="text-align: center"><span style="display:block;" id="fig:f7-7"></span> <p class="caption marginnote shownote"> Figure 123: (a) A nonseparable dataset; (b) with the right transformation, (a) becomes linearly separable </p> <img src="graphics/7_7.png" alt="(a) A nonseparable dataset; (b) with the right transformation, (a) becomes linearly separable" width="80%" /> </div> <p></p> <p>A common strategy to create a nonlinear model is to conduct <em>transformation</em> of the original variables. For Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-7">123</a> (a), we conduct a transformation from the original two-dimensional coordinate system <span class="math inline">\(\boldsymbol{x}\)</span> to a new coordinate system <span class="math inline">\(\boldsymbol{z}\)</span> that is three-dimensional</p> <p><span class="math display" id="eq:7-SVM-xtoz">\[\begin{equation} \small z_{1}=x_{1}^{2}, z_{2}=\sqrt{2} x_{1} x_{2}, z_{3}=x_{2}^{2}. \tag{72} \end{equation}\]</span></p> <p>In the new coordinate system, as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-7">123</a> (b), the data points of the two classes become linearly separable.</p> <!-- % ^[This is an approach we often use in linear regression models as well to capture nonlinearity in the data. It is needed to create *explicit* transformation that asks us to write up how the *transformed features* $\boldsymbol{z}$ could be represented by the original features $\boldsymbol{x}$.] --> <p>The transformation employed in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-xtoz">(72)</a> is <em>explicit</em>, which may not be suitable for applications where we don’t know what is a good transformation<label for="tufte-sn-178" class="margin-toggle sidenote-number">178</label><input type="checkbox" id="tufte-sn-178" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">178</span> Try a ten-dimensional <span class="math inline">\(\boldsymbol{x}\)</span> and see how troublesome it is to define an explicit transformation to enable linear separability of the classes.</span>. Thus, transformation that could be automatically identified by the learning algorithm is needed, even if the transformation is <em>implicit</em>. A remarkable thing about SVM is that its formulation allows automatic transformation.</p> <p>Let’s revisit the dual formulation of SVM for the linearly separable case, as shown in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual">(65)</a>. Assume that the transformation has been performed and now we build the SVM model based on the transformed features, <span class="math inline">\(\boldsymbol{z}\)</span>. The dual formulation of SVM on the transformed variables is</p> <p><span class="math display" id="eq:7-SVM2-dual">\[\begin{equation} \small \begin{gathered} \max _{\boldsymbol{\alpha}} \sum_{n=1}^{N} \alpha_{n}-\frac{1}{2} \sum_{n=1}^{N} \sum_{m=1}^{N} \alpha_{n} \alpha_{m} y_{n} y_{m} \boldsymbol{z}_{n}^{T} \boldsymbol{z}_{m}, \\ \text { Subject to: } \alpha_{n} \geq 0 \text { for } n=1,2, \dots, N, \\ \sum_{n=1}^{N} \alpha_{n} y_{n}=0. \end{gathered} \tag{73} \end{equation}\]</span></p> <p>It can be seen that, the dual formulation of SVM doesn’t directly concern <span class="math inline">\(\boldsymbol{z}_{n}\)</span>. Rather, only the inner product of <span class="math inline">\(\boldsymbol{z}_{n}^{T} \boldsymbol{z}_{m}\)</span> is needed. As <span class="math inline">\(\boldsymbol{z}\)</span> is essentially a function of <span class="math inline">\(\boldsymbol{x}\)</span>, i.e., denote it as <span class="math inline">\(\boldsymbol{z}=\phi(\boldsymbol{x})\)</span>, <span class="math inline">\(\boldsymbol{z}_{n}^{T} \boldsymbol{z}_{m}\)</span> is essentially a function of <span class="math inline">\(\boldsymbol{x}_{n} \text { and } \boldsymbol{x}_{m}\)</span>. We can write it up as <span class="math inline">\(\boldsymbol{z}_{n}^{T} \boldsymbol{z}_{m}=K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right)\)</span>. This is called the <strong>kernel function</strong>.</p> <p>A kernel function is a function that entails a transformation <span class="math inline">\(\boldsymbol{z}=\phi(\boldsymbol{x})\)</span> such that <span class="math inline">\(K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right)\)</span> is an inner product: <span class="math inline">\(K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right)=\phi(\boldsymbol{x}_{n})^{T} \phi(\boldsymbol{x}_{m})\)</span>. In other words, we now do not seek explicit form of <span class="math inline">\(\phi(\boldsymbol{x}_{n})\)</span>; rather, we seek kernel functions that entail such transformations<label for="tufte-sn-179" class="margin-toggle sidenote-number">179</label><input type="checkbox" id="tufte-sn-179" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">179</span> If a kernel function is proven to entail a transformation function <span class="math inline">\(\phi(\boldsymbol{x})\)</span>—even it is only proven <em>in theory</em> and never really made explicit in practice—it is as good as explicit transformation, because only the inner product of <span class="math inline">\(\boldsymbol{z}_{n}^{T} \boldsymbol{z}_{m}\)</span> is needed in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM2-dual">(73)</a>.</span>.</p> <p>Many kernel functions have been developed. For example, the <strong>Gaussian radial basis kernel function</strong> is a popular choice</p> <p><span class="math display">\[\begin{equation*} \small K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right)=e^{-\gamma\left\|\boldsymbol{x}_{n}-\boldsymbol{x}_{m}\right\|^{2}}, \end{equation*}\]</span></p> <p>where the transformation <span class="math inline">\(\boldsymbol{z}=\phi(\boldsymbol{x})\)</span> is implicit and is proved to be infinitely long<label for="tufte-sn-180" class="margin-toggle sidenote-number">180</label><input type="checkbox" id="tufte-sn-180" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">180</span> Which means it is very flexible and can represent any smooth function.</span>.</p> <p>The polynomial kernel function is defined as</p> <p><span class="math display">\[\begin{equation*} \small K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right)=\left(\boldsymbol{x}_{n}^{T} \boldsymbol{x}_{m}+1\right)^{q}. \end{equation*}\]</span></p> <p>The linear kernel function<label for="tufte-sn-181" class="margin-toggle sidenote-number">181</label><input type="checkbox" id="tufte-sn-181" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">181</span> For linear kernel function, the transformation is trivial, i.e., <span class="math inline">\(\phi(\boldsymbol{x}) = \boldsymbol{x}\)</span>.</span> is defined as</p> <p><span class="math display">\[\begin{equation*} \small K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right)=\boldsymbol{x}_{n}^{T} \boldsymbol{x}_{m}. \end{equation*}\]</span></p> <p>With a given kernel function, the dual formulation of SVM is</p> <p><span class="math display" id="eq:7-SVM-dual2">\[\begin{equation} \small \begin{gathered} \max _{\boldsymbol{\alpha}} \sum_{n=1}^{N} \alpha_{n}-\frac{1}{2} \sum_{n=1}^{N} \sum_{m=1}^{N} \alpha_{n} \alpha_{m} y_{n} y_{m} K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right), \\ \text { Subject to: } \alpha_{n} \geq 0 \text { for } n=1,2, \dots, N, \\ \sum_{n=1}^{N} \alpha_{n} y_{n}=0. \end{gathered} \tag{74} \end{equation}\]</span></p> <p>After solving Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual2">(74)</a>, <em>in theory</em> we could obtain the estimation of the parameter <span class="math inline">\(\boldsymbol{w}\)</span> based on Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-w2">(75)</a></p> <p><span class="math display" id="eq:7-SVM-w2">\[\begin{equation} \small \boldsymbol{w}=\sum_{n=1}^{N} \alpha_{n} y_{n} \phi(\boldsymbol{x_{n}}). \tag{75} \end{equation}\]</span></p> <p>However, for kernel functions that we don’t know the explicit transformation function <span class="math inline">\(\phi(\boldsymbol{x})\)</span>, it is no longer possible to write the parameter <span class="math inline">\(\boldsymbol{w}\)</span> in the same way as in linear SVM models. This won’t prevent us from using the learned SVM model for prediction. For a data point, denoted as <span class="math inline">\(\boldsymbol{x}_{*}\)</span>, we can use the learned SVM model to predict on it<label for="tufte-sn-182" class="margin-toggle sidenote-number">182</label><input type="checkbox" id="tufte-sn-182" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">182</span> I.e., combine Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-w2">(75)</a> and Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-DM-SVM">(57)</a> we could derive Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-DM-SVM-kernel">(76)</a>.</span></p> <p><span class="math display" id="eq:7-DM-SVM-kernel">\[\begin{equation} \small \begin{gathered} \text { If } \sum_{n=1}^{N} \alpha_{n} y_{n} K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{*}\right)+b>0, \text { then } y_{*}=1; \\ \text { Otherwise, } y_{*}=-1. \end{gathered} \tag{76} \end{equation}\]</span></p> <p>Again, the specific form of <span class="math inline">\(\phi(\boldsymbol{x})\)</span> is not needed since only the kernel function is used.</p> <p><em>A small-data example.</em></p> <p>Consider a dataset with <span class="math inline">\(4\)</span> data points</p> <p><span class="math display">\[\begin{equation*} \small \begin{array}{l}{\boldsymbol{x}_{1}=(-1,-1)^{T}, y_{1}=-1}; \\ {\boldsymbol{x}_{2}=(-1,+1)^{T}, y_{2}=+1}; \\ {\boldsymbol{x}_{3}=(+1,-1)^{T}, y_{3}=+1} ;\\ {\boldsymbol{x}_{4}=(+1,+1)^{T}, y_{4}=-1.}\end{array} \end{equation*}\]</span></p> <p>The dataset is visualized in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-8">124</a>. The R code to draw Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-8">124</a> is shown below.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-8"></span> <img src="graphics/7_8.png" alt="A linearly inseparable dataset" width="100%" /> <!-- <p class="caption marginnote">-->Figure 124: A linearly inseparable dataset<!--</p>--> <!--</div>--></span> </p> <p></p> <p></p> <div class="sourceCode" id="cb145"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb145-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-1" aria-hidden="true" tabindex="-1"></a><span class="co"># For the toy problem</span></span> <span id="cb145-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-2" aria-hidden="true" tabindex="-1"></a>x <span class="ot">=</span> <span class="fu">matrix</span>(<span class="fu">c</span>(<span class="sc">-</span><span class="dv">1</span>,<span class="sc">-</span><span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="sc">-</span><span class="dv">1</span>,<span class="dv">1</span>,<span class="sc">-</span><span class="dv">1</span>,<span class="dv">1</span>), <span class="at">nrow =</span> <span class="dv">4</span>, <span class="at">ncol =</span> <span class="dv">2</span>)</span> <span id="cb145-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-3" aria-hidden="true" tabindex="-1"></a>y <span class="ot">=</span> <span class="fu">c</span>(<span class="sc">-</span><span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="sc">-</span><span class="dv">1</span>)</span> <span id="cb145-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-4" aria-hidden="true" tabindex="-1"></a>linear.train <span class="ot"><-</span> <span class="fu">data.frame</span>(x,y)</span> <span id="cb145-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-5" aria-hidden="true" tabindex="-1"></a></span> <span id="cb145-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Visualize the distribution of data points of two classes</span></span> <span id="cb145-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-7" aria-hidden="true" tabindex="-1"></a><span class="fu">require</span>( <span class="st">'ggplot2'</span> )</span> <span id="cb145-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-8" aria-hidden="true" tabindex="-1"></a>p <span class="ot"><-</span> <span class="fu">qplot</span>( <span class="at">data=</span>linear.train, X1, X2, </span> <span id="cb145-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-9" aria-hidden="true" tabindex="-1"></a> <span class="at">colour=</span><span class="fu">factor</span>(y),<span class="at">xlim =</span> <span class="fu">c</span>(<span class="sc">-</span><span class="fl">1.5</span>,<span class="fl">1.5</span>), </span> <span id="cb145-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-10" aria-hidden="true" tabindex="-1"></a> <span class="at">ylim =</span> <span class="fu">c</span>(<span class="sc">-</span><span class="fl">1.5</span>,<span class="fl">1.5</span>))</span> <span id="cb145-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-11" aria-hidden="true" tabindex="-1"></a>p <span class="ot"><-</span> p <span class="sc">+</span> <span class="fu">labs</span>(<span class="at">title =</span> <span class="st">"Scatterplot of data points of two classes"</span>)</span> <span id="cb145-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb145-12" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(p)</span></code></pre></div> <p></p> <p>It is a <em>nonlinear</em> case. We use a nonlinear kernel function to build the SVM model.</p> <p>Consider the polynomial kernel function with <code>df=2</code></p> <p><span class="math display" id="eq:7-polykernel2">\[\begin{equation} \small K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right)=\left(\boldsymbol{x}_{n}^{T} \boldsymbol{x}_{m}+1\right)^{2}, \tag{77} \end{equation}\]</span></p> <p>which corresponds to the transformation</p> <p><span class="math display" id="eq:7-polykernel2-tran">\[\begin{equation} \small \phi\left(\boldsymbol{x}_{n}\right)=\left[1, \sqrt{2} x_{n, 1}, \sqrt{2} x_{n, 2}, \sqrt{2} x_{n, 1} x_{n, 2}, x_{n, 1}^{2}, x_{n, 2}^{2}\right]^{T}. \tag{78} \end{equation}\]</span></p> <p>Based on Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-dual">(65)</a>, a specific formulation of the SVM model of this dataset is</p> <p><span class="math display" id="eq:7-SVM-4points">\[\begin{equation} \small \begin{gathered} \max _{\boldsymbol{\alpha}} \sum_{n=1}^{4} \alpha_{n}-\frac{1}{2} \sum_{n=1}^{4} \sum_{m=1}^{4} \alpha_{n} \alpha_{m} y_{n} y_{m} K\left(\boldsymbol{x}_{n}, \boldsymbol{x}_{m}\right), \\ \text { Subject to: } \alpha_{n} \geq 0 \text { for } n=1,2, \dots, 4, \\ \text { and } \sum_{n=1}^{4} \alpha_{n} y_{n}=0. \end{gathered} \tag{79} \end{equation}\]</span></p> <p>We calculate the kernel matrix as<label for="tufte-sn-183" class="margin-toggle sidenote-number">183</label><input type="checkbox" id="tufte-sn-183" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">183</span> E.g., using Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-polykernel2">(77)</a>, <span class="math inline">\(K\left(\boldsymbol{x}_{1}, \boldsymbol{x}_{2}\right) = \left(\boldsymbol{x}_{1}^{T} \boldsymbol{x}_{2}+1\right)^{2} = 3^2 = 9\)</span>. Readers can try other instances.</span></p> <p><span class="math display">\[\begin{equation*} \small \boldsymbol{K}=\left[\begin{array}{cccc}{9} & {1} & {1} & {1} \\ {1} & {9} & {1} & {1} \\ {1} & {1} & {9} & {1} \\ {1} & {1} & {1} & {9}\end{array}\right]. \end{equation*}\]</span></p> <p>We solve the quadratic programming problem<label for="tufte-sn-184" class="margin-toggle sidenote-number">184</label><input type="checkbox" id="tufte-sn-184" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">184</span> I.e., use the R package <code>quadprog</code>.</span> in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-SVM-4points">(79)</a> and get</p> <p><span class="math display" id="eq:7-alpha">\[\begin{equation} \small \alpha_{1}=\alpha_{2}=\alpha_{3}=\alpha_{4}=0.125. \tag{80} \end{equation}\]</span></p> <p>In this particular case, since we can write up the transformation explicitly<label for="tufte-sn-185" class="margin-toggle sidenote-number">185</label><input type="checkbox" id="tufte-sn-185" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">185</span> I.e., as shown in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-polykernel2-tran">(78)</a></span>, we can write up <span class="math inline">\(\boldsymbol{w}\)</span> explicitly as well<label for="tufte-sn-186" class="margin-toggle sidenote-number">186</label><input type="checkbox" id="tufte-sn-186" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">186</span> It should be written as <span class="math inline">\(\widehat{\boldsymbol{w}}\)</span>, since it is an estimator of <span class="math inline">\(\boldsymbol{w}\)</span>. Here for simplicity we skip this.</span></p> <p><span class="math display">\[\begin{equation*} \small \boldsymbol{w}=\sum_{n=1}^{4} \alpha_{n} y_{n} \phi\left(\boldsymbol{x}_{n}\right)=[0,0,0,1 / \sqrt{2}, 0,0]^{T}. \end{equation*}\]</span></p> <p>For any given data point <span class="math inline">\(\boldsymbol{x}_{*}\)</span>, the explicit decision function is</p> <p><span class="math display">\[\begin{equation*} \small f\left(\boldsymbol{x}_{*}\right)=\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{*}\right)=x_{*, 1} x_{*, 2}. \end{equation*}\]</span></p> <p>This is the decision boundary for a typical <strong>XOR</strong> problem<label for="tufte-sn-187" class="margin-toggle sidenote-number">187</label><input type="checkbox" id="tufte-sn-187" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">187</span> Also known as <em>exclusive or</em> or <em>exclusive disjunction</em>, the XOR problem is a logical operation that outputs <em>true</em> only when inputs differ (e.g., one is <em>true</em>, the other is <em>false</em>).</span>.</p> <p>We then use R to build an SVM model on this dataset<label for="tufte-sn-188" class="margin-toggle sidenote-number">188</label><input type="checkbox" id="tufte-sn-188" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">188</span> We use the R package <code>kernlab</code>—more details are shown in the section <strong>R Lab</strong>.</span>. The R code is shown in below.</p> <p></p> <div class="sourceCode" id="cb146"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb146-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Train a nonlinear SVM model </span></span> <span id="cb146-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-2" aria-hidden="true" tabindex="-1"></a><span class="co"># polynomial kernel function with `df=2`</span></span> <span id="cb146-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-3" aria-hidden="true" tabindex="-1"></a>x <span class="ot"><-</span> <span class="fu">cbind</span>(<span class="dv">1</span>, <span class="fu">poly</span>(x, <span class="at">degree =</span> <span class="dv">2</span>, <span class="at">raw =</span> <span class="cn">TRUE</span>))</span> <span id="cb146-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-4" aria-hidden="true" tabindex="-1"></a>coefs <span class="ot">=</span> <span class="fu">c</span>(<span class="dv">1</span>,<span class="fu">sqrt</span>(<span class="dv">2</span>),<span class="dv">1</span>,<span class="fu">sqrt</span>(<span class="dv">2</span>),<span class="fu">sqrt</span>(<span class="dv">2</span>),<span class="dv">1</span>)</span> <span id="cb146-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-5" aria-hidden="true" tabindex="-1"></a>x <span class="ot"><-</span> x <span class="sc">*</span> <span class="fu">t</span>(<span class="fu">matrix</span>(<span class="fu">rep</span>(coefs,<span class="dv">4</span>),<span class="at">nrow=</span><span class="dv">6</span>,<span class="at">ncol=</span><span class="dv">4</span>))</span> <span id="cb146-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-6" aria-hidden="true" tabindex="-1"></a>linear.train <span class="ot"><-</span> <span class="fu">data.frame</span>(x,y)</span> <span id="cb146-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-7" aria-hidden="true" tabindex="-1"></a><span class="fu">require</span>( <span class="st">'kernlab'</span> )</span> <span id="cb146-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-8" aria-hidden="true" tabindex="-1"></a>linear.svm <span class="ot"><-</span> <span class="fu">ksvm</span>(y <span class="sc">~</span> ., <span class="at">data=</span>linear.train, </span> <span id="cb146-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb146-9" aria-hidden="true" tabindex="-1"></a> <span class="at">type=</span><span class="st">'C-svc'</span>, <span class="at">kernel=</span><span class="st">'vanilladot'</span>, <span class="at">C=</span><span class="dv">10</span>, <span class="at">scale=</span><span class="fu">c</span>())</span></code></pre></div> <p></p> <p>The function <code>alpha()</code> returns the values of <span class="math inline">\(\alpha_{n} \text { for } n=1,2, \dots, 4\)</span>. Our results as shown in Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-alpha">(80)</a> are consistent with the results obtained by using R.<label for="tufte-sn-189" class="margin-toggle sidenote-number">189</label><input type="checkbox" id="tufte-sn-189" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">189</span> If your answer is different, check if the <code>alpha()</code> function in the <code>kernlab</code>() package scales the vector <span class="math inline">\(\alpha\)</span>, i.e., to make the sum as <span class="math inline">\(1\)</span>.</span></p> <p></p> <div class="sourceCode" id="cb147"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb147-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb147-1" aria-hidden="true" tabindex="-1"></a><span class="fu">alpha</span>(linear.svm) <span class="co">#scaled alpha vector</span></span> <span id="cb147-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb147-2" aria-hidden="true" tabindex="-1"></a><span class="do">## [[1]]</span></span> <span id="cb147-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb147-3" aria-hidden="true" tabindex="-1"></a><span class="do">## [1] 0.125 0.125 0.125 0.125</span></span></code></pre></div> <p></p> <div style="page-break-after: always;"></div> </div> <div id="r-lab-9" class="section level3 unnumbered"> <h3>R Lab</h3> <p><em>The 7-Step R Pipeline.</em> <strong>Step 1</strong> and <strong>Step 2</strong> get data into R and make appropriate preprocessing.</p> <p></p> <div class="sourceCode" id="cb148"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb148-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Step 1 -> Read data into R workstation</span></span> <span id="cb148-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-2" aria-hidden="true" tabindex="-1"></a></span> <span id="cb148-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-3" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RCurl)</span> <span id="cb148-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-4" aria-hidden="true" tabindex="-1"></a>url <span class="ot"><-</span> <span class="fu">paste0</span>(<span class="st">"https://raw.githubusercontent.com"</span>,</span> <span id="cb148-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-5" aria-hidden="true" tabindex="-1"></a> <span class="st">"/analyticsbook/book/main/data/AD.csv"</span>)</span> <span id="cb148-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-6" aria-hidden="true" tabindex="-1"></a>data <span class="ot"><-</span> <span class="fu">read.csv</span>(<span class="at">text=</span><span class="fu">getURL</span>(url))</span> <span id="cb148-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-7" aria-hidden="true" tabindex="-1"></a></span> <span id="cb148-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Step 2 -> Data preprocessing</span></span> <span id="cb148-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Create X matrix (predictors) and Y vector (outcome variable)</span></span> <span id="cb148-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-10" aria-hidden="true" tabindex="-1"></a>X <span class="ot"><-</span> data[,<span class="dv">2</span><span class="sc">:</span><span class="dv">16</span>]</span> <span id="cb148-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-11" aria-hidden="true" tabindex="-1"></a>Y <span class="ot"><-</span> data<span class="sc">$</span>DX_bl</span> <span id="cb148-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-12" aria-hidden="true" tabindex="-1"></a></span> <span id="cb148-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-13" aria-hidden="true" tabindex="-1"></a>Y <span class="ot"><-</span> <span class="fu">paste0</span>(<span class="st">"c"</span>, Y) </span> <span id="cb148-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-14" aria-hidden="true" tabindex="-1"></a>Y <span class="ot"><-</span> <span class="fu">as.factor</span>(Y) </span> <span id="cb148-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-15" aria-hidden="true" tabindex="-1"></a></span> <span id="cb148-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-16" aria-hidden="true" tabindex="-1"></a>data <span class="ot"><-</span> <span class="fu">data.frame</span>(X,Y)</span> <span id="cb148-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-17" aria-hidden="true" tabindex="-1"></a><span class="fu">names</span>(data)[<span class="dv">16</span>] <span class="ot">=</span> <span class="fu">c</span>(<span class="st">"DX_bl"</span>)</span> <span id="cb148-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-18" aria-hidden="true" tabindex="-1"></a></span> <span id="cb148-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-19" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a training data (half the original data size)</span></span> <span id="cb148-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-20" aria-hidden="true" tabindex="-1"></a>train.ix <span class="ot"><-</span> <span class="fu">sample</span>(<span class="fu">nrow</span>(data),<span class="fu">floor</span>( <span class="fu">nrow</span>(data)<span class="sc">/</span><span class="dv">2</span>) )</span> <span id="cb148-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-21" aria-hidden="true" tabindex="-1"></a>data.train <span class="ot"><-</span> data[train.ix,]</span> <span id="cb148-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-22" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a testing data (half the original data size)</span></span> <span id="cb148-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb148-23" aria-hidden="true" tabindex="-1"></a>data.test <span class="ot"><-</span> data[<span class="sc">-</span>train.ix,]</span></code></pre></div> <p></p> <p><strong>Step 3</strong> puts together a list of candidate models.</p> <p></p> <div class="sourceCode" id="cb149"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb149-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Step 3 -> gather a list of candidate models</span></span> <span id="cb149-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-2" aria-hidden="true" tabindex="-1"></a><span class="co"># SVM: often to compare models with different kernels, </span></span> <span id="cb149-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-3" aria-hidden="true" tabindex="-1"></a><span class="co"># different values of C, different set of variables</span></span> <span id="cb149-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-4" aria-hidden="true" tabindex="-1"></a></span> <span id="cb149-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Use different set of variables</span></span> <span id="cb149-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-6" aria-hidden="true" tabindex="-1"></a></span> <span id="cb149-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-7" aria-hidden="true" tabindex="-1"></a>model1 <span class="ot"><-</span> <span class="fu">as.formula</span>(DX_bl <span class="sc">~</span> .)</span> <span id="cb149-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-8" aria-hidden="true" tabindex="-1"></a>model2 <span class="ot"><-</span> <span class="fu">as.formula</span>(DX_bl <span class="sc">~</span> AGE <span class="sc">+</span> PTEDUCAT <span class="sc">+</span> FDG </span> <span id="cb149-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-9" aria-hidden="true" tabindex="-1"></a> <span class="sc">+</span> AV45 <span class="sc">+</span> HippoNV <span class="sc">+</span> rs3865444)</span> <span id="cb149-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-10" aria-hidden="true" tabindex="-1"></a>model3 <span class="ot"><-</span> <span class="fu">as.formula</span>(DX_bl <span class="sc">~</span> AGE <span class="sc">+</span> PTEDUCAT)</span> <span id="cb149-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb149-11" aria-hidden="true" tabindex="-1"></a>model4 <span class="ot"><-</span> <span class="fu">as.formula</span>(DX_bl <span class="sc">~</span> FDG <span class="sc">+</span> AV45 <span class="sc">+</span> HippoNV)</span></code></pre></div> <p></p> <p><strong>Step 4</strong> uses <span class="math inline">\(10\)</span>-fold cross-validation to evaluate the performance of the candidate models. Below we show how it works for one model. For other models, the same script could be used with a slight modification.</p> <p></p> <div class="sourceCode" id="cb150"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb150-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Step 4 -> Use 10-fold cross-validation to evaluate the models</span></span> <span id="cb150-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-2" aria-hidden="true" tabindex="-1"></a></span> <span id="cb150-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-3" aria-hidden="true" tabindex="-1"></a>n_folds <span class="ot">=</span> <span class="dv">10</span> </span> <span id="cb150-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-4" aria-hidden="true" tabindex="-1"></a><span class="co"># number of fold </span></span> <span id="cb150-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-5" aria-hidden="true" tabindex="-1"></a>N <span class="ot"><-</span> <span class="fu">dim</span>(data.train)[<span class="dv">1</span>] </span> <span id="cb150-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-6" aria-hidden="true" tabindex="-1"></a>folds_i <span class="ot"><-</span> <span class="fu">sample</span>(<span class="fu">rep</span>(<span class="dv">1</span><span class="sc">:</span>n_folds, <span class="at">length.out =</span> N)) </span> <span id="cb150-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-7" aria-hidden="true" tabindex="-1"></a></span> <span id="cb150-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-8" aria-hidden="true" tabindex="-1"></a><span class="co"># evaluate the first model</span></span> <span id="cb150-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-9" aria-hidden="true" tabindex="-1"></a>cv_err <span class="ot"><-</span> <span class="cn">NULL</span> </span> <span id="cb150-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-10" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_err makes records of the prediction error for each fold</span></span> <span id="cb150-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-11" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (k <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span>n_folds) {</span> <span id="cb150-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-12" aria-hidden="true" tabindex="-1"></a> test_i <span class="ot"><-</span> <span class="fu">which</span>(folds_i <span class="sc">==</span> k) </span> <span id="cb150-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-13" aria-hidden="true" tabindex="-1"></a> <span class="co"># In each iteration, use one fold of data as the testing data</span></span> <span id="cb150-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-14" aria-hidden="true" tabindex="-1"></a> data.test.cv <span class="ot"><-</span> data.train[test_i, ] </span> <span id="cb150-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-15" aria-hidden="true" tabindex="-1"></a> <span class="co"># The remaining 9 folds' data form our training data</span></span> <span id="cb150-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-16" aria-hidden="true" tabindex="-1"></a> data.train.cv <span class="ot"><-</span> data.train[<span class="sc">-</span>test_i, ] </span> <span id="cb150-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">require</span>( <span class="st">'kernlab'</span> )</span> <span id="cb150-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-18" aria-hidden="true" tabindex="-1"></a> linear.svm <span class="ot"><-</span> <span class="fu">ksvm</span>(model1, <span class="at">data=</span>data.train.cv, </span> <span id="cb150-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-19" aria-hidden="true" tabindex="-1"></a> <span class="at">type=</span><span class="st">'C-svc'</span>, <span class="at">kernel=</span><span class="st">'vanilladot'</span>, <span class="at">C=</span><span class="dv">10</span>) </span> <span id="cb150-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-20" aria-hidden="true" tabindex="-1"></a> <span class="co"># Fit the linear SVM model with the training data</span></span> <span id="cb150-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-21" aria-hidden="true" tabindex="-1"></a> y_hat <span class="ot"><-</span> <span class="fu">predict</span>(linear.svm, data.test.cv) </span> <span id="cb150-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-22" aria-hidden="true" tabindex="-1"></a> <span class="co"># Predict on the testing data using the trained model</span></span> <span id="cb150-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-23" aria-hidden="true" tabindex="-1"></a> true_y <span class="ot"><-</span> data.test.cv<span class="sc">$</span>DX_bl </span> <span id="cb150-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-24" aria-hidden="true" tabindex="-1"></a> <span class="co"># get the the error rate</span></span> <span id="cb150-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-25" aria-hidden="true" tabindex="-1"></a> cv_err[k] <span class="ot"><-</span><span class="fu">length</span>(<span class="fu">which</span>(y_hat <span class="sc">!=</span> true_y))<span class="sc">/</span><span class="fu">length</span>(y_hat) </span> <span id="cb150-26"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-26" aria-hidden="true" tabindex="-1"></a>}</span> <span id="cb150-27"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-27" aria-hidden="true" tabindex="-1"></a><span class="fu">mean</span>(cv_err)</span> <span id="cb150-28"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-28" aria-hidden="true" tabindex="-1"></a></span> <span id="cb150-29"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-29" aria-hidden="true" tabindex="-1"></a><span class="co"># evaluate the second model ...</span></span> <span id="cb150-30"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-30" aria-hidden="true" tabindex="-1"></a><span class="co"># evaluate the third model ...</span></span> <span id="cb150-31"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb150-31" aria-hidden="true" tabindex="-1"></a><span class="co"># ...</span></span></code></pre></div> <p></p> <p>Results are shown below.</p> <p></p> <div class="sourceCode" id="cb151"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb151-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb151-1" aria-hidden="true" tabindex="-1"></a><span class="do">## [1] 0.1781538</span></span> <span id="cb151-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb151-2" aria-hidden="true" tabindex="-1"></a><span class="do">## [1] 0.1278462</span></span> <span id="cb151-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb151-3" aria-hidden="true" tabindex="-1"></a><span class="do">## [1] 0.4069231</span></span> <span id="cb151-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb151-4" aria-hidden="true" tabindex="-1"></a><span class="do">## [1] 0.1316923</span></span></code></pre></div> <p></p> <p>The second model is the best.</p> <p><strong>Step 5</strong> uses the training data to fit a final model, through the <code>ksvm()</code> function in the package <code>kernlab</code>.</p> <p></p> <div class="sourceCode" id="cb152"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb152-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb152-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Step 5 -> After model selection, </span></span> <span id="cb152-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb152-2" aria-hidden="true" tabindex="-1"></a><span class="co"># use ksvm() function to build your final model</span></span> <span id="cb152-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb152-3" aria-hidden="true" tabindex="-1"></a>linear.svm <span class="ot"><-</span> <span class="fu">ksvm</span>(model2, <span class="at">data=</span>data.train,</span> <span id="cb152-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb152-4" aria-hidden="true" tabindex="-1"></a> <span class="at">type=</span><span class="st">'C-svc'</span>, <span class="at">kernel=</span><span class="st">'vanilladot'</span>, <span class="at">C=</span><span class="dv">10</span>) </span></code></pre></div> <p></p> <p><strong>Step 6</strong> uses the fitted final model for prediction on the testing data.</p> <p></p> <div class="sourceCode" id="cb153"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb153-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb153-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Step 6 -> Predict using your SVM model</span></span> <span id="cb153-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb153-2" aria-hidden="true" tabindex="-1"></a>y_hat <span class="ot"><-</span> <span class="fu">predict</span>(linear.svm, data.test) </span></code></pre></div> <p></p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-BS-ROC"></span> <img src="graphics/7_BS_ROC.png" alt="The ROC curve of the final SVM model" width="100%" /> <!-- <p class="caption marginnote">-->Figure 125: The ROC curve of the final SVM model<!--</p>--> <!--</div>--></span> </p> <p></p> <p><strong>Step 7</strong> evaluates the performance of the model.</p> <p></p> <div class="sourceCode" id="cb154"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb154-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Step 7 -> Evaluate the prediction performance of the SVM model</span></span> <span id="cb154-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-2" aria-hidden="true" tabindex="-1"></a></span> <span id="cb154-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-3" aria-hidden="true" tabindex="-1"></a><span class="co"># (1) The confusion matrix</span></span> <span id="cb154-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-4" aria-hidden="true" tabindex="-1"></a></span> <span id="cb154-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-5" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(caret) </span> <span id="cb154-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-6" aria-hidden="true" tabindex="-1"></a><span class="fu">confusionMatrix</span>(y_hat, data.test<span class="sc">$</span>DX_bl)</span> <span id="cb154-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-7" aria-hidden="true" tabindex="-1"></a></span> <span id="cb154-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-8" aria-hidden="true" tabindex="-1"></a><span class="co"># (2) ROC curve </span></span> <span id="cb154-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-9" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(pROC) </span> <span id="cb154-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-10" aria-hidden="true" tabindex="-1"></a>y_hat <span class="ot"><-</span> <span class="fu">predict</span>(linear.svm, data.test, <span class="at">type =</span> <span class="st">'decision'</span>) </span> <span id="cb154-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-11" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(<span class="fu">roc</span>(data.test<span class="sc">$</span>DX_bl, y_hat),</span> <span id="cb154-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb154-12" aria-hidden="true" tabindex="-1"></a> <span class="at">col=</span><span class="st">"blue"</span>, <span class="at">main=</span><span class="st">"ROC Curve"</span>)</span></code></pre></div> <p></p> <p>Results are shown below. And the ROC curve is shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-BS-ROC">125</a>.</p> <p></p> <div class="sourceCode" id="cb155"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb155-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Confusion Matrix and Statistics</span></span> <span id="cb155-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-2" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb155-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-3" aria-hidden="true" tabindex="-1"></a><span class="do">## Reference</span></span> <span id="cb155-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-4" aria-hidden="true" tabindex="-1"></a><span class="do">## Prediction c0 c1</span></span> <span id="cb155-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-5" aria-hidden="true" tabindex="-1"></a><span class="do">## c0 131 27</span></span> <span id="cb155-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-6" aria-hidden="true" tabindex="-1"></a><span class="do">## c1 11 90</span></span> <span id="cb155-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-7" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb155-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-8" aria-hidden="true" tabindex="-1"></a><span class="do">## Accuracy : 0.8533 </span></span> <span id="cb155-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-9" aria-hidden="true" tabindex="-1"></a><span class="do">## 95% CI : (0.8042, 0.894)</span></span> <span id="cb155-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-10" aria-hidden="true" tabindex="-1"></a><span class="do">## No Information Rate : 0.5483 </span></span> <span id="cb155-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-11" aria-hidden="true" tabindex="-1"></a><span class="do">## P-Value [Acc > NIR] : < 2e-16 </span></span> <span id="cb155-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-12" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb155-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-13" aria-hidden="true" tabindex="-1"></a><span class="do">## Kappa : 0.7002 </span></span> <span id="cb155-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-14" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb155-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-15" aria-hidden="true" tabindex="-1"></a><span class="do">## Mcnemar's Test P-Value : 0.01496 </span></span> <span id="cb155-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-16" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb155-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-17" aria-hidden="true" tabindex="-1"></a><span class="do">## Sensitivity : 0.9225 </span></span> <span id="cb155-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-18" aria-hidden="true" tabindex="-1"></a><span class="do">## Specificity : 0.7692 </span></span> <span id="cb155-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-19" aria-hidden="true" tabindex="-1"></a><span class="do">## Pos Pred Value : 0.8291 </span></span> <span id="cb155-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-20" aria-hidden="true" tabindex="-1"></a><span class="do">## Neg Pred Value : 0.8911 </span></span> <span id="cb155-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-21" aria-hidden="true" tabindex="-1"></a><span class="do">## Prevalence : 0.5483 </span></span> <span id="cb155-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-22" aria-hidden="true" tabindex="-1"></a><span class="do">## Detection Rate : 0.5058 </span></span> <span id="cb155-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-23" aria-hidden="true" tabindex="-1"></a><span class="do">## Detection Prevalence : 0.6100 </span></span> <span id="cb155-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-24" aria-hidden="true" tabindex="-1"></a><span class="do">## Balanced Accuracy : 0.8459 </span></span> <span id="cb155-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-25" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb155-26"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb155-26" aria-hidden="true" tabindex="-1"></a><span class="do">## 'Positive' Class : c0</span></span></code></pre></div> <p></p> <p><em>Beyond the 7-Step R Pipeline.</em></p> <p>In the 7-step pipeline, we create a list of candidate models by different selections of predictors. There are other parameters, such as the kernel function, the value of <span class="math inline">\(C\)</span>, that should be concerned in model selection. The R package <code>caret</code> can automate the process of cross-validation and facilitate the optimization of multiple parameters simultaneously. Below is an example</p> <p></p> <div class="sourceCode" id="cb156"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb156-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RCurl)</span> <span id="cb156-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-2" aria-hidden="true" tabindex="-1"></a>url <span class="ot"><-</span> <span class="fu">paste0</span>(<span class="st">"https://raw.githubusercontent.com"</span>,</span> <span id="cb156-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"/analyticsbook/book/main/data/AD.csv"</span>)</span> <span id="cb156-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-4" aria-hidden="true" tabindex="-1"></a>AD <span class="ot"><-</span> <span class="fu">read.csv</span>(<span class="at">text=</span><span class="fu">getURL</span>(url))</span> <span id="cb156-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-5" aria-hidden="true" tabindex="-1"></a><span class="fu">str</span>(AD)</span> <span id="cb156-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-6" aria-hidden="true" tabindex="-1"></a><span class="co">#Train and Tune the SVM</span></span> <span id="cb156-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-7" aria-hidden="true" tabindex="-1"></a>n <span class="ot">=</span> <span class="fu">dim</span>(AD)[<span class="dv">1</span>]</span> <span id="cb156-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-8" aria-hidden="true" tabindex="-1"></a>n.train <span class="ot"><-</span> <span class="fu">floor</span>(<span class="fl">0.8</span> <span class="sc">*</span> n)</span> <span id="cb156-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-9" aria-hidden="true" tabindex="-1"></a>idx.train <span class="ot"><-</span> <span class="fu">sample</span>(n, n.train)</span> <span id="cb156-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-10" aria-hidden="true" tabindex="-1"></a>AD[<span class="fu">which</span>(AD[,<span class="dv">1</span>]<span class="sc">==</span><span class="dv">0</span>),<span class="dv">1</span>] <span class="ot">=</span> <span class="fu">rep</span>(<span class="st">"Normal"</span>,<span class="fu">length</span>(<span class="fu">which</span>(AD[,<span class="dv">1</span>]<span class="sc">==</span><span class="dv">0</span>)))</span> <span id="cb156-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-11" aria-hidden="true" tabindex="-1"></a>AD[<span class="fu">which</span>(AD[,<span class="dv">1</span>]<span class="sc">==</span><span class="dv">1</span>),<span class="dv">1</span>] <span class="ot">=</span> <span class="fu">rep</span>(<span class="st">"Diseased"</span>,<span class="fu">length</span>(<span class="fu">which</span>(AD[,<span class="dv">1</span>]<span class="sc">==</span><span class="dv">1</span>)))</span> <span id="cb156-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-12" aria-hidden="true" tabindex="-1"></a>AD.train <span class="ot"><-</span> AD[idx.train,<span class="fu">c</span>(<span class="dv">1</span><span class="sc">:</span><span class="dv">16</span>)]</span> <span id="cb156-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-13" aria-hidden="true" tabindex="-1"></a>AD.test <span class="ot"><-</span> AD[<span class="sc">-</span>idx.train,<span class="fu">c</span>(<span class="dv">1</span><span class="sc">:</span><span class="dv">16</span>)]</span> <span id="cb156-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-14" aria-hidden="true" tabindex="-1"></a>trainX <span class="ot"><-</span> AD.train[,<span class="fu">c</span>(<span class="dv">2</span><span class="sc">:</span><span class="dv">16</span>)]</span> <span id="cb156-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-15" aria-hidden="true" tabindex="-1"></a>trainy<span class="ot">=</span> AD.train[,<span class="dv">1</span>]</span> <span id="cb156-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-16" aria-hidden="true" tabindex="-1"></a></span> <span id="cb156-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-17" aria-hidden="true" tabindex="-1"></a><span class="do">## Setup for cross-validation:</span></span> <span id="cb156-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-18" aria-hidden="true" tabindex="-1"></a><span class="co"># 10-fold cross validation</span></span> <span id="cb156-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-19" aria-hidden="true" tabindex="-1"></a><span class="co"># do 5 repetitions of cv</span></span> <span id="cb156-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Use AUC to pick the best model</span></span> <span id="cb156-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-21" aria-hidden="true" tabindex="-1"></a></span> <span id="cb156-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-22" aria-hidden="true" tabindex="-1"></a>ctrl <span class="ot"><-</span> <span class="fu">trainControl</span>(<span class="at">method=</span><span class="st">"repeatedcv"</span>,</span> <span id="cb156-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-23" aria-hidden="true" tabindex="-1"></a> <span class="at">repeats=</span><span class="dv">1</span>,</span> <span id="cb156-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-24" aria-hidden="true" tabindex="-1"></a> <span class="at">summaryFunction=</span>twoClassSummary,</span> <span id="cb156-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-25" aria-hidden="true" tabindex="-1"></a> <span class="at">classProbs=</span><span class="cn">TRUE</span>)</span> <span id="cb156-26"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-26" aria-hidden="true" tabindex="-1"></a></span> <span id="cb156-27"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-27" aria-hidden="true" tabindex="-1"></a><span class="co"># Use the expand.grid to specify the search space </span></span> <span id="cb156-28"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-28" aria-hidden="true" tabindex="-1"></a>grid <span class="ot"><-</span> <span class="fu">expand.grid</span>(<span class="at">sigma =</span> <span class="fu">c</span>(<span class="fl">0.002</span>, <span class="fl">0.005</span>, <span class="fl">0.01</span>, <span class="fl">0.012</span>, <span class="fl">0.015</span>),</span> <span id="cb156-29"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-29" aria-hidden="true" tabindex="-1"></a><span class="at">C =</span> <span class="fu">c</span>(<span class="fl">0.3</span>,<span class="fl">0.4</span>,<span class="fl">0.5</span>,<span class="fl">0.6</span>)</span> <span id="cb156-30"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-30" aria-hidden="true" tabindex="-1"></a>)</span> <span id="cb156-31"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-31" aria-hidden="true" tabindex="-1"></a></span> <span id="cb156-32"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-32" aria-hidden="true" tabindex="-1"></a><span class="co"># method: Radial kernel </span></span> <span id="cb156-33"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-33" aria-hidden="true" tabindex="-1"></a><span class="co"># tuneLength: 9 values of the cost function</span></span> <span id="cb156-34"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-34" aria-hidden="true" tabindex="-1"></a><span class="co"># preProc: Center and scale data</span></span> <span id="cb156-35"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-35" aria-hidden="true" tabindex="-1"></a>svm.tune <span class="ot"><-</span> <span class="fu">train</span>(<span class="at">x =</span> trainX, <span class="at">y =</span> trainy, </span> <span id="cb156-36"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-36" aria-hidden="true" tabindex="-1"></a> <span class="at">method =</span> <span class="st">"svmRadial"</span>, <span class="at">tuneLength =</span> <span class="dv">9</span>,</span> <span id="cb156-37"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-37" aria-hidden="true" tabindex="-1"></a> <span class="at">preProc =</span> <span class="fu">c</span>(<span class="st">"center"</span>,<span class="st">"scale"</span>), <span class="at">metric=</span><span class="st">"ROC"</span>,</span> <span id="cb156-38"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-38" aria-hidden="true" tabindex="-1"></a> <span class="at">tuneGrid =</span> grid,</span> <span id="cb156-39"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-39" aria-hidden="true" tabindex="-1"></a> <span class="at">trControl=</span>ctrl)</span> <span id="cb156-40"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-40" aria-hidden="true" tabindex="-1"></a></span> <span id="cb156-41"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb156-41" aria-hidden="true" tabindex="-1"></a>svm.tune</span></code></pre></div> <p></p> <p>Then we can obtain the following results</p> <p></p> <div class="sourceCode" id="cb157"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb157-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Support Vector Machines with Radial Basis Function Kernel </span></span> <span id="cb157-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-2" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb157-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-3" aria-hidden="true" tabindex="-1"></a><span class="do">## 413 samples</span></span> <span id="cb157-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-4" aria-hidden="true" tabindex="-1"></a><span class="do">## 15 predictor</span></span> <span id="cb157-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-5" aria-hidden="true" tabindex="-1"></a><span class="do">## 2 classes: 'Diseased', 'Normal' </span></span> <span id="cb157-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-6" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb157-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-7" aria-hidden="true" tabindex="-1"></a><span class="do">## Pre-processing: centered (15), scaled (15) </span></span> <span id="cb157-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-8" aria-hidden="true" tabindex="-1"></a><span class="do">## Resampling: Cross-Validated (10 fold, repeated 1 times) </span></span> <span id="cb157-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-9" aria-hidden="true" tabindex="-1"></a><span class="do">## Summary of sample sizes: 371, 372, 372, 371, 372, 372, ... </span></span> <span id="cb157-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-10" aria-hidden="true" tabindex="-1"></a><span class="do">## Resampling results across tuning parameters:</span></span> <span id="cb157-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-11" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb157-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-12" aria-hidden="true" tabindex="-1"></a><span class="do">## sigma C ROC Sens Spec </span></span> <span id="cb157-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-13" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.002 0.3 0.8929523 0.9121053 0.5932900</span></span> <span id="cb157-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-14" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.002 0.4 0.8927130 0.8757895 0.6619048</span></span> <span id="cb157-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-15" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.002 0.5 0.8956402 0.8452632 0.7627706</span></span> <span id="cb157-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-16" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.002 0.6 0.8953759 0.8192105 0.7991342</span></span> <span id="cb157-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-17" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.005 0.3 0.8965129 0.8036842 0.8036797</span></span> <span id="cb157-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-18" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.005 0.4 0.8996565 0.7989474 0.8357143</span></span> <span id="cb157-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-19" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.005 0.5 0.9020830 0.7936842 0.8448052</span></span> <span id="cb157-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-20" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.005 0.6 0.9032422 0.7836842 0.8450216</span></span> <span id="cb157-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-21" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.010 0.3 0.9030514 0.7889474 0.8541126</span></span> <span id="cb157-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-22" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.010 0.4 0.9058248 0.7886842 0.8495671</span></span> <span id="cb157-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-23" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.010 0.5 0.9060999 0.8044737 0.8541126</span></span> <span id="cb157-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-24" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.010 0.6 0.9077848 0.8094737 0.8450216</span></span> <span id="cb157-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-25" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.012 0.3 0.9032308 0.7781579 0.8538961</span></span> <span id="cb157-26"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-26" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.012 0.4 0.9049043 0.7989474 0.8538961</span></span> <span id="cb157-27"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-27" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.012 0.5 0.9063505 0.8094737 0.8495671</span></span> <span id="cb157-28"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-28" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.012 0.6 0.9104511 0.8042105 0.8586580</span></span> <span id="cb157-29"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-29" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.015 0.3 0.9060412 0.7886842 0.8493506</span></span> <span id="cb157-30"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-30" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.015 0.4 0.9068165 0.8094737 0.8495671</span></span> <span id="cb157-31"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-31" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.015 0.5 0.9109051 0.8042105 0.8541126</span></span> <span id="cb157-32"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-32" aria-hidden="true" tabindex="-1"></a><span class="do">## 0.015 0.6 0.9118615 0.8042105 0.8632035</span></span> <span id="cb157-33"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-33" aria-hidden="true" tabindex="-1"></a><span class="do">## </span></span> <span id="cb157-34"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-34" aria-hidden="true" tabindex="-1"></a><span class="do">## ROC was used to select the optimal model using the largest </span></span> <span id="cb157-35"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-35" aria-hidden="true" tabindex="-1"></a><span class="do">## value. The final values used for the model were </span></span> <span id="cb157-36"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb157-36" aria-hidden="true" tabindex="-1"></a><span class="do">## sigma = 0.015 and C = 0.6.</span></span></code></pre></div> <p></p> </div> </div> <div id="ensemble-learning" class="section level2 unnumbered"> <h2>Ensemble learning</h2> <div id="rationale-and-formulation-11" class="section level3 unnumbered"> <h3>Rationale and formulation</h3> <p><strong>Ensemble learning</strong> is another example of how we design better learning algorithms. The random forest model is a particular case of <strong>ensemble models</strong>. An ensemble model consists of <span class="math inline">\(K\)</span> <em>base models</em>, denoted as, <span class="math inline">\(h_{1}, h_{2}, \ldots, h_{K}\)</span>. The algorithms to create ensemble models differ from each other in terms of the types of the base models, the way to create diversity in the base models, etc.</p> <p>We have known the random forest model uses Bootstrap to create many datasets and builds a set of decision tree models. Some other ensemble learning methods, such as the <strong>AdaBoost</strong> model, also use decision tree as the base model. The two differ in the way to build a <em>diverse</em> set of base models. The framework of AdaBoost is illustrated in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-AdaBoost">126</a>. AdaBoost employs a sequential process to build its base models: it uses the original dataset (when the weights for the data points are equal) to build a decision tree; then it uses the decision tree to predict on the dataset, obtains the errors, and updates the weights of the data points<label for="tufte-sn-190" class="margin-toggle sidenote-number">190</label><input type="checkbox" id="tufte-sn-190" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">190</span> I.e., those data points that are wrongly classified will gain higher weights.</span>; then it builds another decision tree on the same dataset with the new weights, obtains the errors, and updates the weights of the data points again. The sequential process continues, until a given number of decision trees are built. This sequential process is designed for adaptability: later models focus more on the <em>hard</em> data points that present challenges for previous base models to achieve good prediction performance. Interested readers may find a formal presentation of the AdaBoost algorithm in the <strong>Remarks</strong> section.</p> <p></p> <div class="figure fullwidth"><span style="display:block;" id="fig:f7-AdaBoost"></span> <img src="graphics/adaboost.png" alt="A general framework of AdaBoost" width="80%" /> <p class="caption marginnote shownote"> Figure 126: A general framework of AdaBoost </p> </div> <p></p> <p>The ensemble learning is flexible, given that any model could be a base model. And there are a variety of ways to resample or perturb a dataset to create a diverse set of base models. Like SVM, the ensemble learning is another approach to have a built-in mechanism to reduce the risk of overfitting. Here, we provide a discussion of this built-in mechanism using the framework proposed by Dietterich<label for="tufte-sn-191" class="margin-toggle sidenote-number">191</label><input type="checkbox" id="tufte-sn-191" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">191</span> Dietterrich, T.G., <em>Ensemble methods in machine learning</em>, Multiple Classifier Systems, Springer, 2000.</span>, where three perspectives (statistical, computational, and representational) were used to explain why ensemble methods could lead to robust performance. Each perspective is described in details below.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-23"></span> <img src="graphics/7_EL_stat.png" alt="Ensemble learning approximates the true model with a combination of good models (statistical perspective)" width="100%" /> <!-- <p class="caption marginnote">-->Figure 127: Ensemble learning approximates the true model with a combination of good models (statistical perspective)<!--</p>--> <!--</div>--></span> </p> <p></p> <p><em>Statistical perspective.</em> The statistical reason is illustrated in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-23">127</a>. <span class="math inline">\(\mathcal{H}\)</span> is the model space where a learning algorithm searches for the best model guided by the training data. A model corresponds to a <em>point</em> in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-23">127</a>, e.g., the point labelled as <span class="math inline">\(f\)</span> is the true model. When the data is limited and the best models are multiple, the problem is a statistical one and we need to make an optimal decision despite the uncertainty. This is illustrated by the inner circle in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-23">127</a>. By building an ensemble of multiple base models, e.g., the <span class="math inline">\(h_{1}, h_{2}, \text { and } h_{3}\)</span> in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-23">127</a>, the average of the models is a good approximation to the true model <span class="math inline">\(f\)</span>. This combined solution, comparing with other models that only identify one best model, has less variance, and therefore, could be more robust.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-24"></span> <img src="graphics/7_EL_comp.png" alt=" Ensemble learning provides a robust coverage of the true model (computational perspective)" width="100%" /> <!-- <p class="caption marginnote">-->Figure 128: Ensemble learning provides a robust coverage of the true model (computational perspective)<!--</p>--> <!--</div>--></span> </p> <p></p> <p><em>Computational perspective.</em> A computational perspective is shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-24">128</a>. This perspective concerns the way we build base models. Often greedy approaches such as the recursive splitting procedure are used to solve optimization problems in training machine learning models. This is optimal only in a <em>local</em> sense<label for="tufte-sn-192" class="margin-toggle sidenote-number">192</label><input type="checkbox" id="tufte-sn-192" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">192</span> E.g., to grow a decision tree, at each node, the node is split according to the maximum information gain <em>at this particular node</em>. To grow a decision tree model, a sequence of splits is needed. Optimization of all the splits <em>simultaneously</em> leads to a <em>global</em> optimal solution, but it is a <em>NP-hard</em> problem that is not solved yet. Optimization of each split is more practical, only we know that the local optimal solution may result in suboptimal situations for further splitting of descendant nodes.</span>. As a remedy to this problem, the ensemble learning initializes the learning algorithm (that is greedy and heuristic) from multiple locations in <span class="math inline">\(\mathcal{H}\)</span>, i.e., as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-24">128</a>, three models are identified by the same algorithm that starts from different initial points. Exploring multiple trajectories help us find a robust coverage of the true model <span class="math inline">\(f\)</span>.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-25"></span> <img src="graphics/7_EL_rep.png" alt=" Ensemble learning approximates the true model with a combination of good models (representational perspective)" width="100%" /> <!-- <p class="caption marginnote">-->Figure 129: Ensemble learning approximates the true model with a combination of good models (representational perspective)<!--</p>--> <!--</div>--></span> </p> <p></p> <p><em>Representational perspective.</em> Due to the size of the dataset or the limitations of a model, sometimes the model space <span class="math inline">\(\mathcal{H}\)</span> does not cover the true model, i.e., in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-25">129</a> the true model is outside the region of <span class="math inline">\(\mathcal{H}\)</span>. This is not uncommon in real-world problems, for example, linear models cannot learn nonlinear patterns, or decision trees have difficulty in learning linear patterns. Using multiple base models may provide an approximation of the true model that is outside <span class="math inline">\(\mathcal{H}\)</span>, as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-25">129</a>.</p> </div> <div id="analysis-of-the-decision-tree-random-forests-and-adaboost" class="section level3 unnumbered"> <h3>Analysis of the decision tree, random forests, and AdaBoost</h3> <p>The three models are analyzed using the three perspectives. Results are shown in Table <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#tab:t8-threemodels">31</a>. In-depth discussions are provided in the following.</p> <p><em>Single decision tree.</em> A single decision tree lacks the capability to overcome overfitting in terms of each of the three perspectives. From the statistical perspective, a decision tree algorithm constructs each node using the maximum information gain <em>at that particular node only</em>; thus, random errors in data may mislead subsequent splits. On the other hand, when the training dataset is limited, many models may perform equally well, since there are not enough data to distinguish these models. This results in a large <em>inner circle</em> as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-23">127</a>. With the true model <span class="math inline">\(f\)</span> hidden in a large area in <span class="math inline">\(\mathcal{H}\)</span>, and the sensitivity of the learning algorithm to random noises in data (an issue from the computational perspective), the learning algorithm may end up with a model far away from the true model <span class="math inline">\(f\)</span>.</p> <p></p> <p><!-- <caption>--><span class="marginnote shownote"><span id="tab:t8-threemodels">Table 31: </span>Analysis of the decision tree (DT), random forests (RF), and AdaBoost using the three perspectives</span><!--</caption>--></p> <table> <thead> <tr class="header"> <th align="left"><em>Perspectives</em></th> <th align="left">DT</th> <th align="left">RF</th> <th align="left">AdaBoost</th> </tr> </thead> <tbody> <tr class="odd"> <td align="left">Statistical</td> <td align="left">No</td> <td align="left">Yes</td> <td align="left">No</td> </tr> <tr class="even"> <td align="left">Computational</td> <td align="left">No</td> <td align="left">Yes</td> <td align="left">Yes</td> </tr> <tr class="odd"> <td align="left">Representational</td> <td align="left">No</td> <td align="left">No</td> <td align="left">Yes</td> </tr> </tbody> </table> <p></p> <p>From the representational perspective, there are also limitations of the decision tree model; i.e., in <strong>Chapter 2</strong> we have shown that the decision tree model has difficulty in modeling linear patterns in the data.</p> <p></p> <div class="figure fullwidth"><span style="display:block;" id="fig:f7-RF-analysis"></span> <img src="graphics/7_EL_rf.png" alt="Analysis of the random forest in terms of the statistical (left), computational (middle), and representational (right) perspectives" width="80%" /> <p class="caption marginnote shownote"> Figure 130: Analysis of the random forest in terms of the statistical (left), computational (middle), and representational (right) perspectives </p> </div> <p></p> <p><em>Random forests.</em> From the statistical perspective, the random forest model is a good ensemble learning model. As shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-RF-analysis">130</a> (left), the way the random forest model grows the base models is to construct the <em>circle</em> of dotted line. Models located in this circle of dotted line have reasonably good accuracy. These models may not be the best models with great accuracy, they do provide a good coverage/approximation of the true model.</p> <p>Note that, if we could directly build a model that is close to <span class="math inline">\(f\)</span>, or build many best models that are located in the circle of dotted line, that would be ideal. However, both tasks are challenging. Comparing with these ideal goals, the random forest model is more pragmatic. It cleverly uses <em>simple</em><label for="tufte-sn-193" class="margin-toggle sidenote-number">193</label><input type="checkbox" id="tufte-sn-193" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">193</span> As we have seen, <em>Simple</em> is a complex word.</span> techniques of <em>randomness</em>, i.e., the Bootstrap and the random selection of variables, that are robust, effective, and easy to implement. It grows a set of models that are not the best, but good models. Most importantly, these good models complement each other<label for="tufte-sn-194" class="margin-toggle sidenote-number">194</label><input type="checkbox" id="tufte-sn-194" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">194</span> In practice, the challenge to grow a set of <em>best</em> models is that it usually ends up with these <em>best models</em> more or less being the same.</span>.</p> <p>Random forest model can also address the computational issue. As shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-RF-analysis">130</a> (middle), while the circle of solid line (i.e., that represents the space of best models) is computationally difficult to reach, averaging multiple models could provide a good approximation.</p> <p>It seems that the random forest models do not actively solve the representational issue. If the true model <span class="math inline">\(f\)</span> lies outside <span class="math inline">\(\mathcal{H}\)</span>, as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-RF-analysis">130</a> (right), averaging multiple models won’t necessarily approximate the true model.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-29"></span> <img src="graphics/7_EL_adaBoost.png" alt="Analysis of the AdaBoost in terms of the representational perspective" width="100%" /> <!-- <p class="caption marginnote">-->Figure 131: Analysis of the AdaBoost in terms of the representational perspective<!--</p>--> <!--</div>--></span> </p> <p></p> <p><em>AdaBoost.</em> Similar to random forest, AdaBoost solves the computational issue by generating many base models. The difference is that, AdaBoost actively solves the representational issue, i.e., it tries to do better on the <em>hard</em> data points where the previous base models fail to predict correctly. For each base model in AdaBoost, the training dataset is not resampled by Bootstrap, but weighted based on the error rates from previous base models, i.e., data points that are difficult to be correctly predicted by the previous models are given more weights in the new training dataset for the subsequent base model. Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-29">131</a> shows this sequential learning process helps AdaBoost identify more models around the true model, and put more weight to the models that are closer to the true model.</p> <p>But AdaBoost is not as good as random forest in terms of addressing the statistical issue. As AdaBoost aggressively solves the representational issue and allows its base models to be impacted by some <em>hard</em> data points<label for="tufte-sn-195" class="margin-toggle sidenote-number">195</label><input type="checkbox" id="tufte-sn-195" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">195</span> This is a common root cause for a model to overfit the training data, if the model tries <em>too hard</em> on a particular training data.</span>, it is more likely to overfit, and may be less stable than the random forest models that place more emphasis on addressing the statistical issue.</p> </div> <div id="r-lab-10" class="section level3 unnumbered"> <h3>R Lab</h3> <p>We use the AD dataset to study decision tree (<code>rpart</code> package), random forests (<code>randomForest</code> package), and AdaBoost (<code>gbm</code> package).</p> <p>First, we evaluate the overall performance of the three models. Results are shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-30">132</a>, produced by the following R code.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-30"></span> <img src="graphics/7_30.png" alt="Boxplots of the classification error rates for single decision tree, random forest, and AdaBoost" width="100%" /> <!-- <p class="caption marginnote">-->Figure 132: Boxplots of the classification error rates for single decision tree, random forest, and AdaBoost<!--</p>--> <!--</div>--></span> </p> <p></p> <p></p> <div class="sourceCode" id="cb158"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb158-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-1" aria-hidden="true" tabindex="-1"></a><span class="fu">theme_set</span>(<span class="fu">theme_gray</span>(<span class="at">base_size =</span> <span class="dv">15</span>))</span> <span id="cb158-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(randomForest)</span> <span id="cb158-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-3" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(gbm)</span> <span id="cb158-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-4" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(rpart)</span> <span id="cb158-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-5" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(dplyr)</span> <span id="cb158-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-6" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(RCurl)</span> <span id="cb158-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-7" aria-hidden="true" tabindex="-1"></a>url <span class="ot"><-</span> <span class="fu">paste0</span>(<span class="st">"https://raw.githubusercontent.com"</span>,</span> <span id="cb158-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-8" aria-hidden="true" tabindex="-1"></a> <span class="st">"/analyticsbook/book/main/data/AD.csv"</span>)</span> <span id="cb158-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-9" aria-hidden="true" tabindex="-1"></a>data <span class="ot"><-</span> <span class="fu">read.csv</span>(<span class="at">text=</span><span class="fu">getURL</span>(url))</span> <span id="cb158-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-10" aria-hidden="true" tabindex="-1"></a></span> <span id="cb158-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-11" aria-hidden="true" tabindex="-1"></a>rm_indx <span class="ot"><-</span> <span class="fu">which</span>(<span class="fu">colnames</span>(data) <span class="sc">%in%</span> <span class="fu">c</span>(<span class="st">"ID"</span>, <span class="st">"TOTAL13"</span>,</span> <span id="cb158-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-12" aria-hidden="true" tabindex="-1"></a> <span class="st">"MMSCORE"</span>))</span> <span id="cb158-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-13" aria-hidden="true" tabindex="-1"></a>data <span class="ot"><-</span> data[, <span class="sc">-</span>rm_indx]</span> <span id="cb158-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-14" aria-hidden="true" tabindex="-1"></a>data<span class="sc">$</span>DX_bl <span class="ot"><-</span> <span class="fu">as.factor</span>(data<span class="sc">$</span>DX_bl)</span> <span id="cb158-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-15" aria-hidden="true" tabindex="-1"></a></span> <span id="cb158-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-16" aria-hidden="true" tabindex="-1"></a><span class="fu">set.seed</span>(<span class="dv">1</span>)</span> <span id="cb158-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-17" aria-hidden="true" tabindex="-1"></a></span> <span id="cb158-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-18" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="cn">NULL</span></span> <span id="cb158-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-19" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (K <span class="cf">in</span> <span class="fu">c</span>(<span class="fl">0.2</span>, <span class="fl">0.3</span>, <span class="fl">0.4</span>, <span class="fl">0.5</span>, <span class="fl">0.6</span>, <span class="fl">0.7</span>)) {</span> <span id="cb158-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-20" aria-hidden="true" tabindex="-1"></a></span> <span id="cb158-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-21" aria-hidden="true" tabindex="-1"></a>testing.indices <span class="ot"><-</span> <span class="cn">NULL</span></span> <span id="cb158-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-22" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span><span class="dv">50</span>) {</span> <span id="cb158-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-23" aria-hidden="true" tabindex="-1"></a>testing.indices <span class="ot"><-</span> <span class="fu">rbind</span>(testing.indices, <span class="fu">sample</span>(<span class="fu">nrow</span>(data),</span> <span id="cb158-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-24" aria-hidden="true" tabindex="-1"></a> <span class="fu">floor</span>((<span class="dv">1</span> <span class="sc">-</span> K) <span class="sc">*</span> <span class="fu">nrow</span>(data))))</span> <span id="cb158-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-25" aria-hidden="true" tabindex="-1"></a>}</span> <span id="cb158-26"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-26" aria-hidden="true" tabindex="-1"></a></span> <span id="cb158-27"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-27" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span><span class="fu">nrow</span>(testing.indices)) {</span> <span id="cb158-28"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-28" aria-hidden="true" tabindex="-1"></a></span> <span id="cb158-29"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-29" aria-hidden="true" tabindex="-1"></a> testing.ix <span class="ot"><-</span> testing.indices[i, ]</span> <span id="cb158-30"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-30" aria-hidden="true" tabindex="-1"></a> target.testing <span class="ot"><-</span> data<span class="sc">$</span>DX_bl[testing.ix]</span> <span id="cb158-31"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-31" aria-hidden="true" tabindex="-1"></a> </span> <span id="cb158-32"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-32" aria-hidden="true" tabindex="-1"></a> tree <span class="ot"><-</span> <span class="fu">rpart</span>(DX_bl <span class="sc">~</span> ., data[<span class="sc">-</span>testing.ix, ])</span> <span id="cb158-33"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-33" aria-hidden="true" tabindex="-1"></a> pred <span class="ot"><-</span> <span class="fu">predict</span>(tree, data[testing.ix, ], <span class="at">type =</span> <span class="st">"class"</span>)</span> <span id="cb158-34"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-34" aria-hidden="true" tabindex="-1"></a> error <span class="ot"><-</span> <span class="fu">length</span>(<span class="fu">which</span>(<span class="fu">as.character</span>(pred) <span class="sc">!=</span></span> <span id="cb158-35"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-35" aria-hidden="true" tabindex="-1"></a> target.testing))<span class="sc">/</span><span class="fu">length</span>(target.testing)</span> <span id="cb158-36"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-36" aria-hidden="true" tabindex="-1"></a> err.mat <span class="ot"><-</span> <span class="fu">rbind</span>(err.mat, <span class="fu">c</span>(<span class="st">"tree"</span>, K, error))</span> <span id="cb158-37"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-37" aria-hidden="true" tabindex="-1"></a> </span> <span id="cb158-38"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-38" aria-hidden="true" tabindex="-1"></a> rf <span class="ot"><-</span> <span class="fu">randomForest</span>(DX_bl <span class="sc">~</span> ., data[<span class="sc">-</span>testing.ix, ])</span> <span id="cb158-39"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-39" aria-hidden="true" tabindex="-1"></a> pred <span class="ot"><-</span> <span class="fu">predict</span>(rf, data[testing.ix, ])</span> <span id="cb158-40"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-40" aria-hidden="true" tabindex="-1"></a> error <span class="ot"><-</span> <span class="fu">length</span>(<span class="fu">which</span>(<span class="fu">as.character</span>(pred) <span class="sc">!=</span> </span> <span id="cb158-41"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-41" aria-hidden="true" tabindex="-1"></a> target.testing))<span class="sc">/</span><span class="fu">length</span>(target.testing)</span> <span id="cb158-42"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-42" aria-hidden="true" tabindex="-1"></a> err.mat <span class="ot"><-</span> <span class="fu">rbind</span>(err.mat, <span class="fu">c</span>(<span class="st">"RF"</span>, K, error))</span> <span id="cb158-43"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-43" aria-hidden="true" tabindex="-1"></a> </span> <span id="cb158-44"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-44" aria-hidden="true" tabindex="-1"></a> data1 <span class="ot"><-</span> data</span> <span id="cb158-45"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-45" aria-hidden="true" tabindex="-1"></a> data1<span class="sc">$</span>DX_bl <span class="ot"><-</span> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(data1<span class="sc">$</span>DX_bl))</span> <span id="cb158-46"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-46" aria-hidden="true" tabindex="-1"></a> boost <span class="ot"><-</span> <span class="fu">gbm</span>(DX_bl <span class="sc">~</span> ., <span class="at">data =</span> data1[<span class="sc">-</span>testing.ix, ],</span> <span id="cb158-47"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-47" aria-hidden="true" tabindex="-1"></a> <span class="at">dist =</span> <span class="st">"adaboost"</span>,<span class="at">interaction.depth =</span> <span class="dv">6</span>,</span> <span id="cb158-48"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-48" aria-hidden="true" tabindex="-1"></a> <span class="at">n.tree =</span> <span class="dv">2000</span>) <span class="co">#cv.folds = 5, </span></span> <span id="cb158-49"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-49" aria-hidden="true" tabindex="-1"></a> <span class="co"># best.iter <- gbm.perf(boost,method='cv')</span></span> <span id="cb158-50"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-50" aria-hidden="true" tabindex="-1"></a> pred <span class="ot"><-</span> <span class="fu">predict</span>(boost, data1[testing.ix, ], <span class="at">n.tree =</span> <span class="dv">2000</span>,</span> <span id="cb158-51"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-51" aria-hidden="true" tabindex="-1"></a> <span class="at">type =</span> <span class="st">"response"</span>) <span class="co"># best.iter n.tree = 400, </span></span> <span id="cb158-52"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-52" aria-hidden="true" tabindex="-1"></a> pred[pred <span class="sc">></span> <span class="fl">0.5</span>] <span class="ot"><-</span> <span class="dv">1</span></span> <span id="cb158-53"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-53" aria-hidden="true" tabindex="-1"></a> pred[pred <span class="sc"><=</span> <span class="fl">0.5</span>] <span class="ot"><-</span> <span class="dv">0</span></span> <span id="cb158-54"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-54" aria-hidden="true" tabindex="-1"></a> error <span class="ot"><-</span> <span class="fu">length</span>(<span class="fu">which</span>(<span class="fu">as.character</span>(pred) <span class="sc">!=</span></span> <span id="cb158-55"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-55" aria-hidden="true" tabindex="-1"></a> target.testing))<span class="sc">/</span><span class="fu">length</span>(target.testing)</span> <span id="cb158-56"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-56" aria-hidden="true" tabindex="-1"></a> err.mat <span class="ot"><-</span> <span class="fu">rbind</span>(err.mat, <span class="fu">c</span>(<span class="st">"AdaBoost"</span>, K, error))</span> <span id="cb158-57"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-57" aria-hidden="true" tabindex="-1"></a> }</span> <span id="cb158-58"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-58" aria-hidden="true" tabindex="-1"></a>}</span> <span id="cb158-59"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-59" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="fu">as.data.frame</span>(err.mat)</span> <span id="cb158-60"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-60" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(err.mat) <span class="ot"><-</span> <span class="fu">c</span>(<span class="st">"method"</span>, <span class="st">"training_percent"</span>, <span class="st">"error"</span>)</span> <span id="cb158-61"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-61" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> err.mat <span class="sc">%>%</span> <span class="fu">mutate</span>(<span class="at">training_percent =</span></span> <span id="cb158-62"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-62" aria-hidden="true" tabindex="-1"></a> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(training_percent)), <span class="at">error =</span></span> <span id="cb158-63"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-63" aria-hidden="true" tabindex="-1"></a> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(error)))</span> <span id="cb158-64"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-64" aria-hidden="true" tabindex="-1"></a></span> <span id="cb158-65"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-65" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>() <span class="sc">+</span> <span class="fu">geom_boxplot</span>(<span class="at">data =</span> err.mat <span class="sc">%>%</span></span> <span id="cb158-66"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-66" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">training_percent =</span> <span class="fu">as.factor</span>(training_percent)), </span> <span id="cb158-67"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-67" aria-hidden="true" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">y =</span> error, <span class="at">x =</span> training_percent,</span> <span id="cb158-68"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb158-68" aria-hidden="true" tabindex="-1"></a> <span class="at">color =</span> method)) <span class="sc">+</span> <span class="fu">geom_point</span>(<span class="at">size =</span> <span class="dv">3</span>)</span></code></pre></div> <p></p> <p>Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-30">132</a> shows that the decision tree is less accurate than the other two ensemble methods. The random forest has lower error rates than AdaBoost in general. As the training data size increases, the gap between random forest and AdaBoost decreases. This may indicate that when the training data size is small, the random forest is more stable due to its advantage of addressing the statistical issue. Overall, all models become better as the percentage of the training data increases.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-32"></span> <img src="graphics/7_32.png" alt=" Boxplots of the classification error rates for AdaBoost with a different number of trees" width="100%" /> <!-- <p class="caption marginnote">-->Figure 133: Boxplots of the classification error rates for AdaBoost with a different number of trees<!--</p>--> <!--</div>--></span> </p> <p></p> <p>We adjust the number of trees in AdaBoost and show the results in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-32">133</a>. It can be seen that the error rates first go down as the number of trees increases to <span class="math inline">\(400\)</span>. Then the error rates increase, and decrease again. The unstable relationship between the error rates with the number of trees of AdaBoost indicates that AdaBoost is impacted by some particularity of the dataset and seems less robust than random forest.</p> <p></p> <div class="sourceCode" id="cb159"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb159-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-1" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="cn">NULL</span></span> <span id="cb159-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-2" aria-hidden="true" tabindex="-1"></a><span class="fu">set.seed</span>(<span class="dv">1</span>)</span> <span id="cb159-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-3" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span><span class="fu">nrow</span>(testing.indices)) {</span> <span id="cb159-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-4" aria-hidden="true" tabindex="-1"></a> data1 <span class="ot"><-</span> data</span> <span id="cb159-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-5" aria-hidden="true" tabindex="-1"></a> data1<span class="sc">$</span>DX_bl <span class="ot"><-</span> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(data1<span class="sc">$</span>DX_bl))</span> <span id="cb159-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-6" aria-hidden="true" tabindex="-1"></a> ntree.v <span class="ot"><-</span> <span class="fu">c</span>(<span class="dv">200</span>, <span class="dv">300</span>, <span class="dv">400</span>, <span class="dv">500</span>, <span class="dv">600</span>, <span class="dv">800</span>, <span class="dv">1000</span>, <span class="dv">1200</span>,</span> <span id="cb159-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-7" aria-hidden="true" tabindex="-1"></a> <span class="dv">1400</span>, <span class="dv">1600</span>, <span class="dv">1800</span>, <span class="dv">2000</span>)</span> <span id="cb159-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-8" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (j <span class="cf">in</span> ntree.v) {</span> <span id="cb159-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-9" aria-hidden="true" tabindex="-1"></a> boost <span class="ot"><-</span> <span class="fu">gbm</span>(DX_bl <span class="sc">~</span> ., <span class="at">data =</span> data1[<span class="sc">-</span>testing.ix, ],</span> <span id="cb159-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-10" aria-hidden="true" tabindex="-1"></a> <span class="at">dist =</span> <span class="st">"adaboost"</span>, <span class="at">interaction.depth =</span> <span class="dv">6</span>,</span> <span id="cb159-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-11" aria-hidden="true" tabindex="-1"></a> <span class="at">n.tree =</span> j)</span> <span id="cb159-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-12" aria-hidden="true" tabindex="-1"></a> <span class="co"># best.iter <- gbm.perf(boost,method='cv')</span></span> <span id="cb159-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-13" aria-hidden="true" tabindex="-1"></a> pred <span class="ot"><-</span> <span class="fu">predict</span>(boost, data1[testing.ix, ], <span class="at">n.tree =</span> j,</span> <span id="cb159-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-14" aria-hidden="true" tabindex="-1"></a> <span class="at">type =</span> <span class="st">"response"</span>)</span> <span id="cb159-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-15" aria-hidden="true" tabindex="-1"></a> pred[pred <span class="sc">></span> <span class="fl">0.5</span>] <span class="ot"><-</span> <span class="dv">1</span></span> <span id="cb159-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-16" aria-hidden="true" tabindex="-1"></a> pred[pred <span class="sc"><=</span> <span class="fl">0.5</span>] <span class="ot"><-</span> <span class="dv">0</span></span> <span id="cb159-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-17" aria-hidden="true" tabindex="-1"></a> error <span class="ot"><-</span> <span class="fu">length</span>(<span class="fu">which</span>(<span class="fu">as.character</span>(pred) <span class="sc">!=</span></span> <span id="cb159-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-18" aria-hidden="true" tabindex="-1"></a> target.testing))<span class="sc">/</span><span class="fu">length</span>(target.testing)</span> <span id="cb159-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-19" aria-hidden="true" tabindex="-1"></a> err.mat <span class="ot"><-</span> <span class="fu">rbind</span>(err.mat, <span class="fu">c</span>(<span class="st">"AdaBoost"</span>, j, error))</span> <span id="cb159-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-20" aria-hidden="true" tabindex="-1"></a> }</span> <span id="cb159-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-21" aria-hidden="true" tabindex="-1"></a>}</span> <span id="cb159-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-22" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="fu">as.data.frame</span>(err.mat)</span> <span id="cb159-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-23" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(err.mat) <span class="ot"><-</span> <span class="fu">c</span>(<span class="st">"method"</span>, <span class="st">"num_trees"</span>, <span class="st">"error"</span>)</span> <span id="cb159-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-24" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> err.mat <span class="sc">%>%</span></span> <span id="cb159-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-25" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">num_trees =</span> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(num_trees)), </span> <span id="cb159-26"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-26" aria-hidden="true" tabindex="-1"></a> <span class="at">error =</span> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(error)))</span> <span id="cb159-27"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-27" aria-hidden="true" tabindex="-1"></a></span> <span id="cb159-28"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-28" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>() <span class="sc">+</span> <span class="fu">geom_boxplot</span>(<span class="at">data =</span> err.mat <span class="sc">%>%</span> </span> <span id="cb159-29"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-29" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">num_trees =</span> <span class="fu">as.factor</span>(num_trees)), </span> <span id="cb159-30"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-30" aria-hidden="true" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">y =</span> error, <span class="at">x =</span> num_trees, <span class="at">color =</span> method)) <span class="sc">+</span></span> <span id="cb159-31"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb159-31" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_point</span>(<span class="at">size =</span> <span class="dv">3</span>)</span></code></pre></div> <p></p> <p>We repeat the experiment on random forest and show the result in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-33">134</a>. Similar to AdaBoost, when the number of trees is small, the random forest has higher error rates. Then, the error rates decrease as more trees are added. And the error rates become stable when more trees are added. The random forest handles the statistical issue better than the AdaBoost.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-33"></span> <img src="graphics/7_33.png" alt=" Boxplots of the classification error rates for random forests with a different number of trees" width="100%" /> <!-- <p class="caption marginnote">-->Figure 134: Boxplots of the classification error rates for random forests with a different number of trees<!--</p>--> <!--</div>--></span> </p> <p></p> <p></p> <div class="sourceCode" id="cb160"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb160-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-1" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="cn">NULL</span></span> <span id="cb160-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-2" aria-hidden="true" tabindex="-1"></a><span class="fu">set.seed</span>(<span class="dv">1</span>)</span> <span id="cb160-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-3" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span><span class="fu">nrow</span>(testing.indices)) {</span> <span id="cb160-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-4" aria-hidden="true" tabindex="-1"></a>testing.ix <span class="ot"><-</span> testing.indices[i, ]</span> <span id="cb160-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-5" aria-hidden="true" tabindex="-1"></a>target.testing <span class="ot"><-</span> data<span class="sc">$</span>DX_bl[testing.ix]</span> <span id="cb160-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-6" aria-hidden="true" tabindex="-1"></a></span> <span id="cb160-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-7" aria-hidden="true" tabindex="-1"></a>ntree.v <span class="ot"><-</span> <span class="fu">c</span>(<span class="dv">5</span>, <span class="dv">10</span>, <span class="dv">50</span>, <span class="dv">100</span>, <span class="dv">200</span>, <span class="dv">400</span>, <span class="dv">600</span>, <span class="dv">800</span>, <span class="dv">1000</span>)</span> <span id="cb160-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-8" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (j <span class="cf">in</span> ntree.v) {</span> <span id="cb160-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-9" aria-hidden="true" tabindex="-1"></a>rf <span class="ot"><-</span> <span class="fu">randomForest</span>(DX_bl <span class="sc">~</span> ., data[<span class="sc">-</span>testing.ix, ], <span class="at">ntree =</span> j)</span> <span id="cb160-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-10" aria-hidden="true" tabindex="-1"></a>pred <span class="ot"><-</span> <span class="fu">predict</span>(rf, data[testing.ix, ])</span> <span id="cb160-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-11" aria-hidden="true" tabindex="-1"></a>error <span class="ot"><-</span> <span class="fu">length</span>(<span class="fu">which</span>(<span class="fu">as.character</span>(pred) <span class="sc">!=</span></span> <span id="cb160-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-12" aria-hidden="true" tabindex="-1"></a> target.testing))<span class="sc">/</span><span class="fu">length</span>(target.testing)</span> <span id="cb160-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-13" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="fu">rbind</span>(err.mat, <span class="fu">c</span>(<span class="st">"RF"</span>, j, error))</span> <span id="cb160-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-14" aria-hidden="true" tabindex="-1"></a>}</span> <span id="cb160-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-15" aria-hidden="true" tabindex="-1"></a>}</span> <span id="cb160-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-16" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="fu">as.data.frame</span>(err.mat)</span> <span id="cb160-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-17" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(err.mat) <span class="ot"><-</span> <span class="fu">c</span>(<span class="st">"method"</span>, <span class="st">"num_trees"</span>, <span class="st">"error"</span>)</span> <span id="cb160-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-18" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> err.mat <span class="sc">%>%</span> <span class="fu">mutate</span>(<span class="at">num_trees =</span></span> <span id="cb160-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-19" aria-hidden="true" tabindex="-1"></a> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(num_trees)), </span> <span id="cb160-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-20" aria-hidden="true" tabindex="-1"></a><span class="at">error =</span> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(error)))</span> <span id="cb160-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-21" aria-hidden="true" tabindex="-1"></a></span> <span id="cb160-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-22" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>() <span class="sc">+</span> <span class="fu">geom_boxplot</span>(<span class="at">data =</span> </span> <span id="cb160-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-23" aria-hidden="true" tabindex="-1"></a> err.mat <span class="sc">%>%</span> <span class="fu">mutate</span>(<span class="at">num_trees =</span> <span class="fu">as.factor</span>(num_trees)), </span> <span id="cb160-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-24" aria-hidden="true" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">y =</span> error, <span class="at">x =</span> num_trees, <span class="at">color =</span> method)) <span class="sc">+</span> </span> <span id="cb160-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb160-25" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_point</span>(<span class="at">size =</span> <span class="dv">3</span>)</span></code></pre></div> <p></p> <p>Building on the result shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-33">134</a>, we pursue a further study of the behavior of random forest. Recall that, in random forest, there are two approaches to increase diversity, one is to Bootstrap samples for each tree, while another is to conduct random feature selection for splitting each node.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-36"></span> <img src="graphics/7_36.png" alt=" Boxplots of the classification error rates for random forest with a different sample sizes" width="100%" /> <!-- <p class="caption marginnote">-->Figure 135: Boxplots of the classification error rates for random forest with a different sample sizes<!--</p>--> <!--</div>--></span> </p> <p></p> <p>First, we investigate the effectiveness of the use of Bootstrap. We change the sampling strategy from <em>sampling with replacement</em> to <em>sampling without replacement</em> and change the sampling size<label for="tufte-sn-196" class="margin-toggle sidenote-number">196</label><input type="checkbox" id="tufte-sn-196" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">196</span> The sampling size is the sample size of the Bootstrapped dataset.</span> from <span class="math inline">\(10\%\)</span> to <span class="math inline">\(100\%\)</span>. The number of features tested at each node is kept at the default value, i.e., <span class="math inline">\(\sqrt{p}\)</span>, where <span class="math inline">\(p\)</span> is the number of features. Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-36">135</a> shows that the increased sample size has an impact on the error rates.</p> <p></p> <div class="sourceCode" id="cb161"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb161-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-1" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="cn">NULL</span></span> <span id="cb161-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-2" aria-hidden="true" tabindex="-1"></a><span class="fu">set.seed</span>(<span class="dv">1</span>)</span> <span id="cb161-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-3" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span><span class="fu">nrow</span>(testing.indices)) {</span> <span id="cb161-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-4" aria-hidden="true" tabindex="-1"></a> testing.ix <span class="ot"><-</span> testing.indices[i, ]</span> <span id="cb161-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-5" aria-hidden="true" tabindex="-1"></a> target.testing <span class="ot"><-</span> data<span class="sc">$</span>DX_bl[testing.ix]</span> <span id="cb161-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-6" aria-hidden="true" tabindex="-1"></a> </span> <span id="cb161-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-7" aria-hidden="true" tabindex="-1"></a> sample.size.v <span class="ot"><-</span> <span class="fu">seq</span>(<span class="fl">0.1</span>, <span class="dv">1</span>, <span class="at">by =</span> <span class="fl">0.1</span>)</span> <span id="cb161-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-8" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (j <span class="cf">in</span> sample.size.v) {</span> <span id="cb161-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-9" aria-hidden="true" tabindex="-1"></a> sample.size <span class="ot"><-</span> <span class="fu">floor</span>(<span class="fu">nrow</span>(data[<span class="sc">-</span>testing.ix, ]) <span class="sc">*</span> j)</span> <span id="cb161-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-10" aria-hidden="true" tabindex="-1"></a> rf <span class="ot"><-</span> <span class="fu">randomForest</span>(DX_bl <span class="sc">~</span> ., data[<span class="sc">-</span>testing.ix, ],</span> <span id="cb161-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-11" aria-hidden="true" tabindex="-1"></a> <span class="at">sampsize =</span> sample.size, </span> <span id="cb161-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-12" aria-hidden="true" tabindex="-1"></a> <span class="at">replace =</span> <span class="cn">FALSE</span>)</span> <span id="cb161-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-13" aria-hidden="true" tabindex="-1"></a> pred <span class="ot"><-</span> <span class="fu">predict</span>(rf, data[testing.ix, ])</span> <span id="cb161-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-14" aria-hidden="true" tabindex="-1"></a> error <span class="ot"><-</span> <span class="fu">length</span>(<span class="fu">which</span>(<span class="fu">as.character</span>(pred) <span class="sc">!=</span></span> <span id="cb161-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-15" aria-hidden="true" tabindex="-1"></a> target.testing))<span class="sc">/</span><span class="fu">length</span>(target.testing)</span> <span id="cb161-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-16" aria-hidden="true" tabindex="-1"></a> err.mat <span class="ot"><-</span> <span class="fu">rbind</span>(err.mat, <span class="fu">c</span>(<span class="st">"RF"</span>, j, error))</span> <span id="cb161-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-17" aria-hidden="true" tabindex="-1"></a> }</span> <span id="cb161-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-18" aria-hidden="true" tabindex="-1"></a>}</span> <span id="cb161-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-19" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="fu">as.data.frame</span>(err.mat)</span> <span id="cb161-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-20" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(err.mat) <span class="ot"><-</span> <span class="fu">c</span>(<span class="st">"method"</span>, <span class="st">"sample_size"</span>, <span class="st">"error"</span>)</span> <span id="cb161-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-21" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> err.mat <span class="sc">%>%</span> <span class="fu">mutate</span>(<span class="at">sample_size =</span></span> <span id="cb161-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-22" aria-hidden="true" tabindex="-1"></a> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(sample_size)), </span> <span id="cb161-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-23" aria-hidden="true" tabindex="-1"></a> <span class="at">error =</span> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(error)))</span> <span id="cb161-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-24" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>() <span class="sc">+</span> <span class="fu">geom_boxplot</span>(<span class="at">data =</span> err.mat <span class="sc">%>%</span> </span> <span id="cb161-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-25" aria-hidden="true" tabindex="-1"></a> <span class="fu">mutate</span>(<span class="at">sample_size =</span> <span class="fu">as.factor</span>(sample_size)), </span> <span id="cb161-26"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-26" aria-hidden="true" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">y =</span> error, <span class="at">x =</span> sample_size,<span class="at">color =</span> method)) <span class="sc">+</span> </span> <span id="cb161-27"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb161-27" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_point</span>(<span class="at">size =</span> <span class="dv">3</span>)</span></code></pre></div> <p></p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-35"></span> <img src="graphics/7_35.png" alt=" Boxplots of the classification error rates for random forest with a different number of features" width="100%" /> <!-- <p class="caption marginnote">-->Figure 136: Boxplots of the classification error rates for random forest with a different number of features<!--</p>--> <!--</div>--></span> </p> <p></p> <p>We then investigate the effectiveness of using random selection of features for node splitting. We fix the sampling size to be the same size as the original dataset, and change the number of features to be selected. Results are shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-35">136</a>. When the number of features reaches <span class="math inline">\(11\)</span>, the error rate starts to increase. This is probably because of the loss of the diversity of the trees, i.e., the more features to be used, the less randomness is introduced into the trees.</p> <p></p> <div class="sourceCode" id="cb162"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb162-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-1" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="cn">NULL</span></span> <span id="cb162-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-2" aria-hidden="true" tabindex="-1"></a><span class="fu">set.seed</span>(<span class="dv">1</span>)</span> <span id="cb162-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-3" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> (i <span class="cf">in</span> <span class="dv">1</span><span class="sc">:</span><span class="fu">nrow</span>(testing.indices)) {</span> <span id="cb162-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-4" aria-hidden="true" tabindex="-1"></a> testing.ix <span class="ot"><-</span> testing.indices[i, ]</span> <span id="cb162-5"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-5" aria-hidden="true" tabindex="-1"></a> target.testing <span class="ot"><-</span> data<span class="sc">$</span>DX_bl[testing.ix]</span> <span id="cb162-6"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-6" aria-hidden="true" tabindex="-1"></a> </span> <span id="cb162-7"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-7" aria-hidden="true" tabindex="-1"></a> num.fea.v <span class="ot"><-</span> <span class="dv">1</span><span class="sc">:</span>(<span class="fu">ncol</span>(data) <span class="sc">-</span> <span class="dv">1</span>)</span> <span id="cb162-8"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-8" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (j <span class="cf">in</span> num.fea.v) {</span> <span id="cb162-9"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-9" aria-hidden="true" tabindex="-1"></a> sample.size <span class="ot"><-</span> <span class="fu">nrow</span>(data[<span class="sc">-</span>testing.ix, ])</span> <span id="cb162-10"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-10" aria-hidden="true" tabindex="-1"></a> rf <span class="ot"><-</span> <span class="fu">randomForest</span>(DX_bl <span class="sc">~</span> ., data[<span class="sc">-</span>testing.ix, ],</span> <span id="cb162-11"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-11" aria-hidden="true" tabindex="-1"></a> <span class="at">mtry =</span> j, <span class="at">sampsize =</span> sample.size, </span> <span id="cb162-12"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-12" aria-hidden="true" tabindex="-1"></a> <span class="at">replace =</span> <span class="cn">FALSE</span>)</span> <span id="cb162-13"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-13" aria-hidden="true" tabindex="-1"></a> pred <span class="ot"><-</span> <span class="fu">predict</span>(rf, data[testing.ix, ])</span> <span id="cb162-14"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-14" aria-hidden="true" tabindex="-1"></a> error <span class="ot"><-</span> <span class="fu">length</span>(<span class="fu">which</span>(<span class="fu">as.character</span>(pred) <span class="sc">!=</span></span> <span id="cb162-15"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-15" aria-hidden="true" tabindex="-1"></a> target.testing))<span class="sc">/</span><span class="fu">length</span>(target.testing)</span> <span id="cb162-16"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-16" aria-hidden="true" tabindex="-1"></a> err.mat <span class="ot"><-</span> <span class="fu">rbind</span>(err.mat, <span class="fu">c</span>(<span class="st">"RF"</span>, j, error))</span> <span id="cb162-17"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-17" aria-hidden="true" tabindex="-1"></a> }</span> <span id="cb162-18"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-18" aria-hidden="true" tabindex="-1"></a>}</span> <span id="cb162-19"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-19" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> <span class="fu">as.data.frame</span>(err.mat)</span> <span id="cb162-20"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-20" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(err.mat) <span class="ot"><-</span> <span class="fu">c</span>(<span class="st">"method"</span>, <span class="st">"num_fea"</span>, <span class="st">"error"</span>)</span> <span id="cb162-21"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-21" aria-hidden="true" tabindex="-1"></a>err.mat <span class="ot"><-</span> err.mat <span class="sc">%>%</span> <span class="fu">mutate</span>(num_fea</span> <span id="cb162-22"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-22" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(num_fea)),</span> <span id="cb162-23"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-23" aria-hidden="true" tabindex="-1"></a> <span class="at">error =</span> <span class="fu">as.numeric</span>(<span class="fu">as.character</span>(error)))</span> <span id="cb162-24"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-24" aria-hidden="true" tabindex="-1"></a></span> <span id="cb162-25"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-25" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>() <span class="sc">+</span> <span class="fu">geom_boxplot</span>(<span class="at">data =</span></span> <span id="cb162-26"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-26" aria-hidden="true" tabindex="-1"></a> err.mat <span class="sc">%>%</span> <span class="fu">mutate</span>(<span class="at">num_fea =</span> <span class="fu">as.factor</span>(num_fea)), </span> <span id="cb162-27"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-27" aria-hidden="true" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">y =</span> error, <span class="at">x =</span> num_fea, <span class="at">color =</span> method)) <span class="sc">+</span></span> <span id="cb162-28"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb162-28" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_point</span>(<span class="at">size =</span> <span class="dv">3</span>)</span></code></pre></div> <p></p> </div> </div> <div id="remarks-5" class="section level2 unnumbered"> <h2>Remarks</h2> <div id="is-svm-a-more-complex-model" class="section level3 unnumbered"> <h3>Is SVM a more complex model?</h3> <p>In the preface of his seminar book<label for="tufte-sn-197" class="margin-toggle sidenote-number">197</label><input type="checkbox" id="tufte-sn-197" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">197</span> Vapnik, V., <em>The Nature of Statistical Learning Theory</em>, Springer, 2000.</span>, Vladimir Vapnik wrote that “<em>…during the last few years at different computer science conferences, I heard reiteration of the following claim: ‘Complex theories do not work, simple algorithms do’…this is not true…Nothing is more practical than a good theory…</em>.” He created the concept of <em>VC dimension</em> to specifically characterize his concept of the complexity of a model.</p> <p>A model is often <em>perceived</em> to be complex. The SVM model looks more complex than the linear regression model. It asks us to characterize the margin using model parameters, write the optimization formulation, learn the trick of kernel function, and understand the support vectors and the slack variables for the nonseparable case. But, don’t forget that the reason for a model to look simple is probably only because this model may presuppose stronger conditions, too strong that we forget they are assumptions.</p> <p>It is fair to say that a model is more complex if it provides more capacity to represent the statistical phenomena in the training data. In other words, a more complex model is more flexible to respond to subtle patterns in the data by adjusting itself. In this sense, SVM with kernel functions is a complex model since it can model nonlinearity in the data. But on the other hand, comparing the SVM model with other linear models as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-20">137</a>, it is hard to tell that the SVM model is simpler, but it is clear that it is more stubborn; because of its pursuit of maximum margin, it ends up with one model only. If you are looking for an example of an idea that is radical and conservative, flexible and disciplined, this is it.</p> <p></p> <div class="figure" style="text-align: center"><span style="display:block;" id="fig:f7-20"></span> <p class="caption marginnote shownote"> Figure 137: (Left) some other linear models; (b) the SVM model </p> <img src="graphics/7_20.png" alt="(Left) some other linear models; (b) the SVM model" width="80%" /> </div> <p></p> </div> <div id="is-svm-a-neural-network-model" class="section level3 unnumbered"> <h3>Is SVM a neural network model?</h3> <p>Another interesting fact about SVM is that, when it was developed, it was named “support vector network”<label for="tufte-sn-198" class="margin-toggle sidenote-number">198</label><input type="checkbox" id="tufte-sn-198" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">198</span> Cortes, C. and Vapnik, V., <em>Support-vector networks,</em> Machine Learning, Volume 20, Issue 3, Pages 273–297, 1995.</span>. In other words, it has a connection with the artificial neural network that will be discussed in <strong>Chapter 10</strong>. This is revealed in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-21">138</a>. Readers who know neural network models are encouraged to write up the mathematical model of the SVM model following the neural network format as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-21">138</a>.</p> <p></p> <div class="figure" style="text-align: center"><span style="display:block;" id="fig:f7-21"></span> <p class="caption marginnote shownote"> Figure 138: SVM as a neural network model </p> <img src="graphics/7_21.png" alt="SVM as a neural network model " width="60%" /> </div> <p></p> </div> <div id="derivation-of-the-margin" class="section level3 unnumbered"> <h3>Derivation of the margin</h3> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-margin-proj"></span> <img src="graphics/7_margin_proj.png" alt="Illustration of how to derive the margin" width="100%" /> <!-- <p class="caption marginnote">-->Figure 139: Illustration of how to derive the margin<!--</p>--> <!--</div>--></span> </p> <p></p> <p>Consider any two points on the two margins, e.g., the <span class="math inline">\(\boldsymbol{x}_A\)</span> and <span class="math inline">\(\boldsymbol{x}_B\)</span> in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-margin-proj">139</a>. The <em>margin width</em> is equal to the projection of the vector <span class="math inline">\(\overrightarrow{A B} = \boldsymbol{x}_B - \boldsymbol{x}_A\)</span> on the direction <span class="math inline">\(\boldsymbol{w}\)</span>, which is</p> <p><span class="math display" id="eq:7-marginpre">\[\begin{equation} \small \text{margin } = \frac{ (\boldsymbol{x}_B - \boldsymbol{x}_A) \cdot \vec{\boldsymbol{w}}}{\|\boldsymbol{w}\|}. \tag{81} \end{equation}\]</span></p> <p>It is known that</p> <p><span class="math display">\[\begin{equation*} \small \boldsymbol{w}^{T} \boldsymbol{x}_B + b =1, \end{equation*}\]</span></p> <p>and</p> <p><span class="math display">\[\begin{equation*} \small \boldsymbol{w}^{T} \boldsymbol{x}_A + b = -1. \end{equation*}\]</span></p> <p>Thus, Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-marginpre">(81)</a> is rewritten as</p> <p><span class="math display" id="eq:7-margin">\[\begin{equation} \small \text{margin } = \frac{2}{\|\boldsymbol{w}\|}. \tag{82} \end{equation}\]</span></p> </div> <div id="why-the-nonzero-alpha_n-are-the-support-vectors" class="section level3 unnumbered"> <h3>Why the nonzero <span class="math inline">\(\alpha_n\)</span> are the support vectors</h3> <p>Theoretically, to understand why the nonzero <span class="math inline">\(\alpha_n\)</span> are the support vectors, we can use the <em>Karush–Kuhn–Tucker (KKT) conditions</em><label for="tufte-sn-199" class="margin-toggle sidenote-number">199</label><input type="checkbox" id="tufte-sn-199" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">199</span> Bertsekas, D., <em>Nonlinear Programming: 3rd Edition</em>, Athena Scientific, 2016.</span>. Based on the <em>complementary slackness</em> as one of the KKT conditions, the following equations must hold</p> <p><span class="math display">\[\begin{equation*} \small \alpha_{n}\left[y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right)-1\right]=0 \text {, for } n=1,2, \dots, N. \end{equation*}\]</span></p> <p>Thus, for any data point <span class="math inline">\(\boldsymbol{x}_n\)</span>, it is either</p> <p><span class="math display">\[\begin{equation*} \small \alpha_{n} = 0 \text {, and } y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right)-1 \neq 0; \end{equation*}\]</span></p> <p>or</p> <p><span class="math display">\[\begin{equation*} \small \alpha_{n} \neq 0 \text {, and } y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right)-1 = 0. \end{equation*}\]</span></p> <p>Revisiting Eq. <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#eq:7-5regions">(58)</a> or Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-5regions">119</a>, we know that only the support vectors have <span class="math inline">\(\alpha_{n} \neq 0\)</span> and <span class="math inline">\(y_{n}\left(\boldsymbol{w}^{T} \boldsymbol{x}_{n}+b\right)-1 = 0\)</span>.</p> <!-- % A model is determined by $\boldsymbol{w}$ and $b$. In order for the idea of the *maximum margin* to work, the *margin* of the model has to be a function of the model parameters, $\boldsymbol{w}$ and $b$. This is not necessary always granted in reality. First, let's assume that the *margin* could be characterized by the model parameters. Then, let's concern a simpler problem first. Look at Figure~\@ref(fig:f7-3), and derive the perpendicular distance^[Denoted as $\|A N\|$] between the point $A$ with the line^[The line's mathematical model is $\boldsymbol{w}^{T} \boldsymbol{x}+b=0$]. --> <!-- % To facilitate the derivation, let's further identify two points, $N$ and $B$, on the line. --> <!-- % It is known that --> <!-- % \begin{equation} --> <!-- % \|A N\|=\|A B\| \cos \theta. --> <!-- % (\#eq:7-marginAN) --> <!-- % \end{equation} --> <!-- % And by definition, we know that: --> <!-- % \begin{equation} --> <!-- % \cos \theta = \frac{\overrightarrow{A B} \cdot \vec{\boldsymbol{w}}}{\|A B\|\|\boldsymbol{w}\|}, --> <!-- % (\#eq:7-costheta) --> <!-- % \end{equation} --> <!-- % \noindent where $\overrightarrow{A B}$ is defined as^[$\boldsymbol{x}_{a}$ and --> <!-- % $\boldsymbol{x}_{b}$ are the coordinates of the two data points $A$ and $B$, respectively.]: --> <!-- % \begin{equation} --> <!-- % \overrightarrow{A B} = \boldsymbol{x}_{a}-\boldsymbol{x}_{b}. --> <!-- % (\#eq:7-arrowAB) --> <!-- % \end{equation} --> <!-- % We plug Eq.~\@ref(eq:7-costheta) in Eq.~\@ref(eq:7-marginAN) and get: --> <!-- % \begin{equation} --> <!-- % \|A N\| = \frac{\overrightarrow{A B} \cdot \vec{\boldsymbol{w}}}{\|\boldsymbol{w}\|}. --> <!-- % (\#eq:7-marginAN2) --> <!-- % \end{equation} --> <!-- % We plug Eq.~\@ref(eq:7-arrowAB) in Eq.~\@ref(eq:7-marginAN2) and get: --> <!-- % \begin{equation*} \small --> <!-- % \|A N\| = \frac{\overrightarrow{A B} \cdot \vec{\boldsymbol{w}}}{\|\boldsymbol{w}\|}=\frac{\boldsymbol{w}^{T}\left(\boldsymbol{x}_{a}-\boldsymbol{x}_{b}\right)}{\|\boldsymbol{w}\|}. --> <!-- % \end{equation*} --> <!-- % Recall that the data point $B$ is a data point on the line $\boldsymbol{w}^{T} \boldsymbol{x}+b=0$, --> <!-- % \begin{equation} --> <!-- % \boldsymbol{w}^{T} \boldsymbol{x}_{b} = -b. --> <!-- % \end{equation} --> <!-- % We can finally derive the mathematical expression of $\|A N\|$ in terms of $\boldsymbol{w}$ and $b$: --> <!-- % \begin{equation} --> <!-- % \|A N\|=\frac{\boldsymbol{w}^{T}\left(\boldsymbol{x}_{a}-\boldsymbol{x}_{\boldsymbol{b}}\right)}{\|\boldsymbol{w}\|}=\frac{\boldsymbol{w}^{T} \boldsymbol{x}_{\boldsymbol{a}}+b}{\|\boldsymbol{w}\|}. --> <!-- % (\#eq:7-ANfinal) --> <!-- % \end{equation} --> <!-- % This is a major step towards the development of the *maximum margin* for SVM. To see that, let's apply this conclusion Eq.~\@ref(eq:7-ANfinal) on Figure \@ref(fig:f7-2) to obtain Figure \@ref(fig:f7-4). --> </div> <div id="adaboost-algorithm" class="section level3 unnumbered"> <h3>AdaBoost algorithm</h3> <p>The specifics of the AdaBoost algorithm shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-AdaBoost">126</a> are described below.</p> <p><!-- begin{itemize} --></p> <ul> <li><p>Input: <span class="math inline">\(N\)</span> data points, <span class="math inline">\(\left(\boldsymbol{x}_{1}, y_{1}\right),\left(\boldsymbol{x}_{2}, y_{2}\right), \ldots,\left(\boldsymbol{x}_{N}, y_{N}\right)\)</span>.</p></li> <li><p>Initialization: Initialize equal weights for all data points <span class="math display">\[\begin{equation*} \small \boldsymbol{w}_{0}=\left(\frac{1}{N}, \ldots, \frac{1}{N}\right). \end{equation*}\]</span></p></li> <li><p>At iteration <span class="math inline">\(t\)</span>:</p></li> </ul> <p><!-- begin{itemize} --></p> <ul> <li><p>Step 1: Build model <span class="math inline">\(h_t\)</span> on the dataset with weights <span class="math inline">\(\boldsymbol{w}_{t-1}\)</span>.</p></li> <li><p>Step 2: Calculate errors using <span class="math inline">\(h_t\)</span> <span class="math display">\[\begin{equation*} \small \epsilon_{t}=\sum_{n=1}^{N} w_{t, n}\left\{h_{t}\left(x_{n}\right) \neq y_{n}\right\}. \end{equation*}\]</span></p></li> <li><p>Step 3: Update weights of the data points <span class="math display">\[\begin{equation*} \small \boldsymbol{w}_{t+1, i}=\frac{w_{t, i}}{Z_{t}} \times \left\{\begin{array}{c}{e^{-\alpha_{t}} \text { if } h_{t}\left(x_{n}\right)=y_{n}} \\ {e^{\alpha_{t}} \text { if } h_{t}\left(x_{n}\right) \neq y_{n}}.\end{array} \right. \end{equation*}\]</span> Here, <span class="math display">\[\begin{equation*} \small Z_{t} \text { is a normalization factor so that } \sum_{n=1}^{N} w_{t+1, n}=1, \end{equation*}\]</span> and <span class="math display">\[\begin{equation*} \small \alpha_{t}=\frac{1}{2} \ln \left(\frac{1-\epsilon_{t}}{\epsilon_{t}}\right). \end{equation*}\]</span></p></li> </ul> <p><!-- end{itemize} --></p> <ul> <li><p>Iterations: Repeat Step 1 to Step 3 for <span class="math inline">\(T\)</span> times, to get <span class="math inline">\(h_1\)</span>, <span class="math inline">\(h_2\)</span>, <span class="math inline">\(h_3\)</span>, <span class="math inline">\(\ldots\)</span>, <span class="math inline">\(h_T\)</span>.</p></li> <li><p>Output: <span class="math display">\[\begin{equation*} \small H(x)=\operatorname{sign}\left(\sum_{t=1}^{T} \alpha_{t} h_{t}(x)\right). \end{equation*}\]</span></p></li> </ul> <p><!-- end{itemize} --></p> <p>When all the base models are trained, the aggregation of these models in predicting on a data instance <span class="math inline">\(\boldsymbol{x}\)</span> is a weighted sum of base models</p> <p><span class="math display">\[\begin{equation*} \small h(\boldsymbol{x})=\sum_{i} \gamma_{i} h_{i}(\boldsymbol{x}), \end{equation*}\]</span></p> <p>where the weight <span class="math inline">\(\gamma_{i}\)</span> is proportional to the accuracy of <span class="math inline">\(h_{i}(x)\)</span> on the training dataset.</p> </div> </div> <div id="exercises-5" class="section level2 unnumbered"> <h2>Exercises</h2> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-hw-sv"></span> <img src="graphics/7_hw_sv.png" alt="How many support vectors are needed?" width="100%" /> <!-- <p class="caption marginnote">-->Figure 140: How many support vectors are needed?<!--</p>--> <!--</div>--></span> </p> <p></p> <p><!-- begin{enumerate} --></p> <p>1. To build a linear SVM on the data shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-hw-sv">140</a>, how many support vectors are needed (use visual inspection)?</p> <p>2. Let’s consider the dataset in Table <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#tab:t7-hw-svm">32</a>. Please (a) draw scatterplots and identify the support vectors if you’d like to build a linear SVM classifier; (b) manually derive the alpha values (i.e., the <span class="math inline">\(\alpha_i\)</span>) for the support vectors and the offset parameter <span class="math inline">\(b\)</span>; (c) derive the weight vector (i.e., the <span class="math inline">\(\hat{\boldsymbol{w}}\)</span>) of the SVM model; and (d) predict on the new dataset and fill in the column of <span class="math inline">\(y\)</span> in Table <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#tab:t7-hw-svm-test">33</a>.</p> <p></p> <p><!-- <caption>--><span class="marginnote shownote"><span id="tab:t7-hw-svm">Table 32: </span>Dataset for building a SVM model in Q2</span><!--</caption>--></p> <table> <thead> <tr class="header"> <th align="left">ID</th> <th align="left"><span class="math inline">\(x_1\)</span></th> <th align="left"><span class="math inline">\(x_2\)</span></th> <th align="left"><span class="math inline">\(x_3\)</span></th> <th align="left"><span class="math inline">\(y\)</span></th> </tr> </thead> <tbody> <tr class="odd"> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(4\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> </tr> <tr class="even"> <td align="left"><span class="math inline">\(2\)</span></td> <td align="left"><span class="math inline">\(4\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> <td align="left"><span class="math inline">\(0\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> </tr> <tr class="odd"> <td align="left"><span class="math inline">\(3\)</span></td> <td align="left"><span class="math inline">\(8\)</span></td> <td align="left"><span class="math inline">\(2\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> </tr> <tr class="even"> <td align="left"><span class="math inline">\(4\)</span></td> <td align="left"><span class="math inline">\(-2.5\)</span></td> <td align="left"><span class="math inline">\(0\)</span></td> <td align="left"><span class="math inline">\(0\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> </tr> <tr class="odd"> <td align="left"><span class="math inline">\(5\)</span></td> <td align="left"><span class="math inline">\(0\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> </tr> <tr class="even"> <td align="left"><span class="math inline">\(6\)</span></td> <td align="left"><span class="math inline">\(-0.3\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> <td align="left"><span class="math inline">\(0\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> </tr> <tr class="odd"> <td align="left"><span class="math inline">\(7\)</span></td> <td align="left"><span class="math inline">\(2.5\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> </tr> <tr class="even"> <td align="left"><span class="math inline">\(8\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(0\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> </tr> </tbody> </table> <p></p> <p></p> <p><!-- <caption>--><span class="marginnote shownote"><span id="tab:t7-hw-svm-test">Table 33: </span>Test data points for the SVM model in Q2</span><!--</caption>--></p> <table> <thead> <tr class="header"> <th align="left">ID</th> <th align="left"><span class="math inline">\(x_1\)</span></th> <th align="left"><span class="math inline">\(x_2\)</span></th> <th align="left"><span class="math inline">\(x_3\)</span></th> <th align="left"><span class="math inline">\(y\)</span></th> </tr> </thead> <tbody> <tr class="odd"> <td align="left"><span class="math inline">\(9\)</span></td> <td align="left"><span class="math inline">\(5.4\)</span></td> <td align="left"><span class="math inline">\(1.2\)</span></td> <td align="left"><span class="math inline">\(2\)</span></td> <td align="left"></td> </tr> <tr class="even"> <td align="left"><span class="math inline">\(10\)</span></td> <td align="left"><span class="math inline">\(1.5\)</span></td> <td align="left"><span class="math inline">\(-2\)</span></td> <td align="left"><span class="math inline">\(3\)</span></td> <td align="left"></td> </tr> <tr class="odd"> <td align="left"><span class="math inline">\(11\)</span></td> <td align="left"><span class="math inline">\(-3.4\)</span></td> <td align="left"><span class="math inline">\(1\)</span></td> <td align="left"><span class="math inline">\(-2\)</span></td> <td align="left"></td> </tr> <tr class="even"> <td align="left"><span class="math inline">\(12\)</span></td> <td align="left"><span class="math inline">\(-2.2\)</span></td> <td align="left"><span class="math inline">\(-1\)</span></td> <td align="left"><span class="math inline">\(-4\)</span></td> <td align="left"></td> </tr> </tbody> </table> <p></p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-hw-2class"></span> <img src="graphics/7_hw_2class.png" alt="A dataset with two classes" width="80%" /> <!-- <p class="caption marginnote">-->Figure 141: A dataset with two classes<!--</p>--> <!--</div>--></span> </p> <p> 3. Follow up on the dataset used in Q2. Use the R pipeline for SVM on this data. Compare the alpha values (i.e., the <span class="math inline">\(\alpha_i\)</span>), the offset parameter <span class="math inline">\(b\)</span>, and the weight vector (i.e., the <span class="math inline">\(\hat{\boldsymbol{w}}\)</span>) from R and the result by your manual calculation in Q2.</p> <p></p> <p> <span class="marginnote shownote"> <!-- <div class="figure">--><span style="display:block;" id="fig:f7-svm-visual"></span> <img src="graphics/7_svm_visual.png" alt="Visualization of the decision boundary of an SVM model with Gaussian kernel" width="80%" /> <!-- <p class="caption marginnote">-->Figure 142: Visualization of the decision boundary of an SVM model with Gaussian kernel<!--</p>--> <!--</div>--></span> </p> <p> 4. Modify the R pipeline for Bootstrap and incorporate the <code>glm</code> package to write your own version of ensemble learning that ensembles a set of logistic regression models. Test it using the same data that has been used in the R lab for logistic regression models.</p> <p>5. Use the dataset <code>PimaIndiansDiabetes2</code> in the <code>mlbench</code> R package, run the R SVM pipeline on it, and summarize your findings.</p> <p>6. Use R to generate a dataset with two classes as shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-hw-2class">141</a>. Then, run SVM model with a properly selected kernel function on this dataset.</p> <p>7. Follow up on the dataset generated in Q6. Try visualizing the decision boundaries by different kernel functions such as linear, Laplace, Gaussian, and polynomial kernel functions. Below is one example using Gaussian kernel with its bandiwidth parameter <span class="math inline">\(\gamma = 0.2\)</span>.<label for="tufte-sn-200" class="margin-toggle sidenote-number">200</label><input type="checkbox" id="tufte-sn-200" class="margin-toggle"><span class="sidenote"><span class="sidenote-number">200</span> In the following R code, the bandiwidth parameter is specified as <code>sigma=0.2</code>.</span> Result is shown in Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-svm-visual">142</a>. The blackened points are support vectors, and the contour reflects the characteristics of the decision boundary.</p> <p>The R code for generating Figure <a href="chapter-7.-learning-ii-svm-ensemble-learning.html#fig:f7-svm-visual">142</a> is shown below.</p> <p>Please follow this example and visualize linear, Laplace, Gaussian, and polynomial kernel functions with different parameter values.</p> <p><!-- end{enumerate} --></p> <p></p> <div class="sourceCode" id="cb163"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb163-1"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb163-1" aria-hidden="true" tabindex="-1"></a><span class="fu">require</span>( <span class="st">'kernlab'</span> )</span> <span id="cb163-2"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb163-2" aria-hidden="true" tabindex="-1"></a>rbf.svm <span class="ot"><-</span> <span class="fu">ksvm</span>(y <span class="sc">~</span> ., <span class="at">data=</span>data, <span class="at">type=</span><span class="st">'C-svc'</span>, <span class="at">kernel=</span><span class="st">'rbfdot'</span>,</span> <span id="cb163-3"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb163-3" aria-hidden="true" tabindex="-1"></a> <span class="at">kpar=</span><span class="fu">list</span>(<span class="at">sigma=</span><span class="fl">0.2</span>), <span class="at">C=</span><span class="dv">100</span>, <span class="at">scale=</span><span class="fu">c</span>())</span> <span id="cb163-4"><a href="chapter-7.-learning-ii-svm-ensemble-learning.html#cb163-4" aria-hidden="true" tabindex="-1"></a><span class="fu">plot</span>(rbf.svm, <span class="at">data=</span>data)</span></code></pre></div> <p></p> <!-- \begin{figure*} --> <!-- \centering --> <!-- \checkoddpage \ifoddpage \forcerectofloat \else \forceversofloat \fi --> <!-- \includegraphics[width = 0.05\textwidth]{graphics/9points_4lines2.png} --> <!-- \end{figure*} --> </div> </div> <p style="text-align: center;"> <a href="chapter-6.-diagnosis-residuals-heterogeneity.html"><button class="btn btn-default">Previous</button></a> <a href="chapter-8.-scalability-lasso-pca.html"><button class="btn btn-default">Next</button></a> </p> </div> </div> <script src="js/jquery.js"></script> <script src="js/tablesaw-stackonly.js"></script> <script src="js/nudge.min.js"></script> <script> // add bootstrap table styles to pandoc tables $(document).ready(function () { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); }); </script> </body> </html>