{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Week 4: Supervised Learning\n", "\n", "The end of week four marked the half way point of the structured curriculum portion of the program. The entire cohort is starting to get the drill down. Learn about a topic in the morning, implement it by hand to see the nuts a bolts, and then finish up with the sklearn version to validate our results. \n", "\n", "Supervised Learning is all about classifying data to a category, or most often, to a 0 or a 1. For example, if you have data about high school students and you know if they got into college or not, you can use a model to predict whether a current high school student will get into a college in the future.\n", "\n", "Its great to see topics that once were mystical, magic, and only for the technical minds, come into light. Random Forest for example, I could remember attending data talks at the beginning of the year and have no idea what they were talking about. Decision trees? Forget about it. However, honestly in just these four weeks we have broke it down piece by piece and are being taught a full picture. I couldn't whiteboard a proof for you about all of the equations, but I can tell you how and why you need to tune various parameters to get the result that you want. When it comes to business application, you need to know when you're overfitting and how to adjust each model to minimize error on your test set.\n", "\n", "I say this every week, but I'm going to have to say it again. It's awesome to learn about technical models, and then apply those same applications to everyday life. For example, decision trees. When ever you come up on a choice, or are trying to classify a group/experience/product...anything really, you naturally think to yourself, \"What matters most if I wanted to divide this group up?\" Once you find out what feature you think is most important, intuition tells me, \"ok now within that feature...where should I set my threshold to divide the set up?\"\n", "\n", "For a concrete example, imagine I gave you a list of 10,000 people along with their height, weight, gpa, iq, and eye color. Then I told you, tell me which feature and what threshold to split on if I wanted to find out which one's were basketball players and which ones were not. I'm guessing most of you would pick height as the most important feature. Then we have to pick a threshold of where to split the group. I'm going to guess (very general) that anyone under 6ft isn't going to make an awesome basketball player. So split the group there and you'll have yourself two groups that are a lot more defined to answer your question. Do this over and over again to classify your whole group!\n", "\n", "Next, on to week 5, Natural Language Processing and Web Work. \n", "\n", "Topics of the week:\n", "1. kNN\n", "2. Decision Trees\n", "3. Entropy/Information Gain/Gini Impurity\n", "4. Random Forest\n", "5. Bagging/Boosting/Testing with Out Of Bag observations\n", "6. Maximum Margin/Support Vector Classifier/SVM/Tuning with Kernals\n", "7. Gradient Boosting/AdaBoosting\n", "8. Profit Curves\n", "\n", "Check out [week three here](https://github.com/gkamradt/Zipfian-GSchool/blob/master/Week_3_Regression.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }