{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Programming Exercise 7:\n", "# K-means Clustering and Principal Component Analysis\n", "\n", "## Introduction\n", "\n", "In this exercise, you will implement the K-means clustering algorithm and apply it to compress an image. In the second part, you will use principal component analysis to find a low-dimensional representation of face images. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.\n", "\n", "All the information you need for solving this assignment is in this notebook, and all the code you will be implementing will take place within this notebook. The assignment can be promptly submitted to the coursera grader directly from this notebook (code and instructions are included below).\n", "\n", "Before we begin with the exercises, we need to import all libraries required for this programming exercise. Throughout the course, we will be using [`numpy`](http://www.numpy.org/) for all arrays and matrix operations, [`matplotlib`](https://matplotlib.org/) for plotting, and [`scipy`](https://docs.scipy.org/doc/scipy/reference/) for scientific and numerical computation functions and tools. You can find instructions on how to install required libraries in the README file in the [github repository](https://github.com/dibgerge/ml-coursera-python-assignments)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# used for manipulating directory paths\n", "import os\n", "\n", "# Scientific and vector computation for python\n", "import numpy as np\n", "\n", "# Import regular expressions to process emails\n", "import re\n", "\n", "# Plotting library\n", "from matplotlib import pyplot\n", "from mpl_toolkits.mplot3d import Axes3D\n", "import matplotlib as mpl\n", "\n", "from IPython.display import HTML, display, clear_output\n", "\n", "try:\n", " pyplot.rcParams[\"animation.html\"] = \"jshtml\"\n", "except ValueError:\n", " pyplot.rcParams[\"animation.html\"] = \"html5\"\n", "\n", "# Optimization module in scipy\n", "from scipy import optimize\n", "\n", "# will be used to load MATLAB mat datafile format\n", "from scipy.io import loadmat\n", "\n", "# library written for this exercise providing additional functions for assignment submission, and others\n", "import utils\n", "\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "# define the submission/grader object for this exercise\n", "grader = utils.Grader()\n", "\n", "# tells matplotlib to embed plots within the notebook\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submission and Grading\n", "\n", "\n", "After completing each part of the assignment, be sure to submit your solutions to the grader. The following is a breakdown of how each part of this exercise is scored.\n", "\n", "\n", "| Section | Part | Submitted Function | Points |\n", "| :- |:- |:- | :-: |\n", "| 1 | [Find Closest Centroids](#section1) | [`findClosestCentroids`](#findClosestCentroids) | 30 |\n", "| 2 | [Computed Centroid Means](#section2) | [`computeCentroids`](#computeCentroids) | 30 |\n", "| 3 | [PCA](#section3) | [`pca`](#pca) | 20 |\n", "| 4 | [Project Data](#section4) | [`projectData`](#projectData) | 10 |\n", "| 5 | [Recover Data](#section5) | [`recoverData`](#recoverData) | 10 |\n", "| | Total Points | |100 |\n", "\n", "\n", "You are allowed to submit your solutions multiple times, and we will take only the highest score into consideration.\n", "\n", "
\n", " | \n", " |