{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide-input", "output-scoll" ] }, "outputs": [], "source": [ "#Install the necessary dependencies\n", "import os\n", "import sys\n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython python_utils" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Managing data\n", "\n", "## Build a regression model using Scikit-learn: prepare and visualize data\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/ml-regression/data-visualization.png\n", "---\n", "name: 'Data visualization infographic'\n", "width: 90%\n", "---\n", "Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)\n", ":::\n", "\n", "## Introduction\n", "\n", "Now that you are set up with the tools you need to start tackling machine learning model building with Scikit-learn, you are ready to start asking questions of your data. As you work with data and apply ML solutions, it's very important to understand how to ask the right question to properly unlock the potentials of your dataset.\n", "\n", "In this section, you will learn:\n", "\n", "- How to prepare your data for model-building.\n", "- How to use Matplotlib for data visualization.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ":::{seealso}\n", "Click to watch : Preparing and Visualizing data video.\n", ":::" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "hide-input", "output-scoll" ] }, "outputs": [ { "data": { "text/html": [ "\n", "
| \n", " | City Name | \n", "Type | \n", "Package | \n", "Variety | \n", "Sub Variety | \n", "Grade | \n", "Date | \n", "Low Price | \n", "High Price | \n", "Mostly Low | \n", "... | \n", "Unit of Sale | \n", "Quality | \n", "Condition | \n", "Appearance | \n", "Storage | \n", "Crop | \n", "Repack | \n", "Trans Mode | \n", "Unnamed: 24 | \n", "Unnamed: 25 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "BALTIMORE | \n", "NaN | \n", "24 inch bins | \n", "NaN | \n", "NaN | \n", "NaN | \n", "4/29/17 | \n", "270.0 | \n", "280.0 | \n", "270.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "E | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 1 | \n", "BALTIMORE | \n", "NaN | \n", "24 inch bins | \n", "NaN | \n", "NaN | \n", "NaN | \n", "5/6/17 | \n", "270.0 | \n", "280.0 | \n", "270.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "E | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 2 | \n", "BALTIMORE | \n", "NaN | \n", "24 inch bins | \n", "HOWDEN TYPE | \n", "NaN | \n", "NaN | \n", "9/24/16 | \n", "160.0 | \n", "160.0 | \n", "160.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "N | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 3 | \n", "BALTIMORE | \n", "NaN | \n", "24 inch bins | \n", "HOWDEN TYPE | \n", "NaN | \n", "NaN | \n", "9/24/16 | \n", "160.0 | \n", "160.0 | \n", "160.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "N | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 4 | \n", "BALTIMORE | \n", "NaN | \n", "24 inch bins | \n", "HOWDEN TYPE | \n", "NaN | \n", "NaN | \n", "11/5/16 | \n", "90.0 | \n", "100.0 | \n", "90.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "N | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
5 rows × 26 columns
\n", "