{ "cells": [ { "cell_type": "markdown", "id": "57d33512", "metadata": {}, "source": [ "# Lesson 11 activity solution\n", "\n", "**Note**: we're not done yet! These plots fulfill the requirements, but in many cases we can do a lot better!\n", "\n", "## Problem statement:\n", "Analyze the housing dataset using various types of plots to gain insights into the data.\n", "\n", "Housing sales dataset avalible here: [housing_data.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/housing_data.csv)\n", "\n", "## Steps to perform:\n", "1. Create a line plot to visualize the trend of house prices over the years.\n", "2. Use a scatter plot to visualize the relationship between two numerical variables, such as `LotArea` and `SalePrice`.\n", "3. Create a bar chart to show the count of houses in each `Neighborhood`.\n", "4. Use a box plot to visualize the distribution of `SalePrice` in each `Neighborhood`.\n", "5. Create a pie chart to visualize the proportion of houses that fall into each `MSZoning` category.\n", "6. Use a 3D scatter plot to visualize `LotArea`, `OverallQual`, and `SalePrice` together." ] }, { "cell_type": "code", "execution_count": 41, "id": "85728eb5", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "from matplotlib.ticker import FuncFormatter\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "id": "05c072d6", "metadata": {}, "source": [ "## Load the dataset" ] }, { "cell_type": "code", "execution_count": 42, "id": "3d78949f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Unnamed: 0 | \n", "MSSubClass | \n", "MSZoning | \n", "LotFrontage | \n", "LotArea | \n", "Street | \n", "Alley | \n", "LotShape | \n", "LandContour | \n", "Utilities | \n", "... | \n", "PoolArea | \n", "PoolQC | \n", "Fence | \n", "MiscFeature | \n", "MiscVal | \n", "MoSold | \n", "YrSold | \n", "SaleType | \n", "SaleCondition | \n", "SalePrice | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "SC60 | \n", "RL | \n", "65.0 | \n", "8450 | \n", "Pave | \n", "NaN | \n", "Reg | \n", "Lvl | \n", "AllPub | \n", "... | \n", "0 | \n", "No | \n", "No | \n", "No | \n", "0 | \n", "Feb | \n", "2008 | \n", "WD | \n", "Normal | \n", "208500 | \n", "
| 1 | \n", "1 | \n", "SC20 | \n", "RL | \n", "80.0 | \n", "9600 | \n", "Pave | \n", "NaN | \n", "Reg | \n", "Lvl | \n", "AllPub | \n", "... | \n", "0 | \n", "No | \n", "No | \n", "No | \n", "0 | \n", "May | \n", "2007 | \n", "WD | \n", "Normal | \n", "181500 | \n", "
| 2 | \n", "2 | \n", "SC60 | \n", "RL | \n", "68.0 | \n", "11250 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "AllPub | \n", "... | \n", "0 | \n", "No | \n", "No | \n", "No | \n", "0 | \n", "Sep | \n", "2008 | \n", "WD | \n", "Normal | \n", "223500 | \n", "
| 3 | \n", "3 | \n", "SC70 | \n", "RL | \n", "60.0 | \n", "9550 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "AllPub | \n", "... | \n", "0 | \n", "No | \n", "No | \n", "No | \n", "0 | \n", "Feb | \n", "2006 | \n", "WD | \n", "Abnorml | \n", "140000 | \n", "
| 4 | \n", "4 | \n", "SC60 | \n", "RL | \n", "84.0 | \n", "14260 | \n", "Pave | \n", "NaN | \n", "IR1 | \n", "Lvl | \n", "AllPub | \n", "... | \n", "0 | \n", "No | \n", "No | \n", "No | \n", "0 | \n", "Dec | \n", "2008 | \n", "WD | \n", "Normal | \n", "250000 | \n", "
5 rows × 81 columns
\n", "| \n", " | Houses | \n", "Percentage | \n", "
|---|---|---|
| MSZoning | \n", "\n", " | \n", " |
| RL | \n", "1151 | \n", "78.8% | \n", "
| RM | \n", "218 | \n", "14.9% | \n", "
| FV | \n", "65 | \n", "4.5% | \n", "
| RH | \n", "16 | \n", "1.1% | \n", "
| C (all) | \n", "10 | \n", "0.7% | \n", "