{ "cells": [ { "cell_type": "markdown", "id": "1a58fe99", "metadata": {}, "source": [ "# Lesson 11 activity: data visualization\n", "\n", "In this activity, you'll use matplotlib, seaborn, and plotly to explore a housing sales dataset through a series of plots.\n", "\n", "## Problem statement:\n", "Analyze the housing dataset using various types of plots to gain insights into the data.\n", "\n", "Housing sales dataset available here: [housing_data.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/housing_data.csv)\n", "\n", "## Steps to complete:\n", "1. Create a line plot to visualize the trend of house prices over the years.\n", "2. Use a scatter plot to visualize the relationship between `LotArea` and `SalePrice`.\n", "3. Create a bar chart to show the count of houses in each `Neighborhood`.\n", "4. Use a box plot to visualize the distribution of `SalePrice` in each `Neighborhood`.\n", "5. Create a pie chart to visualize the proportion of houses in each `MSZoning` category.\n", "6. Use a 3D scatter plot to visualize `LotArea`, `OverallQual`, and `SalePrice` together." ] }, { "cell_type": "code", "execution_count": null, "id": "f4533d40", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "from matplotlib.ticker import FuncFormatter\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "id": "81325909", "metadata": {}, "source": [ "## Load the dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "78d3abbd", "metadata": {}, "outputs": [], "source": [ "url = 'https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/housing_data.csv'\n", "df = pd.read_csv(url)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "e2411c29", "metadata": {}, "outputs": [], "source": [ "df.info()" ] }, { "cell_type": "markdown", "id": "1ea17f5f", "metadata": {}, "source": [ "## 1. House price trends over time\n", "\n", "Create a line plot showing how mean sale price changed each year. Include error bars showing the standard deviation.\n", "\n", "**Hints:**\n", "- Convert `YrSold` to string so it's treated as a categorical axis\n", "- Use `groupby` to calculate mean and standard deviation per year\n", "- `plt.errorbar()` can draw the error bars; `plt.plot()` draws the line\n", "- Use `FuncFormatter` to scale the y-axis to thousands" ] }, { "cell_type": "code", "execution_count": null, "id": "b470ec57", "metadata": {}, "outputs": [], "source": [ "# Convert sale year to string so it's treated as a categorical variable\n", "df['YrSold'] = df['YrSold'].astype(str)\n", "\n", "# Group by year and calculate mean and standard deviation of sale price\n", "# Your code here\n", "\n", "# Create a line plot with error bars\n", "# Your code here\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "8d1196f5", "metadata": {}, "source": [ "## 2. LotArea vs SalePrice\n", "\n", "Create a scatter plot showing the relationship between lot area and sale price.\n", "\n", "**Hints:**\n", "- Use `plt.scatter()` or `sns.scatterplot()`\n", "- The data has outliers — experiment with axis limits or log scale to make the relationship clearer\n", "- Use `FuncFormatter` to scale both axes to thousands" ] }, { "cell_type": "code", "execution_count": null, "id": "7b6aca64", "metadata": {}, "outputs": [], "source": [ "# Create a scatter plot of LotArea vs SalePrice\n", "# Your code here\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "5ea498ab", "metadata": {}, "source": [ "## 3. House count by neighborhood\n", "\n", "Create a bar chart showing how many houses are in each neighborhood.\n", "\n", "**Hints:**\n", "- Use `value_counts()` to count houses per neighborhood\n", "- A horizontal bar chart (`plt.barh()` or `sns.barplot()`) works well here since neighborhood names are long\n", "- Sort the bars by count to make the chart easier to read" ] }, { "cell_type": "code", "execution_count": null, "id": "9c9311cf", "metadata": {}, "outputs": [], "source": [ "# Count houses by neighborhood\n", "# Your code here\n", "\n", "# Create a bar chart\n", "# Your code here\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "e93603d0", "metadata": {}, "source": [ "## 4. SalePrice distribution by neighborhood\n", "\n", "Create a box plot showing the distribution of sale prices in each neighborhood.\n", "\n", "**Hints:**\n", "- Use `sns.boxplot()` with `x='Neighborhood'` and `y='SalePrice'`\n", "- Order neighborhoods by median sale price using `groupby` + `sort_values`\n", "- Rotate x-axis labels so they don't overlap (`plt.xticks(rotation=45, ha='right')`)\n", "- Use `FuncFormatter` to scale the y-axis to thousands" ] }, { "cell_type": "code", "execution_count": null, "id": "a1c461e8", "metadata": {}, "outputs": [], "source": [ "# Get list of neighborhoods ordered by median sale price\n", "# Your code here\n", "\n", "# Create a box plot\n", "# Your code here\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "81456ac8", "metadata": {}, "source": [ "## 5. Proportion of houses by MSZoning\n", "\n", "Create a pie chart showing what proportion of houses fall into each zoning category.\n", "\n", "**Hints:**\n", "- Use `value_counts()` to count houses per zoning category\n", "- Use `plt.pie()` with a legend showing category labels and percentages\n", "- Calculate percentages as `counts / counts.sum() * 100`" ] }, { "cell_type": "code", "execution_count": null, "id": "fa0f44d5", "metadata": {}, "outputs": [], "source": [ "# Count houses by MSZoning\n", "# Your code here\n", "\n", "# Create a pie chart\n", "# Your code here\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "2d620c7b", "metadata": {}, "source": [ "## 6. 3D scatter plot: LotArea, OverallQual, and SalePrice\n", "\n", "Create a 3D scatter plot with `LotArea` on the x-axis, `OverallQual` on the y-axis, and `SalePrice` on the z-axis. Color the points by sale price.\n", "\n", "**Hints:**\n", "- For matplotlib: use `plt.axes(projection='3d')` then `ax.scatter()`\n", "- For plotly: use `px.scatter_3d()` for an interactive version\n", "- Use `FuncFormatter` to scale the lot area and sale price axes to thousands\n", "- Use `cmap='viridis'` and add a colorbar" ] }, { "cell_type": "code", "execution_count": null, "id": "b5b51b69", "metadata": {}, "outputs": [], "source": [ "# Create a 3D scatter plot\n", "# Your code here\n", "\n", "plt.show()" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }