{ "cells": [ { "source": [ "![NYC Skyline](nyc.jpg)\n", "\n", "Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.\n", "\n", "Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. \n", "Three files containing data on 2019 Airbnb listings are available to you:\n", "\n", "**data/airbnb_price.csv**\n", "This is a CSV file containing data on Airbnb listing prices and locations.\n", "- **`listing_id`**: unique identifier of listing\n", "- **`price`**: nightly listing price in USD\n", "- **`nbhood_full`**: name of borough and neighborhood where listing is located\n", "\n", "**data/airbnb_room_type.xlsx**\n", "This is an Excel file containing data on Airbnb listing descriptions and room types.\n", "- **`listing_id`**: unique identifier of listing\n", "- **`description`**: listing description\n", "- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments\n", "\n", "**data/airbnb_last_review.tsv**\n", "This is a TSV file containing data on Airbnb host names and review dates.\n", "- **`listing_id`**: unique identifier of listing\n", "- **`host_name`**: name of listing host\n", "- **`last_review`**: date when the listing was last reviewed" ], "metadata": { "id": "bA5ajAmk7XH6" }, "id": "c747d469-ee97-4b95-880f-feefd5456042", "cell_type": "markdown" }, { "source": [ "# We've loaded your first package for you! You can add as many cells as you need.\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# Begin coding here ...\n", "price = pd.read_csv(\"datasets/airbnb_price.csv\")\n", "room = pd.read_excel(\"datasets/airbnb_room_type.xlsx\")\n", "review = pd.read_csv(\"datasets/airbnb_last_review.tsv\", sep='\\t')" ], "metadata": { "executionTime": 1935, "lastSuccessfullyExecutedCode": "# We've loaded your first package for you! You can add as many cells as you need.\nimport numpy as np\nimport pandas as pd\n\n# Begin coding here ...\nprice = pd.read_csv(\"data/airbnb_price.csv\")\nroom = pd.read_excel(\"data/airbnb_room_type.xlsx\")\nreview = pd.read_csv(\"data/airbnb_last_review.tsv\", sep='\\t')", "executionCancelledAt": null, "lastExecutedAt": 1705538073571, "lastScheduledRunId": null, "collapsed": false }, "id": "1fdc5784-479e-46cb-b05b-68439dcc94a7", "cell_type": "code", "execution_count": 23, "outputs": [] }, { "source": [ "price.head()" ], "metadata": { "executionCancelledAt": null, "executionTime": 62, "lastExecutedAt": 1705538073634, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "price.head()", "outputsMetadata": { "0": { "height": 198, "type": "dataFrame" } } }, "cell_type": "code", "id": "d2174228-0d23-4350-8dd0-071ac512ed04", "outputs": [ { "output_type": "execute_result", "data": { "application/com.datacamp.data-table.v2+json": { "table": { "schema": { "fields": [ { "name": "index", "type": "integer" }, { "name": "listing_id", "type": "integer" }, { "name": "price", "type": "string" }, { "name": "nbhood_full", "type": "string" } ], "primaryKey": [ "index" ], "pandas_version": "1.4.0" }, "data": { "index": [ 0, 1, 2, 3, 4 ], "listing_id": [ 2595, 3831, 5099, 5178, 5238 ], "price": [ "225 dollars", "89 dollars", "200 dollars", "79 dollars", "150 dollars" ], "nbhood_full": [ "Manhattan, Midtown", "Brooklyn, Clinton Hill", "Manhattan, Murray Hill", "Manhattan, Hell's Kitchen", "Manhattan, Chinatown" ] } }, "total_rows": 5, "truncation_type": null }, "text/plain": " listing_id price nbhood_full\n0 2595 225 dollars Manhattan, Midtown\n1 3831 89 dollars Brooklyn, Clinton Hill\n2 5099 200 dollars Manhattan, Murray Hill\n3 5178 79 dollars Manhattan, Hell's Kitchen\n4 5238 150 dollars Manhattan, Chinatown", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
listing_idpricenbhood_full
02595225 dollarsManhattan, Midtown
1383189 dollarsBrooklyn, Clinton Hill
25099200 dollarsManhattan, Murray Hill
3517879 dollarsManhattan, Hell's Kitchen
45238150 dollarsManhattan, Chinatown
\n
" }, "metadata": {}, "execution_count": 24 } ], "execution_count": 24 }, { "source": [ "room.head()" ], "metadata": { "executionCancelledAt": null, "executionTime": 60, "lastExecutedAt": 1705538073694, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "room.head()", "outputsMetadata": { "0": { "height": 198, "type": "dataFrame" } } }, "cell_type": "code", "id": "cb7b2cea-c995-4bcc-a61d-86266b3c4f7b", "outputs": [ { "output_type": "execute_result", "data": { "application/com.datacamp.data-table.v2+json": { "table": { "schema": { "fields": [ { "name": "index", "type": "integer" }, { "name": "listing_id", "type": "integer" }, { "name": "description", "type": "string" }, { "name": "room_type", "type": "string" } ], "primaryKey": [ "index" ], "pandas_version": "1.4.0" }, "data": { "index": [ 0, 1, 2, 3, 4 ], "listing_id": [ 2595, 3831, 5099, 5178, 5238 ], "description": [ "Skylit Midtown Castle", "Cozy Entire Floor of Brownstone", "Large Cozy 1 BR Apartment In Midtown East", "Large Furnished Room Near B'way", "Cute & Cozy Lower East Side 1 bdrm" ], "room_type": [ "Entire home/apt", "Entire home/apt", "Entire home/apt", "private room", "Entire home/apt" ] } }, "total_rows": 5, "truncation_type": null }, "text/plain": " listing_id description room_type\n0 2595 Skylit Midtown Castle Entire home/apt\n1 3831 Cozy Entire Floor of Brownstone Entire home/apt\n2 5099 Large Cozy 1 BR Apartment In Midtown East Entire home/apt\n3 5178 Large Furnished Room Near B'way private room\n4 5238 Cute & Cozy Lower East Side 1 bdrm Entire home/apt", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
listing_iddescriptionroom_type
02595Skylit Midtown CastleEntire home/apt
13831Cozy Entire Floor of BrownstoneEntire home/apt
25099Large Cozy 1 BR Apartment In Midtown EastEntire home/apt
35178Large Furnished Room Near B'wayprivate room
45238Cute & Cozy Lower East Side 1 bdrmEntire home/apt
\n
" }, "metadata": {}, "execution_count": 25 } ], "execution_count": 25 }, { "source": [ "review.head()" ], "metadata": { "executionCancelledAt": null, "executionTime": 56, "lastExecutedAt": 1705538073750, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "review.head()", "outputsMetadata": { "0": { "height": 198, "type": "dataFrame" } } }, "cell_type": "code", "id": "062bc9b7-1e7d-4dcf-b779-dc4359a73f64", "outputs": [ { "output_type": "execute_result", "data": { "application/com.datacamp.data-table.v2+json": { "table": { "schema": { "fields": [ { "name": "index", "type": "integer" }, { "name": "listing_id", "type": "integer" }, { "name": "host_name", "type": "string" }, { "name": "last_review", "type": "string" } ], "primaryKey": [ "index" ], "pandas_version": "1.4.0" }, "data": { "index": [ 0, 1, 2, 3, 4 ], "listing_id": [ 2595, 3831, 5099, 5178, 5238 ], "host_name": [ "Jennifer", "LisaRoxanne", "Chris", "Shunichi", "Ben" ], "last_review": [ "May 21 2019", "July 05 2019", "June 22 2019", "June 24 2019", "June 09 2019" ] } }, "total_rows": 5, "truncation_type": null }, "text/plain": " listing_id host_name last_review\n0 2595 Jennifer May 21 2019\n1 3831 LisaRoxanne July 05 2019\n2 5099 Chris June 22 2019\n3 5178 Shunichi June 24 2019\n4 5238 Ben June 09 2019", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
listing_idhost_namelast_review
02595JenniferMay 21 2019
13831LisaRoxanneJuly 05 2019
25099ChrisJune 22 2019
35178ShunichiJune 24 2019
45238BenJune 09 2019
\n
" }, "metadata": {}, "execution_count": 26 } ], "execution_count": 26 }, { "source": [ "1. What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names." ], "metadata": {}, "cell_type": "markdown", "id": "436ba6fb-a61c-42ad-a3fb-8006f82a9dd7" }, { "source": [ "review[\"last_review\"] = pd.to_datetime(review[\"last_review\"], infer_datetime_format=True, errors='coerce')\n", "review[\"last_review\"].head()" ], "metadata": { "executionCancelledAt": null, "executionTime": 52, "lastExecutedAt": 1705538073802, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "review[\"last_review\"] = pd.to_datetime(review[\"last_review\"], infer_datetime_format=True, errors='coerce')\nreview[\"last_review\"].head()" }, "cell_type": "code", "id": "956ee5da-b06a-4a0b-8628-5edf88ceafad", "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "0 2019-05-21\n1 2019-07-05\n2 2019-06-22\n3 2019-06-24\n4 2019-06-09\nName: last_review, dtype: datetime64[ns]" }, "metadata": {}, "execution_count": 27 } ], "execution_count": 27 }, { "source": [ "first_review_date = review[\"last_review\"].min()\n", "last_review_date = review[\"last_review\"].max()\n", "display(first_review_date, last_review_date)" ], "metadata": { "executionCancelledAt": null, "executionTime": 52, "lastExecutedAt": 1705538073854, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "first_review_date = review[\"last_review\"].min()\nlast_review_date = review[\"last_review\"].max()\ndisplay(first_review_date, last_review_date)", "outputsMetadata": { "0": { "height": 257, "type": "stream" } } }, "cell_type": "code", "id": "625df053-4311-4504-b1b0-03a3b7573ba9", "outputs": [ { "output_type": "display_data", "data": { "text/plain": "Timestamp('2019-01-01 00:00:00')" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "Timestamp('2019-07-09 00:00:00')" }, "metadata": {} } ], "execution_count": 28 }, { "source": [ "2. How many of the listings are private rooms? Save this into any variable." ], "metadata": {}, "cell_type": "markdown", "id": "3903e04d-a808-40ad-b5b2-abae0fee3be0" }, { "source": [ "room[\"room_type\"] = room[\"room_type\"].str.lower()\n", "room[\"room_type\"].value_counts()" ], "metadata": { "executionCancelledAt": null, "executionTime": 64, "lastExecutedAt": 1705538073918, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "room[\"room_type\"] = room[\"room_type\"].str.lower()\nroom[\"room_type\"].value_counts()" }, "cell_type": "code", "id": "8c025a04-57e6-489d-8b95-d8c663adfc17", "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "entire home/apt 13266\nprivate room 11356\nshared room 587\nName: room_type, dtype: int64" }, "metadata": {}, "execution_count": 29 } ], "execution_count": 29 }, { "source": [ "private_rooms = room[\"room_type\"].value_counts()[\"private room\"]\n", "display(private_rooms)" ], "metadata": { "executionCancelledAt": null, "executionTime": 51, "lastExecutedAt": 1705538073969, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "private_rooms = room[\"room_type\"].value_counts()[\"private room\"]\ndisplay(private_rooms)" }, "cell_type": "code", "id": "e116a011-4ee2-4bee-b7f8-fb6fe6e5c1c7", "outputs": [ { "output_type": "display_data", "data": { "text/plain": "11356" }, "metadata": {} } ], "execution_count": 30 }, { "source": [ "3. What is the average listing price? Round to the nearest penny and save into a variable." ], "metadata": {}, "cell_type": "markdown", "id": "ee3b9f65-2fac-435d-8cbc-04018fc41938" }, { "source": [ "price[\"price\"].head()" ], "metadata": { "executionCancelledAt": null, "executionTime": 53, "lastExecutedAt": 1705538074022, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "price[\"price\"].head()" }, "cell_type": "code", "id": "68af15ee-6cba-49d0-b2d3-c311b55d00ae", "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "0 225 dollars\n1 89 dollars\n2 200 dollars\n3 79 dollars\n4 150 dollars\nName: price, dtype: object" }, "metadata": {}, "execution_count": 31 } ], "execution_count": 31 }, { "source": [ "price[\"price\"] = price[\"price\"].str.replace(\" dollars\", \"\")\n", "price[\"price\"].head()" ], "metadata": { "executionCancelledAt": null, "executionTime": 60, "lastExecutedAt": 1705538074082, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "price[\"price\"] = price[\"price\"].str.replace(\" dollars\", \"\")\nprice[\"price\"].head()" }, "cell_type": "code", "id": "64f4f301-8bca-47e6-bc07-8f6eb7b17dc9", "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "0 225\n1 89\n2 200\n3 79\n4 150\nName: price, dtype: object" }, "metadata": {}, "execution_count": 32 } ], "execution_count": 32 }, { "source": [ "price[\"price\"] = price[\"price\"].astype(\"float64\")\n", "average_price = round(price[\"price\"].mean(), 2)\n", "display(average_price)" ], "metadata": { "executionCancelledAt": null, "executionTime": 52, "lastExecutedAt": 1705538074134, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "price[\"price\"] = price[\"price\"].astype(\"float64\")\naverage_price = round(price[\"price\"].mean(), 2)\ndisplay(average_price)", "outputsMetadata": { "0": { "height": 177, "type": "stream" } } }, "cell_type": "code", "id": "85383eb9-134b-4b23-ba88-31c45393d1f3", "outputs": [ { "output_type": "display_data", "data": { "text/plain": "141.78" }, "metadata": {} } ], "execution_count": 33 }, { "source": [ "4. Combine the new variables into one DataFrame called review_dates with four columns in the following order: first_reviewed, last_reviewed, nb_private_rooms, and avg_price. The DataFrame should only contain one row of values." ], "metadata": {}, "cell_type": "markdown", "id": "a3a191be-fa5a-43c7-88b8-d01389d22882" }, { "source": [ "results = {\n", " \"first_reviewed\": [first_review_date],\n", " \"last_reviewed\": [last_review_date],\n", " \"nb_private_rooms\": [private_rooms],\n", " \"avg_price\": [average_price]\n", "}\n", "review_dates = pd.DataFrame(results)\n", "display(review_dates)" ], "metadata": { "executionCancelledAt": null, "executionTime": 56, "lastExecutedAt": 1705538074190, "lastScheduledRunId": null, "lastSuccessfullyExecutedCode": "results = {\n \"first_reviewed\": [first_review_date],\n \"last_reviewed\": [last_review_date],\n \"nb_private_rooms\": [private_rooms],\n \"avg_price\": [average_price]\n}\nreview_dates = pd.DataFrame(results)\ndisplay(review_dates)", "outputsMetadata": { "0": { "height": 98, "type": "dataFrame" } } }, "cell_type": "code", "id": "c016b4c3-4591-4132-9135-c49e659f7de2", "outputs": [ { "output_type": "display_data", "data": { "application/com.datacamp.data-table.v2+json": { "table": { "schema": { "fields": [ { "name": "index", "type": "integer" }, { "name": "first_reviewed", "type": "datetime" }, { "name": "last_reviewed", "type": "datetime" }, { "name": "nb_private_rooms", "type": "integer" }, { "name": "avg_price", "type": "number" } ], "primaryKey": [ "index" ], "pandas_version": "1.4.0" }, "data": { "index": [ 0 ], "first_reviewed": [ "2019-01-01T00:00:00.000" ], "last_reviewed": [ "2019-07-09T00:00:00.000" ], "nb_private_rooms": [ 11356 ], "avg_price": [ 141.78 ] } }, "total_rows": 1, "truncation_type": null }, "text/plain": " first_reviewed last_reviewed nb_private_rooms avg_price\n0 2019-01-01 2019-07-09 11356 141.78", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
first_reviewedlast_reviewednb_private_roomsavg_price
02019-01-012019-07-0911356141.78
\n
" }, "metadata": {} } ], "execution_count": 34 } ], "metadata": { "colab": { "name": "Welcome to DataCamp Workspaces.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }