{
"cells": [
{
"source": [
"![NYC Skyline](nyc.jpg)\n",
"\n",
"Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.\n",
"\n",
"Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. \n",
"Three files containing data on 2019 Airbnb listings are available to you:\n",
"\n",
"**data/airbnb_price.csv**\n",
"This is a CSV file containing data on Airbnb listing prices and locations.\n",
"- **`listing_id`**: unique identifier of listing\n",
"- **`price`**: nightly listing price in USD\n",
"- **`nbhood_full`**: name of borough and neighborhood where listing is located\n",
"\n",
"**data/airbnb_room_type.xlsx**\n",
"This is an Excel file containing data on Airbnb listing descriptions and room types.\n",
"- **`listing_id`**: unique identifier of listing\n",
"- **`description`**: listing description\n",
"- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments\n",
"\n",
"**data/airbnb_last_review.tsv**\n",
"This is a TSV file containing data on Airbnb host names and review dates.\n",
"- **`listing_id`**: unique identifier of listing\n",
"- **`host_name`**: name of listing host\n",
"- **`last_review`**: date when the listing was last reviewed"
],
"metadata": {
"id": "bA5ajAmk7XH6"
},
"id": "c747d469-ee97-4b95-880f-feefd5456042",
"cell_type": "markdown"
},
{
"source": [
"# We've loaded your first package for you! You can add as many cells as you need.\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# Begin coding here ...\n",
"price = pd.read_csv(\"datasets/airbnb_price.csv\")\n",
"room = pd.read_excel(\"datasets/airbnb_room_type.xlsx\")\n",
"review = pd.read_csv(\"datasets/airbnb_last_review.tsv\", sep='\\t')"
],
"metadata": {
"executionTime": 1935,
"lastSuccessfullyExecutedCode": "# We've loaded your first package for you! You can add as many cells as you need.\nimport numpy as np\nimport pandas as pd\n\n# Begin coding here ...\nprice = pd.read_csv(\"data/airbnb_price.csv\")\nroom = pd.read_excel(\"data/airbnb_room_type.xlsx\")\nreview = pd.read_csv(\"data/airbnb_last_review.tsv\", sep='\\t')",
"executionCancelledAt": null,
"lastExecutedAt": 1705538073571,
"lastScheduledRunId": null,
"collapsed": false
},
"id": "1fdc5784-479e-46cb-b05b-68439dcc94a7",
"cell_type": "code",
"execution_count": 23,
"outputs": []
},
{
"source": [
"price.head()"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 62,
"lastExecutedAt": 1705538073634,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "price.head()",
"outputsMetadata": {
"0": {
"height": 198,
"type": "dataFrame"
}
}
},
"cell_type": "code",
"id": "d2174228-0d23-4350-8dd0-071ac512ed04",
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/com.datacamp.data-table.v2+json": {
"table": {
"schema": {
"fields": [
{
"name": "index",
"type": "integer"
},
{
"name": "listing_id",
"type": "integer"
},
{
"name": "price",
"type": "string"
},
{
"name": "nbhood_full",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "1.4.0"
},
"data": {
"index": [
0,
1,
2,
3,
4
],
"listing_id": [
2595,
3831,
5099,
5178,
5238
],
"price": [
"225 dollars",
"89 dollars",
"200 dollars",
"79 dollars",
"150 dollars"
],
"nbhood_full": [
"Manhattan, Midtown",
"Brooklyn, Clinton Hill",
"Manhattan, Murray Hill",
"Manhattan, Hell's Kitchen",
"Manhattan, Chinatown"
]
}
},
"total_rows": 5,
"truncation_type": null
},
"text/plain": " listing_id price nbhood_full\n0 2595 225 dollars Manhattan, Midtown\n1 3831 89 dollars Brooklyn, Clinton Hill\n2 5099 200 dollars Manhattan, Murray Hill\n3 5178 79 dollars Manhattan, Hell's Kitchen\n4 5238 150 dollars Manhattan, Chinatown",
"text/html": "
\n\n
\n \n \n | \n listing_id | \n price | \n nbhood_full | \n
\n \n \n \n 0 | \n 2595 | \n 225 dollars | \n Manhattan, Midtown | \n
\n \n 1 | \n 3831 | \n 89 dollars | \n Brooklyn, Clinton Hill | \n
\n \n 2 | \n 5099 | \n 200 dollars | \n Manhattan, Murray Hill | \n
\n \n 3 | \n 5178 | \n 79 dollars | \n Manhattan, Hell's Kitchen | \n
\n \n 4 | \n 5238 | \n 150 dollars | \n Manhattan, Chinatown | \n
\n \n
\n
"
},
"metadata": {},
"execution_count": 24
}
],
"execution_count": 24
},
{
"source": [
"room.head()"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 60,
"lastExecutedAt": 1705538073694,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "room.head()",
"outputsMetadata": {
"0": {
"height": 198,
"type": "dataFrame"
}
}
},
"cell_type": "code",
"id": "cb7b2cea-c995-4bcc-a61d-86266b3c4f7b",
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/com.datacamp.data-table.v2+json": {
"table": {
"schema": {
"fields": [
{
"name": "index",
"type": "integer"
},
{
"name": "listing_id",
"type": "integer"
},
{
"name": "description",
"type": "string"
},
{
"name": "room_type",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "1.4.0"
},
"data": {
"index": [
0,
1,
2,
3,
4
],
"listing_id": [
2595,
3831,
5099,
5178,
5238
],
"description": [
"Skylit Midtown Castle",
"Cozy Entire Floor of Brownstone",
"Large Cozy 1 BR Apartment In Midtown East",
"Large Furnished Room Near B'way",
"Cute & Cozy Lower East Side 1 bdrm"
],
"room_type": [
"Entire home/apt",
"Entire home/apt",
"Entire home/apt",
"private room",
"Entire home/apt"
]
}
},
"total_rows": 5,
"truncation_type": null
},
"text/plain": " listing_id description room_type\n0 2595 Skylit Midtown Castle Entire home/apt\n1 3831 Cozy Entire Floor of Brownstone Entire home/apt\n2 5099 Large Cozy 1 BR Apartment In Midtown East Entire home/apt\n3 5178 Large Furnished Room Near B'way private room\n4 5238 Cute & Cozy Lower East Side 1 bdrm Entire home/apt",
"text/html": "\n\n
\n \n \n | \n listing_id | \n description | \n room_type | \n
\n \n \n \n 0 | \n 2595 | \n Skylit Midtown Castle | \n Entire home/apt | \n
\n \n 1 | \n 3831 | \n Cozy Entire Floor of Brownstone | \n Entire home/apt | \n
\n \n 2 | \n 5099 | \n Large Cozy 1 BR Apartment In Midtown East | \n Entire home/apt | \n
\n \n 3 | \n 5178 | \n Large Furnished Room Near B'way | \n private room | \n
\n \n 4 | \n 5238 | \n Cute & Cozy Lower East Side 1 bdrm | \n Entire home/apt | \n
\n \n
\n
"
},
"metadata": {},
"execution_count": 25
}
],
"execution_count": 25
},
{
"source": [
"review.head()"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 56,
"lastExecutedAt": 1705538073750,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "review.head()",
"outputsMetadata": {
"0": {
"height": 198,
"type": "dataFrame"
}
}
},
"cell_type": "code",
"id": "062bc9b7-1e7d-4dcf-b779-dc4359a73f64",
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/com.datacamp.data-table.v2+json": {
"table": {
"schema": {
"fields": [
{
"name": "index",
"type": "integer"
},
{
"name": "listing_id",
"type": "integer"
},
{
"name": "host_name",
"type": "string"
},
{
"name": "last_review",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "1.4.0"
},
"data": {
"index": [
0,
1,
2,
3,
4
],
"listing_id": [
2595,
3831,
5099,
5178,
5238
],
"host_name": [
"Jennifer",
"LisaRoxanne",
"Chris",
"Shunichi",
"Ben"
],
"last_review": [
"May 21 2019",
"July 05 2019",
"June 22 2019",
"June 24 2019",
"June 09 2019"
]
}
},
"total_rows": 5,
"truncation_type": null
},
"text/plain": " listing_id host_name last_review\n0 2595 Jennifer May 21 2019\n1 3831 LisaRoxanne July 05 2019\n2 5099 Chris June 22 2019\n3 5178 Shunichi June 24 2019\n4 5238 Ben June 09 2019",
"text/html": "\n\n
\n \n \n | \n listing_id | \n host_name | \n last_review | \n
\n \n \n \n 0 | \n 2595 | \n Jennifer | \n May 21 2019 | \n
\n \n 1 | \n 3831 | \n LisaRoxanne | \n July 05 2019 | \n
\n \n 2 | \n 5099 | \n Chris | \n June 22 2019 | \n
\n \n 3 | \n 5178 | \n Shunichi | \n June 24 2019 | \n
\n \n 4 | \n 5238 | \n Ben | \n June 09 2019 | \n
\n \n
\n
"
},
"metadata": {},
"execution_count": 26
}
],
"execution_count": 26
},
{
"source": [
"1. What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names."
],
"metadata": {},
"cell_type": "markdown",
"id": "436ba6fb-a61c-42ad-a3fb-8006f82a9dd7"
},
{
"source": [
"review[\"last_review\"] = pd.to_datetime(review[\"last_review\"], infer_datetime_format=True, errors='coerce')\n",
"review[\"last_review\"].head()"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 52,
"lastExecutedAt": 1705538073802,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "review[\"last_review\"] = pd.to_datetime(review[\"last_review\"], infer_datetime_format=True, errors='coerce')\nreview[\"last_review\"].head()"
},
"cell_type": "code",
"id": "956ee5da-b06a-4a0b-8628-5edf88ceafad",
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "0 2019-05-21\n1 2019-07-05\n2 2019-06-22\n3 2019-06-24\n4 2019-06-09\nName: last_review, dtype: datetime64[ns]"
},
"metadata": {},
"execution_count": 27
}
],
"execution_count": 27
},
{
"source": [
"first_review_date = review[\"last_review\"].min()\n",
"last_review_date = review[\"last_review\"].max()\n",
"display(first_review_date, last_review_date)"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 52,
"lastExecutedAt": 1705538073854,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "first_review_date = review[\"last_review\"].min()\nlast_review_date = review[\"last_review\"].max()\ndisplay(first_review_date, last_review_date)",
"outputsMetadata": {
"0": {
"height": 257,
"type": "stream"
}
}
},
"cell_type": "code",
"id": "625df053-4311-4504-b1b0-03a3b7573ba9",
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "Timestamp('2019-01-01 00:00:00')"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": "Timestamp('2019-07-09 00:00:00')"
},
"metadata": {}
}
],
"execution_count": 28
},
{
"source": [
"2. How many of the listings are private rooms? Save this into any variable."
],
"metadata": {},
"cell_type": "markdown",
"id": "3903e04d-a808-40ad-b5b2-abae0fee3be0"
},
{
"source": [
"room[\"room_type\"] = room[\"room_type\"].str.lower()\n",
"room[\"room_type\"].value_counts()"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 64,
"lastExecutedAt": 1705538073918,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "room[\"room_type\"] = room[\"room_type\"].str.lower()\nroom[\"room_type\"].value_counts()"
},
"cell_type": "code",
"id": "8c025a04-57e6-489d-8b95-d8c663adfc17",
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "entire home/apt 13266\nprivate room 11356\nshared room 587\nName: room_type, dtype: int64"
},
"metadata": {},
"execution_count": 29
}
],
"execution_count": 29
},
{
"source": [
"private_rooms = room[\"room_type\"].value_counts()[\"private room\"]\n",
"display(private_rooms)"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 51,
"lastExecutedAt": 1705538073969,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "private_rooms = room[\"room_type\"].value_counts()[\"private room\"]\ndisplay(private_rooms)"
},
"cell_type": "code",
"id": "e116a011-4ee2-4bee-b7f8-fb6fe6e5c1c7",
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "11356"
},
"metadata": {}
}
],
"execution_count": 30
},
{
"source": [
"3. What is the average listing price? Round to the nearest penny and save into a variable."
],
"metadata": {},
"cell_type": "markdown",
"id": "ee3b9f65-2fac-435d-8cbc-04018fc41938"
},
{
"source": [
"price[\"price\"].head()"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 53,
"lastExecutedAt": 1705538074022,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "price[\"price\"].head()"
},
"cell_type": "code",
"id": "68af15ee-6cba-49d0-b2d3-c311b55d00ae",
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "0 225 dollars\n1 89 dollars\n2 200 dollars\n3 79 dollars\n4 150 dollars\nName: price, dtype: object"
},
"metadata": {},
"execution_count": 31
}
],
"execution_count": 31
},
{
"source": [
"price[\"price\"] = price[\"price\"].str.replace(\" dollars\", \"\")\n",
"price[\"price\"].head()"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 60,
"lastExecutedAt": 1705538074082,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "price[\"price\"] = price[\"price\"].str.replace(\" dollars\", \"\")\nprice[\"price\"].head()"
},
"cell_type": "code",
"id": "64f4f301-8bca-47e6-bc07-8f6eb7b17dc9",
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "0 225\n1 89\n2 200\n3 79\n4 150\nName: price, dtype: object"
},
"metadata": {},
"execution_count": 32
}
],
"execution_count": 32
},
{
"source": [
"price[\"price\"] = price[\"price\"].astype(\"float64\")\n",
"average_price = round(price[\"price\"].mean(), 2)\n",
"display(average_price)"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 52,
"lastExecutedAt": 1705538074134,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "price[\"price\"] = price[\"price\"].astype(\"float64\")\naverage_price = round(price[\"price\"].mean(), 2)\ndisplay(average_price)",
"outputsMetadata": {
"0": {
"height": 177,
"type": "stream"
}
}
},
"cell_type": "code",
"id": "85383eb9-134b-4b23-ba88-31c45393d1f3",
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "141.78"
},
"metadata": {}
}
],
"execution_count": 33
},
{
"source": [
"4. Combine the new variables into one DataFrame called review_dates with four columns in the following order: first_reviewed, last_reviewed, nb_private_rooms, and avg_price. The DataFrame should only contain one row of values."
],
"metadata": {},
"cell_type": "markdown",
"id": "a3a191be-fa5a-43c7-88b8-d01389d22882"
},
{
"source": [
"results = {\n",
" \"first_reviewed\": [first_review_date],\n",
" \"last_reviewed\": [last_review_date],\n",
" \"nb_private_rooms\": [private_rooms],\n",
" \"avg_price\": [average_price]\n",
"}\n",
"review_dates = pd.DataFrame(results)\n",
"display(review_dates)"
],
"metadata": {
"executionCancelledAt": null,
"executionTime": 56,
"lastExecutedAt": 1705538074190,
"lastScheduledRunId": null,
"lastSuccessfullyExecutedCode": "results = {\n \"first_reviewed\": [first_review_date],\n \"last_reviewed\": [last_review_date],\n \"nb_private_rooms\": [private_rooms],\n \"avg_price\": [average_price]\n}\nreview_dates = pd.DataFrame(results)\ndisplay(review_dates)",
"outputsMetadata": {
"0": {
"height": 98,
"type": "dataFrame"
}
}
},
"cell_type": "code",
"id": "c016b4c3-4591-4132-9135-c49e659f7de2",
"outputs": [
{
"output_type": "display_data",
"data": {
"application/com.datacamp.data-table.v2+json": {
"table": {
"schema": {
"fields": [
{
"name": "index",
"type": "integer"
},
{
"name": "first_reviewed",
"type": "datetime"
},
{
"name": "last_reviewed",
"type": "datetime"
},
{
"name": "nb_private_rooms",
"type": "integer"
},
{
"name": "avg_price",
"type": "number"
}
],
"primaryKey": [
"index"
],
"pandas_version": "1.4.0"
},
"data": {
"index": [
0
],
"first_reviewed": [
"2019-01-01T00:00:00.000"
],
"last_reviewed": [
"2019-07-09T00:00:00.000"
],
"nb_private_rooms": [
11356
],
"avg_price": [
141.78
]
}
},
"total_rows": 1,
"truncation_type": null
},
"text/plain": " first_reviewed last_reviewed nb_private_rooms avg_price\n0 2019-01-01 2019-07-09 11356 141.78",
"text/html": "\n\n
\n \n \n | \n first_reviewed | \n last_reviewed | \n nb_private_rooms | \n avg_price | \n
\n \n \n \n 0 | \n 2019-01-01 | \n 2019-07-09 | \n 11356 | \n 141.78 | \n
\n \n
\n
"
},
"metadata": {}
}
],
"execution_count": 34
}
],
"metadata": {
"colab": {
"name": "Welcome to DataCamp Workspaces.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}