{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating our Database\n", "\n", "This notebook walks through how I put together our tables for the MySQL portion of the course." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pymysql\n", "\n", "# Connect to the database\n", "connection = pymysql.connect(host='',\n", " cursorclass=pymysql.cursors.DictCursor)\n", "\n", "cursor = connection.cursor()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dropping Existing Tables\n", "Generally a good first step for setting up a database is to remove everything that is already there. This avoids any potential conflicts on creations/inserts.\n", "\n", "Therefore we first **DROP** all of the tables we'll be using if they exist." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"Drop tables\"\"\"\n", "tables = ['friends', 'favorites', 'retweets', 'statuses', 'users']\n", "for table in tables:\n", " print(table)\n", " drop_query = f\"\"\"DROP TABLE IF EXISTS {table};\"\"\"\n", " cursor.execute(drop_query)\n", "connection.commit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Our Tables\n", "Once we've cleared things out, we can start creating the tables we'll be using.\n", "\n", "We create a table by calling **CREATE TABLE *table_name*(*colls*)**. One thing to keep in mind is the use of **PRIMARY KEY(*coll*)**, this simply identifies the column that will be used as the primary key." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"Create users table to track twitter users\"\"\"\n", "make_user_table = \"\"\"CREATE TABLE users(\n", " created_date DATE,\n", " description VARCHAR(255),\n", " favorites_count INT,\n", " friends_count INT,\n", " user_id VARCHAR(255),\n", " listed_count INT,\n", " location VARCHAR(255),\n", " name VARCHAR(255),\n", " screen_name VARCHAR(255),\n", " statuses_count INT,\n", " url VARCHAR(255),\n", " verified BOOL,\n", " PRIMARY KEY(user_id));\"\"\"\n", "\n", "cursor.execute(make_user_table)\n", "connection.commit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Foreign Keys\n", "All subsequent table creation queries will look relatively the same, the key difference is the **FOREIGN KEY(*coll*)** statement. This statement defines a relationship to another table (which is a hard rule - no insertions that don't meet constraints).\n", "\n", "Notes:\n", " - **REFERENCES *table_name(coll in table)**: This details where the foreign relationship can be found\n", " - **ON DELETE CASCADE**: This isn't required, but states that if the foreign key is removed, also remove this record" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"Drop and create friends table to track twitter friends\"\"\"\n", "make_friends_table = \"\"\"CREATE TABLE friends(\n", " user_id VARCHAR(255),\n", " user_screen_name VARCHAR(255),\n", " friend_id VARCHAR(255),\n", " friend_screen_name VARCHAR(255),\n", " PRIMARY KEY (user_id, friend_id),\n", " FOREIGN KEY (user_id)\n", " REFERENCES users(user_id)\n", " ON DELETE CASCADE,\n", " FOREIGN KEY (friend_id)\n", " REFERENCES users(user_id)\n", " ON DELETE CASCADE);\"\"\"\n", "\n", "cursor.execute(make_friends_table)\n", "connection.commit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can simply create the remaining tables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"Drop and create friends table to track twitter friends\"\"\"\n", "make_status_table = \"\"\"CREATE TABLE statuses(\n", " created_date DATE,\n", " favorites_count INT,\n", " status_id VARCHAR(255),\n", " lang VARCHAR(255),\n", " retweet_count INT,\n", " source VARCHAR(255),\n", " text TEXT,\n", " truncated BOOL,\n", " user_id VARCHAR(255),\n", " PRIMARY KEY (status_id),\n", " FOREIGN KEY (user_id)\n", " REFERENCES users(user_id)\n", " ON DELETE CASCADE);\"\"\"\n", "\n", "cursor.execute(make_status_table)\n", "connection.commit()\n", "\n", "\n", "\"\"\"Drop and create favorites table to track favorited tweets\"\"\"\n", "make_favs_table = \"\"\"CREATE TABLE favorites(\n", " status_id VARCHAR(255),\n", " user_id VARCHAR(255),\n", " user_screen_name VARCHAR(255),\n", " PRIMARY KEY (status_id, user_id),\n", " FOREIGN KEY (user_id)\n", " REFERENCES users(user_id)\n", " ON DELETE CASCADE,\n", " FOREIGN KEY (status_id)\n", " REFERENCES statuses(status_id)\n", " ON DELETE CASCADE);\"\"\"\n", "\n", "cursor.execute(make_favs_table)\n", "connection.commit()\n", "\n", "\n", "\"\"\"Drop and create favorites table to track favorited tweets\"\"\"\n", "make_retweets_table = \"\"\"CREATE TABLE retweets(\n", " created_date DATE,\n", " status_id VARCHAR(255),\n", " retweeted_status VARCHAR(255),\n", " lang VARCHAR(255),\n", " retweet_count INT,\n", " source VARCHAR(255),\n", " text TEXT,\n", " user_id VARCHAR(255),\n", " PRIMARY KEY (status_id),\n", " FOREIGN KEY (user_id)\n", " REFERENCES users(user_id)\n", " ON DELETE CASCADE,\n", " FOREIGN KEY (retweeted_status)\n", " REFERENCES statuses(status_id)\n", " ON DELETE CASCADE);\"\"\"\n", "\n", "cursor.execute(make_retweets_table)\n", "connection.commit()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }