{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab -- Publicly Accessible Datasets | Submission Template" ] }, { "cell_type": "markdown", "metadata": { "id": "k245HSnZv0nX" }, "source": [ "Template for deliverable for https://classes.daveeargle.com/security-analytics-assignments/labs/lab-publicly-accessible-datasets.html\n", "\n", "Deliverable:\n", "1. Make a copy of this file, and fill it in.\n", "4. Submit to canvas *either* of the following:\n", " * A link to a publicly accessible copy of your completed .ipynb notebook\n", " * A file upload of your completed .ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset\n", "\n", "Use the dataset from [this OpenML phish_url dataset](https://www.openml.org/d/4534) for all tasks in this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hv0QKSZOwyb3" }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## OpenML" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 456 }, "id": "arAai3wUvrKi", "outputId": "70ed6a90-79c2-4923-9ddf-4f121d49c0b8" }, "outputs": [], "source": [ "def do_openml():\n", " # Use url hacking to get a direct download url from the OpenML page for the dataset.\n", " \n", " # For this lab, do _not_ use sklearn's fetch_openml(). \n", " # The generalization described in the lab doc does _not work_ consistently.\n", " # But I'm leaving it as-is because the purpose of this is to teach you url hacking.\n", " df = pd.read_csv()\n", " return df\n", "do_openml()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Github" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 456 }, "id": "3N8Lqhrgv9UO", "outputId": "2fd013ce-3b1d-4cc4-cb33-e1f38bb805f7" }, "outputs": [], "source": [ "def do_github():\n", " # Upload the dataset to a github repository, and use a direct download link for it below\n", " \n", " df = pd.read_csv()\n", " return df\n", "do_github()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cloud Storage (Google Drive or Dropbox)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 456 }, "id": "B-wR4F9Mv_Km", "outputId": "ab7b3ca8-e948-47d0-a990-f108167a4e1b" }, "outputs": [], "source": [ "def do_cloud_storage():\n", " # Upload the phish_url dataset to one of the following cloud storage providers:\n", " # * Google Drive\n", " # * Dropbox\n", " #\n", " # Use url hacking to get a direct download link for it, and use it below.\n", " df = pd.read_csv()\n", " return df\n", "do_cloud_storage()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## GCP Cloud Storage" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 456 }, "id": "JxOnFT0rwFfn", "outputId": "f700e39b-5347-4197-ec69-98651cb0b48e" }, "outputs": [], "source": [ "def do_gcp():\n", " # Create a public GCP bucket, and upload the dataset to it. \n", " # Use a direct download link to load the file below.\n", " df = pd.read_csv()\n", " return df\n", "do_gcp()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AWS S3" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jWqlxOCR4wEy" }, "outputs": [], "source": [ "def do_aws():\n", " # Create a public aws s3 bucket, and upload the dataset to it. \n", " # Use a direct download link to load the file below.\n", " df = pd.read_csv()\n", " return df\n", "do_aws()" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "Lab -- Publicly Accessible Datasets | submission template", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }