{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create Junction Tables for Assessments\n",
    "\n",
    "The purpose of this notebook is to create junction tables between \"checks\" and the various pieces of information the checks are performed on for the purposes of transit data assessments.\n",
    "\n",
    "Assessment checks are performed at the following levels:\n",
    "\n",
    "- Transit Provider (i.e. does the provider list GTFS on its website)\n",
    "- GTFS Dataset (i.e. does it list appropriate contact info in `feed_info.txt`)\n",
    "- Service representation in a GTFS Dataset (i.e. are the San Francisco cable cars appropriately represented in the SFMTA GTFS Dataset?)\n",
    "\n",
    "### Output\n",
    "In order to make sure that each relevant record is assessed with each relevant \"check\", a cartesian join is made between the records and the checks to form a \"junction table\", which allows us to add attributes about the junction of the record and the check (i.e. a \"grade\" or score).\n",
    "\n",
    "This notebook will create three delimited files (sep=\"|\") corresponding to each of these levels, which can be appended to the records in the following Airtable tables:\n",
    "\n",
    " - `data_cross.csv` --> GTFS Dataset Check Data  \n",
    " - `services_cross.csv` --> gtfs-dataset Check Data  \n",
    " - `provider_cross.csv` --> Provider Check Data\n",
    "\n",
    "### Requirements \n",
    "\n",
    "This notebook makes use of the `pyairtable` library to access the California Transit Airtable Database.  You will need read access to this database with an API key stored in a .env file in the base directory of this notebook with the following format:\n",
    "\n",
    "```sh\n",
    "AIRTABLE_API_KEY='your_api_key_info`\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1 - Read data from AirTable API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import dotenv_values\n",
    "from pyairtable import Table\n",
    "\n",
    "API_KEY = dotenv_values(\".env\")['AIRTABLE_API_KEY']\n",
    "BASE_ID = 'appjPsudTDcbLUWM5'\n",
    "\n",
    "GTFS_SERVICES_TABLE = 'Assessed GTFS-Services'\n",
    "ASSESSED_SERVICES_TABLE = 'Assessed Services'\n",
    "ASSESSED_PROVIDERS_TABLE = 'Assessed Transit Providers'\n",
    "CHECKS_TABLE = 'gtfs checks'\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {},
   "outputs": [],
   "source": [
    "def airtable_to_df(table_id:str,base_id:str=BASE_ID,api_key:str=API_KEY)->pd.DataFrame:\n",
    "    records = Table(api_key, base_id, table_id)\n",
    "    airtable_rows = [] \n",
    "    airtable_index = []\n",
    "    for record in records.all():\n",
    "        airtable_rows.append(record[\"fields\"])\n",
    "        airtable_index.append(record[\"id\"])\n",
    "    return pd.DataFrame(airtable_rows, index=airtable_index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 236,
   "metadata": {},
   "outputs": [],
   "source": [
    "gtfs_services_df = airtable_to_df(GTFS_SERVICES_TABLE)\n",
    "gtfs_checks_df = airtable_to_df(CHECKS_TABLE)\n",
    "services_df = airtable_to_df(ASSESSED_SERVICES_TABLE)\n",
    "gtfs_providers_df = airtable_to_df(ASSESSED_PROVIDERS_TABLE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Services</th>\n",
       "      <th>GTFS Dataset</th>\n",
       "      <th>Dataset Type</th>\n",
       "      <th>Category</th>\n",
       "      <th>Service Type (from Services)</th>\n",
       "      <th>Provider</th>\n",
       "      <th>Operator</th>\n",
       "      <th>Dataset Producers (from GTFS Dataset)</th>\n",
       "      <th>Dataset Publisher (from GTFS Dataset)</th>\n",
       "      <th>...</th>\n",
       "      <th>Flex Status</th>\n",
       "      <th>agency_id</th>\n",
       "      <th>route_id</th>\n",
       "      <th>Fares v2 Status</th>\n",
       "      <th>ITP Activities (from GTFS Dataset)</th>\n",
       "      <th>ITP Schedule TODO (from GTFS Dataset)</th>\n",
       "      <th>network_id</th>\n",
       "      <th>Fares Notes (from GTFS Dataset)</th>\n",
       "      <th>Schedule Comments (from GTFS Dataset)</th>\n",
       "      <th>gtfs check data</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>rec05rHUgCVFqsXgN</th>\n",
       "      <td>Tulare Intermodal Express – Tulare Alerts</td>\n",
       "      <td>Tulare Intermodal Express</td>\n",
       "      <td>Tulare Alerts</td>\n",
       "      <td>[GTFS Alerts]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>Tulare County Regional Transit Agency</td>\n",
       "      <td>Tulare County Regional Transit Agency</td>\n",
       "      <td>GMV Syncromatics Inc</td>\n",
       "      <td>GMV Syncromatics Inc</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec09fVK7ls8fl5sp</th>\n",
       "      <td>Monterey-Salinas Transit – Monterey Salinas Tr...</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "      <td>Monterey Salinas TripUpdates</td>\n",
       "      <td>[GTFS TripUpdates]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[ADA paratransit]</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>[Needed - Existing GTFS]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0WcjVurdKkHx7S</th>\n",
       "      <td>Healdsburg Shuttle – Bay Area 511 TripUpdates</td>\n",
       "      <td>Healdsburg Shuttle</td>\n",
       "      <td>Bay Area 511 TripUpdates</td>\n",
       "      <td>[GTFS TripUpdates]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>Sonoma County</td>\n",
       "      <td>Sonoma County</td>\n",
       "      <td>Metropolitan Transportation Commission</td>\n",
       "      <td>Metropolitan Transportation Commission</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>SO</td>\n",
       "      <td>SO:67</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0XvgWtnmoUl8Lz</th>\n",
       "      <td>Glenn Ride – Glenn Schedule</td>\n",
       "      <td>Glenn Ride</td>\n",
       "      <td>Glenn Schedule</td>\n",
       "      <td>[GTFS Schedule]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[deviated fixed-route, fixed-route, ADA paratr...</td>\n",
       "      <td>Glenn County</td>\n",
       "      <td>Paratransit Services Inc.</td>\n",
       "      <td>Trillium Inc.</td>\n",
       "      <td>Trillium Inc.</td>\n",
       "      <td>...</td>\n",
       "      <td>[In Progress - Trillium]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[Vendor published]</td>\n",
       "      <td>[GTFS Into Google, GTFS Schedule Creation]</td>\n",
       "      <td>[Train transit provider to own]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0YWNONFHEO0pUl</th>\n",
       "      <td>Valley Express – VCTC Alerts</td>\n",
       "      <td>Valley Express</td>\n",
       "      <td>VCTC Alerts</td>\n",
       "      <td>[GTFS Alerts]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>Ventura County Transportation Commission</td>\n",
       "      <td>Ventura County Transportation Commission</td>\n",
       "      <td>GMV Syncromatics Inc</td>\n",
       "      <td>GMV Syncromatics Inc</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>149</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reczSG6bO6il6QMLZ</th>\n",
       "      <td>Healdsburg Shuttle – Bay Area 511 Schedule</td>\n",
       "      <td>Healdsburg Shuttle</td>\n",
       "      <td>Bay Area 511 Schedule</td>\n",
       "      <td>[GTFS Schedule]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>Sonoma County</td>\n",
       "      <td>Sonoma County</td>\n",
       "      <td>Metropolitan Transportation Commission</td>\n",
       "      <td>Metropolitan Transportation Commission</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>SO</td>\n",
       "      <td>SO:67</td>\n",
       "      <td>[Vendor published]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>MTC</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reczZKKwKsVQuU0dF</th>\n",
       "      <td>Placer County Transit – Placer Schedule</td>\n",
       "      <td>Placer County Transit</td>\n",
       "      <td>Placer Schedule</td>\n",
       "      <td>[GTFS Schedule]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>Placer County</td>\n",
       "      <td>Placer County</td>\n",
       "      <td>Placer County, Trillium Inc.</td>\n",
       "      <td>Trillium Inc.</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[Vendor published]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Published by Trillium</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reczf9FxlvLopZybb</th>\n",
       "      <td>Taft Area Transit – Taft Schedule</td>\n",
       "      <td>Taft Area Transit</td>\n",
       "      <td>Taft Schedule</td>\n",
       "      <td>[GTFS Schedule]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>City of Taft</td>\n",
       "      <td>City of Taft</td>\n",
       "      <td>Trillium Inc.</td>\n",
       "      <td>Trillium Inc.</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[Needs GTFS Schedule]</td>\n",
       "      <td>[GTFS Schedule Creation]</td>\n",
       "      <td>[Get into google, Ask Kern to own]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>recziP7dvY7apKbPB</th>\n",
       "      <td>Kern Transit – Kern Schedule</td>\n",
       "      <td>Kern Transit</td>\n",
       "      <td>Kern Schedule</td>\n",
       "      <td>[GTFS Schedule]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>Kern County</td>\n",
       "      <td>Kern County</td>\n",
       "      <td>Kern County, Trillium Inc.</td>\n",
       "      <td>Trillium Inc.</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>194</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[Vendor published]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Published by Trillium</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reczmEIm3KnSr2jF3</th>\n",
       "      <td>Monterey-Salinas Transit – Monterey Salinas Ve...</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "      <td>Monterey Salinas VehiclePositions</td>\n",
       "      <td>[GTFS VehiclePositions]</td>\n",
       "      <td>primary</td>\n",
       "      <td>[ADA paratransit]</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>[Needed - Existing GTFS]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>507 rows × 28 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                Name  \\\n",
       "rec05rHUgCVFqsXgN          Tulare Intermodal Express – Tulare Alerts   \n",
       "rec09fVK7ls8fl5sp  Monterey-Salinas Transit – Monterey Salinas Tr...   \n",
       "rec0WcjVurdKkHx7S      Healdsburg Shuttle – Bay Area 511 TripUpdates   \n",
       "rec0XvgWtnmoUl8Lz                        Glenn Ride – Glenn Schedule   \n",
       "rec0YWNONFHEO0pUl                       Valley Express – VCTC Alerts   \n",
       "...                                                              ...   \n",
       "reczSG6bO6il6QMLZ         Healdsburg Shuttle – Bay Area 511 Schedule   \n",
       "reczZKKwKsVQuU0dF            Placer County Transit – Placer Schedule   \n",
       "reczf9FxlvLopZybb                  Taft Area Transit – Taft Schedule   \n",
       "recziP7dvY7apKbPB                       Kern Transit – Kern Schedule   \n",
       "reczmEIm3KnSr2jF3  Monterey-Salinas Transit – Monterey Salinas Ve...   \n",
       "\n",
       "                                    Services  \\\n",
       "rec05rHUgCVFqsXgN  Tulare Intermodal Express   \n",
       "rec09fVK7ls8fl5sp   Monterey-Salinas Transit   \n",
       "rec0WcjVurdKkHx7S         Healdsburg Shuttle   \n",
       "rec0XvgWtnmoUl8Lz                 Glenn Ride   \n",
       "rec0YWNONFHEO0pUl             Valley Express   \n",
       "...                                      ...   \n",
       "reczSG6bO6il6QMLZ         Healdsburg Shuttle   \n",
       "reczZKKwKsVQuU0dF      Placer County Transit   \n",
       "reczf9FxlvLopZybb          Taft Area Transit   \n",
       "recziP7dvY7apKbPB               Kern Transit   \n",
       "reczmEIm3KnSr2jF3   Monterey-Salinas Transit   \n",
       "\n",
       "                                        GTFS Dataset             Dataset Type  \\\n",
       "rec05rHUgCVFqsXgN                      Tulare Alerts            [GTFS Alerts]   \n",
       "rec09fVK7ls8fl5sp       Monterey Salinas TripUpdates       [GTFS TripUpdates]   \n",
       "rec0WcjVurdKkHx7S           Bay Area 511 TripUpdates       [GTFS TripUpdates]   \n",
       "rec0XvgWtnmoUl8Lz                     Glenn Schedule          [GTFS Schedule]   \n",
       "rec0YWNONFHEO0pUl                        VCTC Alerts            [GTFS Alerts]   \n",
       "...                                              ...                      ...   \n",
       "reczSG6bO6il6QMLZ              Bay Area 511 Schedule          [GTFS Schedule]   \n",
       "reczZKKwKsVQuU0dF                    Placer Schedule          [GTFS Schedule]   \n",
       "reczf9FxlvLopZybb                      Taft Schedule          [GTFS Schedule]   \n",
       "recziP7dvY7apKbPB                      Kern Schedule          [GTFS Schedule]   \n",
       "reczmEIm3KnSr2jF3  Monterey Salinas VehiclePositions  [GTFS VehiclePositions]   \n",
       "\n",
       "                  Category                       Service Type (from Services)  \\\n",
       "rec05rHUgCVFqsXgN  primary                                      [fixed-route]   \n",
       "rec09fVK7ls8fl5sp  primary                                  [ADA paratransit]   \n",
       "rec0WcjVurdKkHx7S  primary                                      [fixed-route]   \n",
       "rec0XvgWtnmoUl8Lz  primary  [deviated fixed-route, fixed-route, ADA paratr...   \n",
       "rec0YWNONFHEO0pUl  primary                                      [fixed-route]   \n",
       "...                    ...                                                ...   \n",
       "reczSG6bO6il6QMLZ  primary                                      [fixed-route]   \n",
       "reczZKKwKsVQuU0dF  primary                                      [fixed-route]   \n",
       "reczf9FxlvLopZybb  primary                                      [fixed-route]   \n",
       "recziP7dvY7apKbPB  primary                                      [fixed-route]   \n",
       "reczmEIm3KnSr2jF3  primary                                  [ADA paratransit]   \n",
       "\n",
       "                                                   Provider  \\\n",
       "rec05rHUgCVFqsXgN     Tulare County Regional Transit Agency   \n",
       "rec09fVK7ls8fl5sp                  Monterey-Salinas Transit   \n",
       "rec0WcjVurdKkHx7S                             Sonoma County   \n",
       "rec0XvgWtnmoUl8Lz                              Glenn County   \n",
       "rec0YWNONFHEO0pUl  Ventura County Transportation Commission   \n",
       "...                                                     ...   \n",
       "reczSG6bO6il6QMLZ                             Sonoma County   \n",
       "reczZKKwKsVQuU0dF                             Placer County   \n",
       "reczf9FxlvLopZybb                              City of Taft   \n",
       "recziP7dvY7apKbPB                               Kern County   \n",
       "reczmEIm3KnSr2jF3                  Monterey-Salinas Transit   \n",
       "\n",
       "                                                   Operator  \\\n",
       "rec05rHUgCVFqsXgN     Tulare County Regional Transit Agency   \n",
       "rec09fVK7ls8fl5sp                  Monterey-Salinas Transit   \n",
       "rec0WcjVurdKkHx7S                             Sonoma County   \n",
       "rec0XvgWtnmoUl8Lz                 Paratransit Services Inc.   \n",
       "rec0YWNONFHEO0pUl  Ventura County Transportation Commission   \n",
       "...                                                     ...   \n",
       "reczSG6bO6il6QMLZ                             Sonoma County   \n",
       "reczZKKwKsVQuU0dF                             Placer County   \n",
       "reczf9FxlvLopZybb                              City of Taft   \n",
       "recziP7dvY7apKbPB                               Kern County   \n",
       "reczmEIm3KnSr2jF3                  Monterey-Salinas Transit   \n",
       "\n",
       "                    Dataset Producers (from GTFS Dataset)  \\\n",
       "rec05rHUgCVFqsXgN                    GMV Syncromatics Inc   \n",
       "rec09fVK7ls8fl5sp                Monterey-Salinas Transit   \n",
       "rec0WcjVurdKkHx7S  Metropolitan Transportation Commission   \n",
       "rec0XvgWtnmoUl8Lz                           Trillium Inc.   \n",
       "rec0YWNONFHEO0pUl                    GMV Syncromatics Inc   \n",
       "...                                                   ...   \n",
       "reczSG6bO6il6QMLZ  Metropolitan Transportation Commission   \n",
       "reczZKKwKsVQuU0dF            Placer County, Trillium Inc.   \n",
       "reczf9FxlvLopZybb                           Trillium Inc.   \n",
       "recziP7dvY7apKbPB              Kern County, Trillium Inc.   \n",
       "reczmEIm3KnSr2jF3                Monterey-Salinas Transit   \n",
       "\n",
       "                    Dataset Publisher (from GTFS Dataset)  ...  \\\n",
       "rec05rHUgCVFqsXgN                    GMV Syncromatics Inc  ...   \n",
       "rec09fVK7ls8fl5sp                                     NaN  ...   \n",
       "rec0WcjVurdKkHx7S  Metropolitan Transportation Commission  ...   \n",
       "rec0XvgWtnmoUl8Lz                           Trillium Inc.  ...   \n",
       "rec0YWNONFHEO0pUl                    GMV Syncromatics Inc  ...   \n",
       "...                                                   ...  ...   \n",
       "reczSG6bO6il6QMLZ  Metropolitan Transportation Commission  ...   \n",
       "reczZKKwKsVQuU0dF                           Trillium Inc.  ...   \n",
       "reczf9FxlvLopZybb                           Trillium Inc.  ...   \n",
       "recziP7dvY7apKbPB                           Trillium Inc.  ...   \n",
       "reczmEIm3KnSr2jF3                                     NaN  ...   \n",
       "\n",
       "                                Flex Status  agency_id  route_id  \\\n",
       "rec05rHUgCVFqsXgN                       NaN        NaN       NaN   \n",
       "rec09fVK7ls8fl5sp  [Needed - Existing GTFS]        NaN       NaN   \n",
       "rec0WcjVurdKkHx7S                       NaN         SO     SO:67   \n",
       "rec0XvgWtnmoUl8Lz  [In Progress - Trillium]        NaN       NaN   \n",
       "rec0YWNONFHEO0pUl                       NaN        149       NaN   \n",
       "...                                     ...        ...       ...   \n",
       "reczSG6bO6il6QMLZ                       NaN         SO     SO:67   \n",
       "reczZKKwKsVQuU0dF                       NaN        NaN       NaN   \n",
       "reczf9FxlvLopZybb                       NaN        NaN       NaN   \n",
       "recziP7dvY7apKbPB                       NaN        194       NaN   \n",
       "reczmEIm3KnSr2jF3  [Needed - Existing GTFS]        NaN       NaN   \n",
       "\n",
       "                         Fares v2 Status  \\\n",
       "rec05rHUgCVFqsXgN                    NaN   \n",
       "rec09fVK7ls8fl5sp                    NaN   \n",
       "rec0WcjVurdKkHx7S                    NaN   \n",
       "rec0XvgWtnmoUl8Lz     [Vendor published]   \n",
       "rec0YWNONFHEO0pUl                    NaN   \n",
       "...                                  ...   \n",
       "reczSG6bO6il6QMLZ     [Vendor published]   \n",
       "reczZKKwKsVQuU0dF     [Vendor published]   \n",
       "reczf9FxlvLopZybb  [Needs GTFS Schedule]   \n",
       "recziP7dvY7apKbPB     [Vendor published]   \n",
       "reczmEIm3KnSr2jF3                    NaN   \n",
       "\n",
       "                           ITP Activities (from GTFS Dataset)  \\\n",
       "rec05rHUgCVFqsXgN                                         NaN   \n",
       "rec09fVK7ls8fl5sp                                         NaN   \n",
       "rec0WcjVurdKkHx7S                                         NaN   \n",
       "rec0XvgWtnmoUl8Lz  [GTFS Into Google, GTFS Schedule Creation]   \n",
       "rec0YWNONFHEO0pUl                                         NaN   \n",
       "...                                                       ...   \n",
       "reczSG6bO6il6QMLZ                                         NaN   \n",
       "reczZKKwKsVQuU0dF                                         NaN   \n",
       "reczf9FxlvLopZybb                    [GTFS Schedule Creation]   \n",
       "recziP7dvY7apKbPB                                         NaN   \n",
       "reczmEIm3KnSr2jF3                                         NaN   \n",
       "\n",
       "                  ITP Schedule TODO (from GTFS Dataset) network_id  \\\n",
       "rec05rHUgCVFqsXgN                                   NaN        NaN   \n",
       "rec09fVK7ls8fl5sp                                   NaN        NaN   \n",
       "rec0WcjVurdKkHx7S                                   NaN        NaN   \n",
       "rec0XvgWtnmoUl8Lz       [Train transit provider to own]        NaN   \n",
       "rec0YWNONFHEO0pUl                                   NaN        NaN   \n",
       "...                                                 ...        ...   \n",
       "reczSG6bO6il6QMLZ                                   NaN        NaN   \n",
       "reczZKKwKsVQuU0dF                                   NaN        NaN   \n",
       "reczf9FxlvLopZybb    [Get into google, Ask Kern to own]        NaN   \n",
       "recziP7dvY7apKbPB                                   NaN        NaN   \n",
       "reczmEIm3KnSr2jF3                                   NaN        NaN   \n",
       "\n",
       "                  Fares Notes (from GTFS Dataset)  \\\n",
       "rec05rHUgCVFqsXgN                             NaN   \n",
       "rec09fVK7ls8fl5sp                             NaN   \n",
       "rec0WcjVurdKkHx7S                             NaN   \n",
       "rec0XvgWtnmoUl8Lz                             NaN   \n",
       "rec0YWNONFHEO0pUl                             NaN   \n",
       "...                                           ...   \n",
       "reczSG6bO6il6QMLZ                             MTC   \n",
       "reczZKKwKsVQuU0dF           Published by Trillium   \n",
       "reczf9FxlvLopZybb                             NaN   \n",
       "recziP7dvY7apKbPB           Published by Trillium   \n",
       "reczmEIm3KnSr2jF3                             NaN   \n",
       "\n",
       "                  Schedule Comments (from GTFS Dataset) gtfs check data  \n",
       "rec05rHUgCVFqsXgN                                   NaN             NaN  \n",
       "rec09fVK7ls8fl5sp                                   NaN             NaN  \n",
       "rec0WcjVurdKkHx7S                                   NaN             NaN  \n",
       "rec0XvgWtnmoUl8Lz                                   NaN             NaN  \n",
       "rec0YWNONFHEO0pUl                                   NaN             NaN  \n",
       "...                                                 ...             ...  \n",
       "reczSG6bO6il6QMLZ                                   NaN             NaN  \n",
       "reczZKKwKsVQuU0dF                                   NaN             NaN  \n",
       "reczf9FxlvLopZybb                                   NaN             NaN  \n",
       "recziP7dvY7apKbPB                                   NaN             NaN  \n",
       "reczmEIm3KnSr2jF3                                   NaN             NaN  \n",
       "\n",
       "[507 rows x 28 columns]"
      ]
     },
     "execution_count": 237,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gtfs_services_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Check #</th>\n",
       "      <th>Description</th>\n",
       "      <th>Scope</th>\n",
       "      <th>Source</th>\n",
       "      <th>Source: medium-term</th>\n",
       "      <th>Source: goal</th>\n",
       "      <th>Max Score</th>\n",
       "      <th>Score Type</th>\n",
       "      <th>gtfs-service check data copy</th>\n",
       "      <th>Scoring Criteria</th>\n",
       "      <th>gtfs check data</th>\n",
       "      <th>gtfs-dataset check data copy</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>rec4Ah8sgs8F243XH</th>\n",
       "      <td>16</td>\n",
       "      <td>TripUpdates Published</td>\n",
       "      <td>TripUpdates Dataset</td>\n",
       "      <td>[human]</td>\n",
       "      <td>[auto]</td>\n",
       "      <td>[auto]</td>\n",
       "      <td>5.0</td>\n",
       "      <td>Nominal</td>\n",
       "      <td>[reczJXpgSxkKqzrFg, recAarP7F7LPwtlcn, reczYzu...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec5GIXSm4aAcpuZj</th>\n",
       "      <td>20</td>\n",
       "      <td>Publish to OpenMobilityData</td>\n",
       "      <td>TripUpdates Dataset</td>\n",
       "      <td>[human]</td>\n",
       "      <td>[human]</td>\n",
       "      <td>[human]</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Boolean</td>\n",
       "      <td>[recjjCy0uuLGk6RSq, recXNqjW35Dq3omAX, reclalk...</td>\n",
       "      <td>There or not.\\n\\n</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec5ahg0uEPYRoMEX</th>\n",
       "      <td>26</td>\n",
       "      <td>Publish to TransitLand</td>\n",
       "      <td>VehiclePositions Dataset</td>\n",
       "      <td>[human]</td>\n",
       "      <td>[human]</td>\n",
       "      <td>[human]</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Boolean</td>\n",
       "      <td>[recw95VXxi1w2rmOC, recWbsiIF3zlT2u5d, recuuLV...</td>\n",
       "      <td>There or not.\\n\\n</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec6i4tZoKfMLAWiK</th>\n",
       "      <td>22</td>\n",
       "      <td>Reasonable API Key Process</td>\n",
       "      <td>TripUpdates Dataset</td>\n",
       "      <td>[human]</td>\n",
       "      <td>[human]</td>\n",
       "      <td>[human]</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Nominal</td>\n",
       "      <td>[rec5b8Xd4UbUciyH7, recNp2w2UUdwfEB6j, recskZe...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>recBWkZKdzqEIDE46</th>\n",
       "      <td>13</td>\n",
       "      <td>GTFS Grading Scheme Score</td>\n",
       "      <td>Service within GTFS Schedule Dataset</td>\n",
       "      <td>[human]</td>\n",
       "      <td>[gtfs-trained human]</td>\n",
       "      <td>[human]</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Continuous</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3\\* Grading scheme score / Max grading scheme ...</td>\n",
       "      <td>[recuql0JQWFcn7KFs, recxItR7wCdB51vjA, rec4ivy...</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Check #                  Description  \\\n",
       "rec4Ah8sgs8F243XH      16        TripUpdates Published   \n",
       "rec5GIXSm4aAcpuZj      20  Publish to OpenMobilityData   \n",
       "rec5ahg0uEPYRoMEX      26       Publish to TransitLand   \n",
       "rec6i4tZoKfMLAWiK      22   Reasonable API Key Process   \n",
       "recBWkZKdzqEIDE46      13    GTFS Grading Scheme Score   \n",
       "\n",
       "                                                  Scope   Source  \\\n",
       "rec4Ah8sgs8F243XH                   TripUpdates Dataset  [human]   \n",
       "rec5GIXSm4aAcpuZj                   TripUpdates Dataset  [human]   \n",
       "rec5ahg0uEPYRoMEX              VehiclePositions Dataset  [human]   \n",
       "rec6i4tZoKfMLAWiK                   TripUpdates Dataset  [human]   \n",
       "recBWkZKdzqEIDE46  Service within GTFS Schedule Dataset  [human]   \n",
       "\n",
       "                    Source: medium-term Source: goal  Max Score  Score Type  \\\n",
       "rec4Ah8sgs8F243XH                [auto]       [auto]        5.0     Nominal   \n",
       "rec5GIXSm4aAcpuZj               [human]      [human]        1.0     Boolean   \n",
       "rec5ahg0uEPYRoMEX               [human]      [human]        1.0     Boolean   \n",
       "rec6i4tZoKfMLAWiK               [human]      [human]        2.0     Nominal   \n",
       "recBWkZKdzqEIDE46  [gtfs-trained human]      [human]        3.0  Continuous   \n",
       "\n",
       "                                        gtfs-service check data copy  \\\n",
       "rec4Ah8sgs8F243XH  [reczJXpgSxkKqzrFg, recAarP7F7LPwtlcn, reczYzu...   \n",
       "rec5GIXSm4aAcpuZj  [recjjCy0uuLGk6RSq, recXNqjW35Dq3omAX, reclalk...   \n",
       "rec5ahg0uEPYRoMEX  [recw95VXxi1w2rmOC, recWbsiIF3zlT2u5d, recuuLV...   \n",
       "rec6i4tZoKfMLAWiK  [rec5b8Xd4UbUciyH7, recNp2w2UUdwfEB6j, recskZe...   \n",
       "recBWkZKdzqEIDE46                                                NaN   \n",
       "\n",
       "                                                    Scoring Criteria  \\\n",
       "rec4Ah8sgs8F243XH                                                NaN   \n",
       "rec5GIXSm4aAcpuZj                                  There or not.\\n\\n   \n",
       "rec5ahg0uEPYRoMEX                                  There or not.\\n\\n   \n",
       "rec6i4tZoKfMLAWiK                                                NaN   \n",
       "recBWkZKdzqEIDE46  3\\* Grading scheme score / Max grading scheme ...   \n",
       "\n",
       "                                                     gtfs check data  \\\n",
       "rec4Ah8sgs8F243XH                                                NaN   \n",
       "rec5GIXSm4aAcpuZj                                                NaN   \n",
       "rec5ahg0uEPYRoMEX                                                NaN   \n",
       "rec6i4tZoKfMLAWiK                                                NaN   \n",
       "recBWkZKdzqEIDE46  [recuql0JQWFcn7KFs, recxItR7wCdB51vjA, rec4ivy...   \n",
       "\n",
       "                  gtfs-dataset check data copy  \n",
       "rec4Ah8sgs8F243XH                          NaN  \n",
       "rec5GIXSm4aAcpuZj                          NaN  \n",
       "rec5ahg0uEPYRoMEX                          NaN  \n",
       "rec6i4tZoKfMLAWiK                          NaN  \n",
       "recBWkZKdzqEIDE46                          NaN  "
      ]
     },
     "execution_count": 238,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gtfs_checks_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 239,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>service_id</th>\n",
       "      <th>Provider</th>\n",
       "      <th>Operator</th>\n",
       "      <th>Currently Operating</th>\n",
       "      <th>Mode</th>\n",
       "      <th>Service Type</th>\n",
       "      <th>Notes</th>\n",
       "      <th>Funding Sources</th>\n",
       "      <th>Funding Category</th>\n",
       "      <th>...</th>\n",
       "      <th>Service Area Type</th>\n",
       "      <th>Service Area Buffer (miles)</th>\n",
       "      <th>Service Availability Category</th>\n",
       "      <th>Paratransit For</th>\n",
       "      <th>organization stack components</th>\n",
       "      <th>ITP Schedule TODO (from GTFS Dataset) (from GTFS Services Association)</th>\n",
       "      <th>ITP Activities (from GTFS Dataset) (from GTFS Services Association)</th>\n",
       "      <th>Schedule Comments (from GTFS Dataset) (from GTFS Services Association)</th>\n",
       "      <th>Season Start</th>\n",
       "      <th>Season End</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>rec0OCKo3fwRLpPIh</th>\n",
       "      <td>Watsonville Circulator</td>\n",
       "      <td>rec6lgJrPslFjSXdk</td>\n",
       "      <td>Santa Cruz Metropolitan Transit District</td>\n",
       "      <td>Santa Cruz Metropolitan Transit District</td>\n",
       "      <td>True</td>\n",
       "      <td>[bus]</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>ETRO’s new Watsonville Circulator Route is des...</td>\n",
       "      <td>Caltrans</td>\n",
       "      <td>[public]</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0ZVZt8PTzF2zuY</th>\n",
       "      <td>Topanga Beach Bus</td>\n",
       "      <td>rec00Di9RibzcrxHS</td>\n",
       "      <td>Los Angeles County</td>\n",
       "      <td>Los Angeles County</td>\n",
       "      <td>True</td>\n",
       "      <td>[bus]</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>Connects Metro Orange Line to Topanga Beach an...</td>\n",
       "      <td>Caltrans</td>\n",
       "      <td>[public]</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0ikYugqTjhLBkg</th>\n",
       "      <td>Imperial Valley Transit</td>\n",
       "      <td>recMR1zH6QMLPSriV</td>\n",
       "      <td>Imperial County Transportation Commission</td>\n",
       "      <td>Imperial County Transportation Commission</td>\n",
       "      <td>True</td>\n",
       "      <td>[bus]</td>\n",
       "      <td>[fixed-route, deviated fixed-route]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Caltrans</td>\n",
       "      <td>[public]</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0n7D4vtGfWZVEc</th>\n",
       "      <td>Cityline</td>\n",
       "      <td>recaTB1mtROhjUHNR</td>\n",
       "      <td>City of West Hollywood</td>\n",
       "      <td>City of West Hollywood</td>\n",
       "      <td>True</td>\n",
       "      <td>[bus]</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>West Hollywood’s own free shuttle\\n\\nCityline ...</td>\n",
       "      <td>Caltrans</td>\n",
       "      <td>[public]</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec14ZBuXpyw4OLY2</th>\n",
       "      <td>Avocado Heights/Bassett/West Valinda Shuttle</td>\n",
       "      <td>recHJivXoZGV1KhZe</td>\n",
       "      <td>Los Angeles County</td>\n",
       "      <td>Los Angeles County</td>\n",
       "      <td>True</td>\n",
       "      <td>[bus]</td>\n",
       "      <td>[fixed-route]</td>\n",
       "      <td>The Avocado Heights/Bassett/West Valinda Shutt...</td>\n",
       "      <td>Caltrans</td>\n",
       "      <td>[public]</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 41 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                           Name  \\\n",
       "rec0OCKo3fwRLpPIh                        Watsonville Circulator   \n",
       "rec0ZVZt8PTzF2zuY                             Topanga Beach Bus   \n",
       "rec0ikYugqTjhLBkg                       Imperial Valley Transit   \n",
       "rec0n7D4vtGfWZVEc                                      Cityline   \n",
       "rec14ZBuXpyw4OLY2  Avocado Heights/Bassett/West Valinda Shuttle   \n",
       "\n",
       "                          service_id  \\\n",
       "rec0OCKo3fwRLpPIh  rec6lgJrPslFjSXdk   \n",
       "rec0ZVZt8PTzF2zuY  rec00Di9RibzcrxHS   \n",
       "rec0ikYugqTjhLBkg  recMR1zH6QMLPSriV   \n",
       "rec0n7D4vtGfWZVEc  recaTB1mtROhjUHNR   \n",
       "rec14ZBuXpyw4OLY2  recHJivXoZGV1KhZe   \n",
       "\n",
       "                                                    Provider  \\\n",
       "rec0OCKo3fwRLpPIh   Santa Cruz Metropolitan Transit District   \n",
       "rec0ZVZt8PTzF2zuY                         Los Angeles County   \n",
       "rec0ikYugqTjhLBkg  Imperial County Transportation Commission   \n",
       "rec0n7D4vtGfWZVEc                     City of West Hollywood   \n",
       "rec14ZBuXpyw4OLY2                         Los Angeles County   \n",
       "\n",
       "                                                    Operator  \\\n",
       "rec0OCKo3fwRLpPIh   Santa Cruz Metropolitan Transit District   \n",
       "rec0ZVZt8PTzF2zuY                         Los Angeles County   \n",
       "rec0ikYugqTjhLBkg  Imperial County Transportation Commission   \n",
       "rec0n7D4vtGfWZVEc                     City of West Hollywood   \n",
       "rec14ZBuXpyw4OLY2                         Los Angeles County   \n",
       "\n",
       "                   Currently Operating   Mode  \\\n",
       "rec0OCKo3fwRLpPIh                 True  [bus]   \n",
       "rec0ZVZt8PTzF2zuY                 True  [bus]   \n",
       "rec0ikYugqTjhLBkg                 True  [bus]   \n",
       "rec0n7D4vtGfWZVEc                 True  [bus]   \n",
       "rec14ZBuXpyw4OLY2                 True  [bus]   \n",
       "\n",
       "                                          Service Type  \\\n",
       "rec0OCKo3fwRLpPIh                        [fixed-route]   \n",
       "rec0ZVZt8PTzF2zuY                        [fixed-route]   \n",
       "rec0ikYugqTjhLBkg  [fixed-route, deviated fixed-route]   \n",
       "rec0n7D4vtGfWZVEc                        [fixed-route]   \n",
       "rec14ZBuXpyw4OLY2                        [fixed-route]   \n",
       "\n",
       "                                                               Notes  \\\n",
       "rec0OCKo3fwRLpPIh  ETRO’s new Watsonville Circulator Route is des...   \n",
       "rec0ZVZt8PTzF2zuY  Connects Metro Orange Line to Topanga Beach an...   \n",
       "rec0ikYugqTjhLBkg                                                NaN   \n",
       "rec0n7D4vtGfWZVEc  West Hollywood’s own free shuttle\\n\\nCityline ...   \n",
       "rec14ZBuXpyw4OLY2  The Avocado Heights/Bassett/West Valinda Shutt...   \n",
       "\n",
       "                  Funding Sources Funding Category  ... Service Area Type  \\\n",
       "rec0OCKo3fwRLpPIh        Caltrans         [public]  ...               NaN   \n",
       "rec0ZVZt8PTzF2zuY        Caltrans         [public]  ...               NaN   \n",
       "rec0ikYugqTjhLBkg        Caltrans         [public]  ...               NaN   \n",
       "rec0n7D4vtGfWZVEc        Caltrans         [public]  ...               NaN   \n",
       "rec14ZBuXpyw4OLY2        Caltrans         [public]  ...               NaN   \n",
       "\n",
       "                  Service Area Buffer (miles) Service Availability Category  \\\n",
       "rec0OCKo3fwRLpPIh                         NaN                           NaN   \n",
       "rec0ZVZt8PTzF2zuY                         NaN                           NaN   \n",
       "rec0ikYugqTjhLBkg                         NaN                           NaN   \n",
       "rec0n7D4vtGfWZVEc                         NaN                           NaN   \n",
       "rec14ZBuXpyw4OLY2                         NaN                           NaN   \n",
       "\n",
       "                   Paratransit For  organization stack components  \\\n",
       "rec0OCKo3fwRLpPIh              NaN                            NaN   \n",
       "rec0ZVZt8PTzF2zuY              NaN                            NaN   \n",
       "rec0ikYugqTjhLBkg              NaN                            NaN   \n",
       "rec0n7D4vtGfWZVEc              NaN                            NaN   \n",
       "rec14ZBuXpyw4OLY2              NaN                            NaN   \n",
       "\n",
       "                  ITP Schedule TODO (from GTFS Dataset) (from GTFS Services Association)  \\\n",
       "rec0OCKo3fwRLpPIh                                                NaN                       \n",
       "rec0ZVZt8PTzF2zuY                                                NaN                       \n",
       "rec0ikYugqTjhLBkg                                                NaN                       \n",
       "rec0n7D4vtGfWZVEc                                                NaN                       \n",
       "rec14ZBuXpyw4OLY2                                                NaN                       \n",
       "\n",
       "                  ITP Activities (from GTFS Dataset) (from GTFS Services Association)  \\\n",
       "rec0OCKo3fwRLpPIh                                                NaN                    \n",
       "rec0ZVZt8PTzF2zuY                                                NaN                    \n",
       "rec0ikYugqTjhLBkg                                                NaN                    \n",
       "rec0n7D4vtGfWZVEc                                                NaN                    \n",
       "rec14ZBuXpyw4OLY2                                                NaN                    \n",
       "\n",
       "                  Schedule Comments (from GTFS Dataset) (from GTFS Services Association)  \\\n",
       "rec0OCKo3fwRLpPIh                                                NaN                       \n",
       "rec0ZVZt8PTzF2zuY                                                NaN                       \n",
       "rec0ikYugqTjhLBkg                                                NaN                       \n",
       "rec0n7D4vtGfWZVEc                                                NaN                       \n",
       "rec14ZBuXpyw4OLY2                                                NaN                       \n",
       "\n",
       "                  Season Start Season End  \n",
       "rec0OCKo3fwRLpPIh          NaN        NaN  \n",
       "rec0ZVZt8PTzF2zuY          NaN        NaN  \n",
       "rec0ikYugqTjhLBkg          NaN        NaN  \n",
       "rec0n7D4vtGfWZVEc          NaN        NaN  \n",
       "rec14ZBuXpyw4OLY2          NaN        NaN  \n",
       "\n",
       "[5 rows x 41 columns]"
      ]
     },
     "execution_count": 239,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "services_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 240,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Alias'</th>\n",
       "      <th>organization_id</th>\n",
       "      <th>ITP ID</th>\n",
       "      <th>Tracking Cat - og</th>\n",
       "      <th>Reporting Cat - og</th>\n",
       "      <th>Assist Cat - og</th>\n",
       "      <th>Caltrans District</th>\n",
       "      <th>MPO/RTPA</th>\n",
       "      <th>Planning Authority</th>\n",
       "      <th>...</th>\n",
       "      <th>Count of Services with Complete Realtime Status</th>\n",
       "      <th>At least one GTFS feed for any service (1=yes)</th>\n",
       "      <th>At least on complete RT set (1=yes)</th>\n",
       "      <th>Complete static GTFS coverage (1=yes)</th>\n",
       "      <th>Complete RT coverage (1=yes)</th>\n",
       "      <th>provider check data</th>\n",
       "      <th>Provider Assessments</th>\n",
       "      <th>Details</th>\n",
       "      <th>Website</th>\n",
       "      <th>Contracts Held</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>rec0ZHctuUYh5wtLS</th>\n",
       "      <td>San Luis Obispo Regional Transit Authority</td>\n",
       "      <td>[SLORTA]</td>\n",
       "      <td>reciakGBN1DP9dK9N</td>\n",
       "      <td>289.0</td>\n",
       "      <td>Active</td>\n",
       "      <td>Core</td>\n",
       "      <td>White Glove</td>\n",
       "      <td>05 - San Luis Obispo</td>\n",
       "      <td>San Luis Obispo Council of Governments</td>\n",
       "      <td>San Luis Obispo Council of Governments</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>[recJpdz8eCeRsUQSj, recK3xMG6XsnQpYwD, recX8ke...</td>\n",
       "      <td>[recIUkhYnikNkX12h]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0dtZWydv7z5afX</th>\n",
       "      <td>City of Taft</td>\n",
       "      <td>[TAT]</td>\n",
       "      <td>recgTxL1xyvgC8e9k</td>\n",
       "      <td>330.0</td>\n",
       "      <td>Active</td>\n",
       "      <td>Core</td>\n",
       "      <td>White Glove</td>\n",
       "      <td>06 - Fresno</td>\n",
       "      <td>Kern Council of Governments</td>\n",
       "      <td>Kern Council of Governments</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>[rec9P4eMWAQ2GaF5A, recYKDCjjLJbzcJTM, recLVPL...</td>\n",
       "      <td>[recp38jhkZVctheDK]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0qKtbrBvE1AyRe</th>\n",
       "      <td>City of Cerritos</td>\n",
       "      <td>NaN</td>\n",
       "      <td>recXYM27Lts8XF98x</td>\n",
       "      <td>63.0</td>\n",
       "      <td>Active</td>\n",
       "      <td>Core</td>\n",
       "      <td>White Glove</td>\n",
       "      <td>07 - Los Angeles</td>\n",
       "      <td>Southern California Association of Governments</td>\n",
       "      <td>Southern California Association of Governments</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>[receMU82jZycJkGFe, recwhLFXaVN5DcqQ1, recXbUF...</td>\n",
       "      <td>[recJMNAE1xFtv9B60]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0yf9AiAqnwpaIT</th>\n",
       "      <td>City of Delano</td>\n",
       "      <td>NaN</td>\n",
       "      <td>recROsnN85RlZziSj</td>\n",
       "      <td>91.0</td>\n",
       "      <td>Active</td>\n",
       "      <td>Core</td>\n",
       "      <td>White Glove</td>\n",
       "      <td>06 - Fresno</td>\n",
       "      <td>Kern Council of Governments</td>\n",
       "      <td>Kern Council of Governments</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>[recxPm0wCZJj0jtR0, recyeyK9Brn5UV85M, recHjg7...</td>\n",
       "      <td>[rec90UiwBWjFXvdt9]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec1gD38VBhjtMssT</th>\n",
       "      <td>Imperial County Transportation Commission</td>\n",
       "      <td>NaN</td>\n",
       "      <td>rec38PbjPbEy2Tvdu</td>\n",
       "      <td>138.0</td>\n",
       "      <td>Active</td>\n",
       "      <td>Core</td>\n",
       "      <td>White Glove</td>\n",
       "      <td>11 - San Diego</td>\n",
       "      <td>Southern California Association of Governments</td>\n",
       "      <td>Southern California Association of Governments</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>[recT2TIZ0r3FHdCwQ, recgcfRSpfLffxTDf, recSF6B...</td>\n",
       "      <td>[reczToe4t1ynsYs5T]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 33 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                          Name    Alias'  \\\n",
       "rec0ZHctuUYh5wtLS  San Luis Obispo Regional Transit Authority   [SLORTA]   \n",
       "rec0dtZWydv7z5afX                                 City of Taft     [TAT]   \n",
       "rec0qKtbrBvE1AyRe                             City of Cerritos       NaN   \n",
       "rec0yf9AiAqnwpaIT                               City of Delano       NaN   \n",
       "rec1gD38VBhjtMssT    Imperial County Transportation Commission       NaN   \n",
       "\n",
       "                     organization_id  ITP ID Tracking Cat - og  \\\n",
       "rec0ZHctuUYh5wtLS  reciakGBN1DP9dK9N   289.0            Active   \n",
       "rec0dtZWydv7z5afX  recgTxL1xyvgC8e9k   330.0            Active   \n",
       "rec0qKtbrBvE1AyRe  recXYM27Lts8XF98x    63.0            Active   \n",
       "rec0yf9AiAqnwpaIT  recROsnN85RlZziSj    91.0            Active   \n",
       "rec1gD38VBhjtMssT  rec38PbjPbEy2Tvdu   138.0            Active   \n",
       "\n",
       "                  Reporting Cat - og Assist Cat - og     Caltrans District  \\\n",
       "rec0ZHctuUYh5wtLS               Core     White Glove  05 - San Luis Obispo   \n",
       "rec0dtZWydv7z5afX               Core     White Glove           06 - Fresno   \n",
       "rec0qKtbrBvE1AyRe               Core     White Glove      07 - Los Angeles   \n",
       "rec0yf9AiAqnwpaIT               Core     White Glove           06 - Fresno   \n",
       "rec1gD38VBhjtMssT               Core     White Glove        11 - San Diego   \n",
       "\n",
       "                                                         MPO/RTPA  \\\n",
       "rec0ZHctuUYh5wtLS          San Luis Obispo Council of Governments   \n",
       "rec0dtZWydv7z5afX                     Kern Council of Governments   \n",
       "rec0qKtbrBvE1AyRe  Southern California Association of Governments   \n",
       "rec0yf9AiAqnwpaIT                     Kern Council of Governments   \n",
       "rec1gD38VBhjtMssT  Southern California Association of Governments   \n",
       "\n",
       "                                               Planning Authority  ...  \\\n",
       "rec0ZHctuUYh5wtLS          San Luis Obispo Council of Governments  ...   \n",
       "rec0dtZWydv7z5afX                     Kern Council of Governments  ...   \n",
       "rec0qKtbrBvE1AyRe  Southern California Association of Governments  ...   \n",
       "rec0yf9AiAqnwpaIT                     Kern Council of Governments  ...   \n",
       "rec1gD38VBhjtMssT  Southern California Association of Governments  ...   \n",
       "\n",
       "                  Count of Services with Complete Realtime Status  \\\n",
       "rec0ZHctuUYh5wtLS                                               0   \n",
       "rec0dtZWydv7z5afX                                               0   \n",
       "rec0qKtbrBvE1AyRe                                               0   \n",
       "rec0yf9AiAqnwpaIT                                               0   \n",
       "rec1gD38VBhjtMssT                                               0   \n",
       "\n",
       "                  At least one GTFS feed for any service (1=yes)  \\\n",
       "rec0ZHctuUYh5wtLS                                              1   \n",
       "rec0dtZWydv7z5afX                                              1   \n",
       "rec0qKtbrBvE1AyRe                                              0   \n",
       "rec0yf9AiAqnwpaIT                                              1   \n",
       "rec1gD38VBhjtMssT                                              0   \n",
       "\n",
       "                  At least on complete RT set (1=yes)  \\\n",
       "rec0ZHctuUYh5wtLS                                   0   \n",
       "rec0dtZWydv7z5afX                                   0   \n",
       "rec0qKtbrBvE1AyRe                                   0   \n",
       "rec0yf9AiAqnwpaIT                                   0   \n",
       "rec1gD38VBhjtMssT                                   0   \n",
       "\n",
       "                  Complete static GTFS coverage (1=yes)  \\\n",
       "rec0ZHctuUYh5wtLS                                     0   \n",
       "rec0dtZWydv7z5afX                                     1   \n",
       "rec0qKtbrBvE1AyRe                                     0   \n",
       "rec0yf9AiAqnwpaIT                                     1   \n",
       "rec1gD38VBhjtMssT                                     0   \n",
       "\n",
       "                  Complete RT coverage (1=yes)  \\\n",
       "rec0ZHctuUYh5wtLS                            0   \n",
       "rec0dtZWydv7z5afX                            0   \n",
       "rec0qKtbrBvE1AyRe                            0   \n",
       "rec0yf9AiAqnwpaIT                            0   \n",
       "rec1gD38VBhjtMssT                            0   \n",
       "\n",
       "                                                 provider check data  \\\n",
       "rec0ZHctuUYh5wtLS  [recJpdz8eCeRsUQSj, recK3xMG6XsnQpYwD, recX8ke...   \n",
       "rec0dtZWydv7z5afX  [rec9P4eMWAQ2GaF5A, recYKDCjjLJbzcJTM, recLVPL...   \n",
       "rec0qKtbrBvE1AyRe  [receMU82jZycJkGFe, recwhLFXaVN5DcqQ1, recXbUF...   \n",
       "rec0yf9AiAqnwpaIT  [recxPm0wCZJj0jtR0, recyeyK9Brn5UV85M, recHjg7...   \n",
       "rec1gD38VBhjtMssT  [recT2TIZ0r3FHdCwQ, recgcfRSpfLffxTDf, recSF6B...   \n",
       "\n",
       "                  Provider Assessments Details Website Contracts Held  \n",
       "rec0ZHctuUYh5wtLS  [recIUkhYnikNkX12h]     NaN     NaN            NaN  \n",
       "rec0dtZWydv7z5afX  [recp38jhkZVctheDK]     NaN     NaN            NaN  \n",
       "rec0qKtbrBvE1AyRe  [recJMNAE1xFtv9B60]     NaN     NaN            NaN  \n",
       "rec0yf9AiAqnwpaIT  [rec90UiwBWjFXvdt9]     NaN     NaN            NaN  \n",
       "rec1gD38VBhjtMssT  [reczToe4t1ynsYs5T]     NaN     NaN            NaN  \n",
       "\n",
       "[5 rows x 33 columns]"
      ]
     },
     "execution_count": 240,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gtfs_providers_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2 - Create Junction Tables\n",
    "\n",
    "Airtable doesn't have a good way of auto-generating a junction table (a table associating records from two other tables) based on two other tables and a set of conditions. \n",
    "\n",
    "The following codes createsa \"cartesian product\" (every record to every record) junction table based on exports of two airtable tables and then selecting which association records are relevant (i.e. only checks that apply to GTFS Schedule should be associated with GTFS Schedule).\n",
    "\n",
    "The resulting delimeted file can be pasted into a spreadsheet which can then be based into Airtable. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Services"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 241,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Check #</th>\n",
       "      <th>Services</th>\n",
       "      <th>Dataset Type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1382</th>\n",
       "      <td>32</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Alerts</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>246</th>\n",
       "      <td>13</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>530</th>\n",
       "      <td>10</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>814</th>\n",
       "      <td>8</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1098</th>\n",
       "      <td>7</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2454</th>\n",
       "      <td>9</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2738</th>\n",
       "      <td>14</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3022</th>\n",
       "      <td>11</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2170</th>\n",
       "      <td>18</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "      <td>GTFS TripUpdates</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1886</th>\n",
       "      <td>25</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "      <td>GTFS VehiclePositions</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3124 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Check #                          Services           Dataset Type\n",
       "1382      32                        AC Transit            GTFS Alerts\n",
       "246       13                        AC Transit          GTFS Schedule\n",
       "530       10                        AC Transit          GTFS Schedule\n",
       "814        8                        AC Transit          GTFS Schedule\n",
       "1098       7                        AC Transit          GTFS Schedule\n",
       "...      ...                               ...                    ...\n",
       "2454       9  the Link-Baldwin Hills Parklands          GTFS Schedule\n",
       "2738      14  the Link-Baldwin Hills Parklands          GTFS Schedule\n",
       "3022      11  the Link-Baldwin Hills Parklands          GTFS Schedule\n",
       "2170      18  the Link-Baldwin Hills Parklands       GTFS TripUpdates\n",
       "1886      25  the Link-Baldwin Hills Parklands  GTFS VehiclePositions\n",
       "\n",
       "[3124 rows x 3 columns]"
      ]
     },
     "execution_count": 241,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "services_df = services_df.rename(columns={\"Name\":\"Services\"})\n",
    "\n",
    "service_checks = [\n",
    "    \"Service within GTFS Schedule Dataset\",\n",
    "    \"Service within TripUpdates Dataset\",\n",
    "    \"Service within VehiclePositions Dataset\",\n",
    "    \"Service within Alerts Dataset\"\n",
    "]\n",
    "\n",
    "service_checks_df = gtfs_checks_df.loc[gtfs_checks_df[\"Scope\"].isin(service_checks),[\"Check #\",\"Scope\"]]\n",
    "\n",
    "# cartesian product of service checks and services\n",
    "service_checks_cross_df = service_checks_df.merge(services_df[\"Services\"], how='cross')\n",
    "service_checks_cross_df = service_checks_cross_df.sort_values([\"Services\",\"Scope\"])\n",
    "\n",
    "scope_data_mapping = {\n",
    "    \"Service within GTFS Schedule Dataset\":\"GTFS Schedule\",\n",
    "    \"Service within TripUpdates Dataset\":\"GTFS TripUpdates\",\n",
    "    \"Service within VehiclePositions Dataset\":\"GTFS VehiclePositions\",\n",
    "    \"Service within Alerts Dataset\":\"GTFS Alerts\",\n",
    "}\n",
    "\n",
    "service_checks_cross_df[\"Dataset Type\"] = service_checks_cross_df[\"Scope\"].map(scope_data_mapping)\n",
    "\n",
    "service_checks_cross_df = service_checks_cross_df.drop(columns=[\"Scope\"])\n",
    "\n",
    "service_checks_cross_df[\"Dataset Type\"].value_counts()\n",
    "service_checks_cross_df\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 242,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gtfs-service record</th>\n",
       "      <th>Dataset Type</th>\n",
       "      <th>GTFS Dataset</th>\n",
       "      <th>Services</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>rec05rHUgCVFqsXgN</th>\n",
       "      <td>Tulare Intermodal Express – Tulare Alerts</td>\n",
       "      <td>GTFS Alerts</td>\n",
       "      <td>Tulare Alerts</td>\n",
       "      <td>Tulare Intermodal Express</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec09fVK7ls8fl5sp</th>\n",
       "      <td>Monterey-Salinas Transit – Monterey Salinas Tr...</td>\n",
       "      <td>GTFS TripUpdates</td>\n",
       "      <td>Monterey Salinas TripUpdates</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0WcjVurdKkHx7S</th>\n",
       "      <td>Healdsburg Shuttle – Bay Area 511 TripUpdates</td>\n",
       "      <td>GTFS TripUpdates</td>\n",
       "      <td>Bay Area 511 TripUpdates</td>\n",
       "      <td>Healdsburg Shuttle</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0XvgWtnmoUl8Lz</th>\n",
       "      <td>Glenn Ride – Glenn Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>Glenn Schedule</td>\n",
       "      <td>Glenn Ride</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rec0YWNONFHEO0pUl</th>\n",
       "      <td>Valley Express – VCTC Alerts</td>\n",
       "      <td>GTFS Alerts</td>\n",
       "      <td>VCTC Alerts</td>\n",
       "      <td>Valley Express</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reczSG6bO6il6QMLZ</th>\n",
       "      <td>Healdsburg Shuttle – Bay Area 511 Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>Bay Area 511 Schedule</td>\n",
       "      <td>Healdsburg Shuttle</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reczZKKwKsVQuU0dF</th>\n",
       "      <td>Placer County Transit – Placer Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>Placer Schedule</td>\n",
       "      <td>Placer County Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reczf9FxlvLopZybb</th>\n",
       "      <td>Taft Area Transit – Taft Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>Taft Schedule</td>\n",
       "      <td>Taft Area Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>recziP7dvY7apKbPB</th>\n",
       "      <td>Kern Transit – Kern Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>Kern Schedule</td>\n",
       "      <td>Kern Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reczmEIm3KnSr2jF3</th>\n",
       "      <td>Monterey-Salinas Transit – Monterey Salinas Ve...</td>\n",
       "      <td>GTFS VehiclePositions</td>\n",
       "      <td>Monterey Salinas VehiclePositions</td>\n",
       "      <td>Monterey-Salinas Transit</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>543 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                 gtfs-service record  \\\n",
       "rec05rHUgCVFqsXgN          Tulare Intermodal Express – Tulare Alerts   \n",
       "rec09fVK7ls8fl5sp  Monterey-Salinas Transit – Monterey Salinas Tr...   \n",
       "rec0WcjVurdKkHx7S      Healdsburg Shuttle – Bay Area 511 TripUpdates   \n",
       "rec0XvgWtnmoUl8Lz                        Glenn Ride – Glenn Schedule   \n",
       "rec0YWNONFHEO0pUl                       Valley Express – VCTC Alerts   \n",
       "...                                                              ...   \n",
       "reczSG6bO6il6QMLZ         Healdsburg Shuttle – Bay Area 511 Schedule   \n",
       "reczZKKwKsVQuU0dF            Placer County Transit – Placer Schedule   \n",
       "reczf9FxlvLopZybb                  Taft Area Transit – Taft Schedule   \n",
       "recziP7dvY7apKbPB                       Kern Transit – Kern Schedule   \n",
       "reczmEIm3KnSr2jF3  Monterey-Salinas Transit – Monterey Salinas Ve...   \n",
       "\n",
       "                            Dataset Type                       GTFS Dataset  \\\n",
       "rec05rHUgCVFqsXgN            GTFS Alerts                      Tulare Alerts   \n",
       "rec09fVK7ls8fl5sp       GTFS TripUpdates       Monterey Salinas TripUpdates   \n",
       "rec0WcjVurdKkHx7S       GTFS TripUpdates           Bay Area 511 TripUpdates   \n",
       "rec0XvgWtnmoUl8Lz          GTFS Schedule                     Glenn Schedule   \n",
       "rec0YWNONFHEO0pUl            GTFS Alerts                        VCTC Alerts   \n",
       "...                                  ...                                ...   \n",
       "reczSG6bO6il6QMLZ          GTFS Schedule              Bay Area 511 Schedule   \n",
       "reczZKKwKsVQuU0dF          GTFS Schedule                    Placer Schedule   \n",
       "reczf9FxlvLopZybb          GTFS Schedule                      Taft Schedule   \n",
       "recziP7dvY7apKbPB          GTFS Schedule                      Kern Schedule   \n",
       "reczmEIm3KnSr2jF3  GTFS VehiclePositions  Monterey Salinas VehiclePositions   \n",
       "\n",
       "                                    Services  \n",
       "rec05rHUgCVFqsXgN  Tulare Intermodal Express  \n",
       "rec09fVK7ls8fl5sp   Monterey-Salinas Transit  \n",
       "rec0WcjVurdKkHx7S         Healdsburg Shuttle  \n",
       "rec0XvgWtnmoUl8Lz                 Glenn Ride  \n",
       "rec0YWNONFHEO0pUl             Valley Express  \n",
       "...                                      ...  \n",
       "reczSG6bO6il6QMLZ         Healdsburg Shuttle  \n",
       "reczZKKwKsVQuU0dF      Placer County Transit  \n",
       "reczf9FxlvLopZybb          Taft Area Transit  \n",
       "recziP7dvY7apKbPB               Kern Transit  \n",
       "reczmEIm3KnSr2jF3   Monterey-Salinas Transit  \n",
       "\n",
       "[543 rows x 4 columns]"
      ]
     },
     "execution_count": 242,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "services_gtfs_df = gtfs_services_df[[\"Name\",\"Dataset Type\",\"GTFS Dataset\",\"Services\"]].rename(columns={\"Name\":\"gtfs-service record\"})\n",
    "\n",
    "# make services a list if it isn't already\n",
    "services_gtfs_df[\"Services\"]=services_gtfs_df[\"Services\"].apply(lambda x: x.split(\",\"))\n",
    "\n",
    "# if dataset type is a list, then take first value\n",
    "services_gtfs_df[\"Dataset Type\"]=services_gtfs_df[\"Dataset Type\"].apply(lambda x: x[0])\n",
    "\n",
    "all_services_gtfs_df = services_gtfs_df.explode(\"Services\")\n",
    "all_services_gtfs_df[\"Services\"]=all_services_gtfs_df[\"Services\"].apply(lambda x: x.strip())\n",
    "\n",
    "all_services_gtfs_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 252,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Check #</th>\n",
       "      <th>gtfs-service record</th>\n",
       "      <th>Services</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>32</td>\n",
       "      <td>NaN</td>\n",
       "      <td>AC Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>13</td>\n",
       "      <td>AC Transit – Bay Area 511 Schedule</td>\n",
       "      <td>AC Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10</td>\n",
       "      <td>AC Transit – Bay Area 511 Schedule</td>\n",
       "      <td>AC Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8</td>\n",
       "      <td>AC Transit – Bay Area 511 Schedule</td>\n",
       "      <td>AC Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>AC Transit – Bay Area 511 Schedule</td>\n",
       "      <td>AC Transit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3135</th>\n",
       "      <td>9</td>\n",
       "      <td>the Link-Athens, the Link Florence-Firestone/W...</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3136</th>\n",
       "      <td>14</td>\n",
       "      <td>the Link-Athens, the Link Florence-Firestone/W...</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3137</th>\n",
       "      <td>11</td>\n",
       "      <td>the Link-Athens, the Link Florence-Firestone/W...</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3138</th>\n",
       "      <td>18</td>\n",
       "      <td>NaN</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3139</th>\n",
       "      <td>25</td>\n",
       "      <td>NaN</td>\n",
       "      <td>the Link-Baldwin Hills Parklands</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3140 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Check #                                gtfs-service record  \\\n",
       "0         32                                                NaN   \n",
       "1         13                 AC Transit – Bay Area 511 Schedule   \n",
       "2         10                 AC Transit – Bay Area 511 Schedule   \n",
       "3          8                 AC Transit – Bay Area 511 Schedule   \n",
       "4          7                 AC Transit – Bay Area 511 Schedule   \n",
       "...      ...                                                ...   \n",
       "3135       9  the Link-Athens, the Link Florence-Firestone/W...   \n",
       "3136      14  the Link-Athens, the Link Florence-Firestone/W...   \n",
       "3137      11  the Link-Athens, the Link Florence-Firestone/W...   \n",
       "3138      18                                                NaN   \n",
       "3139      25                                                NaN   \n",
       "\n",
       "                              Services  \n",
       "0                           AC Transit  \n",
       "1                           AC Transit  \n",
       "2                           AC Transit  \n",
       "3                           AC Transit  \n",
       "4                           AC Transit  \n",
       "...                                ...  \n",
       "3135  the Link-Baldwin Hills Parklands  \n",
       "3136  the Link-Baldwin Hills Parklands  \n",
       "3137  the Link-Baldwin Hills Parklands  \n",
       "3138  the Link-Baldwin Hills Parklands  \n",
       "3139  the Link-Baldwin Hills Parklands  \n",
       "\n",
       "[3140 rows x 3 columns]"
      ]
     },
     "execution_count": 252,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# attach information about gtfs datasets for each service to evaluate\n",
    "\n",
    "service_checkdata_df = service_checks_cross_df.merge(\n",
    "    all_services_gtfs_df,\n",
    "    on=[\"Services\",\"Dataset Type\"],how=\"left\")\n",
    "\n",
    "\n",
    "keep_cols = [\"Check #\",\"gtfs-service record\",\"Services\"]\n",
    "keep_service_checkdata_df = service_checkdata_df[keep_cols]  #.drop_duplicates()\n",
    "keep_service_checkdata_df.to_csv(\"service_cross.csv\",index=False,sep=\"|\")\n",
    "#service_checkdata_df.loc[service_checkdata_df[\"gtfs-service record\"].isna()]\n",
    "keep_service_checkdata_df\n",
    "#service_checkdata_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 245,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Check #</th>\n",
       "      <th>Services</th>\n",
       "      <th>Dataset Type</th>\n",
       "      <th>gtfs-service record</th>\n",
       "      <th>GTFS Dataset</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>32</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Alerts</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>13</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>AC Transit – Bay Area 511 Schedule</td>\n",
       "      <td>Bay Area 511 Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>AC Transit – Bay Area 511 Schedule</td>\n",
       "      <td>Bay Area 511 Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>AC Transit – Bay Area 511 Schedule</td>\n",
       "      <td>Bay Area 511 Schedule</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>AC Transit</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>AC Transit – Bay Area 511 Schedule</td>\n",
       "      <td>Bay Area 511 Schedule</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Check #    Services   Dataset Type                 gtfs-service record  \\\n",
       "0      32  AC Transit    GTFS Alerts                                 NaN   \n",
       "1      13  AC Transit  GTFS Schedule  AC Transit – Bay Area 511 Schedule   \n",
       "2      10  AC Transit  GTFS Schedule  AC Transit – Bay Area 511 Schedule   \n",
       "3       8  AC Transit  GTFS Schedule  AC Transit – Bay Area 511 Schedule   \n",
       "4       7  AC Transit  GTFS Schedule  AC Transit – Bay Area 511 Schedule   \n",
       "\n",
       "            GTFS Dataset  \n",
       "0                    NaN  \n",
       "1  Bay Area 511 Schedule  \n",
       "2  Bay Area 511 Schedule  \n",
       "3  Bay Area 511 Schedule  \n",
       "4  Bay Area 511 Schedule  "
      ]
     },
     "execution_count": 245,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "service_checkdata_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_checks_df = pd.read_csv(os.path.join(\"data\",\"gtfs checks-Dataset Scope.csv\"), usecols = [\"Check #\",\"Scope\"])\n",
    "datasets_df = pd.read_csv(os.path.join(\"data\",\"Assessed Feeds.csv\"), usecols = [\"gtfs_dataset_id\",\"Data\"])\n",
    "\n",
    "# cartesian product\n",
    "dataset_cross_df = datasets_df.merge(dataset_checks_df, how='cross')\n",
    "\n",
    "# select applicable checks\n",
    "dataset_cross_df[\"data_match\"]=dataset_cross_df[\"Data\"]+\" Dataset\"\n",
    "dataset_cross_df[\"scope_match\"] =  dataset_cross_df[\"Scope\"]\n",
    "dataset_cross_df.loc[\n",
    "    dataset_cross_df[\"Scope\"] != \"GTFS Schedule Dataset\", \n",
    "    \"scope_match\"\n",
    "]=\"GTFS \" + dataset_cross_df[\"Scope\"]\n",
    "dataset_checkdata_df = dataset_cross_df[dataset_cross_df[\"data_match\"]==dataset_cross_df[\"scope_match\"]]\n",
    "\n",
    "#checks \"GTFS Schedule Dataset\"\n",
    "#dataset \"GTFS Schedule\"\n",
    "\n",
    "dataset_checkdata_df[[\"gtfs_dataset_id\",\"Check #\"]].to_csv(\"data_cross.csv\",index=False,sep=\"|\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gtfs_dataset_id</th>\n",
       "      <th>Data</th>\n",
       "      <th>Check #</th>\n",
       "      <th>Scope</th>\n",
       "      <th>data_match</th>\n",
       "      <th>scope_match</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Get Around Town Express Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>1</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Get Around Town Express Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>2</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Get Around Town Express Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>3</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Get Around Town Express Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>4</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Get Around Town Express Schedule</td>\n",
       "      <td>GTFS Schedule</td>\n",
       "      <td>5</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "      <td>GTFS Schedule Dataset</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    gtfs_dataset_id           Data  Check #  \\\n",
       "0  Get Around Town Express Schedule  GTFS Schedule        1   \n",
       "1  Get Around Town Express Schedule  GTFS Schedule        2   \n",
       "2  Get Around Town Express Schedule  GTFS Schedule        3   \n",
       "3  Get Around Town Express Schedule  GTFS Schedule        4   \n",
       "4  Get Around Town Express Schedule  GTFS Schedule        5   \n",
       "\n",
       "                   Scope             data_match            scope_match  \n",
       "0  GTFS Schedule Dataset  GTFS Schedule Dataset  GTFS Schedule Dataset  \n",
       "1  GTFS Schedule Dataset  GTFS Schedule Dataset  GTFS Schedule Dataset  \n",
       "2  GTFS Schedule Dataset  GTFS Schedule Dataset  GTFS Schedule Dataset  \n",
       "3  GTFS Schedule Dataset  GTFS Schedule Dataset  GTFS Schedule Dataset  \n",
       "4  GTFS Schedule Dataset  GTFS Schedule Dataset  GTFS Schedule Dataset  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset_checkdata_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "provider_checks_df = pd.read_csv(os.path.join(\"data\",\"provider_checks.csv\"))\n",
    "providers_df = pd.read_csv(os.path.join(\"data\",\"Assessed Transit Providers-Grid view.csv\"), usecols = [\"Name\"])\n",
    "\n",
    "# cartesian product\n",
    "providers_cross_df = providers_df.merge(provider_checks_df, how='cross')\n",
    "\n",
    "providers_cross_df.to_csv(\"provider_cross.csv\",index=False,sep=\"|\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>checks</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alameda-Contra Costa Transit District</td>\n",
       "      <td>37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Alameda-Contra Costa Transit District</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Alameda-Contra Costa Transit District</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Amador Regional Transit System</td>\n",
       "      <td>37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Amador Regional Transit System</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    Name  checks\n",
       "0  Alameda-Contra Costa Transit District      37\n",
       "1  Alameda-Contra Costa Transit District      39\n",
       "2  Alameda-Contra Costa Transit District      40\n",
       "3         Amador Regional Transit System      37\n",
       "4         Amador Regional Transit System      39"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "providers_cross_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
  },
  "kernelspec": {
   "display_name": "Python 3.9.4 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}