{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Create Junction Tables for Assessments\n", "\n", "The purpose of this notebook is to create junction tables between \"checks\" and the various pieces of information the checks are performed on for the purposes of transit data assessments.\n", "\n", "Assessment checks are performed at the following levels:\n", "\n", "- Transit Provider (i.e. does the provider list GTFS on its website)\n", "- GTFS Dataset (i.e. does it list appropriate contact info in `feed_info.txt`)\n", "- Service representation in a GTFS Dataset (i.e. are the San Francisco cable cars appropriately represented in the SFMTA GTFS Dataset?)\n", "\n", "### Output\n", "In order to make sure that each relevant record is assessed with each relevant \"check\", a cartesian join is made between the records and the checks to form a \"junction table\", which allows us to add attributes about the junction of the record and the check (i.e. a \"grade\" or score).\n", "\n", "This notebook will create three delimited files (sep=\"|\") corresponding to each of these levels, which can be appended to the records in the following Airtable tables:\n", "\n", " - `data_cross.csv` --> GTFS Dataset Check Data \n", " - `services_cross.csv` --> gtfs-dataset Check Data \n", " - `provider_cross.csv` --> Provider Check Data\n", "\n", "### Requirements \n", "\n", "This notebook makes use of the `pyairtable` library to access the California Transit Airtable Database. You will need read access to this database with an API key stored in a .env file in the base directory of this notebook with the following format:\n", "\n", "```sh\n", "AIRTABLE_API_KEY='your_api_key_info`\n", "```\n", "\n" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1 - Read data from AirTable API" ] }, { "cell_type": "code", "execution_count": 168, "metadata": {}, "outputs": [], "source": [ "from dotenv import dotenv_values\n", "from pyairtable import Table\n", "\n", "API_KEY = dotenv_values(\".env\")['AIRTABLE_API_KEY']\n", "BASE_ID = 'appjPsudTDcbLUWM5'\n", "\n", "GTFS_SERVICES_TABLE = 'Assessed GTFS-Services'\n", "ASSESSED_SERVICES_TABLE = 'Assessed Services'\n", "ASSESSED_PROVIDERS_TABLE = 'Assessed Transit Providers'\n", "CHECKS_TABLE = 'gtfs checks'\n" ] }, { "cell_type": "code", "execution_count": 169, "metadata": {}, "outputs": [], "source": [ "def airtable_to_df(table_id:str,base_id:str=BASE_ID,api_key:str=API_KEY)->pd.DataFrame:\n", " records = Table(api_key, base_id, table_id)\n", " airtable_rows = [] \n", " airtable_index = []\n", " for record in records.all():\n", " airtable_rows.append(record[\"fields\"])\n", " airtable_index.append(record[\"id\"])\n", " return pd.DataFrame(airtable_rows, index=airtable_index)" ] }, { "cell_type": "code", "execution_count": 236, "metadata": {}, "outputs": [], "source": [ "gtfs_services_df = airtable_to_df(GTFS_SERVICES_TABLE)\n", "gtfs_checks_df = airtable_to_df(CHECKS_TABLE)\n", "services_df = airtable_to_df(ASSESSED_SERVICES_TABLE)\n", "gtfs_providers_df = airtable_to_df(ASSESSED_PROVIDERS_TABLE)" ] }, { "cell_type": "code", "execution_count": 237, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Services</th>\n", " <th>GTFS Dataset</th>\n", " <th>Dataset Type</th>\n", " <th>Category</th>\n", " <th>Service Type (from Services)</th>\n", " <th>Provider</th>\n", " <th>Operator</th>\n", " <th>Dataset Producers (from GTFS Dataset)</th>\n", " <th>Dataset Publisher (from GTFS Dataset)</th>\n", " <th>...</th>\n", " <th>Flex Status</th>\n", " <th>agency_id</th>\n", " <th>route_id</th>\n", " <th>Fares v2 Status</th>\n", " <th>ITP Activities (from GTFS Dataset)</th>\n", " <th>ITP Schedule TODO (from GTFS Dataset)</th>\n", " <th>network_id</th>\n", " <th>Fares Notes (from GTFS Dataset)</th>\n", " <th>Schedule Comments (from GTFS Dataset)</th>\n", " <th>gtfs check data</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>rec05rHUgCVFqsXgN</th>\n", " <td>Tulare Intermodal Express – Tulare Alerts</td>\n", " <td>Tulare Intermodal Express</td>\n", " <td>Tulare Alerts</td>\n", " <td>[GTFS Alerts]</td>\n", " <td>primary</td>\n", " <td>[fixed-route]</td>\n", " <td>Tulare County Regional Transit Agency</td>\n", " <td>Tulare County Regional Transit Agency</td>\n", " <td>GMV Syncromatics Inc</td>\n", " <td>GMV Syncromatics Inc</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec09fVK7ls8fl5sp</th>\n", " <td>Monterey-Salinas Transit – Monterey Salinas Tr...</td>\n", " <td>Monterey-Salinas Transit</td>\n", " <td>Monterey Salinas TripUpdates</td>\n", " <td>[GTFS TripUpdates]</td>\n", " <td>primary</td>\n", " <td>[ADA paratransit]</td>\n", " <td>Monterey-Salinas Transit</td>\n", " <td>Monterey-Salinas Transit</td>\n", " <td>Monterey-Salinas Transit</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>[Needed - Existing GTFS]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0WcjVurdKkHx7S</th>\n", " <td>Healdsburg Shuttle – Bay Area 511 TripUpdates</td>\n", " <td>Healdsburg Shuttle</td>\n", " <td>Bay Area 511 TripUpdates</td>\n", " <td>[GTFS TripUpdates]</td>\n", " <td>primary</td>\n", " <td>[fixed-route]</td>\n", " <td>Sonoma County</td>\n", " <td>Sonoma County</td>\n", " <td>Metropolitan Transportation Commission</td>\n", " <td>Metropolitan Transportation Commission</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>SO</td>\n", " <td>SO:67</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0XvgWtnmoUl8Lz</th>\n", " <td>Glenn Ride – Glenn Schedule</td>\n", " <td>Glenn Ride</td>\n", " <td>Glenn Schedule</td>\n", " <td>[GTFS Schedule]</td>\n", " <td>primary</td>\n", " <td>[deviated fixed-route, fixed-route, ADA paratr...</td>\n", " <td>Glenn County</td>\n", " <td>Paratransit Services Inc.</td>\n", " <td>Trillium Inc.</td>\n", " <td>Trillium Inc.</td>\n", " <td>...</td>\n", " <td>[In Progress - Trillium]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>[Vendor published]</td>\n", " <td>[GTFS Into Google, GTFS Schedule Creation]</td>\n", " <td>[Train transit provider to own]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0YWNONFHEO0pUl</th>\n", " <td>Valley Express – VCTC Alerts</td>\n", " <td>Valley Express</td>\n", " <td>VCTC Alerts</td>\n", " <td>[GTFS Alerts]</td>\n", " <td>primary</td>\n", " <td>[fixed-route]</td>\n", " <td>Ventura County Transportation Commission</td>\n", " <td>Ventura County Transportation Commission</td>\n", " <td>GMV Syncromatics Inc</td>\n", " <td>GMV Syncromatics Inc</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>149</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>reczSG6bO6il6QMLZ</th>\n", " <td>Healdsburg Shuttle – Bay Area 511 Schedule</td>\n", " <td>Healdsburg Shuttle</td>\n", " <td>Bay Area 511 Schedule</td>\n", " <td>[GTFS Schedule]</td>\n", " <td>primary</td>\n", " <td>[fixed-route]</td>\n", " <td>Sonoma County</td>\n", " <td>Sonoma County</td>\n", " <td>Metropolitan Transportation Commission</td>\n", " <td>Metropolitan Transportation Commission</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>SO</td>\n", " <td>SO:67</td>\n", " <td>[Vendor published]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>MTC</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>reczZKKwKsVQuU0dF</th>\n", " <td>Placer County Transit – Placer Schedule</td>\n", " <td>Placer County Transit</td>\n", " <td>Placer Schedule</td>\n", " <td>[GTFS Schedule]</td>\n", " <td>primary</td>\n", " <td>[fixed-route]</td>\n", " <td>Placer County</td>\n", " <td>Placer County</td>\n", " <td>Placer County, Trillium Inc.</td>\n", " <td>Trillium Inc.</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>[Vendor published]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>Published by Trillium</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>reczf9FxlvLopZybb</th>\n", " <td>Taft Area Transit – Taft Schedule</td>\n", " <td>Taft Area Transit</td>\n", " <td>Taft Schedule</td>\n", " <td>[GTFS Schedule]</td>\n", " <td>primary</td>\n", " <td>[fixed-route]</td>\n", " <td>City of Taft</td>\n", " <td>City of Taft</td>\n", " <td>Trillium Inc.</td>\n", " <td>Trillium Inc.</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>[Needs GTFS Schedule]</td>\n", " <td>[GTFS Schedule Creation]</td>\n", " <td>[Get into google, Ask Kern to own]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>recziP7dvY7apKbPB</th>\n", " <td>Kern Transit – Kern Schedule</td>\n", " <td>Kern Transit</td>\n", " <td>Kern Schedule</td>\n", " <td>[GTFS Schedule]</td>\n", " <td>primary</td>\n", " <td>[fixed-route]</td>\n", " <td>Kern County</td>\n", " <td>Kern County</td>\n", " <td>Kern County, Trillium Inc.</td>\n", " <td>Trillium Inc.</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>194</td>\n", " <td>NaN</td>\n", " <td>[Vendor published]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>Published by Trillium</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>reczmEIm3KnSr2jF3</th>\n", " <td>Monterey-Salinas Transit – Monterey Salinas Ve...</td>\n", " <td>Monterey-Salinas Transit</td>\n", " <td>Monterey Salinas VehiclePositions</td>\n", " <td>[GTFS VehiclePositions]</td>\n", " <td>primary</td>\n", " <td>[ADA paratransit]</td>\n", " <td>Monterey-Salinas Transit</td>\n", " <td>Monterey-Salinas Transit</td>\n", " <td>Monterey-Salinas Transit</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>[Needed - Existing GTFS]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>507 rows × 28 columns</p>\n", "</div>" ], "text/plain": [ " Name \\\n", "rec05rHUgCVFqsXgN Tulare Intermodal Express – Tulare Alerts \n", "rec09fVK7ls8fl5sp Monterey-Salinas Transit – Monterey Salinas Tr... \n", "rec0WcjVurdKkHx7S Healdsburg Shuttle – Bay Area 511 TripUpdates \n", "rec0XvgWtnmoUl8Lz Glenn Ride – Glenn Schedule \n", "rec0YWNONFHEO0pUl Valley Express – VCTC Alerts \n", "... ... \n", "reczSG6bO6il6QMLZ Healdsburg Shuttle – Bay Area 511 Schedule \n", "reczZKKwKsVQuU0dF Placer County Transit – Placer Schedule \n", "reczf9FxlvLopZybb Taft Area Transit – Taft Schedule \n", "recziP7dvY7apKbPB Kern Transit – Kern Schedule \n", "reczmEIm3KnSr2jF3 Monterey-Salinas Transit – Monterey Salinas Ve... \n", "\n", " Services \\\n", "rec05rHUgCVFqsXgN Tulare Intermodal Express \n", "rec09fVK7ls8fl5sp Monterey-Salinas Transit \n", "rec0WcjVurdKkHx7S Healdsburg Shuttle \n", "rec0XvgWtnmoUl8Lz Glenn Ride \n", "rec0YWNONFHEO0pUl Valley Express \n", "... ... \n", "reczSG6bO6il6QMLZ Healdsburg Shuttle \n", "reczZKKwKsVQuU0dF Placer County Transit \n", "reczf9FxlvLopZybb Taft Area Transit \n", "recziP7dvY7apKbPB Kern Transit \n", "reczmEIm3KnSr2jF3 Monterey-Salinas Transit \n", "\n", " GTFS Dataset Dataset Type \\\n", "rec05rHUgCVFqsXgN Tulare Alerts [GTFS Alerts] \n", "rec09fVK7ls8fl5sp Monterey Salinas TripUpdates [GTFS TripUpdates] \n", "rec0WcjVurdKkHx7S Bay Area 511 TripUpdates [GTFS TripUpdates] \n", "rec0XvgWtnmoUl8Lz Glenn Schedule [GTFS Schedule] \n", "rec0YWNONFHEO0pUl VCTC Alerts [GTFS Alerts] \n", "... ... ... \n", "reczSG6bO6il6QMLZ Bay Area 511 Schedule [GTFS Schedule] \n", "reczZKKwKsVQuU0dF Placer Schedule [GTFS Schedule] \n", "reczf9FxlvLopZybb Taft Schedule [GTFS Schedule] \n", "recziP7dvY7apKbPB Kern Schedule [GTFS Schedule] \n", "reczmEIm3KnSr2jF3 Monterey Salinas VehiclePositions [GTFS VehiclePositions] \n", "\n", " Category Service Type (from Services) \\\n", "rec05rHUgCVFqsXgN primary [fixed-route] \n", "rec09fVK7ls8fl5sp primary [ADA paratransit] \n", "rec0WcjVurdKkHx7S primary [fixed-route] \n", "rec0XvgWtnmoUl8Lz primary [deviated fixed-route, fixed-route, ADA paratr... \n", "rec0YWNONFHEO0pUl primary [fixed-route] \n", "... ... ... \n", "reczSG6bO6il6QMLZ primary [fixed-route] \n", "reczZKKwKsVQuU0dF primary [fixed-route] \n", "reczf9FxlvLopZybb primary [fixed-route] \n", "recziP7dvY7apKbPB primary [fixed-route] \n", "reczmEIm3KnSr2jF3 primary [ADA paratransit] \n", "\n", " Provider \\\n", "rec05rHUgCVFqsXgN Tulare County Regional Transit Agency \n", "rec09fVK7ls8fl5sp Monterey-Salinas Transit \n", "rec0WcjVurdKkHx7S Sonoma County \n", "rec0XvgWtnmoUl8Lz Glenn County \n", "rec0YWNONFHEO0pUl Ventura County Transportation Commission \n", "... ... \n", "reczSG6bO6il6QMLZ Sonoma County \n", "reczZKKwKsVQuU0dF Placer County \n", "reczf9FxlvLopZybb City of Taft \n", "recziP7dvY7apKbPB Kern County \n", "reczmEIm3KnSr2jF3 Monterey-Salinas Transit \n", "\n", " Operator \\\n", "rec05rHUgCVFqsXgN Tulare County Regional Transit Agency \n", "rec09fVK7ls8fl5sp Monterey-Salinas Transit \n", "rec0WcjVurdKkHx7S Sonoma County \n", "rec0XvgWtnmoUl8Lz Paratransit Services Inc. \n", "rec0YWNONFHEO0pUl Ventura County Transportation Commission \n", "... ... \n", "reczSG6bO6il6QMLZ Sonoma County \n", "reczZKKwKsVQuU0dF Placer County \n", "reczf9FxlvLopZybb City of Taft \n", "recziP7dvY7apKbPB Kern County \n", "reczmEIm3KnSr2jF3 Monterey-Salinas Transit \n", "\n", " Dataset Producers (from GTFS Dataset) \\\n", "rec05rHUgCVFqsXgN GMV Syncromatics Inc \n", "rec09fVK7ls8fl5sp Monterey-Salinas Transit \n", "rec0WcjVurdKkHx7S Metropolitan Transportation Commission \n", "rec0XvgWtnmoUl8Lz Trillium Inc. \n", "rec0YWNONFHEO0pUl GMV Syncromatics Inc \n", "... ... \n", "reczSG6bO6il6QMLZ Metropolitan Transportation Commission \n", "reczZKKwKsVQuU0dF Placer County, Trillium Inc. \n", "reczf9FxlvLopZybb Trillium Inc. \n", "recziP7dvY7apKbPB Kern County, Trillium Inc. \n", "reczmEIm3KnSr2jF3 Monterey-Salinas Transit \n", "\n", " Dataset Publisher (from GTFS Dataset) ... \\\n", "rec05rHUgCVFqsXgN GMV Syncromatics Inc ... \n", "rec09fVK7ls8fl5sp NaN ... \n", "rec0WcjVurdKkHx7S Metropolitan Transportation Commission ... \n", "rec0XvgWtnmoUl8Lz Trillium Inc. ... \n", "rec0YWNONFHEO0pUl GMV Syncromatics Inc ... \n", "... ... ... \n", "reczSG6bO6il6QMLZ Metropolitan Transportation Commission ... \n", "reczZKKwKsVQuU0dF Trillium Inc. ... \n", "reczf9FxlvLopZybb Trillium Inc. ... \n", "recziP7dvY7apKbPB Trillium Inc. ... \n", "reczmEIm3KnSr2jF3 NaN ... \n", "\n", " Flex Status agency_id route_id \\\n", "rec05rHUgCVFqsXgN NaN NaN NaN \n", "rec09fVK7ls8fl5sp [Needed - Existing GTFS] NaN NaN \n", "rec0WcjVurdKkHx7S NaN SO SO:67 \n", "rec0XvgWtnmoUl8Lz [In Progress - Trillium] NaN NaN \n", "rec0YWNONFHEO0pUl NaN 149 NaN \n", "... ... ... ... \n", "reczSG6bO6il6QMLZ NaN SO SO:67 \n", "reczZKKwKsVQuU0dF NaN NaN NaN \n", "reczf9FxlvLopZybb NaN NaN NaN \n", "recziP7dvY7apKbPB NaN 194 NaN \n", "reczmEIm3KnSr2jF3 [Needed - Existing GTFS] NaN NaN \n", "\n", " Fares v2 Status \\\n", "rec05rHUgCVFqsXgN NaN \n", "rec09fVK7ls8fl5sp NaN \n", "rec0WcjVurdKkHx7S NaN \n", "rec0XvgWtnmoUl8Lz [Vendor published] \n", "rec0YWNONFHEO0pUl NaN \n", "... ... \n", "reczSG6bO6il6QMLZ [Vendor published] \n", "reczZKKwKsVQuU0dF [Vendor published] \n", "reczf9FxlvLopZybb [Needs GTFS Schedule] \n", "recziP7dvY7apKbPB [Vendor published] \n", "reczmEIm3KnSr2jF3 NaN \n", "\n", " ITP Activities (from GTFS Dataset) \\\n", "rec05rHUgCVFqsXgN NaN \n", "rec09fVK7ls8fl5sp NaN \n", "rec0WcjVurdKkHx7S NaN \n", "rec0XvgWtnmoUl8Lz [GTFS Into Google, GTFS Schedule Creation] \n", "rec0YWNONFHEO0pUl NaN \n", "... ... \n", "reczSG6bO6il6QMLZ NaN \n", "reczZKKwKsVQuU0dF NaN \n", "reczf9FxlvLopZybb [GTFS Schedule Creation] \n", "recziP7dvY7apKbPB NaN \n", "reczmEIm3KnSr2jF3 NaN \n", "\n", " ITP Schedule TODO (from GTFS Dataset) network_id \\\n", "rec05rHUgCVFqsXgN NaN NaN \n", "rec09fVK7ls8fl5sp NaN NaN \n", "rec0WcjVurdKkHx7S NaN NaN \n", "rec0XvgWtnmoUl8Lz [Train transit provider to own] NaN \n", "rec0YWNONFHEO0pUl NaN NaN \n", "... ... ... \n", "reczSG6bO6il6QMLZ NaN NaN \n", "reczZKKwKsVQuU0dF NaN NaN \n", "reczf9FxlvLopZybb [Get into google, Ask Kern to own] NaN \n", "recziP7dvY7apKbPB NaN NaN \n", "reczmEIm3KnSr2jF3 NaN NaN \n", "\n", " Fares Notes (from GTFS Dataset) \\\n", "rec05rHUgCVFqsXgN NaN \n", "rec09fVK7ls8fl5sp NaN \n", "rec0WcjVurdKkHx7S NaN \n", "rec0XvgWtnmoUl8Lz NaN \n", "rec0YWNONFHEO0pUl NaN \n", "... ... \n", "reczSG6bO6il6QMLZ MTC \n", "reczZKKwKsVQuU0dF Published by Trillium \n", "reczf9FxlvLopZybb NaN \n", "recziP7dvY7apKbPB Published by Trillium \n", "reczmEIm3KnSr2jF3 NaN \n", "\n", " Schedule Comments (from GTFS Dataset) gtfs check data \n", "rec05rHUgCVFqsXgN NaN NaN \n", "rec09fVK7ls8fl5sp NaN NaN \n", "rec0WcjVurdKkHx7S NaN NaN \n", "rec0XvgWtnmoUl8Lz NaN NaN \n", "rec0YWNONFHEO0pUl NaN NaN \n", "... ... ... \n", "reczSG6bO6il6QMLZ NaN NaN \n", "reczZKKwKsVQuU0dF NaN NaN \n", "reczf9FxlvLopZybb NaN NaN \n", "recziP7dvY7apKbPB NaN NaN \n", "reczmEIm3KnSr2jF3 NaN NaN \n", "\n", "[507 rows x 28 columns]" ] }, "execution_count": 237, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gtfs_services_df" ] }, { "cell_type": "code", "execution_count": 238, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Check #</th>\n", " <th>Description</th>\n", " <th>Scope</th>\n", " <th>Source</th>\n", " <th>Source: medium-term</th>\n", " <th>Source: goal</th>\n", " <th>Max Score</th>\n", " <th>Score Type</th>\n", " <th>gtfs-service check data copy</th>\n", " <th>Scoring Criteria</th>\n", " <th>gtfs check data</th>\n", " <th>gtfs-dataset check data copy</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>rec4Ah8sgs8F243XH</th>\n", " <td>16</td>\n", " <td>TripUpdates Published</td>\n", " <td>TripUpdates Dataset</td>\n", " <td>[human]</td>\n", " <td>[auto]</td>\n", " <td>[auto]</td>\n", " <td>5.0</td>\n", " <td>Nominal</td>\n", " <td>[reczJXpgSxkKqzrFg, recAarP7F7LPwtlcn, reczYzu...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec5GIXSm4aAcpuZj</th>\n", " <td>20</td>\n", " <td>Publish to OpenMobilityData</td>\n", " <td>TripUpdates Dataset</td>\n", " <td>[human]</td>\n", " <td>[human]</td>\n", " <td>[human]</td>\n", " <td>1.0</td>\n", " <td>Boolean</td>\n", " <td>[recjjCy0uuLGk6RSq, recXNqjW35Dq3omAX, reclalk...</td>\n", " <td>There or not.\\n\\n</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec5ahg0uEPYRoMEX</th>\n", " <td>26</td>\n", " <td>Publish to TransitLand</td>\n", " <td>VehiclePositions Dataset</td>\n", " <td>[human]</td>\n", " <td>[human]</td>\n", " <td>[human]</td>\n", " <td>1.0</td>\n", " <td>Boolean</td>\n", " <td>[recw95VXxi1w2rmOC, recWbsiIF3zlT2u5d, recuuLV...</td>\n", " <td>There or not.\\n\\n</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec6i4tZoKfMLAWiK</th>\n", " <td>22</td>\n", " <td>Reasonable API Key Process</td>\n", " <td>TripUpdates Dataset</td>\n", " <td>[human]</td>\n", " <td>[human]</td>\n", " <td>[human]</td>\n", " <td>2.0</td>\n", " <td>Nominal</td>\n", " <td>[rec5b8Xd4UbUciyH7, recNp2w2UUdwfEB6j, recskZe...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>recBWkZKdzqEIDE46</th>\n", " <td>13</td>\n", " <td>GTFS Grading Scheme Score</td>\n", " <td>Service within GTFS Schedule Dataset</td>\n", " <td>[human]</td>\n", " <td>[gtfs-trained human]</td>\n", " <td>[human]</td>\n", " <td>3.0</td>\n", " <td>Continuous</td>\n", " <td>NaN</td>\n", " <td>3\\* Grading scheme score / Max grading scheme ...</td>\n", " <td>[recuql0JQWFcn7KFs, recxItR7wCdB51vjA, rec4ivy...</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Check # Description \\\n", "rec4Ah8sgs8F243XH 16 TripUpdates Published \n", "rec5GIXSm4aAcpuZj 20 Publish to OpenMobilityData \n", "rec5ahg0uEPYRoMEX 26 Publish to TransitLand \n", "rec6i4tZoKfMLAWiK 22 Reasonable API Key Process \n", "recBWkZKdzqEIDE46 13 GTFS Grading Scheme Score \n", "\n", " Scope Source \\\n", "rec4Ah8sgs8F243XH TripUpdates Dataset [human] \n", "rec5GIXSm4aAcpuZj TripUpdates Dataset [human] \n", "rec5ahg0uEPYRoMEX VehiclePositions Dataset [human] \n", "rec6i4tZoKfMLAWiK TripUpdates Dataset [human] \n", "recBWkZKdzqEIDE46 Service within GTFS Schedule Dataset [human] \n", "\n", " Source: medium-term Source: goal Max Score Score Type \\\n", "rec4Ah8sgs8F243XH [auto] [auto] 5.0 Nominal \n", "rec5GIXSm4aAcpuZj [human] [human] 1.0 Boolean \n", "rec5ahg0uEPYRoMEX [human] [human] 1.0 Boolean \n", "rec6i4tZoKfMLAWiK [human] [human] 2.0 Nominal \n", "recBWkZKdzqEIDE46 [gtfs-trained human] [human] 3.0 Continuous \n", "\n", " gtfs-service check data copy \\\n", "rec4Ah8sgs8F243XH [reczJXpgSxkKqzrFg, recAarP7F7LPwtlcn, reczYzu... \n", "rec5GIXSm4aAcpuZj [recjjCy0uuLGk6RSq, recXNqjW35Dq3omAX, reclalk... \n", "rec5ahg0uEPYRoMEX [recw95VXxi1w2rmOC, recWbsiIF3zlT2u5d, recuuLV... \n", "rec6i4tZoKfMLAWiK [rec5b8Xd4UbUciyH7, recNp2w2UUdwfEB6j, recskZe... \n", "recBWkZKdzqEIDE46 NaN \n", "\n", " Scoring Criteria \\\n", "rec4Ah8sgs8F243XH NaN \n", "rec5GIXSm4aAcpuZj There or not.\\n\\n \n", "rec5ahg0uEPYRoMEX There or not.\\n\\n \n", "rec6i4tZoKfMLAWiK NaN \n", "recBWkZKdzqEIDE46 3\\* Grading scheme score / Max grading scheme ... \n", "\n", " gtfs check data \\\n", "rec4Ah8sgs8F243XH NaN \n", "rec5GIXSm4aAcpuZj NaN \n", "rec5ahg0uEPYRoMEX NaN \n", "rec6i4tZoKfMLAWiK NaN \n", "recBWkZKdzqEIDE46 [recuql0JQWFcn7KFs, recxItR7wCdB51vjA, rec4ivy... \n", "\n", " gtfs-dataset check data copy \n", "rec4Ah8sgs8F243XH NaN \n", "rec5GIXSm4aAcpuZj NaN \n", "rec5ahg0uEPYRoMEX NaN \n", "rec6i4tZoKfMLAWiK NaN \n", "recBWkZKdzqEIDE46 NaN " ] }, "execution_count": 238, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gtfs_checks_df.head()" ] }, { "cell_type": "code", "execution_count": 239, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>service_id</th>\n", " <th>Provider</th>\n", " <th>Operator</th>\n", " <th>Currently Operating</th>\n", " <th>Mode</th>\n", " <th>Service Type</th>\n", " <th>Notes</th>\n", " <th>Funding Sources</th>\n", " <th>Funding Category</th>\n", " <th>...</th>\n", " <th>Service Area Type</th>\n", " <th>Service Area Buffer (miles)</th>\n", " <th>Service Availability Category</th>\n", " <th>Paratransit For</th>\n", " <th>organization stack components</th>\n", " <th>ITP Schedule TODO (from GTFS Dataset) (from GTFS Services Association)</th>\n", " <th>ITP Activities (from GTFS Dataset) (from GTFS Services Association)</th>\n", " <th>Schedule Comments (from GTFS Dataset) (from GTFS Services Association)</th>\n", " <th>Season Start</th>\n", " <th>Season End</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>rec0OCKo3fwRLpPIh</th>\n", " <td>Watsonville Circulator</td>\n", " <td>rec6lgJrPslFjSXdk</td>\n", " <td>Santa Cruz Metropolitan Transit District</td>\n", " <td>Santa Cruz Metropolitan Transit District</td>\n", " <td>True</td>\n", " <td>[bus]</td>\n", " <td>[fixed-route]</td>\n", " <td>ETRO’s new Watsonville Circulator Route is des...</td>\n", " <td>Caltrans</td>\n", " <td>[public]</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0ZVZt8PTzF2zuY</th>\n", " <td>Topanga Beach Bus</td>\n", " <td>rec00Di9RibzcrxHS</td>\n", " <td>Los Angeles County</td>\n", " <td>Los Angeles County</td>\n", " <td>True</td>\n", " <td>[bus]</td>\n", " <td>[fixed-route]</td>\n", " <td>Connects Metro Orange Line to Topanga Beach an...</td>\n", " <td>Caltrans</td>\n", " <td>[public]</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0ikYugqTjhLBkg</th>\n", " <td>Imperial Valley Transit</td>\n", " <td>recMR1zH6QMLPSriV</td>\n", " <td>Imperial County Transportation Commission</td>\n", " <td>Imperial County Transportation Commission</td>\n", " <td>True</td>\n", " <td>[bus]</td>\n", " <td>[fixed-route, deviated fixed-route]</td>\n", " <td>NaN</td>\n", " <td>Caltrans</td>\n", " <td>[public]</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0n7D4vtGfWZVEc</th>\n", " <td>Cityline</td>\n", " <td>recaTB1mtROhjUHNR</td>\n", " <td>City of West Hollywood</td>\n", " <td>City of West Hollywood</td>\n", " <td>True</td>\n", " <td>[bus]</td>\n", " <td>[fixed-route]</td>\n", " <td>West Hollywood’s own free shuttle\\n\\nCityline ...</td>\n", " <td>Caltrans</td>\n", " <td>[public]</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec14ZBuXpyw4OLY2</th>\n", " <td>Avocado Heights/Bassett/West Valinda Shuttle</td>\n", " <td>recHJivXoZGV1KhZe</td>\n", " <td>Los Angeles County</td>\n", " <td>Los Angeles County</td>\n", " <td>True</td>\n", " <td>[bus]</td>\n", " <td>[fixed-route]</td>\n", " <td>The Avocado Heights/Bassett/West Valinda Shutt...</td>\n", " <td>Caltrans</td>\n", " <td>[public]</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 41 columns</p>\n", "</div>" ], "text/plain": [ " Name \\\n", "rec0OCKo3fwRLpPIh Watsonville Circulator \n", "rec0ZVZt8PTzF2zuY Topanga Beach Bus \n", "rec0ikYugqTjhLBkg Imperial Valley Transit \n", "rec0n7D4vtGfWZVEc Cityline \n", "rec14ZBuXpyw4OLY2 Avocado Heights/Bassett/West Valinda Shuttle \n", "\n", " service_id \\\n", "rec0OCKo3fwRLpPIh rec6lgJrPslFjSXdk \n", "rec0ZVZt8PTzF2zuY rec00Di9RibzcrxHS \n", "rec0ikYugqTjhLBkg recMR1zH6QMLPSriV \n", "rec0n7D4vtGfWZVEc recaTB1mtROhjUHNR \n", "rec14ZBuXpyw4OLY2 recHJivXoZGV1KhZe \n", "\n", " Provider \\\n", "rec0OCKo3fwRLpPIh Santa Cruz Metropolitan Transit District \n", "rec0ZVZt8PTzF2zuY Los Angeles County \n", "rec0ikYugqTjhLBkg Imperial County Transportation Commission \n", "rec0n7D4vtGfWZVEc City of West Hollywood \n", "rec14ZBuXpyw4OLY2 Los Angeles County \n", "\n", " Operator \\\n", "rec0OCKo3fwRLpPIh Santa Cruz Metropolitan Transit District \n", "rec0ZVZt8PTzF2zuY Los Angeles County \n", "rec0ikYugqTjhLBkg Imperial County Transportation Commission \n", "rec0n7D4vtGfWZVEc City of West Hollywood \n", "rec14ZBuXpyw4OLY2 Los Angeles County \n", "\n", " Currently Operating Mode \\\n", "rec0OCKo3fwRLpPIh True [bus] \n", "rec0ZVZt8PTzF2zuY True [bus] \n", "rec0ikYugqTjhLBkg True [bus] \n", "rec0n7D4vtGfWZVEc True [bus] \n", "rec14ZBuXpyw4OLY2 True [bus] \n", "\n", " Service Type \\\n", "rec0OCKo3fwRLpPIh [fixed-route] \n", "rec0ZVZt8PTzF2zuY [fixed-route] \n", "rec0ikYugqTjhLBkg [fixed-route, deviated fixed-route] \n", "rec0n7D4vtGfWZVEc [fixed-route] \n", "rec14ZBuXpyw4OLY2 [fixed-route] \n", "\n", " Notes \\\n", "rec0OCKo3fwRLpPIh ETRO’s new Watsonville Circulator Route is des... \n", "rec0ZVZt8PTzF2zuY Connects Metro Orange Line to Topanga Beach an... \n", "rec0ikYugqTjhLBkg NaN \n", "rec0n7D4vtGfWZVEc West Hollywood’s own free shuttle\\n\\nCityline ... \n", "rec14ZBuXpyw4OLY2 The Avocado Heights/Bassett/West Valinda Shutt... \n", "\n", " Funding Sources Funding Category ... Service Area Type \\\n", "rec0OCKo3fwRLpPIh Caltrans [public] ... NaN \n", "rec0ZVZt8PTzF2zuY Caltrans [public] ... NaN \n", "rec0ikYugqTjhLBkg Caltrans [public] ... NaN \n", "rec0n7D4vtGfWZVEc Caltrans [public] ... NaN \n", "rec14ZBuXpyw4OLY2 Caltrans [public] ... NaN \n", "\n", " Service Area Buffer (miles) Service Availability Category \\\n", "rec0OCKo3fwRLpPIh NaN NaN \n", "rec0ZVZt8PTzF2zuY NaN NaN \n", "rec0ikYugqTjhLBkg NaN NaN \n", "rec0n7D4vtGfWZVEc NaN NaN \n", "rec14ZBuXpyw4OLY2 NaN NaN \n", "\n", " Paratransit For organization stack components \\\n", "rec0OCKo3fwRLpPIh NaN NaN \n", "rec0ZVZt8PTzF2zuY NaN NaN \n", "rec0ikYugqTjhLBkg NaN NaN \n", "rec0n7D4vtGfWZVEc NaN NaN \n", "rec14ZBuXpyw4OLY2 NaN NaN \n", "\n", " ITP Schedule TODO (from GTFS Dataset) (from GTFS Services Association) \\\n", "rec0OCKo3fwRLpPIh NaN \n", "rec0ZVZt8PTzF2zuY NaN \n", "rec0ikYugqTjhLBkg NaN \n", "rec0n7D4vtGfWZVEc NaN \n", "rec14ZBuXpyw4OLY2 NaN \n", "\n", " ITP Activities (from GTFS Dataset) (from GTFS Services Association) \\\n", "rec0OCKo3fwRLpPIh NaN \n", "rec0ZVZt8PTzF2zuY NaN \n", "rec0ikYugqTjhLBkg NaN \n", "rec0n7D4vtGfWZVEc NaN \n", "rec14ZBuXpyw4OLY2 NaN \n", "\n", " Schedule Comments (from GTFS Dataset) (from GTFS Services Association) \\\n", "rec0OCKo3fwRLpPIh NaN \n", "rec0ZVZt8PTzF2zuY NaN \n", "rec0ikYugqTjhLBkg NaN \n", "rec0n7D4vtGfWZVEc NaN \n", "rec14ZBuXpyw4OLY2 NaN \n", "\n", " Season Start Season End \n", "rec0OCKo3fwRLpPIh NaN NaN \n", "rec0ZVZt8PTzF2zuY NaN NaN \n", "rec0ikYugqTjhLBkg NaN NaN \n", "rec0n7D4vtGfWZVEc NaN NaN \n", "rec14ZBuXpyw4OLY2 NaN NaN \n", "\n", "[5 rows x 41 columns]" ] }, "execution_count": 239, "metadata": {}, "output_type": "execute_result" } ], "source": [ "services_df.head()" ] }, { "cell_type": "code", "execution_count": 240, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Alias'</th>\n", " <th>organization_id</th>\n", " <th>ITP ID</th>\n", " <th>Tracking Cat - og</th>\n", " <th>Reporting Cat - og</th>\n", " <th>Assist Cat - og</th>\n", " <th>Caltrans District</th>\n", " <th>MPO/RTPA</th>\n", " <th>Planning Authority</th>\n", " <th>...</th>\n", " <th>Count of Services with Complete Realtime Status</th>\n", " <th>At least one GTFS feed for any service (1=yes)</th>\n", " <th>At least on complete RT set (1=yes)</th>\n", " <th>Complete static GTFS coverage (1=yes)</th>\n", " <th>Complete RT coverage (1=yes)</th>\n", " <th>provider check data</th>\n", " <th>Provider Assessments</th>\n", " <th>Details</th>\n", " <th>Website</th>\n", " <th>Contracts Held</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>rec0ZHctuUYh5wtLS</th>\n", " <td>San Luis Obispo Regional Transit Authority</td>\n", " <td>[SLORTA]</td>\n", " <td>reciakGBN1DP9dK9N</td>\n", " <td>289.0</td>\n", " <td>Active</td>\n", " <td>Core</td>\n", " <td>White Glove</td>\n", " <td>05 - San Luis Obispo</td>\n", " <td>San Luis Obispo Council of Governments</td>\n", " <td>San Luis Obispo Council of Governments</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>[recJpdz8eCeRsUQSj, recK3xMG6XsnQpYwD, recX8ke...</td>\n", " <td>[recIUkhYnikNkX12h]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0dtZWydv7z5afX</th>\n", " <td>City of Taft</td>\n", " <td>[TAT]</td>\n", " <td>recgTxL1xyvgC8e9k</td>\n", " <td>330.0</td>\n", " <td>Active</td>\n", " <td>Core</td>\n", " <td>White Glove</td>\n", " <td>06 - Fresno</td>\n", " <td>Kern Council of Governments</td>\n", " <td>Kern Council of Governments</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>[rec9P4eMWAQ2GaF5A, recYKDCjjLJbzcJTM, recLVPL...</td>\n", " <td>[recp38jhkZVctheDK]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0qKtbrBvE1AyRe</th>\n", " <td>City of Cerritos</td>\n", " <td>NaN</td>\n", " <td>recXYM27Lts8XF98x</td>\n", " <td>63.0</td>\n", " <td>Active</td>\n", " <td>Core</td>\n", " <td>White Glove</td>\n", " <td>07 - Los Angeles</td>\n", " <td>Southern California Association of Governments</td>\n", " <td>Southern California Association of Governments</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>[receMU82jZycJkGFe, recwhLFXaVN5DcqQ1, recXbUF...</td>\n", " <td>[recJMNAE1xFtv9B60]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec0yf9AiAqnwpaIT</th>\n", " <td>City of Delano</td>\n", " <td>NaN</td>\n", " <td>recROsnN85RlZziSj</td>\n", " <td>91.0</td>\n", " <td>Active</td>\n", " <td>Core</td>\n", " <td>White Glove</td>\n", " <td>06 - Fresno</td>\n", " <td>Kern Council of Governments</td>\n", " <td>Kern Council of Governments</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>[recxPm0wCZJj0jtR0, recyeyK9Brn5UV85M, recHjg7...</td>\n", " <td>[rec90UiwBWjFXvdt9]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>rec1gD38VBhjtMssT</th>\n", " <td>Imperial County Transportation Commission</td>\n", " <td>NaN</td>\n", " <td>rec38PbjPbEy2Tvdu</td>\n", " <td>138.0</td>\n", " <td>Active</td>\n", " <td>Core</td>\n", " <td>White Glove</td>\n", " <td>11 - San Diego</td>\n", " <td>Southern California Association of Governments</td>\n", " <td>Southern California Association of Governments</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>[recT2TIZ0r3FHdCwQ, recgcfRSpfLffxTDf, recSF6B...</td>\n", " <td>[reczToe4t1ynsYs5T]</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 33 columns</p>\n", "</div>" ], "text/plain": [ " Name Alias' \\\n", "rec0ZHctuUYh5wtLS San Luis Obispo Regional Transit Authority [SLORTA] \n", "rec0dtZWydv7z5afX City of Taft [TAT] \n", "rec0qKtbrBvE1AyRe City of Cerritos NaN \n", "rec0yf9AiAqnwpaIT City of Delano NaN \n", "rec1gD38VBhjtMssT Imperial County Transportation Commission NaN \n", "\n", " organization_id ITP ID Tracking Cat - og \\\n", "rec0ZHctuUYh5wtLS reciakGBN1DP9dK9N 289.0 Active \n", "rec0dtZWydv7z5afX recgTxL1xyvgC8e9k 330.0 Active \n", "rec0qKtbrBvE1AyRe recXYM27Lts8XF98x 63.0 Active \n", "rec0yf9AiAqnwpaIT recROsnN85RlZziSj 91.0 Active \n", "rec1gD38VBhjtMssT rec38PbjPbEy2Tvdu 138.0 Active \n", "\n", " Reporting Cat - og Assist Cat - og Caltrans District \\\n", "rec0ZHctuUYh5wtLS Core White Glove 05 - San Luis Obispo \n", "rec0dtZWydv7z5afX Core White Glove 06 - Fresno \n", "rec0qKtbrBvE1AyRe Core White Glove 07 - Los Angeles \n", "rec0yf9AiAqnwpaIT Core White Glove 06 - Fresno \n", "rec1gD38VBhjtMssT Core White Glove 11 - San Diego \n", "\n", " MPO/RTPA \\\n", "rec0ZHctuUYh5wtLS San Luis Obispo Council of Governments \n", "rec0dtZWydv7z5afX Kern Council of Governments \n", "rec0qKtbrBvE1AyRe Southern California Association of Governments \n", "rec0yf9AiAqnwpaIT Kern Council of Governments \n", "rec1gD38VBhjtMssT Southern California Association of Governments \n", "\n", " Planning Authority ... \\\n", "rec0ZHctuUYh5wtLS San Luis Obispo Council of Governments ... \n", "rec0dtZWydv7z5afX Kern Council of Governments ... \n", "rec0qKtbrBvE1AyRe Southern California Association of Governments ... \n", "rec0yf9AiAqnwpaIT Kern Council of Governments ... \n", "rec1gD38VBhjtMssT Southern California Association of Governments ... \n", "\n", " Count of Services with Complete Realtime Status \\\n", "rec0ZHctuUYh5wtLS 0 \n", "rec0dtZWydv7z5afX 0 \n", "rec0qKtbrBvE1AyRe 0 \n", "rec0yf9AiAqnwpaIT 0 \n", "rec1gD38VBhjtMssT 0 \n", "\n", " At least one GTFS feed for any service (1=yes) \\\n", "rec0ZHctuUYh5wtLS 1 \n", "rec0dtZWydv7z5afX 1 \n", "rec0qKtbrBvE1AyRe 0 \n", "rec0yf9AiAqnwpaIT 1 \n", "rec1gD38VBhjtMssT 0 \n", "\n", " At least on complete RT set (1=yes) \\\n", "rec0ZHctuUYh5wtLS 0 \n", "rec0dtZWydv7z5afX 0 \n", "rec0qKtbrBvE1AyRe 0 \n", "rec0yf9AiAqnwpaIT 0 \n", "rec1gD38VBhjtMssT 0 \n", "\n", " Complete static GTFS coverage (1=yes) \\\n", "rec0ZHctuUYh5wtLS 0 \n", "rec0dtZWydv7z5afX 1 \n", "rec0qKtbrBvE1AyRe 0 \n", "rec0yf9AiAqnwpaIT 1 \n", "rec1gD38VBhjtMssT 0 \n", "\n", " Complete RT coverage (1=yes) \\\n", "rec0ZHctuUYh5wtLS 0 \n", "rec0dtZWydv7z5afX 0 \n", "rec0qKtbrBvE1AyRe 0 \n", "rec0yf9AiAqnwpaIT 0 \n", "rec1gD38VBhjtMssT 0 \n", "\n", " provider check data \\\n", "rec0ZHctuUYh5wtLS [recJpdz8eCeRsUQSj, recK3xMG6XsnQpYwD, recX8ke... \n", "rec0dtZWydv7z5afX [rec9P4eMWAQ2GaF5A, recYKDCjjLJbzcJTM, recLVPL... \n", "rec0qKtbrBvE1AyRe [receMU82jZycJkGFe, recwhLFXaVN5DcqQ1, recXbUF... \n", "rec0yf9AiAqnwpaIT [recxPm0wCZJj0jtR0, recyeyK9Brn5UV85M, recHjg7... \n", "rec1gD38VBhjtMssT [recT2TIZ0r3FHdCwQ, recgcfRSpfLffxTDf, recSF6B... \n", "\n", " Provider Assessments Details Website Contracts Held \n", "rec0ZHctuUYh5wtLS [recIUkhYnikNkX12h] NaN NaN NaN \n", "rec0dtZWydv7z5afX [recp38jhkZVctheDK] NaN NaN NaN \n", "rec0qKtbrBvE1AyRe [recJMNAE1xFtv9B60] NaN NaN NaN \n", "rec0yf9AiAqnwpaIT [rec90UiwBWjFXvdt9] NaN NaN NaN \n", "rec1gD38VBhjtMssT [reczToe4t1ynsYs5T] NaN NaN NaN \n", "\n", "[5 rows x 33 columns]" ] }, "execution_count": 240, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gtfs_providers_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2 - Create Junction Tables\n", "\n", "Airtable doesn't have a good way of auto-generating a junction table (a table associating records from two other tables) based on two other tables and a set of conditions. \n", "\n", "The following codes createsa \"cartesian product\" (every record to every record) junction table based on exports of two airtable tables and then selecting which association records are relevant (i.e. only checks that apply to GTFS Schedule should be associated with GTFS Schedule).\n", "\n", "The resulting delimeted file can be pasted into a spreadsheet which can then be based into Airtable. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Services" ] }, { "cell_type": "code", "execution_count": 241, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Check #</th>\n", " <th>Services</th>\n", " <th>Dataset Type</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>1382</th>\n", " <td>32</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Alerts</td>\n", " </tr>\n", " <tr>\n", " <th>246</th>\n", " <td>13</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>530</th>\n", " <td>10</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>814</th>\n", " <td>8</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>1098</th>\n", " <td>7</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>2454</th>\n", " <td>9</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " <td>GTFS Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>2738</th>\n", " <td>14</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " <td>GTFS Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>3022</th>\n", " <td>11</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " <td>GTFS Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>2170</th>\n", " <td>18</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " <td>GTFS TripUpdates</td>\n", " </tr>\n", " <tr>\n", " <th>1886</th>\n", " <td>25</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " <td>GTFS VehiclePositions</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>3124 rows × 3 columns</p>\n", "</div>" ], "text/plain": [ " Check # Services Dataset Type\n", "1382 32 AC Transit GTFS Alerts\n", "246 13 AC Transit GTFS Schedule\n", "530 10 AC Transit GTFS Schedule\n", "814 8 AC Transit GTFS Schedule\n", "1098 7 AC Transit GTFS Schedule\n", "... ... ... ...\n", "2454 9 the Link-Baldwin Hills Parklands GTFS Schedule\n", "2738 14 the Link-Baldwin Hills Parklands GTFS Schedule\n", "3022 11 the Link-Baldwin Hills Parklands GTFS Schedule\n", "2170 18 the Link-Baldwin Hills Parklands GTFS TripUpdates\n", "1886 25 the Link-Baldwin Hills Parklands GTFS VehiclePositions\n", "\n", "[3124 rows x 3 columns]" ] }, "execution_count": 241, "metadata": {}, "output_type": "execute_result" } ], "source": [ "services_df = services_df.rename(columns={\"Name\":\"Services\"})\n", "\n", "service_checks = [\n", " \"Service within GTFS Schedule Dataset\",\n", " \"Service within TripUpdates Dataset\",\n", " \"Service within VehiclePositions Dataset\",\n", " \"Service within Alerts Dataset\"\n", "]\n", "\n", "service_checks_df = gtfs_checks_df.loc[gtfs_checks_df[\"Scope\"].isin(service_checks),[\"Check #\",\"Scope\"]]\n", "\n", "# cartesian product of service checks and services\n", "service_checks_cross_df = service_checks_df.merge(services_df[\"Services\"], how='cross')\n", "service_checks_cross_df = service_checks_cross_df.sort_values([\"Services\",\"Scope\"])\n", "\n", "scope_data_mapping = {\n", " \"Service within GTFS Schedule Dataset\":\"GTFS Schedule\",\n", " \"Service within TripUpdates Dataset\":\"GTFS TripUpdates\",\n", " \"Service within VehiclePositions Dataset\":\"GTFS VehiclePositions\",\n", " \"Service within Alerts Dataset\":\"GTFS Alerts\",\n", "}\n", "\n", "service_checks_cross_df[\"Dataset Type\"] = service_checks_cross_df[\"Scope\"].map(scope_data_mapping)\n", "\n", "service_checks_cross_df = service_checks_cross_df.drop(columns=[\"Scope\"])\n", "\n", "service_checks_cross_df[\"Dataset Type\"].value_counts()\n", "service_checks_cross_df\n" ] }, { "cell_type": "code", "execution_count": 242, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>gtfs-service record</th>\n", " <th>Dataset Type</th>\n", " <th>GTFS Dataset</th>\n", " <th>Services</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>rec05rHUgCVFqsXgN</th>\n", " <td>Tulare Intermodal Express – Tulare Alerts</td>\n", " <td>GTFS Alerts</td>\n", " <td>Tulare Alerts</td>\n", " <td>Tulare Intermodal Express</td>\n", " </tr>\n", " <tr>\n", " <th>rec09fVK7ls8fl5sp</th>\n", " <td>Monterey-Salinas Transit – Monterey Salinas Tr...</td>\n", " <td>GTFS TripUpdates</td>\n", " <td>Monterey Salinas TripUpdates</td>\n", " <td>Monterey-Salinas Transit</td>\n", " </tr>\n", " <tr>\n", " <th>rec0WcjVurdKkHx7S</th>\n", " <td>Healdsburg Shuttle – Bay Area 511 TripUpdates</td>\n", " <td>GTFS TripUpdates</td>\n", " <td>Bay Area 511 TripUpdates</td>\n", " <td>Healdsburg Shuttle</td>\n", " </tr>\n", " <tr>\n", " <th>rec0XvgWtnmoUl8Lz</th>\n", " <td>Glenn Ride – Glenn Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>Glenn Schedule</td>\n", " <td>Glenn Ride</td>\n", " </tr>\n", " <tr>\n", " <th>rec0YWNONFHEO0pUl</th>\n", " <td>Valley Express – VCTC Alerts</td>\n", " <td>GTFS Alerts</td>\n", " <td>VCTC Alerts</td>\n", " <td>Valley Express</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>reczSG6bO6il6QMLZ</th>\n", " <td>Healdsburg Shuttle – Bay Area 511 Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>Bay Area 511 Schedule</td>\n", " <td>Healdsburg Shuttle</td>\n", " </tr>\n", " <tr>\n", " <th>reczZKKwKsVQuU0dF</th>\n", " <td>Placer County Transit – Placer Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>Placer Schedule</td>\n", " <td>Placer County Transit</td>\n", " </tr>\n", " <tr>\n", " <th>reczf9FxlvLopZybb</th>\n", " <td>Taft Area Transit – Taft Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>Taft Schedule</td>\n", " <td>Taft Area Transit</td>\n", " </tr>\n", " <tr>\n", " <th>recziP7dvY7apKbPB</th>\n", " <td>Kern Transit – Kern Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>Kern Schedule</td>\n", " <td>Kern Transit</td>\n", " </tr>\n", " <tr>\n", " <th>reczmEIm3KnSr2jF3</th>\n", " <td>Monterey-Salinas Transit – Monterey Salinas Ve...</td>\n", " <td>GTFS VehiclePositions</td>\n", " <td>Monterey Salinas VehiclePositions</td>\n", " <td>Monterey-Salinas Transit</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>543 rows × 4 columns</p>\n", "</div>" ], "text/plain": [ " gtfs-service record \\\n", "rec05rHUgCVFqsXgN Tulare Intermodal Express – Tulare Alerts \n", "rec09fVK7ls8fl5sp Monterey-Salinas Transit – Monterey Salinas Tr... \n", "rec0WcjVurdKkHx7S Healdsburg Shuttle – Bay Area 511 TripUpdates \n", "rec0XvgWtnmoUl8Lz Glenn Ride – Glenn Schedule \n", "rec0YWNONFHEO0pUl Valley Express – VCTC Alerts \n", "... ... \n", "reczSG6bO6il6QMLZ Healdsburg Shuttle – Bay Area 511 Schedule \n", "reczZKKwKsVQuU0dF Placer County Transit – Placer Schedule \n", "reczf9FxlvLopZybb Taft Area Transit – Taft Schedule \n", "recziP7dvY7apKbPB Kern Transit – Kern Schedule \n", "reczmEIm3KnSr2jF3 Monterey-Salinas Transit – Monterey Salinas Ve... \n", "\n", " Dataset Type GTFS Dataset \\\n", "rec05rHUgCVFqsXgN GTFS Alerts Tulare Alerts \n", "rec09fVK7ls8fl5sp GTFS TripUpdates Monterey Salinas TripUpdates \n", "rec0WcjVurdKkHx7S GTFS TripUpdates Bay Area 511 TripUpdates \n", "rec0XvgWtnmoUl8Lz GTFS Schedule Glenn Schedule \n", "rec0YWNONFHEO0pUl GTFS Alerts VCTC Alerts \n", "... ... ... \n", "reczSG6bO6il6QMLZ GTFS Schedule Bay Area 511 Schedule \n", "reczZKKwKsVQuU0dF GTFS Schedule Placer Schedule \n", "reczf9FxlvLopZybb GTFS Schedule Taft Schedule \n", "recziP7dvY7apKbPB GTFS Schedule Kern Schedule \n", "reczmEIm3KnSr2jF3 GTFS VehiclePositions Monterey Salinas VehiclePositions \n", "\n", " Services \n", "rec05rHUgCVFqsXgN Tulare Intermodal Express \n", "rec09fVK7ls8fl5sp Monterey-Salinas Transit \n", "rec0WcjVurdKkHx7S Healdsburg Shuttle \n", "rec0XvgWtnmoUl8Lz Glenn Ride \n", "rec0YWNONFHEO0pUl Valley Express \n", "... ... \n", "reczSG6bO6il6QMLZ Healdsburg Shuttle \n", "reczZKKwKsVQuU0dF Placer County Transit \n", "reczf9FxlvLopZybb Taft Area Transit \n", "recziP7dvY7apKbPB Kern Transit \n", "reczmEIm3KnSr2jF3 Monterey-Salinas Transit \n", "\n", "[543 rows x 4 columns]" ] }, "execution_count": 242, "metadata": {}, "output_type": "execute_result" } ], "source": [ "services_gtfs_df = gtfs_services_df[[\"Name\",\"Dataset Type\",\"GTFS Dataset\",\"Services\"]].rename(columns={\"Name\":\"gtfs-service record\"})\n", "\n", "# make services a list if it isn't already\n", "services_gtfs_df[\"Services\"]=services_gtfs_df[\"Services\"].apply(lambda x: x.split(\",\"))\n", "\n", "# if dataset type is a list, then take first value\n", "services_gtfs_df[\"Dataset Type\"]=services_gtfs_df[\"Dataset Type\"].apply(lambda x: x[0])\n", "\n", "all_services_gtfs_df = services_gtfs_df.explode(\"Services\")\n", "all_services_gtfs_df[\"Services\"]=all_services_gtfs_df[\"Services\"].apply(lambda x: x.strip())\n", "\n", "all_services_gtfs_df" ] }, { "cell_type": "code", "execution_count": 252, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Check #</th>\n", " <th>gtfs-service record</th>\n", " <th>Services</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>32</td>\n", " <td>NaN</td>\n", " <td>AC Transit</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>13</td>\n", " <td>AC Transit – Bay Area 511 Schedule</td>\n", " <td>AC Transit</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>10</td>\n", " <td>AC Transit – Bay Area 511 Schedule</td>\n", " <td>AC Transit</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>8</td>\n", " <td>AC Transit – Bay Area 511 Schedule</td>\n", " <td>AC Transit</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>7</td>\n", " <td>AC Transit – Bay Area 511 Schedule</td>\n", " <td>AC Transit</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>3135</th>\n", " <td>9</td>\n", " <td>the Link-Athens, the Link Florence-Firestone/W...</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " </tr>\n", " <tr>\n", " <th>3136</th>\n", " <td>14</td>\n", " <td>the Link-Athens, the Link Florence-Firestone/W...</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " </tr>\n", " <tr>\n", " <th>3137</th>\n", " <td>11</td>\n", " <td>the Link-Athens, the Link Florence-Firestone/W...</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " </tr>\n", " <tr>\n", " <th>3138</th>\n", " <td>18</td>\n", " <td>NaN</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " </tr>\n", " <tr>\n", " <th>3139</th>\n", " <td>25</td>\n", " <td>NaN</td>\n", " <td>the Link-Baldwin Hills Parklands</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>3140 rows × 3 columns</p>\n", "</div>" ], "text/plain": [ " Check # gtfs-service record \\\n", "0 32 NaN \n", "1 13 AC Transit – Bay Area 511 Schedule \n", "2 10 AC Transit – Bay Area 511 Schedule \n", "3 8 AC Transit – Bay Area 511 Schedule \n", "4 7 AC Transit – Bay Area 511 Schedule \n", "... ... ... \n", "3135 9 the Link-Athens, the Link Florence-Firestone/W... \n", "3136 14 the Link-Athens, the Link Florence-Firestone/W... \n", "3137 11 the Link-Athens, the Link Florence-Firestone/W... \n", "3138 18 NaN \n", "3139 25 NaN \n", "\n", " Services \n", "0 AC Transit \n", "1 AC Transit \n", "2 AC Transit \n", "3 AC Transit \n", "4 AC Transit \n", "... ... \n", "3135 the Link-Baldwin Hills Parklands \n", "3136 the Link-Baldwin Hills Parklands \n", "3137 the Link-Baldwin Hills Parklands \n", "3138 the Link-Baldwin Hills Parklands \n", "3139 the Link-Baldwin Hills Parklands \n", "\n", "[3140 rows x 3 columns]" ] }, "execution_count": 252, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# attach information about gtfs datasets for each service to evaluate\n", "\n", "service_checkdata_df = service_checks_cross_df.merge(\n", " all_services_gtfs_df,\n", " on=[\"Services\",\"Dataset Type\"],how=\"left\")\n", "\n", "\n", "keep_cols = [\"Check #\",\"gtfs-service record\",\"Services\"]\n", "keep_service_checkdata_df = service_checkdata_df[keep_cols] #.drop_duplicates()\n", "keep_service_checkdata_df.to_csv(\"service_cross.csv\",index=False,sep=\"|\")\n", "#service_checkdata_df.loc[service_checkdata_df[\"gtfs-service record\"].isna()]\n", "keep_service_checkdata_df\n", "#service_checkdata_df" ] }, { "cell_type": "code", "execution_count": 245, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Check #</th>\n", " <th>Services</th>\n", " <th>Dataset Type</th>\n", " <th>gtfs-service record</th>\n", " <th>GTFS Dataset</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>32</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Alerts</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>13</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Schedule</td>\n", " <td>AC Transit – Bay Area 511 Schedule</td>\n", " <td>Bay Area 511 Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>10</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Schedule</td>\n", " <td>AC Transit – Bay Area 511 Schedule</td>\n", " <td>Bay Area 511 Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>8</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Schedule</td>\n", " <td>AC Transit – Bay Area 511 Schedule</td>\n", " <td>Bay Area 511 Schedule</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>7</td>\n", " <td>AC Transit</td>\n", " <td>GTFS Schedule</td>\n", " <td>AC Transit – Bay Area 511 Schedule</td>\n", " <td>Bay Area 511 Schedule</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Check # Services Dataset Type gtfs-service record \\\n", "0 32 AC Transit GTFS Alerts NaN \n", "1 13 AC Transit GTFS Schedule AC Transit – Bay Area 511 Schedule \n", "2 10 AC Transit GTFS Schedule AC Transit – Bay Area 511 Schedule \n", "3 8 AC Transit GTFS Schedule AC Transit – Bay Area 511 Schedule \n", "4 7 AC Transit GTFS Schedule AC Transit – Bay Area 511 Schedule \n", "\n", " GTFS Dataset \n", "0 NaN \n", "1 Bay Area 511 Schedule \n", "2 Bay Area 511 Schedule \n", "3 Bay Area 511 Schedule \n", "4 Bay Area 511 Schedule " ] }, "execution_count": 245, "metadata": {}, "output_type": "execute_result" } ], "source": [ "service_checkdata_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Datasets" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "dataset_checks_df = pd.read_csv(os.path.join(\"data\",\"gtfs checks-Dataset Scope.csv\"), usecols = [\"Check #\",\"Scope\"])\n", "datasets_df = pd.read_csv(os.path.join(\"data\",\"Assessed Feeds.csv\"), usecols = [\"gtfs_dataset_id\",\"Data\"])\n", "\n", "# cartesian product\n", "dataset_cross_df = datasets_df.merge(dataset_checks_df, how='cross')\n", "\n", "# select applicable checks\n", "dataset_cross_df[\"data_match\"]=dataset_cross_df[\"Data\"]+\" Dataset\"\n", "dataset_cross_df[\"scope_match\"] = dataset_cross_df[\"Scope\"]\n", "dataset_cross_df.loc[\n", " dataset_cross_df[\"Scope\"] != \"GTFS Schedule Dataset\", \n", " \"scope_match\"\n", "]=\"GTFS \" + dataset_cross_df[\"Scope\"]\n", "dataset_checkdata_df = dataset_cross_df[dataset_cross_df[\"data_match\"]==dataset_cross_df[\"scope_match\"]]\n", "\n", "#checks \"GTFS Schedule Dataset\"\n", "#dataset \"GTFS Schedule\"\n", "\n", "dataset_checkdata_df[[\"gtfs_dataset_id\",\"Check #\"]].to_csv(\"data_cross.csv\",index=False,sep=\"|\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>gtfs_dataset_id</th>\n", " <th>Data</th>\n", " <th>Check #</th>\n", " <th>Scope</th>\n", " <th>data_match</th>\n", " <th>scope_match</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Get Around Town Express Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>1</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Get Around Town Express Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>2</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Get Around Town Express Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>3</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Get Around Town Express Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>4</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Get Around Town Express Schedule</td>\n", " <td>GTFS Schedule</td>\n", " <td>5</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " <td>GTFS Schedule Dataset</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " gtfs_dataset_id Data Check # \\\n", "0 Get Around Town Express Schedule GTFS Schedule 1 \n", "1 Get Around Town Express Schedule GTFS Schedule 2 \n", "2 Get Around Town Express Schedule GTFS Schedule 3 \n", "3 Get Around Town Express Schedule GTFS Schedule 4 \n", "4 Get Around Town Express Schedule GTFS Schedule 5 \n", "\n", " Scope data_match scope_match \n", "0 GTFS Schedule Dataset GTFS Schedule Dataset GTFS Schedule Dataset \n", "1 GTFS Schedule Dataset GTFS Schedule Dataset GTFS Schedule Dataset \n", "2 GTFS Schedule Dataset GTFS Schedule Dataset GTFS Schedule Dataset \n", "3 GTFS Schedule Dataset GTFS Schedule Dataset GTFS Schedule Dataset \n", "4 GTFS Schedule Dataset GTFS Schedule Dataset GTFS Schedule Dataset " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset_checkdata_df.head()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "provider_checks_df = pd.read_csv(os.path.join(\"data\",\"provider_checks.csv\"))\n", "providers_df = pd.read_csv(os.path.join(\"data\",\"Assessed Transit Providers-Grid view.csv\"), usecols = [\"Name\"])\n", "\n", "# cartesian product\n", "providers_cross_df = providers_df.merge(provider_checks_df, how='cross')\n", "\n", "providers_cross_df.to_csv(\"provider_cross.csv\",index=False,sep=\"|\")\n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>checks</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Alameda-Contra Costa Transit District</td>\n", " <td>37</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Alameda-Contra Costa Transit District</td>\n", " <td>39</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Alameda-Contra Costa Transit District</td>\n", " <td>40</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Amador Regional Transit System</td>\n", " <td>37</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Amador Regional Transit System</td>\n", " <td>39</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name checks\n", "0 Alameda-Contra Costa Transit District 37\n", "1 Alameda-Contra Costa Transit District 39\n", "2 Alameda-Contra Costa Transit District 40\n", "3 Amador Regional Transit System 37\n", "4 Amador Regional Transit System 39" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "providers_cross_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "interpreter": { "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" }, "kernelspec": { "display_name": "Python 3.9.4 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }