{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### If you run into any issues or have questions/concerns about the data catalog API, usage patterns, or anything else, please do not hesistate to email them to danf@usc.edu.\n",
    "\n",
    "#### Thanks,\n",
    "#### Dan Feldman"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prerequisites: python 3.6 or later\n",
    "import requests\n",
    "import json\n",
    "import uuid\n",
    "import pprint\n",
    "import datetime\n",
    "pp = pprint.PrettyPrinter(indent=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is a convenience method to handle api responses. The main portion of the notebook starts in the the next cell\n",
    "def handle_api_response(response, print_response=False):\n",
    "    parsed_response = response.json()\n",
    "\n",
    "    if print_response:\n",
    "        pp.pprint({\"API Response\": parsed_response})\n",
    "    \n",
    "    if response.status_code == 200:\n",
    "        return parsed_response\n",
    "    elif response.status_code == 400:\n",
    "        raise Exception(\"Bad request ^\")\n",
    "    elif response.status_code == 403:\n",
    "        msg = \"Please make sure your request headers include X-Api-Key and that you are using correct url\"\n",
    "        raise Exception(msg)\n",
    "    else:\n",
    "        now = datetime.datetime.utcnow().replace(microsecond=0).isoformat()\n",
    "        msg = f\"\"\"\\n\\n\n",
    "        ------------------------------------- BEGIN ERROR MESSAGE -----------------------------------------\n",
    "        It seems our server encountered an error which it doesn't know how to handle yet. \n",
    "        This sometimes happens with unexpected input(s). In order to help us diagnose and resolve the issue, \n",
    "        could you please fill out the following information and email the entire message between ----- to\n",
    "        danf@usc.edu:\n",
    "        1) URL of notebook (of using the one from https://hub.mybinder.org/...): [*****PLEASE INSERT ONE HERE*****]\n",
    "        2) Snapshot/picture of the cell that resulted in this error: [*****PLEASE INSERT ONE HERE*****]\n",
    "        \n",
    "        Thank you and we apologize for any inconvenience. We'll get back to you as soon as possible!\n",
    "        \n",
    "        Sincerely, \n",
    "        Dan Feldman\n",
    "        \n",
    "        Automatically generated summary:\n",
    "        - Time of occurrence: {now}\n",
    "        - Request method + url: {response.request.method} - {response.request.url}\n",
    "        - Request headers: {response.request.headers}\n",
    "        - Request body: {response.request.body}\n",
    "        - Response: {parsed_response}\n",
    "\n",
    "        --------------------------------------- END ERROR MESSAGE ------------------------------------------\n",
    "        \\n\\n\n",
    "        \"\"\"\n",
    "\n",
    "        raise Exception(msg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Main Portion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For real interactions with the data catalog, use api.mint-data-catalog.org\n",
    "url = \"https://sandbox.mint-data-catalog.org\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# When you register datasets or resources, we require you to pass a \"provenance_id\". This a unique id associated\n",
    "# with your account so that we can keep track of who is adding things to the data catalog. For sandboxed interactions\n",
    "# with the data catalog api, please use this provenance_id:\n",
    "provenance_id = \"e8287ea4-e6f2-47aa-8bfc-0c22852735c8\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Step 1: Get session token to use the API\n",
    "resp = requests.get(f\"{url}/get_session_token\").json()\n",
    "print(resp)\n",
    "api_key = resp['X-Api-Key']\n",
    "\n",
    "request_headers = {\n",
    "    'Content-Type': \"application/json\",\n",
    "    'X-Api-Key': api_key\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Our setup\n",
    "\n",
    "Recall from the data [catalog primer](https://docs.google.com/document/d/1I3CjYB-GDdFTZO-dHsHB0B5f0iEbzHnr8QHSKN5k3Sc/edit#heading=h.crwrtnf2ch1h) \n",
    "that a *dataset* is logical grouping of data about specific *variables* contained in one or more *resources*\n",
    "that a *dataset* is logical grouping of data about specific *variables* contained in one or more *resources*\n",
    "\n",
    "To make the above statement more concrete, we will interactively go through the process of registering a toy\n",
    "dataset in the data catalog in order to make it available for others.\n",
    "\n",
    "Let's say I have a dataset called _*\"Temperature recorded outside my house\"*_ in which every day I note the\n",
    "temperature outside my apartment in the morning, afternoon, and evening. I then record those data points in a \n",
    "csv file *temp_records_YYYY_MM_DD.csv* that looks like (prettified):\n",
    "\n",
    "| Time                | Temperature |\n",
    "|---------------------|-------------|\n",
    "| 2018-01-01T07:34:40 | 23          |\n",
    "| 2018-01-01T12:15:28 | 32          |\n",
    "| 2018-01-01T20:56:15 | 26          |\n",
    "\n",
    "Note that each file contains data for a single day only. \n",
    "\n",
    "In this example, my dataset would be *\"Temperature recorded outside my house\"*, variables would be \n",
    "*\"Time\"* and *\"Temperature\"*, and each csv file would be a resource associated with the dataset. In addition,\n",
    "since each file contains both of the variables in our dataset, each resource will be associated with both variables.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A case for why we need standard variables\n",
    "Now, _*I*_ know that what I refer to as \"Temperature\" is actually the air temperature recorded in F, but my CSV\n",
    "files have no mention of the fact. If you just look at the file without any context, it's unclear what it is that \n",
    "is being recorded. Temperature of what? In what units? C, F, K?\n",
    "\n",
    "In order to disambiguate those variable names, we require that each variable in your dataset to be associated\n",
    "with one or more *standard* variables. What makes a variable name \"standard\" is that it is a part of some ontology,\n",
    "so that anyone can examine that ontology and see for themselves semantic meaning of the variable.\n",
    "Most of our current datasets are mapped to standard names defined by the GSN ontology. \n",
    "\n",
    "But you are not forced to map your variables to GSN names. Data catalog allows you to register your own set \n",
    "of standard_variable_names. The only requirement for now is that those standard names are associated with an\n",
    "ontology whose schema is publicly available.\n",
    "\n",
    "#### FAQ: What if I don't know anything about ontologies? How do I know that my ontology/standard variable name is correct?\n",
    "\n",
    "There is no such thing as a \"correct\" standard variable or ontology - you can think\n",
    "of it as a formalized naming convention. As such, there isn't such a thing as a right\n",
    "or wrong standard name, but useful or not. If you are the only person who is using a\n",
    "convention (like naming all variables with single letters), it might not be useful to \n",
    "anyone but you. Sure, it's possible to teach somebody else your convention, but the\n",
    "less semantic structure your convention uses, the harder it will be for another person\n",
    "to learn it. The easier it is for someone to pick it up, the faster other people will\n",
    "adopt it. \n",
    "\n",
    "So, what makes a good standard variable name? A good starting point is to take a look \n",
    "at the variable from a perspective of someone who is seeing just that piece of information\n",
    "for the first time. How many clarifying questions do you expect the other person to ask you \n",
    "before they understand the meaning of data in front of them? Those answers can then become\n",
    "parts of standard variable name.\n",
    "\n",
    "Now that we know why we need standard variables, this is how we can register new ones, if needed.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 0: Registering Standard Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# @param[name] standard variable name (aka label)\n",
    "# @param[ontology] name of the ontology where standard variables are defined\n",
    "# @param[uri] uri of standard variable name (note that this is full uri, which includes the ontology)\n",
    "standard_variable_defs = {\n",
    "    \"standard_variables\": [\n",
    "        {\n",
    "            \"name\": \"Time_Standard_Variable\",\n",
    "            \"ontology\": \"MyOntology\",\n",
    "            \"uri\": \"http://my_ontology_uri.org/standard_names/time_standard_variable\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"Temperature_Standard_Variable\",\n",
    "            \"ontology\": \"MyOntology\",\n",
    "            \"uri\": \"http://my_ontology_uri.org/standard_names/temperature_standard_variable\"\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/knowledge_graph/register_standard_variables\", \n",
    "                    headers=request_headers, \n",
    "                    json=standard_variable_defs)\n",
    "\n",
    "\n",
    "# If request is successful, it will return 'result': 'success' along with a list of registered standard variables\n",
    "# and their record_ids. Those record_ids are unique identifiers (UUID) and you will need them down the road to \n",
    "# register variables\n",
    "parsed_response = handle_api_response(resp, print_response=True)\n",
    "records = parsed_response['standard_variables']\n",
    "\n",
    "# iterate through the list of returned standard variable objects and save\n",
    "# the ones whose names match the one that we want and store them in python variables\n",
    "time_standard_variable = next(record for record in records if record[\"name\"] == \"Time_Standard_Variable\")\n",
    "temperature_standard_variable = next(record for record in records if record[\"name\"] == \"Temperature_Standard_Variable\")\n",
    "\n",
    "## Uncomment below to see the structure of a specific variable:\n",
    "# pp.pprint({\"Time Standard Variable\": time_standard_variable})\n",
    "# pp.pprint({\"Temperature Standard Variable\": temperature_standard_variable})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If you need to check if specific standard variables have already been registered in the data catalog, \n",
    "# you can search by name and data catalog will return existing records.\n",
    "nonexistent_name = str(uuid.uuid4())\n",
    "print(f\"This name does not exist: {nonexistent_name}\")\n",
    "\n",
    "search_query = {\n",
    "    \"name__in\": [\"Time_Standard_Variable\", \"Temperature_Standard_Variable\", nonexistent_name]\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/knowledge_graph/find_standard_variables\", \n",
    "                                        headers=request_headers,\n",
    "                                        json=search_query)\n",
    "parsed_response = handle_api_response(resp, print_response=True)\n",
    "\n",
    "# Below is how you'd extract standard_variables from the response if you need to reference them (their record_ids)\n",
    "# later:\n",
    "# \n",
    "# existing_standard_variables = parsed_response[\"standard_variables\"]\n",
    "# print(existing_standard_variables)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### After you are satisfied that all relevant standard variables are in the data catalog (usually it's a one-time thing), you can proceed to register datasets, variables, and resources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Step 1: REGISTER DATASETS\n",
    "\n",
    "#### FAQ: How do I know what is a dataset and what is a resource?\n",
    "There isn't a cut-and-dry answer and will utimately depend on the way your organize\n",
    "and think about your data. We define a dataset as \"logical grouping of variables in a collection\n",
    "of resources\". As long as your way fits this extremely broad definition, you should be ok. \n",
    "To illustrate this, let's go back to our toy example of *\"Temperature recorded outside my house\"*,\n",
    "for which I record data for my time and temperature variables in multiple files. Originally, \n",
    "each file was a resource under my dataset, which makes sense because all of these data files\n",
    "describe (semantically) the same concept - temperature recorded outside my house. On the other\n",
    "hand, I could've made a similarly strong argument that each file is actually a separate\n",
    "logical entity that provides temperature data recorded outside my house on a *specific* date\n",
    "and should therefore be semantically differentiated from the temperature recorded on another date.\n",
    "But in vast majority of cases, this distinction doesn't really matter because in the end, those that \n",
    "care about knowing the temperature outside my house on Jan 1st 2018 will be able to find \n",
    "the link to that file and download it. From the perspective of the end user, they care about the actual\n",
    "raw data, and not the (arguably somewhat arbitrary) distinction between a resource and a dataset. There will\n",
    "be an example later on how to \"tag\" your data with relevant temporal/spatial information so that it becomes\n",
    "searchable.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_id = \"4e8ade31-7729-4891-a462-2dac66158512\" # This is optional; if not given, it will be auto-generated\n",
    "\n",
    "## An example of how to generate a random uuid yourself (will be different every time method is run)\n",
    "# print(str(uuid.uuid4()))\n",
    "# print(str(uuid.uuid4()))\n",
    "#\n",
    "## This will generate the same record_id as long as the input string remains the same\n",
    "#\n",
    "# input_string = \"some string 34_\"\n",
    "# print(str(uuid.uuid5(uuid.NAMESPACE_URL, str(input_string))))\n",
    "# print(str(uuid.uuid5(uuid.NAMESPACE_URL, str(input_string))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### A note about ids and record_ids. \n",
    "Every entity in data catalog (variables, standard_variables, \n",
    "datasets, resources) will have a unique id/record_id associated with it. This is what disambiguates\n",
    "e.g., two datasets that are named \"MyDataset\". These record_ids are either generated automatically,\n",
    "on our end if no \"record_id\" is provided, or you can generate them yourself using Python's [uuid](https://docs.python.org/3/library/uuid.html)\n",
    "library (or any other library that generates uuids according to the international standard).\n",
    "\n",
    "What this means in practice is that if you remove \"record_id\" from the dictionary below\n",
    "and rerun this cell 3 times, you will end up registering 3 datasets with identical name, description,\n",
    "metadata, and provenance_id. This is why if you register a new dataset (or variable, or resource, etc), \n",
    "it's important to note returned object's record_id if/when you need to reference it later, \n",
    "rerun the same script in an indempotent manner, or update record's attributes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Build datasets definition"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_defs = {\n",
    "    \"datasets\": [\n",
    "        {\n",
    "            \"record_id\": dataset_id, # Remove this line if you want to create a new dataset\n",
    "            \"provenance_id\": provenance_id,\n",
    "            \"metadata\": {\n",
    "                \"any_additional_metadata\": \"content\"\n",
    "            },\n",
    "            \"description\": \"Temperature recorded outside my house; collected over last month\",\n",
    "            \"name\": \"Temperature recorded outside my house\"\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/datasets/register_datasets\", \n",
    "                                        headers=request_headers,\n",
    "                                        json=dataset_defs)\n",
    "\n",
    "\n",
    "parsed_response = handle_api_response(resp, print_response=True)\n",
    "\n",
    "datasets = parsed_response[\"datasets\"]\n",
    "\n",
    "# Iterate through the list of returned datasets objects and save the one whose name matches our name \n",
    "# to a Python variable\n",
    "dataset_record = next(record for record in datasets if record[\"name\"] == \"Temperature recorded outside my house\")\n",
    "# Extract dataset record_id and store it in a variable\n",
    "dataset_record_id = dataset_record[\"record_id\"]\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Register variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Again, these ids are optional and will be auto-generated if not given. They are included here in order\n",
    "# to make requests indempotent (so that new records aren't beeing generated every time this code block is run)\n",
    "\n",
    "time_variable_record_id = '9358af57-192f-4cc3-9bee-837e76819674'\n",
    "temperature_variable_record_id = 'c22deb3b-ebda-48cb-950a-2f4f00498197'\n",
    "\n",
    "variable_defs = {\n",
    "    \"variables\": [\n",
    "        {\n",
    "            \"record_id\": time_variable_record_id, # If you remove this line, record_id will be auto-generated\n",
    "            \"dataset_id\": dataset_record_id,\n",
    "            \"name\": \"Time\",\n",
    "            \"metadata\": {\n",
    "                \"units\": \"ISO8601_datetime\"\n",
    "                # Can include any other metadata that you want to associate with the variable\n",
    "            },\n",
    "            \"standard_variable_ids\": [\n",
    "                # Recall that we created \"time_standard_variable\" python object after\n",
    "                # registering our standard variables. We just need its unique identifier - \n",
    "                # record_id - in order to associate it with our \"Time\" variable. Also, note \n",
    "                # that \"standard_variable_ids\" is an array, so you can associate multiple\n",
    "                # standard variables with our \"local\" variable (and it does not have\n",
    "                # to be done all at once). That is how we can semantically link multiple\n",
    "                # standard names and ontologies later on\n",
    "                time_standard_variable[\"record_id\"]\n",
    "            ]\n",
    "        },\n",
    "        {\n",
    "            \"record_id\": temperature_variable_record_id, # If you remove this line, record_id will be auto-generated\n",
    "            \"dataset_id\": dataset_record_id, # from register_datasets() call\n",
    "            \"name\": \"Temperature\",\n",
    "            \"metadata\": {\n",
    "                \"units\": \"F\"\n",
    "            },\n",
    "            \"standard_variable_ids\": [\n",
    "                temperature_standard_variable[\"record_id\"]\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/datasets/register_variables\", \n",
    "                                        headers=request_headers,\n",
    "                                        json=variable_defs)\n",
    "\n",
    "parsed_response = handle_api_response(resp, print_response=True)\n",
    "variables = parsed_response[\"variables\"]\n",
    "\n",
    "time_variable = next(record for record in variables if record[\"name\"] == \"Time\")\n",
    "temperature_variable = next(record for record in variables if record[\"name\"] == \"Temperature\")\n",
    "\n",
    "## Uncomment below to print individual records\n",
    "# print(f\"Time Variable: {time_variable}\")\n",
    "# print(f\"Temperature Variable: {temperature_variable}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Register resources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assume that I host my datasets files on www.my_domain.com/storage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_storage_url = \"www.my_domain.com/storage\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also, assume that I've collected 2 days worth of data in temp_records_2018_01_01.csv and temp_records_2018_01_02.csv ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file_1_name = \"temp_records_2018_01_01.csv\"\n",
    "file_2_name = \"temp_records_2018_01_02.csv\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "...and uploaded them to my remote storage location"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file_1_data_url = f\"{data_storage_url}/{file_1_name}\"\n",
    "file_2_data_url = f\"{data_storage_url}/{file_2_name}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Similar to dataset and variable registrations, we are going to generate unique resource record_ids to \n",
    "# make these requests repeatable without creating new records. But remember, these will be auto-generated\n",
    "# if not given\n",
    "\n",
    "file_1_record_id = \"dd52e66b-3149-4d46-8f8e-a18e46136e55\"\n",
    "file_2_record_id = \"25916ccf-d108-4187-b243-2b257ce67fa5\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Making my files searchable by time\n",
    "\n",
    "If I want my resources to be searchable by time range, I can \"annotate\" each resource with corresponding \n",
    "temporal coverage. That way, when someone searches for any datasets that contain \"Temperature_Standard_Variable\"\n",
    "for January 01 2018, my file_1_name will be returned, along with the data url, and the users will be able to \n",
    "download it easily. Note that temporal coverage must have \"start_time\" and \"end_time\" and must follow ISO 8601 \n",
    "datetime format YYYY-MM-DDTHH:mm:ss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file_1_temporal_coverage = {\n",
    "    \"start_time\": \"2018-01-01T00:00:00\",\n",
    "    \"end_time\": \"2018-01-01T23:59:59\"\n",
    "}\n",
    "file_2_temporal_coverage = {\n",
    "    \"start_time\": \"2018-01-02T00:00:00\",\n",
    "    \"end_time\": \"2018-01-02T23:59:59\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Making my files spatially searchable \n",
    "\n",
    "Let's say that my house is somewhere in LA, defined by the following bounding box \n",
    "(where x refers to longitude and y refers to latitude)  \n",
    "x_min:  33.9605286  \n",
    "y_min: -118.4253354  \n",
    "x_max: 33.9895077  \n",
    "y_max: -118.4093589  \n",
    "\n",
    "We can annotate our resources with spatial coverage. Since all of our resources come from the same location, we can reuse the same values. If you have multiple resources with with different locations, you can follow temporal annotation example above.\n",
    "\n",
    "Things to note here are the required \"type\" and \"value\" parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spatial_coverage = {\n",
    "    \"type\": \"BoundingBox\",\n",
    "    \"value\": {\n",
    "        \"xmin\": 33.9605286,\n",
    "        \"ymin\": -118.4253354,\n",
    "        \"xmax\": 33.9895077,\n",
    "        \"ymax\": -118.4093589\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Finally, we can build our resource definitions and register them (in bulk)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "resource_defs = {\n",
    "    \"resources\": [\n",
    "        {\n",
    "            \"record_id\": file_1_record_id,\n",
    "            \"dataset_id\": dataset_record_id,\n",
    "            \"provenance_id\": provenance_id,\n",
    "            \"variable_ids\": [\n",
    "                time_variable[\"record_id\"],\n",
    "                temperature_variable[\"record_id\"]\n",
    "            ],\n",
    "            \"name\": file_1_name,\n",
    "            \"resource_type\": \"csv\",\n",
    "            \"data_url\": file_1_data_url,\n",
    "            \"metadata\": {\n",
    "                \"spatial_coverage\": spatial_coverage,\n",
    "                \"temporal_coverage\": file_1_temporal_coverage\n",
    "            },\n",
    "            \"layout\": {}\n",
    "        },\n",
    "        {\n",
    "            \"record_id\": file_2_record_id,\n",
    "            \"dataset_id\": dataset_record_id,\n",
    "            \"provenance_id\": provenance_id,\n",
    "            \"variable_ids\": [\n",
    "                time_variable[\"record_id\"],\n",
    "                temperature_variable[\"record_id\"]\n",
    "            ],\n",
    "            \"name\": file_2_name,\n",
    "            \"resource_type\": \"csv\",\n",
    "            \"data_url\": file_2_data_url,\n",
    "            \"metadata\": {\n",
    "                \"spatial_coverage\": spatial_coverage,\n",
    "                \"temporal_coverage\": file_2_temporal_coverage\n",
    "            },\n",
    "            \"layout\": {}\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "# ... and register them in bulk\n",
    "resp = requests.post(f\"{url}/datasets/register_resources\", \n",
    "                                        headers=request_headers,\n",
    "                                        json=resource_defs)\n",
    "\n",
    "\n",
    "parsed_response = handle_api_response(resp, print_response=True)\n",
    "\n",
    "\n",
    "resources = parsed_response[\"resources\"]\n",
    "    \n",
    "resource_1 = next(record for record in resources if record[\"name\"] == file_1_name)\n",
    "resource_2 = next(record for record in resources if record[\"name\"] == file_2_name)\n",
    "\n",
    "## Uncomment below to print individual records    \n",
    "# print(f\"{file_1_name}: {resource_1}\")\n",
    "# print(f\"{file_2_name}: {resource_2}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Searching for datasets/resources\n",
    "\n",
    "After registering datasets/variables/resources, we can now programmatically search of relevant information. Below,\n",
    "you'll see 3 examples of searching for data using standard variable names, temporal, and spatial coverages. Currently, these are the only search filters we support, but we'll be adding more as we get more feature requests.\n",
    "If you would like to search data catalog by other keywords, please let me know at danf@usc.edu"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1) Searching by standard_names\n",
    "\n",
    "search_query_1 = {\n",
    "    \"standard_variable_names__in\": [temperature_standard_variable[\"name\"]]\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/datasets/find\", \n",
    "                                        headers=request_headers,\n",
    "                                        json=search_query_1).json()\n",
    "if resp['result'] == 'success':\n",
    "    found_resources = resp['resources']\n",
    "    print(f\"Found {len(found_resources)} resources\")\n",
    "    print(found_resources)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 2) Searching by spatial_coverage\n",
    "\n",
    "# Bounding box search parameter is a 4-element numeric array (in WGS84 coordinate system) [xmin, ymin, xmax, ymax]\n",
    "# As a reminder, x is longitude, y is latitude\n",
    "bounding_box = [\n",
    "    spatial_coverage[\"value\"][\"xmin\"], \n",
    "    spatial_coverage[\"value\"][\"ymin\"], \n",
    "    spatial_coverage[\"value\"][\"xmax\"],\n",
    "    spatial_coverage[\"value\"][\"ymax\"]\n",
    "]\n",
    "\n",
    "search_query_2 = {\n",
    "    \"spatial_coverage__within\": bounding_box\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/datasets/find\", \n",
    "                                        headers=request_headers,\n",
    "                                        json=search_query_2).json()\n",
    "if resp['result'] == 'success':\n",
    "    found_resources = resp['resources']\n",
    "    print(f\"Found {len(found_resources)} resources\")\n",
    "    print(found_resources)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3) Searching by temporal_coverage and standard_names\n",
    "\n",
    "# Bounding box search parameter is a 4-element numeric array (in WGS84 coordinate system) [xmin, ymin, xmax, ymax]\n",
    "# As a reminder, x is longitude, y is latitude\n",
    "start_time = \"2018-01-01T00:00:00\"\n",
    "end_time = \"2018-01-21T23:59:59\"\n",
    "\n",
    "search_query_3 = {\n",
    "    \"standard_variable_names__in\": [temperature_standard_variable[\"name\"]],\n",
    "    \"start_time__gte\": start_time,\n",
    "    \"end_time__lte\": end_time\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/datasets/find\", \n",
    "                                        headers=request_headers,\n",
    "                                        json=search_query_3).json()\n",
    "\n",
    "if resp['result'] == 'success':\n",
    "    found_resources = resp['resources']\n",
    "    print(f\"Found {len(found_resources)} resources\")\n",
    "    pp.pprint(found_resources)\n",
    "    \n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 4) Searching by dataset_names\n",
    "\n",
    "search_query_4 = {\n",
    "    \"dataset_names__in\": [\"Temperature recorded outside my house\"]\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/datasets/find\",\n",
    "                     headers=request_headers,\n",
    "                     json=search_query_4).json()\n",
    "\n",
    "if resp['result'] == 'success':\n",
    "    found_resources = resp['resources']\n",
    "    print(f\"Found {len(found_resources)} resources\")\n",
    "    pp.pprint(found_resources)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 5) Searching by dataset ids\n",
    "\n",
    "search_query_5 = {\n",
    "    \"dataset_ids__in\": [dataset_id]\n",
    "}\n",
    "\n",
    "resp = requests.post(f\"{url}/datasets/find\",\n",
    "                     headers=request_headers,\n",
    "                     json=search_query_5).json()\n",
    "\n",
    "if resp['result'] == 'success':\n",
    "    found_resources = resp['resources']\n",
    "    print(f\"Found {len(found_resources)} resources\")\n",
    "    pp.pprint(found_resources)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}