{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "47536c27-5f4f-4da7-9649-63ac5fe5bdf8",
   "metadata": {},
   "source": [
    "# Sample queries for datasets\n",
    "\n",
    "The DataCollection class queries for datasets (collections in NASA terminology) and can use a variety of criteria.\n",
    "The basics are the spatio temporal parameters but we can also search based on the data center (or DAAC), the dataset version or cloud hosted data.\n",
    "\n",
    "This notebook provides some examples of how to search for datasets using different parameters.\n",
    "\n",
    "Collection search parameters\n",
    "\n",
    "**dataset origin and location**\n",
    "* archive_center\n",
    "* data_center\n",
    "* daac\n",
    "* provider\n",
    "* cloud_hosted\n",
    "\n",
    "**spatio temporal parameters**\n",
    "* bounding_box\n",
    "* temporal\n",
    "* point\n",
    "* polygon\n",
    "* line\n",
    "\n",
    "**dataset metadata parameters**\n",
    "* concept_id \n",
    "* entry_title\n",
    "* keyword\n",
    "* version\n",
    "* short_name\n",
    "\n",
    "Once the query has been formed with one or more search parameters we can get the results by using either `hits()` or `get()`.\n",
    "\n",
    "* **hits()**: gets the counts for our query, if the search didn't match any result then is 0\n",
    "* **get()**: gets the metadata records for those collections that matched our criteria, we can specify a max i.e. `get(10)`, if we do not specify the default number is 2000\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a31f359-ca14-49d0-97c1-0845070833ed",
   "metadata": {},
   "source": [
    "## Example #1, querying for cloud enabled data from a given data center (DAAC)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8b80820-94ce-468c-b8cc-69d84a2dadfd",
   "metadata": {},
   "outputs": [],
   "source": [
    "from earthaccess import DataCollections\n",
    "\n",
    "# We only need to specify the DAAC and if we're looking for cloud hosted data\n",
    "query = DataCollections().daac(\"LPDAAC\").cloud_hosted(False)\n",
    "# we use hits to get a count for the collections that match our query\n",
    "query.hits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2e674100-f041-4620-bd33-0ec29f0a8fd8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we get the collections' metadata\n",
    "collections = query.get(10)\n",
    "# let's print only the first collection, uncomment the next line\n",
    "# collections[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e84a0423-6045-465a-9c1d-2505f9643d26",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can print a small summary of the dataset, here for the first 10 collections\n",
    "summaries = [collection.summary() for collection in collections]\n",
    "summaries"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13ef8aab-4f63-4750-8759-6297d5078307",
   "metadata": {},
   "source": [
    "\n",
    "### Searching using keywords\n",
    "\n",
    "> **Note**: Some DAACs don't have cloud hosted collections yet, some have cloud collections but do not allow direct access"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b0dca41-41f0-4b2b-a7e4-9a7ab2a423bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now let's search using keyword and daac\n",
    "# from earthaccess import DataCollections\n",
    "\n",
    "query = DataCollections().keyword(\"fi*e\").daac(\"LPDAAC\")\n",
    "# we use hits to get a count for the collections that match our query\n",
    "query.hits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6eca99e5-0e46-42de-a507-4cc4c0078187",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now let's search using keyword and daac\n",
    "query = DataCollections().keyword(\"fire\").daac(\"LPDAAC\")\n",
    "# we use hits to get a count for the collections that match our query\n",
    "query.hits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "197dece4-02dc-4622-964c-41ca6a275cae",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's get only the info on the first 10 collections and filter the fields\n",
    "collections = query.get(10)\n",
    "# let's print just the first collection, do you really want to look at all the metadata ?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec85fa1a-c704-4f6a-9226-72dd0c528c52",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can print a small summary of the dataset, here for the first 10 collections again\n",
    "summaries = [collection.summary() for collection in collections]\n",
    "summaries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65dd1d56-937f-437e-b8a7-733c3559b4d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = DataCollections().cloud_hosted(True).bounding_box(-25.31, 63.23, -11.95, 66.65)\n",
    "\n",
    "query.hits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5a4235d-545e-40e7-8972-4519912b3a5f",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = (\n",
    "    DataCollections()\n",
    "    .cloud_hosted(True)\n",
    "    .short_name(\"ECCO_L4_GMSL_TIME_SERIES_MONTHLY_V4R4\")\n",
    ")\n",
    "for c in query.get(40):\n",
    "    print(c.summary(), \"\\n\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}