{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Untitled0.ipynb",
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/ebi-ait/ingest-programmatic-submissions/blob/main/notebooks/create_project/programmatic_submissions.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Create a project"
      ],
      "metadata": {
        "id": "vZmJIGcsUdbs"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "This notebook is intended to give an insight into how to generate a submission by using the python tools available, the `hca-ingest` library. This library, amongst other utilities for interacting with the Ingest service, contains a wrapper for Ingest's API, which lets you easily create, update and delete.\n",
        "\n",
        "This section will be focused around `Projects`"
      ],
      "metadata": {
        "id": "ePq8DXWvNiiJ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Download example project\n",
        "\n",
        "In order to have the files necessary for this guide, we're going to download the template file for the project metadata"
      ],
      "metadata": {
        "id": "S7auKovJSSl2"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!wget https://raw.githubusercontent.com/ebi-ait/ingest-programmatic-submissions/main/_data/submission_example/project/example_project.json"
      ],
      "metadata": {
        "id": "zp53YQ92SRIX",
        "outputId": "f3ec70b7-e782-4911-8df0-2ad0c6b9e511",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2022-11-09 17:32:55--  https://raw.githubusercontent.com/ebi-ait/ingest-programmatic-submissions/main/_data/submission_example/project/example_project.json\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 1232 (1.2K) [text/plain]\n",
            "Saving to: ‘example_project.json’\n",
            "\n",
            "\rexample_project.jso   0%[                    ]       0  --.-KB/s               \rexample_project.jso 100%[===================>]   1.20K  --.-KB/s    in 0s      \n",
            "\n",
            "2022-11-09 17:32:55 (34.3 MB/s) - ‘example_project.json’ saved [1232/1232]\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Set up  libraries and dependencies"
      ],
      "metadata": {
        "id": "qAU_Cz-GpvTr"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Install external libraries"
      ],
      "metadata": {
        "id": "b3ZppyYdqL2R"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install hca-ingest"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KgUOAkScp1WH",
        "outputId": "6a95303d-8b40-4232-d07c-07ef06274019"
      },
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting hca-ingest\n",
            "  Downloading hca-ingest-2.6.0.tar.gz (57 kB)\n",
            "\u001b[K     |████████████████████████████████| 57 kB 2.7 MB/s \n",
            "\u001b[?25hCollecting jsonref\n",
            "  Downloading jsonref-1.0.1-py3-none-any.whl (9.5 kB)\n",
            "Requirement already satisfied: openpyxl in /usr/local/lib/python3.7/dist-packages (from hca-ingest) (3.0.10)\n",
            "Collecting polling\n",
            "  Downloading polling-0.3.2.tar.gz (5.2 kB)\n",
            "Requirement already satisfied: PyYAML>=5.3.1 in /usr/local/lib/python3.7/dist-packages (from hca-ingest) (6.0)\n",
            "Requirement already satisfied: requests[security] in /usr/local/lib/python3.7/dist-packages (from hca-ingest) (2.23.0)\n",
            "Collecting xlsxwriter\n",
            "  Downloading XlsxWriter-3.0.3-py3-none-any.whl (149 kB)\n",
            "\u001b[K     |████████████████████████████████| 149 kB 10.9 MB/s \n",
            "\u001b[?25hCollecting mergedeep\n",
            "  Downloading mergedeep-1.3.4-py3-none-any.whl (6.4 kB)\n",
            "Collecting cryptography\n",
            "  Downloading cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl (4.1 MB)\n",
            "\u001b[K     |████████████████████████████████| 4.1 MB 43.8 MB/s \n",
            "\u001b[?25hCollecting requests-cache\n",
            "  Downloading requests_cache-0.9.7-py3-none-any.whl (48 kB)\n",
            "\u001b[K     |████████████████████████████████| 48 kB 4.6 MB/s \n",
            "\u001b[?25hRequirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.7/dist-packages (from cryptography->hca-ingest) (1.15.1)\n",
            "Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.12->cryptography->hca-ingest) (2.21)\n",
            "Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.7/dist-packages (from openpyxl->hca-ingest) (1.1.0)\n",
            "Collecting urllib3>=1.25.5\n",
            "  Downloading urllib3-1.26.12-py2.py3-none-any.whl (140 kB)\n",
            "\u001b[K     |████████████████████████████████| 140 kB 72.8 MB/s \n",
            "\u001b[?25hRequirement already satisfied: appdirs>=1.4.4 in /usr/local/lib/python3.7/dist-packages (from requests-cache->hca-ingest) (1.4.4)\n",
            "Collecting cattrs>=22.2\n",
            "  Downloading cattrs-22.2.0-py3-none-any.whl (35 kB)\n",
            "Requirement already satisfied: attrs>=21.2 in /usr/local/lib/python3.7/dist-packages (from requests-cache->hca-ingest) (22.1.0)\n",
            "Collecting url-normalize>=1.4\n",
            "  Downloading url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB)\n",
            "Collecting exceptiongroup\n",
            "  Downloading exceptiongroup-1.0.1-py3-none-any.whl (12 kB)\n",
            "Requirement already satisfied: typing_extensions in /usr/local/lib/python3.7/dist-packages (from cattrs>=22.2->requests-cache->hca-ingest) (4.1.1)\n",
            "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests[security]->hca-ingest) (2.10)\n",
            "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests[security]->hca-ingest) (3.0.4)\n",
            "Collecting urllib3>=1.25.5\n",
            "  Downloading urllib3-1.25.11-py2.py3-none-any.whl (127 kB)\n",
            "\u001b[K     |████████████████████████████████| 127 kB 37.2 MB/s \n",
            "\u001b[?25hRequirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests[security]->hca-ingest) (2022.9.24)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from url-normalize>=1.4->requests-cache->hca-ingest) (1.15.0)\n",
            "Collecting pyOpenSSL>=0.14\n",
            "  Downloading pyOpenSSL-22.1.0-py3-none-any.whl (57 kB)\n",
            "\u001b[K     |████████████████████████████████| 57 kB 5.0 MB/s \n",
            "\u001b[?25hBuilding wheels for collected packages: hca-ingest, polling\n",
            "  Building wheel for hca-ingest (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for hca-ingest: filename=hca_ingest-2.6.0-py3-none-any.whl size=70596 sha256=114859a9f8d95b35d96ab061da4a47a702825e7a9a0015cfdcf7e06ac4300515\n",
            "  Stored in directory: /root/.cache/pip/wheels/c2/8e/0c/4693dcbc7e2b550c87e3b89a41f0b2e4d98924bc7681c85846\n",
            "  Building wheel for polling (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for polling: filename=polling-0.3.2-py3-none-any.whl size=4129 sha256=3fb1d18a14a77ee5a6864cc4e989d673d80b8421e54621e759012f50d975858a\n",
            "  Stored in directory: /root/.cache/pip/wheels/e5/3f/0c/54a03b715fce3176335c957ae94d7d0b2a918e89b1b195bace\n",
            "Successfully built hca-ingest polling\n",
            "Installing collected packages: urllib3, exceptiongroup, cryptography, url-normalize, pyOpenSSL, cattrs, xlsxwriter, requests-cache, polling, mergedeep, jsonref, hca-ingest\n",
            "  Attempting uninstall: urllib3\n",
            "    Found existing installation: urllib3 1.24.3\n",
            "    Uninstalling urllib3-1.24.3:\n",
            "      Successfully uninstalled urllib3-1.24.3\n",
            "Successfully installed cattrs-22.2.0 cryptography-38.0.3 exceptiongroup-1.0.1 hca-ingest-2.6.0 jsonref-1.0.1 mergedeep-1.3.4 polling-0.3.2 pyOpenSSL-22.1.0 requests-cache-0.9.7 url-normalize-1.4.3 urllib3-1.25.11 xlsxwriter-3.0.3\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Load libraries"
      ],
      "metadata": {
        "id": "IYQmBG-wqO5v"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import requests as rq\n",
        "import json\n",
        "from hca_ingest.api.ingestapi import IngestApi"
      ],
      "metadata": {
        "id": "ViET1F9vqlT7"
      },
      "execution_count": 3,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Get a token\n",
        "\n",
        "In order to get a token, you need to log in to the ingest UI: https://staging.contribute.data.humancellatlas.org/. For the purpose of this notebook, we will be using staging. However, that can be change to prod (by deleting the first part of the domain) or to dev (by changing `staging` to `dev`) at any point in the process.\n",
        "\n",
        "If you are going to use any other environment, please remember to change the `environment` variable in the next section\n",
        "\n",
        "The steps to obtain the token are detailed in this guide: [API tokens](https://ebi-ait.github.io/hca-ebi-dev-team/operations_tasks/api_token.html)"
      ],
      "metadata": {
        "id": "6VLW0nNnUjFq"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "token = \"Bearer <paste_token_here>\""
      ],
      "metadata": {
        "id": "07iWzHuwUwQN"
      },
      "execution_count": 11,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Set up environment and global variables"
      ],
      "metadata": {
        "id": "zuKF-BQzvpQy"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Environment-related set-up and global variables used across the notebook\n",
        "accepted_environments = {\n",
        "    'develop': '.dev',\n",
        "    'staging': '.staging',\n",
        "    'production': ''\n",
        "}\n",
        "\n",
        "environment = 'staging'  #staging environment by default\n",
        "\n",
        "# Set up environment value for API's URL\n",
        "try:\n",
        "  env_for_url = accepted_environments[environment]\n",
        "except KeyError:\n",
        "  print(f\"Environment {environment} not recognised. Defaulting to staging\")\n",
        "  env_for_url = accepted_environments['staging']\n",
        "\n",
        "base_url = f'https://api.ingest{env_for_url}.archive.data.humancellatlas.org'\n",
        "\n",
        "# Set up API object\n",
        "api = IngestApi(url=base_url)\n",
        "headers = api.set_token(token=token)\n"
      ],
      "metadata": {
        "id": "LMOfvgiXsX79"
      },
      "execution_count": 12,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Create a project\n",
        "\n",
        "This block of code will be dedicated to creating a project within ingest. The following will be assumed:\n",
        "* A JSON entity is available for use as the \"content\"\n",
        "\n",
        "For the purpose of this notebook, everything will be performed in the staging environment. To perform this on other environments (e.g. prod), please update the `environment` variable to any of the values accepted in `accepted_environments`"
      ],
      "metadata": {
        "id": "D4TjzYhdUpqk"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Load the project metadata entity\n",
        "with open('example_project.json', 'r') as f:\n",
        "  project_content = json.load(f)\n",
        "\n",
        "ingest_project = api.create_project(submission_url='', content=project_content)\n"
      ],
      "metadata": {
        "id": "KWYY0lYZUtz9"
      },
      "execution_count": 13,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "The returned object is the project as contained by ingest: this object contains the metadata that was submitted in the previous step, but also contains some extra, important metadata:\n",
        "\n",
        "* uuid: Unique identifier for your project, generated randomly\n",
        "* Management metadata: This metadata comprises metadata that will apply to your experiment, e.g. organs, species used, etc.\n",
        "\n",
        "We're going to print the object and take a look"
      ],
      "metadata": {
        "id": "8z0yee6CZQK0"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "ingest_project"
      ],
      "metadata": {
        "id": "GuDNpJS1ae_v",
        "outputId": "2d5e7efd-404f-446f-b5ef-68abe28b9a6c",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'content': {'describedBy': 'https://schema.staging.data.humancellatlas.org/type/project/17.0.0/project',\n",
              "  'schema_type': 'project',\n",
              "  'project_core': {'project_short_name': 'myCoolLabel',\n",
              "   'project_title': 'Test_project_with_minimum_information',\n",
              "   'project_description': 'This is a test project with minimum information for the programmatic submissions guide'},\n",
              "  'contributors': [{'name': 'Enrique,,Ventura',\n",
              "    'email': 'enrique@ebi.ac.uk',\n",
              "    'institution': 'EMBL-EBI',\n",
              "    'corresponding_contributor': True,\n",
              "    'project_role': {'text': 'data curator',\n",
              "     'ontology': 'EFO:0009737',\n",
              "     'ontology_label': 'data curator'}}],\n",
              "  'publications': [{'authors': ['Lorem IP', 'Sed UP'],\n",
              "    'title': 'A combined approach for single-cell mRNA and intracellular protein expression analysis',\n",
              "    'url': 'https://www.frontiersin.org/articles/10.3389/fcell.2020.00384/full',\n",
              "    'official_hca_publication': False}],\n",
              "  'funders': [{'grant_title': 'a cool grant',\n",
              "    'grant_id': '000000000bp1',\n",
              "    'organization': 'EMBL-EBI'}]},\n",
              " 'submissionDate': '2022-11-09T17:36:29.253947Z',\n",
              " 'updateDate': '2022-11-09T17:36:29.253947Z',\n",
              " 'user': '5ece3464ec0680746267e784',\n",
              " 'lastModifiedUser': '5ece3464ec0680746267e784',\n",
              " 'type': 'Project',\n",
              " 'uuid': {'uuid': 'e838a25f-8f52-4678-9744-6d650ca65374'},\n",
              " 'events': [],\n",
              " 'firstDcpVersion': '2022-11-09T17:36:29.253947Z',\n",
              " 'dcpVersion': '2022-11-09T17:36:29.253947Z',\n",
              " 'contentLastModified': '2022-11-09T17:36:29.246070Z',\n",
              " 'accession': None,\n",
              " 'validationState': 'Draft',\n",
              " 'validationErrors': None,\n",
              " 'graphValidationErrors': None,\n",
              " 'isUpdate': False,\n",
              " 'releaseDate': None,\n",
              " 'accessionDate': None,\n",
              " 'technology': None,\n",
              " 'organ': None,\n",
              " 'cellCount': None,\n",
              " 'dataAccess': None,\n",
              " 'identifyingOrganisms': None,\n",
              " 'primaryWrangler': None,\n",
              " 'secondaryWrangler': None,\n",
              " 'wranglingState': None,\n",
              " 'wranglingPriority': None,\n",
              " 'wranglingNotes': None,\n",
              " 'isInCatalogue': None,\n",
              " 'cataloguedDate': None,\n",
              " 'publicationsInfo': None,\n",
              " 'dcpReleaseNumber': None,\n",
              " 'projectLabels': None,\n",
              " 'hasOpenSubmission': False,\n",
              " '_links': {'self': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786'},\n",
              "  'project': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786',\n",
              "   'title': 'A single project'},\n",
              "  'validating': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/validatingEvent'},\n",
              "  'bundleManifests': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/bundleManifests',\n",
              "   'title': 'Access or create bundle manifests (describing which submitted contents went into which bundle in the datastore)'},\n",
              "  'auditLogs': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/auditLogs'},\n",
              "  'supplementaryFiles': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/supplementaryFiles'},\n",
              "  'submissionEnvelopes': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/submissionEnvelopes',\n",
              "   'title': 'Access or create new submission envelopes'},\n",
              "  'submissionEnvelope': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/submissionEnvelope',\n",
              "   'title': 'A single submission envelope'}}}"
            ]
          },
          "metadata": {},
          "execution_count": 14
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Everything looks correct, so we will save the identifier for our project (called the `uuid`) and store it in case we need to retrieve the project later."
      ],
      "metadata": {
        "id": "kKbnF8IFMvhO"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Store project uuid\n",
        "ingest_project_uuid = ingest_project['uuid']['uuid']"
      ],
      "metadata": {
        "id": "Yk6pGUrONAgI"
      },
      "execution_count": 15,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Understanding the information on the project\n",
        "\n",
        "After printing the resulting `ingest_project`, you probably have noticed that there is much more meatadata than what was sent; for most entities, this is just system-generated and you don't need to worry about it. \n",
        "\n",
        "However, for `project` metadata, we load some information regarding statuses and general-level metadata for different purposes (e.g. display in the [project catalogue](https://www.ebi.ac.uk/humancellatlas/project-catalogue/)).\n",
        "\n",
        "This project-level metadata is explained in more detail in the [create a project](https://ebi-ait.github.io/ingest-programmatic-submissions/docs/create_a_project/create_a_project.html) guideline associated with this notebook. For now, we will focus on the metadata that we should fill out:"
      ],
      "metadata": {
        "id": "QPTVXMcZQaua"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "minimum_required_fields = {\n",
        "    'releaseDate': None,          # Date that you want your data to be released. If the data is to be released as soon as possible, or if data has already been released (e.g. in GEO) input today's date in format: YYYY-MM-DDT00:00:00Z (e.g. 2021-11-29T00:00:00Z)\n",
        "    'accessionDate': None,        # Same as above, but for accessioning in public archives.\n",
        "    'technology': None,           # Library preparation technology(ies) used in the experiment, ontologised. More below.\n",
        "    'organ': None,                # Organ(s) used in the experimnt, ontologised. More below\n",
        "    'cellCount': None,            # Estimated number of cells generated by this project.\n",
        "    'dataAccess': None,           # Type of data access, selected from a list of terms. For more detail, refer to readme.\n",
        "    'identifyingOrganisms': None, # Organism that was used to generate the data, can be: Human, Mouse, or both.\n",
        "    'primaryWrangler': None,      # Person that is in charge of the project/submission: associated with a user.\n",
        "    'wranglingState': None,       # Status of the project. For a detailed list of accepted values, refer to readme.\n",
        "    'wranglingPriority': None,    # 1, 2, or 3. 1 is highest priority and 3 is lowest. Refer to readme for more information.\n",
        "    'wranglingNotes': None,       # Extra notes associated with the project; feel free to input your own notes here.\n",
        "    'isInCatalogue': None,        # If the project is to be displayed in the catalogue, True, otherwise False\n",
        "    }"
      ],
      "metadata": {
        "id": "eujjTdlXU7Bt"
      },
      "execution_count": 16,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Adding minimum information\n",
        "Now, we will be modifying the information on the list above, to make sure we enter the minimum amount of metadata that the project should contain. We're going to divide the fields in 2 types:\n",
        "* **Ontologised**: fields that are validated against the [HCA ontology](https://ontology.archive.data.humancellatlas.org/index).\n",
        "* **Other**: Fields that have are not ontologised and that are validated against other premises.\n",
        "\n",
        "We're going to start with the ontologised fields.\n",
        "\n",
        "#### Ontologised fields\n",
        "\n",
        "These terms are called \"ontologised\" because they are validated against a set of restrictions defined both in our validation rules and enforced in the ontologies themselves; for example, `organ` validates that the term used as an input is validated as a child term, only with relationship `subclassOf`, of the term `anatomical structure`([UBERON:0000061](https://ontology.archive.data.humancellatlas.org/ontologies/hcao/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_0000061)). Detailed information on the restrictions can be found in the readme file.\n",
        "\n",
        "In this category, we have 2 fields:\n",
        "- organ: A list of the organs used in this experiment; for this notebook, we're going to use the terms \"lung\"([UBERON:0002048](https://ontology.archive.data.humancellatlas.org/ontologies/hcao/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_0002048)) and \"heart\"([UBERON:0000948](https://ontology.archive.data.humancellatlas.org/ontologies/hcao/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_0000948)).\n",
        "- technology: A list of the library preparation technologies used in this experiment; for this notebook, we're going to use the terms `10x 3' v2`([EFO:0009899](https://ontology.archive.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0009899)) and `10x 3' v3`([EFO:0009922](https://ontology.archive.data.humancellatlas.org/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0009922)). This field also accepts free text in case there is no ontology for the term just yet; we are also going to add an entry for this\n"
      ],
      "metadata": {
        "id": "n_RSGOk5xsbw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Set up organ\n",
        "organ = {\n",
        "    \"ontologies\": [\n",
        "      {\n",
        "        \"text\": \"lung\",                 # Text field, free string that allows the user to introduce a more exact definition of the term if not available in the ontology\n",
        "        \"ontology\": \"UBERON:0002048\",   # Unique identifier for the ontology term, in the form of <ontology>:<ID>\n",
        "        \"ontology_label\": \"lung\"        # Text field, must exactly match the label provided in the ontology, case sensitive.\n",
        "      },\n",
        "      {\n",
        "        \"text\": \"heart\",\n",
        "        \"ontology\": \"UBERON:0000948\",\n",
        "        \"ontology_label\": \"heart\"\n",
        "      }\n",
        "    ]\n",
        "}\n",
        "\n",
        "# Set up technology\n",
        "technology = {\n",
        "    \"ontologies\": [\n",
        "      {\n",
        "        \"text\": \"10x 3' v2\",\n",
        "        \"ontology\": \"EFO:0009899\",\n",
        "        \"ontology_label\": \"10x 3' v2\"\n",
        "      },\n",
        "      {\n",
        "        \"text\": \"10x 3' v3\",\n",
        "        \"ontology\": \"EFO:0009922\",\n",
        "        \"ontology_label\": \"10x 3' v3\"\n",
        "      }\n",
        "    ],\n",
        "    \"others\": [\n",
        "        \"Mysupercoollibrarypreptechnology\"  # Free text field to introduce as many terms as you want that couldn't be found in the ontology\n",
        "    ]\n",
        "}\n",
        "\n",
        "# pass the values to our variable\n",
        "minimum_required_fields['organ'] = organ\n",
        "minimum_required_fields['technology'] = technology"
      ],
      "metadata": {
        "id": "HXo9IcnLKZOe"
      },
      "execution_count": 17,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Other fields"
      ],
      "metadata": {
        "id": "PD6jS8yrYPTk"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Dates\n",
        "# Dates must follow the following format: YYYY-MM-DDThh:mm:ssZ\n",
        "minimum_required_fields['releaseDate'] = \"2022-08-30T00:00:00Z\"\n",
        "minimum_required_fields['accessionDate'] = \"2022-08-30T00:00:00Z\"\n",
        "\n",
        "# Enum values\n",
        "# Set of values accepted are predetermined, depending on the field. \n",
        "# For the full list of values, please refer to the readme\n",
        "minimum_required_fields['dataAccess'] = {\n",
        "                                          \"type\": \"All fully open\",\n",
        "                                          \"notes\": \"Can be released publicly! :D\"\n",
        "                                        }\n",
        "minimum_required_fields['identifyingOrganisms'] = [\"Human\", \"Mouse\", \"Other\"]\n",
        "minimum_required_fields['wranglingPriority'] = 1 # Very important project!\n",
        "minimum_required_fields['wranglingState'] = \"Eligible\"\n",
        "\n",
        "# Simple values\n",
        "# Set of fields that have a simple value; it may be a free string, an integer or a boolean\n",
        "minimum_required_fields['cellCount'] = 17500\n",
        "minimum_required_fields['primaryWrangler'] = ingest_project['user'] # User ID is required in this field.\n",
        "minimum_required_fields['wranglingNotes'] = \"This is an awesome project and I will finish it soon\"\n",
        "minimum_required_fields['isInCatalogue'] = True # We want the project to be displayed in the project catalogue"
      ],
      "metadata": {
        "id": "FYGzoPepYTox"
      },
      "execution_count": 18,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Updating project with missing information\n",
        "\n",
        "Now that we understand the metadata that we are handling, and that we have filled in the missing bits necessary for a minimum information project, we will update the project with the values that we have been gathering.\n",
        "\n",
        "Once we have the content that we have to update, the update itself is pretty easy!"
      ],
      "metadata": {
        "id": "sqA9AEZAl_MQ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Retrieve project URL to update\n",
        "ingest_project_url = ingest_project['_links']['self']['href']\n",
        "response = api.patch(url=ingest_project_url, json=minimum_required_fields)\n",
        "\n",
        "updated_ingest_project = response.json()"
      ],
      "metadata": {
        "id": "vV4XC9J2HXw8"
      },
      "execution_count": 19,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's print the project and check if the changes have made it through!"
      ],
      "metadata": {
        "id": "jTL3XY2aK-Vz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "updated_ingest_project"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Aa1snDYbLCgL",
        "outputId": "28fd0080-20bc-4c30-d057-64cf23685605"
      },
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'content': {'describedBy': 'https://schema.staging.data.humancellatlas.org/type/project/17.0.0/project',\n",
              "  'schema_type': 'project',\n",
              "  'project_core': {'project_short_name': 'myCoolLabel',\n",
              "   'project_title': 'Test_project_with_minimum_information',\n",
              "   'project_description': 'This is a test project with minimum information for the programmatic submissions guide'},\n",
              "  'contributors': [{'name': 'Enrique,,Ventura',\n",
              "    'email': 'enrique@ebi.ac.uk',\n",
              "    'institution': 'EMBL-EBI',\n",
              "    'corresponding_contributor': True,\n",
              "    'project_role': {'text': 'data curator',\n",
              "     'ontology': 'EFO:0009737',\n",
              "     'ontology_label': 'data curator'}}],\n",
              "  'publications': [{'authors': ['Lorem IP', 'Sed UP'],\n",
              "    'title': 'A combined approach for single-cell mRNA and intracellular protein expression analysis',\n",
              "    'url': 'https://www.frontiersin.org/articles/10.3389/fcell.2020.00384/full',\n",
              "    'official_hca_publication': False}],\n",
              "  'funders': [{'grant_title': 'a cool grant',\n",
              "    'grant_id': '000000000bp1',\n",
              "    'organization': 'EMBL-EBI'}]},\n",
              " 'submissionDate': '2022-11-09T17:36:29.253Z',\n",
              " 'updateDate': '2022-11-09T17:37:57.227484Z',\n",
              " 'user': '5ece3464ec0680746267e784',\n",
              " 'lastModifiedUser': '5ece3464ec0680746267e784',\n",
              " 'type': 'Project',\n",
              " 'uuid': {'uuid': 'e838a25f-8f52-4678-9744-6d650ca65374'},\n",
              " 'events': [],\n",
              " 'firstDcpVersion': '2022-11-09T17:36:29.253Z',\n",
              " 'dcpVersion': '2022-11-09T17:36:29.253Z',\n",
              " 'contentLastModified': '2022-11-09T17:36:29.246Z',\n",
              " 'accession': None,\n",
              " 'validationState': 'Valid',\n",
              " 'validationErrors': [],\n",
              " 'graphValidationErrors': None,\n",
              " 'isUpdate': False,\n",
              " 'releaseDate': '2022-08-30T00:00:00Z',\n",
              " 'accessionDate': '2022-08-30T00:00:00Z',\n",
              " 'technology': {'ontologies': [{'text': \"10x 3' v2\",\n",
              "    'ontology': 'EFO:0009899',\n",
              "    'ontology_label': \"10x 3' v2\"},\n",
              "   {'text': \"10x 3' v3\",\n",
              "    'ontology': 'EFO:0009922',\n",
              "    'ontology_label': \"10x 3' v3\"}],\n",
              "  'others': ['Mysupercoollibrarypreptechnology']},\n",
              " 'organ': {'ontologies': [{'text': 'lung',\n",
              "    'ontology': 'UBERON:0002048',\n",
              "    'ontology_label': 'lung'},\n",
              "   {'text': 'heart',\n",
              "    'ontology': 'UBERON:0000948',\n",
              "    'ontology_label': 'heart'}]},\n",
              " 'cellCount': 17500,\n",
              " 'dataAccess': {'type': 'All fully open',\n",
              "  'notes': 'Can be released publicly! :D'},\n",
              " 'identifyingOrganisms': ['Human', 'Mouse', 'Other'],\n",
              " 'primaryWrangler': '5ece3464ec0680746267e784',\n",
              " 'secondaryWrangler': None,\n",
              " 'wranglingState': 'Eligible',\n",
              " 'wranglingPriority': 1,\n",
              " 'wranglingNotes': 'This is an awesome project and I will finish it soon',\n",
              " 'isInCatalogue': True,\n",
              " 'cataloguedDate': '2022-11-09T17:37:57.207027Z',\n",
              " 'publicationsInfo': None,\n",
              " 'dcpReleaseNumber': None,\n",
              " 'projectLabels': None,\n",
              " 'hasOpenSubmission': False,\n",
              " '_links': {'self': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786'},\n",
              "  'project': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786',\n",
              "   'title': 'A single project'},\n",
              "  'processing': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/processingEvent'},\n",
              "  'draft': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/draftEvent'},\n",
              "  'bundleManifests': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/bundleManifests',\n",
              "   'title': 'Access or create bundle manifests (describing which submitted contents went into which bundle in the datastore)'},\n",
              "  'auditLogs': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/auditLogs'},\n",
              "  'supplementaryFiles': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/supplementaryFiles'},\n",
              "  'submissionEnvelopes': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/submissionEnvelopes',\n",
              "   'title': 'Access or create new submission envelopes'},\n",
              "  'submissionEnvelope': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/submissionEnvelope',\n",
              "   'title': 'A single submission envelope'}}}"
            ]
          },
          "metadata": {},
          "execution_count": 20
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "And we have our project, updated, with the minimum required metadata!"
      ],
      "metadata": {
        "id": "-xl9qZTuLWiR"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Retrieve a project\n",
        "\n",
        "Once we have created a project with minimum information, we may want to retrieve the project to do further things with it (Add more metadata, check status, etc). In order to do this, we are going to use one of the many functions that we have available to retrieve a project:\n",
        "- `IngestApi.get_project_by_uuid`: Retrieves a single project with a UUID\n",
        "\n",
        "But there are other functions available, in case you don't have the UUID at hand or can't remember, listed below:\n",
        "\n",
        "<details>\n",
        "<summary>Functions to search for projects</summary>\n",
        "<ul>\n",
        "<li>.get_user_projects: Retrieve all the projects associated with your user (Requires token to be set)</li>\n",
        "<li>.get_project_by_id: Retrieve a project with the MongoDB ID provided</li>\n",
        "</ul>\n",
        "</details>\n"
      ],
      "metadata": {
        "id": "Z-vwT165JGDF"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "ingest_project = api.get_project_by_uuid(ingest_project_uuid)"
      ],
      "metadata": {
        "id": "Kgwe6J5OarK0"
      },
      "execution_count": 21,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's ensure we have retrieved our project correctly:"
      ],
      "metadata": {
        "id": "B8po4GbBNWhV"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "ingest_project"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OJA74x10QzR3",
        "outputId": "fd68ce0e-ba0f-44b3-81d0-858d6928916a"
      },
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'content': {'describedBy': 'https://schema.staging.data.humancellatlas.org/type/project/17.0.0/project',\n",
              "  'schema_type': 'project',\n",
              "  'project_core': {'project_short_name': 'myCoolLabel',\n",
              "   'project_title': 'Test_project_with_minimum_information',\n",
              "   'project_description': 'This is a test project with minimum information for the programmatic submissions guide'},\n",
              "  'contributors': [{'name': 'Enrique,,Ventura',\n",
              "    'email': 'enrique@ebi.ac.uk',\n",
              "    'institution': 'EMBL-EBI',\n",
              "    'corresponding_contributor': True,\n",
              "    'project_role': {'text': 'data curator',\n",
              "     'ontology': 'EFO:0009737',\n",
              "     'ontology_label': 'data curator'}}],\n",
              "  'publications': [{'authors': ['Lorem IP', 'Sed UP'],\n",
              "    'title': 'A combined approach for single-cell mRNA and intracellular protein expression analysis',\n",
              "    'url': 'https://www.frontiersin.org/articles/10.3389/fcell.2020.00384/full',\n",
              "    'official_hca_publication': False}],\n",
              "  'funders': [{'grant_title': 'a cool grant',\n",
              "    'grant_id': '000000000bp1',\n",
              "    'organization': 'EMBL-EBI'}]},\n",
              " 'submissionDate': '2022-11-09T17:36:29.253Z',\n",
              " 'updateDate': '2022-11-09T17:37:57.227Z',\n",
              " 'user': '5ece3464ec0680746267e784',\n",
              " 'lastModifiedUser': '5ece3464ec0680746267e784',\n",
              " 'type': 'Project',\n",
              " 'uuid': {'uuid': 'e838a25f-8f52-4678-9744-6d650ca65374'},\n",
              " 'events': [],\n",
              " 'firstDcpVersion': '2022-11-09T17:36:29.253Z',\n",
              " 'dcpVersion': '2022-11-09T17:36:29.253Z',\n",
              " 'contentLastModified': '2022-11-09T17:36:29.246Z',\n",
              " 'accession': None,\n",
              " 'validationState': 'Valid',\n",
              " 'validationErrors': [],\n",
              " 'graphValidationErrors': None,\n",
              " 'isUpdate': False,\n",
              " 'releaseDate': '2022-08-30T00:00:00Z',\n",
              " 'accessionDate': '2022-08-30T00:00:00Z',\n",
              " 'technology': {'ontologies': [{'text': \"10x 3' v2\",\n",
              "    'ontology': 'EFO:0009899',\n",
              "    'ontology_label': \"10x 3' v2\"},\n",
              "   {'text': \"10x 3' v3\",\n",
              "    'ontology': 'EFO:0009922',\n",
              "    'ontology_label': \"10x 3' v3\"}],\n",
              "  'others': ['Mysupercoollibrarypreptechnology']},\n",
              " 'organ': {'ontologies': [{'text': 'lung',\n",
              "    'ontology': 'UBERON:0002048',\n",
              "    'ontology_label': 'lung'},\n",
              "   {'text': 'heart',\n",
              "    'ontology': 'UBERON:0000948',\n",
              "    'ontology_label': 'heart'}]},\n",
              " 'cellCount': 17500,\n",
              " 'dataAccess': {'type': 'All fully open',\n",
              "  'notes': 'Can be released publicly! :D'},\n",
              " 'identifyingOrganisms': ['Human', 'Mouse', 'Other'],\n",
              " 'primaryWrangler': '5ece3464ec0680746267e784',\n",
              " 'secondaryWrangler': None,\n",
              " 'wranglingState': 'Eligible',\n",
              " 'wranglingPriority': 1,\n",
              " 'wranglingNotes': 'This is an awesome project and I will finish it soon',\n",
              " 'isInCatalogue': True,\n",
              " 'cataloguedDate': '2022-11-09T17:37:57.207Z',\n",
              " 'publicationsInfo': None,\n",
              " 'dcpReleaseNumber': None,\n",
              " 'projectLabels': None,\n",
              " 'hasOpenSubmission': False,\n",
              " '_links': {'self': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786'},\n",
              "  'project': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786',\n",
              "   'title': 'A single project'},\n",
              "  'processing': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/processingEvent'},\n",
              "  'draft': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/draftEvent'},\n",
              "  'bundleManifests': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/bundleManifests',\n",
              "   'title': 'Access or create bundle manifests (describing which submitted contents went into which bundle in the datastore)'},\n",
              "  'auditLogs': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/auditLogs'},\n",
              "  'supplementaryFiles': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/supplementaryFiles'},\n",
              "  'submissionEnvelopes': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/submissionEnvelopes',\n",
              "   'title': 'Access or create new submission envelopes'},\n",
              "  'submissionEnvelope': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/submissionEnvelope',\n",
              "   'title': 'A single submission envelope'}}}"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Check status of a project\n",
        "\n",
        "When a project (or any piece of metadata) is updated to ingest, it gets validated, the `content` being validated against the schema it is pointing to (on the `describedBy` field), and in the case of the project, the base fields validating against other set of rules.\n",
        "\n",
        "The ingest service has the ability to provide with a full report of these validation events, including the status of the entity and the error messages.\n",
        "\n",
        "On this section, we will focus on retrieving the errors (currently none) of the project we just uploaded and we will update the project to artificially produce a couple of errors. We will then retrieve the project again and check on the errors, but for a detailed explanation of each type of error, please refer to the Readme file."
      ],
      "metadata": {
        "id": "kd338q6ZQwjJ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Print validation errors\n",
        "validation_errors = ingest_project['validationErrors']\n",
        "print(f\"Validation errors: {validation_errors if validation_errors else None}\")\n",
        "\n",
        "# Print validation status\n",
        "validation_status = ingest_project['validationState']\n",
        "print(f\"Validation status: {validation_status}\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "R3aSi8ODQwLc",
        "outputId": "c1c1ff95-9af2-4d86-d8e8-b20a9e724124"
      },
      "execution_count": 23,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Validation errors: None\n",
            "Validation status: Valid\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "non_valid_content = ingest_project['content']\n",
        "non_valid_content['estimated_cell_count'] = '17500'             # Cell count should always be an integer\n",
        "non_valid_content['insdc_project_accessions'] =  [              # INSDC project accessions:\n",
        "                                      'GSE7777777',   # SHOULD NOT be a GEO series accession\n",
        "                                      'SRP000000',    # SHOULD follow SRPXXXXXX format\n",
        "                                      '',             # SHOULD NOT be an empty string\n",
        "                                      347289347       # SHOULD NOT be a numer\n",
        "                                      ]\n",
        "\n",
        "non_valid_values =  {    \n",
        "                      \"content\" : non_valid_content   # Patching \"content\" field\n",
        "                    } \n",
        "\n",
        "# Patch the non_valid content into the project content\n",
        "response = api.patch(url=ingest_project_url, json=non_valid_values)"
      ],
      "metadata": {
        "id": "qiUH6ow5Q1vb"
      },
      "execution_count": 24,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "response.json()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "goKuDHaUrTqt",
        "outputId": "94325c1a-bf46-4e96-af8c-b5ae9fab4471"
      },
      "execution_count": 25,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'content': {'describedBy': 'https://schema.staging.data.humancellatlas.org/type/project/17.0.0/project',\n",
              "  'schema_type': 'project',\n",
              "  'project_core': {'project_short_name': 'myCoolLabel',\n",
              "   'project_title': 'Test_project_with_minimum_information',\n",
              "   'project_description': 'This is a test project with minimum information for the programmatic submissions guide'},\n",
              "  'contributors': [{'name': 'Enrique,,Ventura',\n",
              "    'email': 'enrique@ebi.ac.uk',\n",
              "    'institution': 'EMBL-EBI',\n",
              "    'corresponding_contributor': True,\n",
              "    'project_role': {'text': 'data curator',\n",
              "     'ontology': 'EFO:0009737',\n",
              "     'ontology_label': 'data curator'}}],\n",
              "  'publications': [{'authors': ['Lorem IP', 'Sed UP'],\n",
              "    'title': 'A combined approach for single-cell mRNA and intracellular protein expression analysis',\n",
              "    'url': 'https://www.frontiersin.org/articles/10.3389/fcell.2020.00384/full',\n",
              "    'official_hca_publication': False}],\n",
              "  'funders': [{'grant_title': 'a cool grant',\n",
              "    'grant_id': '000000000bp1',\n",
              "    'organization': 'EMBL-EBI'}],\n",
              "  'estimated_cell_count': '17500',\n",
              "  'insdc_project_accessions': ['GSE7777777', 'SRP000000', '', 347289347]},\n",
              " 'submissionDate': '2022-11-09T17:36:29.253Z',\n",
              " 'updateDate': '2022-11-09T17:38:35.057769Z',\n",
              " 'user': '5ece3464ec0680746267e784',\n",
              " 'lastModifiedUser': '5ece3464ec0680746267e784',\n",
              " 'type': 'Project',\n",
              " 'uuid': {'uuid': 'e838a25f-8f52-4678-9744-6d650ca65374'},\n",
              " 'events': [],\n",
              " 'firstDcpVersion': '2022-11-09T17:36:29.253Z',\n",
              " 'dcpVersion': '2022-11-09T17:38:35.057205Z',\n",
              " 'contentLastModified': '2022-11-09T17:38:35.057205Z',\n",
              " 'accession': None,\n",
              " 'validationState': 'Valid',\n",
              " 'validationErrors': [],\n",
              " 'graphValidationErrors': None,\n",
              " 'isUpdate': False,\n",
              " 'releaseDate': '2022-08-30T00:00:00Z',\n",
              " 'accessionDate': '2022-08-30T00:00:00Z',\n",
              " 'technology': {'ontologies': [{'text': \"10x 3' v2\",\n",
              "    'ontology': 'EFO:0009899',\n",
              "    'ontology_label': \"10x 3' v2\"},\n",
              "   {'text': \"10x 3' v3\",\n",
              "    'ontology': 'EFO:0009922',\n",
              "    'ontology_label': \"10x 3' v3\"}],\n",
              "  'others': ['Mysupercoollibrarypreptechnology']},\n",
              " 'organ': {'ontologies': [{'text': 'lung',\n",
              "    'ontology': 'UBERON:0002048',\n",
              "    'ontology_label': 'lung'},\n",
              "   {'text': 'heart',\n",
              "    'ontology': 'UBERON:0000948',\n",
              "    'ontology_label': 'heart'}]},\n",
              " 'cellCount': 17500,\n",
              " 'dataAccess': {'type': 'All fully open',\n",
              "  'notes': 'Can be released publicly! :D'},\n",
              " 'identifyingOrganisms': ['Human', 'Mouse', 'Other'],\n",
              " 'primaryWrangler': '5ece3464ec0680746267e784',\n",
              " 'secondaryWrangler': None,\n",
              " 'wranglingState': None,\n",
              " 'wranglingPriority': 1,\n",
              " 'wranglingNotes': 'This is an awesome project and I will finish it soon',\n",
              " 'isInCatalogue': True,\n",
              " 'cataloguedDate': '2022-11-09T17:37:57.207Z',\n",
              " 'publicationsInfo': None,\n",
              " 'dcpReleaseNumber': None,\n",
              " 'projectLabels': None,\n",
              " 'hasOpenSubmission': False,\n",
              " '_links': {'self': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786'},\n",
              "  'project': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786',\n",
              "   'title': 'A single project'},\n",
              "  'processing': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/processingEvent'},\n",
              "  'draft': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/draftEvent'},\n",
              "  'bundleManifests': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/bundleManifests',\n",
              "   'title': 'Access or create bundle manifests (describing which submitted contents went into which bundle in the datastore)'},\n",
              "  'auditLogs': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/auditLogs'},\n",
              "  'supplementaryFiles': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/supplementaryFiles'},\n",
              "  'submissionEnvelopes': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/submissionEnvelopes',\n",
              "   'title': 'Access or create new submission envelopes'},\n",
              "  'submissionEnvelope': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/636be51d897ce65cad07c786/submissionEnvelope',\n",
              "   'title': 'A single submission envelope'}}}"
            ]
          },
          "metadata": {},
          "execution_count": 25
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "After patching the project with invalid values, let's repeat the check we did previously."
      ],
      "metadata": {
        "id": "LCmpD1Z4obB6"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "ingest_project = api.get_project_by_uuid(ingest_project_uuid)\n",
        "# Print validation errors\n",
        "validation_errors = ingest_project['validationErrors']\n",
        "newline = '\\n'\n",
        "print(f\"Validation errors: {validation_errors if validation_errors else None}\")\n",
        "\n",
        "# Print validation status\n",
        "validation_status = ingest_project['validationState']\n",
        "print(f\"Validation status: {validation_status}\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "WgEXUMCkogYk",
        "outputId": "cc08dcf0-49c4-436e-969a-37bd33b76f07"
      },
      "execution_count": 26,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Validation errors: [{'errorType': 'METADATA_ERROR', 'message': 'should match pattern \"^[D|E|S]RP[0-9]+$\"', 'userFriendlyMessage': 'should match pattern \"^[D|E|S]RP[0-9]+$\" at .insdc_project_accessions[0]', 'absoluteDataPath': '.insdc_project_accessions[0]'}, {'errorType': 'METADATA_ERROR', 'message': 'should match pattern \"^[D|E|S]RP[0-9]+$\"', 'userFriendlyMessage': 'should match pattern \"^[D|E|S]RP[0-9]+$\" at .insdc_project_accessions[2]', 'absoluteDataPath': '.insdc_project_accessions[2]'}, {'errorType': 'METADATA_ERROR', 'message': 'should be string', 'userFriendlyMessage': 'should be string at .insdc_project_accessions[3]', 'absoluteDataPath': '.insdc_project_accessions[3]'}, {'errorType': 'METADATA_ERROR', 'message': 'should be integer', 'userFriendlyMessage': 'should be integer at .estimated_cell_count', 'absoluteDataPath': '.estimated_cell_count'}]\n",
            "Validation status: Invalid\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "As we can see, this time it has returned 2 things:\n",
        "- A set of errors, comprised in a list that details the errors, from type to message.\n",
        "- Validation status: invalid, indicating that the validation went wrong.\n",
        "\n",
        "For detailed information on how to understand the errors, please proceed to the \"readme.md\" file."
      ],
      "metadata": {
        "id": "wZvFmCSavieu"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Delete a project\n",
        "\n",
        "Projects in our database can be deleted. While we do not advise to delete projects once they have been published in the data portal (`uuid` identifiers are important for updates), at any point before finishing the submission (Later in the notebook), any metadata entity can be deleted, including projects."
      ],
      "metadata": {
        "id": "E8Yy6hkeOWjL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Delete ingest project and check everything went correctly\n",
        "response = api.delete(ingest_project_url)\n",
        "\n",
        "assert response.status_code == 204"
      ],
      "metadata": {
        "id": "UESYW8QMOYFh"
      },
      "execution_count": 27,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "If the status code of the response is 204, the project has been deleted!"
      ],
      "metadata": {
        "id": "St_va6S2T9KK"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Addendum\n",
        "\n",
        "## Updating projects\n",
        "\n",
        "### Deleting a field/Replacing all values"
      ],
      "metadata": {
        "id": "N1e3V7Ld-UCT"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Deleting a field requires a slightly different sort of operation; up until now, we have used `patch` to address field modifications. However, if we want to delete a field or replace all values, we would need to delete the field from the content, and then PUT the whole content of the project entity to the project URL.\n",
        "\n",
        "This operation will completely replace the older entry with the new one; using the old one as a template ensures critical fields (e.g. `uuid`) get preserved over this operation."
      ],
      "metadata": {
        "id": "7D40RV57BZY_"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Adding new fields\n",
        "\n",
        "When adding new fields, considering the type of field that is going to be added is essential; nested properties and arrays can't be just modified through a `patch` operation, they need the document to be partially (or entirely) replaced\n",
        "\n",
        "In this notebook, we are going to add 2 fields:\n",
        "- A completely new field, available in the schema, `insdc_project_accessions`\n",
        "- A new publication that we want associated to this project, without deleting the already existing one."
      ],
      "metadata": {
        "id": "GEqrch7-CyOd"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Adding the INSDC project accession\n",
        "response = api.patch(url=ingest_project_url, patch={\"content\": {\"insdc_project_accessions\": [\"SRP000000\"]}})\n",
        "assert response.status_code == 200\n",
        "updated_project = api.get_project_by_uuid(ingest_project_uuid)\n",
        "\n",
        "# Let's print the project and ensure the modification has gone through!\n",
        "updated_project"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tjf_XcJcDSx5",
        "outputId": "807839d3-bebf-4b37-9fe7-37c59b1f28a7"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'content': {'insdc_project_accessions': ['SRP000000']},\n",
              " 'submissionDate': '2022-08-31T14:05:37.625Z',\n",
              " 'updateDate': '2022-08-31T14:07:18.255Z',\n",
              " 'user': '5ece3464ec0680746267e784',\n",
              " 'lastModifiedUser': '5ece3464ec0680746267e784',\n",
              " 'type': 'Project',\n",
              " 'uuid': {'uuid': '019b3b05-903b-4b85-bdae-1e10589ccd06'},\n",
              " 'events': [],\n",
              " 'firstDcpVersion': '2022-08-31T14:05:37.625Z',\n",
              " 'dcpVersion': '2022-08-31T14:07:18.252Z',\n",
              " 'contentLastModified': '2022-08-31T14:07:18.252Z',\n",
              " 'accession': None,\n",
              " 'validationState': 'Draft',\n",
              " 'validationErrors': [],\n",
              " 'graphValidationErrors': None,\n",
              " 'isUpdate': False,\n",
              " 'releaseDate': '2022-08-30T00:00:00Z',\n",
              " 'accessionDate': '2022-08-30T00:00:00Z',\n",
              " 'technology': {'ontologies': [{'text': \"10x 3' v2\",\n",
              "    'ontology': 'EFO:0009899',\n",
              "    'ontology_label': \"10x 3' v2\"},\n",
              "   {'text': \"10x 3' v3\",\n",
              "    'ontology': 'EFO:0009922',\n",
              "    'ontology_label': \"10x 3' v3\"}],\n",
              "  'others': ['Mysupercoollibrarypreptechnology']},\n",
              " 'organ': {'ontologies': [{'text': 'lung',\n",
              "    'ontology': 'UBERON:0002048',\n",
              "    'ontology_label': 'lung'},\n",
              "   {'text': 'heart',\n",
              "    'ontology': 'UBERON:0000948',\n",
              "    'ontology_label': 'heart'}]},\n",
              " 'cellCount': 17500,\n",
              " 'dataAccess': {'type': 'All fully open',\n",
              "  'notes': 'Can be released publicly! :D'},\n",
              " 'identifyingOrganisms': ['Human', 'Mouse', 'Other'],\n",
              " 'primaryWrangler': '5ece3464ec0680746267e784',\n",
              " 'secondaryWrangler': None,\n",
              " 'wranglingState': 'Eligible',\n",
              " 'wranglingPriority': 1,\n",
              " 'wranglingNotes': 'This is an awesome project and I will finish it soon',\n",
              " 'isInCatalogue': True,\n",
              " 'cataloguedDate': '2022-08-31T14:06:06.624Z',\n",
              " 'publicationsInfo': None,\n",
              " 'dcpReleaseNumber': None,\n",
              " 'projectLabels': None,\n",
              " 'hasOpenSubmission': False,\n",
              " '_links': {'self': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/630f6ab106f1711fccbe4d69'},\n",
              "  'project': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/630f6ab106f1711fccbe4d69',\n",
              "   'title': 'A single project'},\n",
              "  'validating': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/630f6ab106f1711fccbe4d69/validatingEvent'},\n",
              "  'bundleManifests': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/630f6ab106f1711fccbe4d69/bundleManifests',\n",
              "   'title': 'Access or create bundle manifests (describing which submitted contents went into which bundle in the datastore)'},\n",
              "  'auditLogs': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/630f6ab106f1711fccbe4d69/auditLogs'},\n",
              "  'supplementaryFiles': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/630f6ab106f1711fccbe4d69/supplementaryFiles'},\n",
              "  'submissionEnvelopes': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/630f6ab106f1711fccbe4d69/submissionEnvelopes',\n",
              "   'title': 'Access or create new submission envelopes'},\n",
              "  'submissionEnvelope': {'href': 'https://api.ingest.staging.archive.data.humancellatlas.org/projects/630f6ab106f1711fccbe4d69/submissionEnvelope',\n",
              "   'title': 'A single submission envelope'}}}"
            ]
          },
          "metadata": {},
          "execution_count": 20
        }
      ]
    }
  ]
}