openapi: 3.0.1
info:
  title: Task Execution Service
  version: 1.0.0
  x-logo:
    url: 'https://www.ga4gh.org/wp-content/themes/ga4gh-theme/gfx/GA-logo-horizontal-tag-RGB.svg'
  description: >
    ## Executive Summary

    The Task Execution Service (TES) API is a standardized schema and API for
    describing and executing batch execution tasks. A task defines a set of
    input files, a set of containers and commands to run, a set of
    output files and some other logging and metadata.


    TES servers accept task documents and execute them asynchronously on
    available compute resources. A TES server could be built on top of
    a traditional HPC queuing system,
    such as Grid Engine, Slurm or cloud style compute systems such as AWS Batch
    or Kubernetes.

    ## Introduction

    This document describes the TES API and provides details on the specific
    endpoints, request formats, and responses. It is intended to provide key
    information for developers of TES-compatible services as well as clients
    that will call these TES services. Use cases include:

      - Deploying existing workflow engines on new infrastructure. Workflow engines
      such as CWL-Tes and Cromwell have extentions for using TES. This will allow
      a system engineer to deploy them onto a new infrastructure using a job scheduling
      system not previously supported by the engine.

      - Developing a custom workflow management system. This API provides a common
      interface to asynchronous batch processing capabilities. A developer can write
      new tools against this interface and expect them to work using a variety of
      backend solutions that all support the same specification.


    ## Standards

    The TES API specification is written in OpenAPI and embodies a RESTful service
    philosophy. It uses JSON in requests and responses and standard
    HTTP/HTTPS for information transport. HTTPS should be used rather than plain HTTP
    except for testing or internal-only purposes.

    ### Authentication and Authorization

    Is is envisaged that most TES API instances will require users to authenticate to use the endpoints.
    However, the decision if authentication is required should be taken by TES API implementers.


    If authentication is required, we recommend that TES implementations use an OAuth2  bearer token, although they can choose other mechanisms if appropriate.


    Checking that a user is authorized to submit TES requests is a responsibility of TES implementations.

    ### CORS

    If TES API implementation is to be used by another website or domain it must implement Cross Origin Resource Sharing (CORS).
    Please refer to https://w3id.org/ga4gh/product-approval-support/cors for more information about GA4GH’s recommendations and how to implement CORS.


servers:
- url: /ga4gh/tes/v1
paths:
  /service-info:
    get:
      tags:
      - TaskService
      summary: GetServiceInfo
      description: |-
        Provides information about the service, this structure is based on the
        standardized GA4GH service info structure. In addition, this endpoint
        will also provide information about customized storage endpoints offered
        by the TES server.
      operationId: GetServiceInfo
      responses:
        200:
          description: ""
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/tesServiceInfo'
  /tasks:
    get:
      tags:
      - TaskService
      summary: ListTasks
      description: |-
        List tasks tracked by the TES server. This includes queued, active and completed tasks.
        How long completed tasks are stored by the system may be dependent on the underlying
        implementation.
      operationId: ListTasks
      parameters:
      - name: name_prefix
        in: query
        description: |-
          OPTIONAL. Filter the list to include tasks where the name matches this prefix.
          If unspecified, no task name filtering is done.
        schema:
          type: string
      - name: page_size
        in: query
        description: |-
          Optional number of tasks to return in one page.
          Must be less than 2048. Defaults to 256.
        schema:
          type: integer
          format: int64
      - name: page_token
        in: query
        description: |-
          OPTIONAL. Page token is used to retrieve the next page of results.
          If unspecified, returns the first page of results. The value can be found
          in the `next_page_token` field of the last returned result of ListTasks
        schema:
          type: string
      - $ref: '#/components/parameters/view'

      responses:
        200:
          description: ""
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/tesListTasksResponse'
    post:
      tags:
      - TaskService
      summary: CreateTask
      description: |-
        Create a new task. The user provides a Task document, which the server
        uses as a basis and adds additional fields.
      operationId: CreateTask
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/tesTask'
        required: true
      responses:
        200:
          description: ""
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/tesCreateTaskResponse'
      x-codegen-request-body-name: body
  /tasks/{id}:
    get:
      tags:
      - TaskService
      summary: GetTask
      description: |-
        Get a single task, based on providing the exact task ID string.
      operationId: GetTask
      parameters:
      - name: id
        in: path
        required: true
        description: ID of task to retrieve.
        schema:
          type: string
      - $ref: '#/components/parameters/view'
      responses:
        200:
          description: ""
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/tesTask'
  /tasks/{id}:cancel:
    post:
      tags:
      - TaskService
      summary: CancelTask
      description: Cancel a task based on providing an exact task ID.
      operationId: CancelTask
      parameters:
      - name: id
        in: path
        description: ID of task to be canceled.
        required: true
        schema:
          type: string
      responses:
        200:
          description: ""
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/tesCancelTaskResponse'
components:
  parameters:
    view:
      name: view
      in: query
      description: |-
        OPTIONAL. Affects the fields included in the returned Task messages.

        `MINIMAL`: Task message will include ONLY the fields:
        - `tesTask.Id`
        - `tesTask.State`

        `BASIC`: Task message will include all fields EXCEPT:
        - `tesTask.ExecutorLog.stdout`
        - `tesTask.ExecutorLog.stderr`
        - `tesInput.content`
        - `tesTaskLog.system_logs`

        `FULL`: Task message includes all fields.
      schema:
        type: string
        default: MINIMAL
        enum:
        - MINIMAL
        - BASIC
        - FULL

  schemas:
    tesCancelTaskResponse:
      type: object
      description: CancelTaskResponse describes a response from the CancelTask endpoint.
    tesCreateTaskResponse:
      required:
      - id
      type: object
      properties:
        id:
          type: string
          description: Task identifier assigned by the server.
      description: |-
        CreateTaskResponse describes a response from the CreateTask endpoint. It
        will include the task ID that can be used to look up the status of the job.
    tesExecutor:
      required:
      - command
      - image
      type: object
      properties:
        image:
          type: string
          example: ubuntu:20.04
          description: |-
            Name of the container image. The string will be passed as the image
            argument to the containerization run command. Examples:
               - `ubuntu`
               - `quay.io/aptible/ubuntu`
               - `gcr.io/my-org/my-image`
               - `myregistryhost:5000/fedora/httpd:version1.0`
        command:
          type: array
          description: |-
            A sequence of program arguments to execute, where the first argument
            is the program to execute (i.e. argv). Example:
            ```
            {
              "command" : ["/bin/md5", "/data/file1"]
            }
            ```
          items:
            type: string
          example: ["/bin/md5", "/data/file1"]
        workdir:
          type: string
          description: |-
            The working directory that the command will be executed in.
            If not defined, the system will default to the directory set by
            the container image.
          example: /data/
        stdin:
          type: string
          description: |-
            Path inside the container to a file which will be piped
            to the executor's stdin. This must be an absolute path. This mechanism
            could be used in conjunction with the input declaration to process
            a data file using a tool that expects STDIN.

            For example, to get the MD5 sum of a file by reading it into the STDIN
            ```
            {
              "command" : ["/bin/md5"],
              "stdin" : "/data/file1"
            }
            ```
          example: "/data/file1"
        stdout:
          type: string
          description: |-
            Path inside the container to a file where the executor's
            stdout will be written to. Must be an absolute path. Example:
            ```
            {
              "stdout" : "/tmp/stdout.log"
            }
            ```
          example: "/tmp/stdout.log"
        stderr:
          type: string
          description: |-
            Path inside the container to a file where the executor's
            stderr will be written to. Must be an absolute path. Example:
            ```
            {
              "stderr" : "/tmp/stderr.log"
            }
            ```
          example: "/tmp/stderr.log"
        env:
          type: object
          additionalProperties:
            type: string
          description: |-
            Enviromental variables to set within the container. Example:
            ```
            {
              "env" : {
                "ENV_CONFIG_PATH" : "/data/config.file",
                "BLASTDB" : "/data/GRC38",
                "HMMERDB" : "/data/hmmer"
              }
            }
            ```
          example:
            "BLASTDB" : "/data/GRC38"
            "HMMERDB" : "/data/hmmer"
      description: Executor describes a command to be executed, and its environment.
    tesExecutorLog:
      required:
      - exit_code
      type: object
      properties:
        start_time:
          type: string
          description: Time the executor started, in RFC 3339 format.
          example: 2020-10-02T10:00:00-05:00
        end_time:
          type: string
          description: Time the executor ended, in RFC 3339 format.
          example: 2020-10-02T11:00:00-05:00
        stdout:
          type: string
          description: |-
            Stdout content.

            This is meant for convenience. No guarantees are made about the content.
            Implementations may chose different approaches: only the head, only the tail,
            a URL reference only, etc.

            In order to capture the full stdout client should set Executor.stdout
            to a container file path, and use Task.outputs to upload that file
            to permanent storage.
        stderr:
          type: string
          description: |-
            Stderr content.

            This is meant for convenience. No guarantees are made about the content.
            Implementations may chose different approaches: only the head, only the tail,
            a URL reference only, etc.

            In order to capture the full stderr client should set Executor.stderr
            to a container file path, and use Task.outputs to upload that file
            to permanent storage.
        exit_code:
          type: integer
          description: Exit code.
          format: int32
      description: ExecutorLog describes logging information related to an Executor.
    tesFileType:
      type: string
      default: FILE
      enum:
      - FILE
      - DIRECTORY
    tesInput:
      required:
      - path
      - type
      type: object
      properties:
        name:
          type: string
        description:
          type: string
        url:
          type: string
          description: |-
            REQUIRED, unless "content" is set.

            URL in long term storage, for example:
             - s3://my-object-store/file1
             - gs://my-bucket/file2
             - file:///path/to/my/file
             - /path/to/my/file
          example: s3://my-object-store/file1
        path:
          type: string
          description: |-
            Path of the file inside the container.
            Must be an absolute path.
          example: /data/file1
        type:
          $ref: '#/components/schemas/tesFileType'
        content:
          type: string
          description: |-
            File content literal.

            Implementations should support a minimum of 128 KiB in this field
            and may define their own maximum.

            UTF-8 encoded

            If content is not empty, "url" must be ignored.
      description: Input describes Task input files.
    tesListTasksResponse:
      required:
      - tasks
      type: object
      properties:
        tasks:
          type: array
          description: |-
            List of tasks. These tasks will be based on the original submitted
            task document, but with other fields, such as the job state and
            logging info, added/changed as the job progresses.
          items:
            $ref: '#/components/schemas/tesTask'
        next_page_token:
          type: string
          description: |-
            Token used to return the next page of results. This value can be used
            in the `page_token` field of the next ListTasks request.
      description: ListTasksResponse describes a response from the ListTasks endpoint.
    tesOutput:
      required:
      - path
      - type
      - url
      type: object
      properties:
        name:
          type: string
          description: User-provided name of output file
        description:
          type: string
          description: Optional users provided description field, can be used for documentation.
        url:
          type: string
          description: |-
            URL for the file to be copied by the TES server after the task is complete.
            For Example:
             - `s3://my-object-store/file1`
             - `gs://my-bucket/file2`
             - `file:///path/to/my/file`
        path:
          type: string
          description: |-
            Path of the file inside the container.
            Must be an absolute path.
        type:
          $ref: '#/components/schemas/tesFileType'
      description: Output describes Task output files.
    tesOutputFileLog:
      required:
      - path
      - size_bytes
      - url
      type: object
      properties:
        url:
          type: string
          description: URL of the file in storage, e.g. s3://bucket/file.txt
        path:
          type: string
          description: Path of the file inside the container. Must be an absolute
            path.
        size_bytes:
          type: string
          description: |-
            Size of the file in bytes. Note, this is currently coded as a string
            because official JSON doesn't support int64 numbers.
          format: int64
          example:
            - "1024"
      description: |-
        OutputFileLog describes a single output file. This describes
        file details after the task has completed successfully,
        for logging purposes.
    tesResources:
      type: object
      properties:
        cpu_cores:
          type: integer
          description: Requested number of CPUs
          format: int64
          example: 4
        preemptible:
          type: boolean
          description: |-
            Define if the task is allowed to run on preemptible compute instances,
            for example, AWS Spot. This option may have no effect when utilized
            on some backends that don't have the concept of preemptible jobs.
          format: boolean
          example: false
        ram_gb:
          type: number
          description: Requested RAM required in gigabytes (GB)
          format: double
          example: 8
        disk_gb:
          type: number
          description: Requested disk size in gigabytes (GB)
          format: double
          example: 40
        zones:
          type: array
          description: |-
            Request that the task be run in these compute zones. How this string
            is utilized will be dependent on the backend system. For example, a
            system based on a cluster queueing system may use this string to define
            priorty queue to which the job is assigned.
          items:
            type: string
          example: us-west-1
      description: Resources describes the resources requested by a task.
    tesServiceType:
      allOf:
      - $ref: 'https://raw.githubusercontent.com/ga4gh-discovery/ga4gh-service-info/v1.0.0/service-info.yaml#/components/schemas/ServiceType'
      - type: object
        required:
        - artifact
        properties:
          artifact:
            type: string
            enum: [tes]
            example: tes
    tesServiceInfo:
      allOf:
      - $ref: 'https://raw.githubusercontent.com/ga4gh-discovery/ga4gh-service-info/v1.0.0/service-info.yaml#/components/schemas/Service'
      - type: object
        properties:
          storage:
            type: array
            description: |-
              Lists some, but not necessarily all, storage locations supported
              by the service.
            items:
              type: string
            example:
               - file:///path/to/local/funnel-storage
               - s3://ohsu-compbio-funnel/storage
          type:
            $ref: '#/components/schemas/tesServiceType'
    tesState:
      type: string
      readOnly: True
      description: |-
        Task state as defined by the server.

         - `UNKNOWN`: The state of the task is unknown. The cause for this status
          message may be dependent on the underlying system. The `UNKNOWN` states
          provides a safe default for messages where this field is missing so
          that a missing field does not accidentally imply that
          the state is QUEUED.
         - `QUEUED`: The task is queued and awaiting resources to begin computing.
         - `INITIALIZING`: The task has been assigned to a worker and is currently preparing to run.
        For example, the worker may be turning on, downloading input files, etc.
         - `RUNNING`: The task is running. Input files are downloaded and the first Executor
        has been started.
         - `PAUSED`: The task is paused. The reasons for this would be tied to
          the specific system running the job. An implementation may have the ability
          to pause a task, but this is not required.
         - `COMPLETE`: The task has completed running. Executors have exited without error
        and output files have been successfully uploaded.
         - `EXECUTOR_ERROR`: The task encountered an error in one of the Executor processes. Generally,
        this means that an Executor exited with a non-zero exit code.
         - `SYSTEM_ERROR`: The task was stopped due to a system error, but not from an Executor,
        for example an upload failed due to network issues, the worker's ran out
        of disk space, etc.
         - `CANCELED`: The task was canceled by the user.
      default: UNKNOWN
      example: COMPLETE
      enum:
      - UNKNOWN
      - QUEUED
      - INITIALIZING
      - RUNNING
      - PAUSED
      - COMPLETE
      - EXECUTOR_ERROR
      - SYSTEM_ERROR
      - CANCELED
    tesTask:
      required:
      - executors
      type: object
      properties:
        id:
          type: string
          description: Task identifier assigned by the server.
          readOnly: true
          example: job-0012345
        state:
          $ref: '#/components/schemas/tesState'
        name:
          type: string
          description: User-provided task name.
        description:
          type: string
          description: |-
            Optional user-provided description of task for documentation purposes.
        inputs:
          type: array
          description: |-
            Input files that will be used by the task. Inputs will be downloaded
            and mounted into the executor container as defined by the task request
            document.
          items:
            $ref: '#/components/schemas/tesInput'
          example:
            - { "url" : "s3://my-object-store/file1", "path" : "/data/file1" }
        outputs:
          type: array
          description: |-
            Output files.
            Outputs will be uploaded from the executor container to long-term storage.
          items:
            $ref: '#/components/schemas/tesOutput'
          example:
            - { "path" : "/data/outfile", "url" : "s3://my-object-store/outfile-1", type: "FILE" }
        resources:
          $ref: '#/components/schemas/tesResources'
        executors:
          type: array
          description: |-
            An array of executors to be run. Each of the executors will run one
            at a time sequentially. Each executor is a different command that
            will be run, and each can utilize a different docker image. But each of
            the executors will see the same mapped inputs and volumes that are declared
            in the parent CreateTask message.

            Execution stops on the first error.
          items:
            $ref: '#/components/schemas/tesExecutor'
        volumes:
          type: array
          example:
            - "/vol/A/"
          description: |-
            Volumes are directories which may be used to share data between
            Executors. Volumes are initialized as empty directories by the
            system when the task starts and are mounted at the same path
            in each Executor.

            For example, given a volume defined at `/vol/A`,
            executor 1 may write a file to `/vol/A/exec1.out.txt`, then
            executor 2 may read from that file.

            (Essentially, this translates to a `docker run -v` flag where
            the container path is the same for each executor).
          items:
            type: string
        tags:
          type: object
          example:
            "WORKFLOW_ID" : "cwl-01234"
            "PROJECT_GROUP" : "alice-lab"

          additionalProperties:
            type: string
          description: |-
            A key-value map of arbitrary tags. These can be used to store meta-data
            and annotations about a task. Example:
            ```
            {
              "tags" : {
                  "WORKFLOW_ID" : "cwl-01234",
                  "PROJECT_GROUP" : "alice-lab"
              }
            }
            ```
        logs:
          type: array
          description: |-
            Task logging information.
            Normally, this will contain only one entry, but in the case where
            a task fails and is retried, an entry will be appended to this list.
          readOnly: true
          items:
            $ref: '#/components/schemas/tesTaskLog'
        creation_time:
          type: string
          description: |-
            Date + time the task was created, in RFC 3339 format.
            This is set by the system, not the client.
          example: 2020-10-02T10:00:00-05:00
          readOnly: true
      description: Task describes an instance of a task.
    tesTaskLog:
      required:
      - logs
      - outputs
      type: object
      properties:
        logs:
          type: array
          description: Logs for each executor
          items:
            $ref: '#/components/schemas/tesExecutorLog'
        metadata:
          type: object
          additionalProperties:
            type: string
          description: Arbitrary logging metadata included by the implementation.
          example:
            host: worker-001
            slurmm_id: 123456
        start_time:
          type: string
          description: When the task started, in RFC 3339 format.
          example: 2020-10-02T10:00:00-05:00
        end_time:
          type: string
          description: When the task ended, in RFC 3339 format.
          example: 2020-10-02T11:00:00-05:00
        outputs:
          type: array
          description: |-
            Information about all output files. Directory outputs are
            flattened into separate items.
          items:
            $ref: '#/components/schemas/tesOutputFileLog'
        system_logs:
          type: array
          description: |-
            System logs are any logs the system decides are relevant,
            which are not tied directly to an Executor process.
            Content is implementation specific: format, size, etc.

            System logs may be collected here to provide convenient access.

            For example, the system may include the name of the host
            where the task is executing, an error message that caused
            a SYSTEM_ERROR state (e.g. disk is full), etc.

            System logs are only included in the FULL task view.
          items:
            type: string
      description: TaskLog describes logging information related to a Task.