# Kubernetes DNS-Based **Multicluster** Service Discovery - [0 - About This Document](#0---about-this-document) - [1 - Schema Version](#1---schema-version) - [2 - Resource Records](#2---resource-records) - [2.1 - Definitions](#21---definitions) - [2.2 - Record for Schema Version](#22---record-for-schema-version) - [2.3 - Records for a Service with ClusterSetIP](#23---records-for-a-service-with-clustersetip) - [2.3.1 - A/AAAA Record](#231---aaaaa-record) - [2.3.2 - SRV Records](#232---srv-records) - [2.3.3 - PTR Record](#233---ptr-record) - [2.3.4 - Records that MUST NOT exist for a Service with ClusterSetIP](#234---records-that-must-not-exist-for-a-service-with-clustersetip) - [2.4 - Records for a Multicluster Headless Service](#24---records-for-a-multicluster-headless-service) - [2.4.1 - A/AAAA Records](#241---aaaaa-records) - [2.4.2 - SRV Records](#242---srv-records) - [2.4.3 - PTR Records](#243---ptr-records) - [2.4.4 - Records that MUST NOT exist for a Multicluster Headless Service](#244---records-that-must-not-exist-for-a-multicluster-headless-service) ## 0 - About This Document This document is a specification for DNS-based Kubernetes service discovery for clusters implementing the [Multicluster Service API](README.md). ## 1 - Schema Version This document describes version 1.0.0 of the schema. ## 2 - Resource Records Any DNS-based service discovery solution for Kubernetes clusters implementing the Multicluster Services API must provide the resource records (RR) described below to be considered compliant with this specification. ### 2.1 - Definitions This proposal is intended as an extension of the [cluster-local Kubernetes DNS specification](https://github.com/kubernetes/dns/blob/master/docs/specification.md), and inherits its definitions from section 2.1 with the addition of the following: hostname = as already defined in the [cluster-local Kubernetes DNS specification](https://github.com/kubernetes/dns/blob/master/docs/specification.md), this refers (in brief): in order of precedence, to a) the value of the endpoint's `hostname` field, or b) a unique, system-assigned identifier for the endpoint. Of importance to highlight is that since the [default hostname of an endpoint is the Pod's `metadata.name` field](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields), this will likely often be the pod name, but not always, and implementations must prefer a directly specified `hostname` value. clusterset = as defined in [KEP-1645: Multi-Cluster Services API](README.md): “A placeholder name for a group of clusters with a high degree of mutual trust and shared ownership that share services amongst themselves. Membership in a clusterset is symmetric and transitive. The set of member clusters are mutually aware, and agree about their collective association. Within a clusterset, [namespace sameness](https://github.com/kubernetes/community/blob/master/sig-multicluster/namespace-sameness-position-statement.md) applies and all namespaces with a given name are considered to be the same namespace.” `` = domain for multi-cluster services in the clusterset, which must be `clusterset.local`; as this may become configurable in the future, this specification refers to it by the placeholder ``, but per the MCS API it currently must be defined to be `clusterset.local`. ClusterSetIP / `` / clusterset IP = as defined in [KEP-1645: Multi-Cluster Services API](README.md): “A non-headless ServiceImport is expected to have an associated IP address, the clusterset IP, which may be accessed from within an importing cluster. This IP may be a single IP used clusterset-wide or assigned on a per-cluster basis, but is expected to be consistent for the life of a ServiceImport from the perspective of the importing cluster. Requests to this IP from within a cluster will route to backends for the aggregated Service.” Cluster ID / `` = the cluster id stored in the `id.k8s.io ClusterProperty` as described in [KEP-2149: ClusterId for ClusterSet identification](../2149-clusterid/README.md). The recommended value is a kube-system namespace uid ( such as `721ab723-13bc-11e5-aec2-42010af0021e`). For ease of KEP readability, this document uses human readable names `cluster-a` and `cluster-b` to represent the cluster IDs of two clusters in a ClusterSet. ### 2.2 - Record for Schema Version Following the existing specification, clusters implementing multicluster DNS will contain an additional `TXT` record responding with the semantic version of the DNS schema used for the multi cluster DNS ``, also known in this specification as ``. - Question Example: - `dns-version.clusterset.local. IN TXT` - Answer Example: - `dns-version.clusterset.local. 28800 IN TXT “1.0.0”` ### 2.3 - Records for a Service with ClusterSetIP #### 2.3.1 - `A`/`AAAA` Record _Note: This section refers to `A` and `AAAA` record requirements. For Service objects that have dual-stack networking enabled, both `A` and `AAAA` records must be created to cover both IPv4 and IPv6 assigned ClusterSetIPs._ Given a ClusterIP type Service named `` in Namespace `` that has been exported via a name-mapped ServiceExport with name ``, and given its endpoints are accessible from a given cluster by the IP address ``, the following records must exist. If the `` is an IPv4 address, an `A` record of the following form must exist. * Record Format: * `..svc.. IN A ` * Question Example * `myservice.test.svc.clusterset.local. IN A` * Answer Example: * `myservice.test.svc.clusterset.local. 4 IN A 10.42.42.42` If the `` is an IPv6 address, an `AAAA` record of the following form must exist. * Record Format: * `..svc.. IN AAAA ` * Question Example: * `myservice.test.svc.clusterset.local. IN AAAA` * Answer Example: * `myservice.test.svc.clusterset.local. 4 IN AAAA 2001:db8::1` #### 2.3.2 - `SRV` Records For each port in an exported Service with name `` and number `` using protocol ``, an `SRV` record of the following form must exist. * Record Format: * `_._...svc.. IN SRV ..svc..` The priority `` and weight `` are numbers as described in [RFC2782](https://tools.ietf.org/html/rfc2782) and whose values are not prescribed by this specification. Unnamed ports do not have an `SRV` record. * Question Example: * `_https._tcp.myservice.test.svc.clusterset.local. IN SRV` * Answer Example: * `_https._tcp.myservice.test.svc.clusterset.local. 30 IN SRV 10 100 443 myservice.test.svc.clusterset.local.` The Additional section of the response may include the Service `A`/`AAAA` record referred to in the `SRV` record. #### 2.3.3 - `PTR` Record `PTR` records are not specified in any way for multicluster DNS and `PTR` records ending with the `` are **NOT** required. (See the DNS section of the [KEP-1645: Multi-Cluster Services API](README.md#no-ptr-records-necessary-for-multicluster-DNS) for more context.) #### 2.3.4 - Records that MUST NOT exist for a Service with ClusterSetIP ClusterSetIP Services **MUST NOT** have a record disambiguating to a single cluster's backends, ex. `...svc.`. This form is reserved for possible future use and as updates to the MCS API standard may define its use in a specific way, implementations must not use or depend on DNS records of this form. (See the DNS section of the [KEP-1645: Multi-Cluster Services API](README.md#not-allowing-cluster-specific-targeting-via-dns) for more context.) ### 2.4 - Records for a Multicluster Headless Service #### 2.4.1 - `A`/`AAAA` Records _Note: This section refers to `A` and `AAAA` record requirements. For Service objects that have dual-stack networking enabled, both `A` and `AAAA` records must be created to cover both IPv4 and IPv6 assigned pod IPs._ Given a headless Service named `` in Namespace `` that has been exported via a name-mapped ServiceExport with name ``, for a subset of _ready_ endpoints accessible across the cluster set with the IPv4 address ``, the following records must exist. The subset of _ready_ endpoints _may_ be all _ready_ endpoints, but the exact subset is implementation dependent due to performance restrictions and response size limit of the DNS server used, as the number of potential endpoints could be quite high depending on the number of backends exported across the ClusterSet. * Record Format: * `..svc.. IN A ` * Question Example * `myservice.test.svc.clusterset.local IN A` * Answer Example: * `myservice.test.svc.clusterset.local 4 IN A 10.42.42.42` * `myservice.test.svc.clusterset.local 4 IN A 10.10.10.10` There must also be an `A` record of the following form for each _ready_ endpoint in the same subset with hostname of ``, member cluster ID of ``, and IPv4 address ``. If there are multiple IPv4 addresses for a given hostname, then there must be one such `A` record returned for each IP. * Record Format: * `....svc.. IN A ` * Question Example: * `my-hostname.cluster-a.myservice.test.svc.clusterset.local. IN A` * Answer Example: * `my-hostname.cluster-a.myservice.test.svc.clusterset.local. 4 IN A 10.3.0.100` There must be an `AAAA` record each for a subset of _ready_ endpoints of the headless Service with IPv6 address `` as shown below. If there are no _ready_ endpoints for the headless Service, the answer should be `NXDOMAIN`. The subset of _ready_ endpoints _may_ be all _ready_ endpoints, but the exact subset is implementation dependent (as mentioned above). If both `A` and `AAAA` records exist, they must program the same subset of IPs. * Record Format: * `..svc.. IN AAAA ` * Question Example: * `headless.test.svc.clusterset.local. IN AAAA` * Answer Example: * `headless.test.svc.clusterset.local. 4 IN AAAA 2001:db8::1` * `headless.test.svc.clusterset.local. 4 IN AAAA 2001:db8::2` * `headless.test.svc.clusterset.local. 4 IN AAAA 2001:db8::3` There must also be an `AAAA` record of the following form for each ready endpoint in the same subset with hostname of ``, member cluster ID of ``, and IPv6 address ``. If there are multiple IPv6 addresses for a given hostname, then there must be one such `A` record returned for each IP. * Record Format: * `....svc.. IN AAAA ` * Question Example: * `my-hostname.cluster-a.test.svc.clusterset.local. IN AAAA` * Answer Example: * `my-hostname.cluster-a.test.svc.clusterset.local. 4 IN AAAA 2001:db8::1` #### 2.4.2 - `SRV` Records For each combination of _ready_ endpoint with _hostname_ of ``, member cluster ID of ``, and port in the Service with name `` and number `` using protocol ``, an `SRV` record of the following form must exist. * Record Format: * `_._...svc.. IN SRV ....svc..` This implies that if there are **N** _ready_ endpoints and the Service defines **M** named ports, there will be **N** X **M** **`SRV`** RRs for the Service. The priority `` and weight `` are numbers as described in [RFC2782](https://tools.ietf.org/html/rfc2782) and whose values are not prescribed by this specification. Unnamed ports do not have an `SRV` record. In the following example, the cluster ID for each answer example is in bold to emphasize that the union of records from all clusters are returned by a SRV record request. * Question Example: * `_https._tcp.headless.test.svc.clusterset.local. IN SRV` * Answer Example: * `_https._tcp.headless.test.svc.clusterset.local. 4 IN SRV 10 100 443 my-pet-1.`**`cluster-a`**`.headless.test.svc.clusterset.local.` * `_https._tcp.headless.test.svc.clusterset.local. 4 IN SRV 10 100 443 my-pet-2.`**`cluster-a`**`.headless.test.svc.clusterset.local.` * `_https._tcp.headless.test.svc.clusterset.local. 4 IN SRV 10 100 443 my-pet-3.`**`cluster-a`**`.headless.test.svc.clusterset.local.` * `_https._tcp.headless.test.svc.clusterset.local. 4 IN SRV 10 100 443 my-pet-1.`**`cluster-b`**`.headless.test.svc.clusterset.local.` * `_https._tcp.headless.test.svc.clusterset.local. 4 IN SRV 10 100 443 my-pet-2.`**`cluster-b`**`.headless.test.svc.clusterset.local.` * `_https._tcp.headless.test.svc.clusterset.local. 4 IN SRV 10 100 443 my-pet-3.`**`cluster-b`**`.headless.test.svc.clusterset.local.` The Additional section of the response may include the `A`/`AAAA` records referred to in the `SRV` records. #### 2.4.3 - `PTR` Records `PTR` records are not specified in any way for multicluster DNS and `PTR` records ending with the `` are **NOT** required. (See the DNS section of the [KEP-1645: Multi-Cluster Services API](README.md#no-ptr-records-necessary-for-multicluster-DNS) for more context.) #### 2.4.4 - Records that MUST NOT exist for a Multicluster Headless Service Multicluster Headless Services **MUST NOT** have a record disambiguating to a single cluster's backends, ex. `...svc.`. This form is reserved for possible future use and as updates to the MCS API standard may define its use in a specific way, implementations must not use or depend on DNS records of this form. (See the DNS section of the [KEP-1645: Multi-Cluster Services API](README.md#not-allowing-cluster-specific-targeting-via-dns) for more context.)