# Base Data Format This document describes the toplevel keys of the data format. See this directory's [README](README.md) for the basic concepts. ## Specification ```JSON { "annotations": {}, "backend_version": "", "bucket_date": "2019-10-10", "data_format_version": "0.2.0", "extensions": {}, "id": "bc1ff44a-04e7-45e0-81a6-46bc95c7c6b0", "input": "http://example.com/", "input_hashes": [], "measurement_start_time": "2019-10-10 23:59:23", "options": [], "probe_asn": "AS13285", "probe_network_name": "TalkTalk Communications Limited", "probe_cc": "GB", "probe_city": "", "probe_ip": "127.0.0.1", "report_filename": "2019-10-10/20191010T235813Z-GB-AS13285-web_connectivity-20191010T235815Z_AS13285_SCHbEXPZ59vF8wmd6SHGGCaPxYGiEg8tSPwN85fJIFHrG4ZfVP-0.2.0-probe.json", "report_id": "20191010T235815Z_AS13285_SCHbEXPZ59vF8wmd6SHGGCaPxYGiEg8tSPwN85fJIFHrG4ZfVP", "resolver_asn": "AS15169", "resolver_ip": "8.8.8.8", "resolver_network_name": "Google LLC", "software_name": "ooniprobe-ios", "software_version": "2.1.0", "test_helpers": {}, "test_keys": {}, "test_name": "web_connectivity", "test_runtime": 2.2955930233, "test_start_time": "2019-10-10 23:58:13", "test_version": "0.0.1" } ``` - `annotations` (`map[string]string`; optional): key-value annotations to the report that provide metadata to this measurement. See below. - `backend_version` (`string`; optional): version of the backend that has collected this specific measurement. Note that clients of course are not supposed to emit this field. - `bucket_date` (`string`): a date like `"2006-01-02"` that indicates when this measurement was processed by the data pipeline. Note that clients of course are not supposed to emit this field. - `data_format_version` (`string`): indicates the data format version. See [README.md](README.md) for the current version and for the versions history. - `extensions` (`map[string]int`; optional): SHOULD describe the extensions to the base data format included in the `test_keys` field. The name of an extension is the obtained directly from the file name describing the extension in this directory, with the `df-xxx` prefix and the `.md` suffix removed. A probe SHOULD describe the extensions included by its measurements. - `id` (`string`; optional): client-generated UUID4 identifying this measurement in the context of a set of measurements (i.e. a report). Consumers of OONI data SHOULD NOT trust this identifier to uniquely identify the measurement. This identifier is only meaningful for measurements that have not been submitted to a OONI collector yet. In fact, OONI collectors SHOULD clear this field to avoid any potential confusion caused by it. - `input` (`string`; nullable): if this experiment accepts any input, the input that was used to produce this measurement. For example, the Web Connectivity experiment uses URLs as input. Otherwise, this field SHOULD be present and set to `null`. - `input_hashes` (`[]string`; optional; deprecated): historical field that used to contain the SHA256s of all inputs provided to the experiment. Modern implementations, e.g. Measurement Kit, typically emit an empty list. All modern clients SHOULD NOT emit this field at all. - `measurement_start_time` (`string`): time when this measurement was started in UTC, using the `"2006-01-02 08:04:05"` format. Note that ooniprobe <= 1.4.0 generates skewed time information. - `options` (`[]string`; optional): list of options passed on the command line when running this specific experiment. Modern implementations, e.g. Measurement Kit, typically emit an empty list here. There is a use case for using this field when you are allowing users to heavily customise the experiment; for this reason we record the option being used by our experimental, research-oriented client `miniooni`. - `probe_asn` (`string`): AS Number of the probe (prefixed by AS, e.g., `"AS1234"`), or `"AS0"` if the user does not want to share their ASN. - `probe_network_name` (`string`; optional; since `2020-04-22`): The organisation name corresponding to the AS of the probe. - `probe_cc` (`string`): two letter country code of the probe (e.g., `"IT"`) or `"ZZ"` if the user does not want to share their country code. - `probe_city` (`string`; optional; deprecated): name of the city where the measurement was run. If the user does not want to share this information, this field was historically set to `null`; modern clients SHOULD NOT emit it. - `probe_ip` (`string`): IP address of the probe, or `"127.0.0.1"` if the user does not want to share their IP. - `report_filename` (`string`): name of the file containing the report, i.e., a set of related measurements, in our infrastructure. Note that clients of course are not supposed to emit this field. - `report_id` (`string`): identifier of a set of related measurements generated by OONI backends when submitting one or more measurements. - `resolver_asn` (`string`; optional; since `2019-12-29`): like `probe_asn` but for `resolver_ip` rather than for `probe_ip`. - `resolver_ip` (`string`; optional; since `2019-11-11`): IP of the DNS resolver used by the probe, as determined by the measurement engine. - `resolver_network_name` (`string`; optional; since `2019-12-29`): like `probe_network_name` but for `resolver_ip` rather than for `probe_ip`. - `software_name` (`string`): name of the software that has generated this specific measurement (e.g., `"ooniprobe"`). - `software_version` (`string`): version of the software used to generate this specific measurement (e.g., `"3.0.0"`). - `test_helpers` (`map[string]any`): map containing information regarding what test helpers have been used for running this measurement. See below for more information regarding this field's format. - `test_keys` (`object`): object containing specific keys that depend upon the specific network experiment that we're running as well as upon the specific test helpers that are used. - `test_name` (`string`): name of the experiment in snake case. For example, Web Connectivity SHOULD be indicated as `"web_connectivity"`. - `test_runtime` (`float`): runtime of this specific measurement in seconds with arbitrary sub-seconds precision. All modern implementations, i.e. Measurement Kit and `github.com/ooni/probe-engine`, measure this value as the time elapsed since when we start measuring a specific input (or when we start an experiment without input) until when the measurement is complete (i.e. all the fields inside `test_keys` have been computed or there has been an error or timeout causing the measurement to be aborted and the error to be recorded inside it). This specifically means that this field does not include the time spent communicating with OONI backends such as the bouncer and the collector, but it includes the communication with any backend that is required to finish off the measurement (e.g. the Web Connectivity test helper). Note that this field's name is misleading and it should have been called `measurement_runtime` instead. - `test_start_time` (`string`): like `measurement_start_time` except that it indicates the moment in which a related set of measurements started rather than the moment where the current measurement started. For example, for the Web Connectivity experiment, this is the momement where we start processing a list of input URLs. ## Annotations ```JSON { "engine_name": "libmeasurement_kit", "engine_version": "0.10.4", "engine_version_full": "v0.10.4", "network_type": "wifi", "ooni_run_link_id": "123456", "platform": "ios", "architecture": "arm64" } ``` Annotations is defined as `map[string]string` but the consumer of this field SHOULD NOT assume that measurements use string values. A client SHOULD always add to the map of annotations: - `architecture` (`string`): one of `arm`, `arm64`, `386`, `amd64` - `engine_name` (`string`): the name of the measurement engine - `engine_version` (`string`): the version of the measurement engine - `engine_version_full` (`string`): the version of the measurement engine as generated by `git describe --tags` - `go_version` (`string`): the version of Go we're using - `network_type` (`string`): one of: - `mobile`: when OONI Probe Mobile is using 2G/3G/4G/5G networks. - `wifi`: when OONI Probe Mobile is using Wi-Fi networks. - `ooni_run_link_id` (`string`): the OONI-Run-v2 link ID that caused this measurement to be performed. - `platform` (`string`): one of: - `android` - `freebsd` - `ios` - `lepidopter` - `linux` - `macos` - `windows` - `vcs_modified` (`string`): `"true"` or `"false"` depending on whether the tree used for building was dirty - `vcs_revision` (`string`): the revision we're building - `vcs_time` (`string`): the time of the revision we're building - `vcs_tool` (`string`): the version control system (VCS) tool we're using, which typically should be `"git"` ## Test Helpers Historically we have saved into `test_helpers` two different data structures: ```JSON {"backend": "1.1.1.1:853"} ``` used, e.g., by HTTP Invalid Request Line, and ```JSON "backend": { "address": "https://mia-wcth.ooni.io", "type": "https" } ``` used, e.g., by Web Connectivity. The former is typically used when there can only be a single type of backend. The latter when more types are possible. ## Example In the following example we omitted the content of `test_keys` because it was not relevant for this discussion. ```JSON { "annotations": { "platform": "macos" }, "data_format_version": "0.2.0", "extensions": { "dnst": 0, "httpt": 0, "tcpconnect": 0 }, "input": null, "measurement_start_time": "2020-01-10 17:25:19", "probe_asn": "AS30722", "probe_network_name": "Vodafone Italia S.p.A.", "probe_cc": "IT", "probe_ip": "127.0.0.1", "report_id": "20200110T172519Z_AS30722_5UdG13d6rEfOVCTHEdMjuXGah8vF6dpShA0jditnrHCmH10o1K", "resolver_asn": "AS15169", "resolver_ip": "172.217.34.2", "resolver_network_name": "Google LLC", "software_name": "miniooni", "software_version": "0.1.0-dev", "test_keys": {}, "test_name": "telegram", "test_runtime": 4.426603178, "test_start_time": "2020-01-10 17:25:19", "test_version": "0.0.4" } ```