{ "cells": [ { "cell_type": "markdown", "id": "ca4dc9b7", "metadata": {}, "source": [ "# Global food trade data\n", "\n", "!!! info \"\"\n", " :octicons-person-16: **[Pablo Rosado](https://ourworldindata.org/team/pablo-rosado)** • :octicons-calendar-16: June 18, 2026 *(last edit)* • [**:octicons-mail-16: Feedback**](mailto:info@ourworldindata.org?subject=Feedback%20on%20technical%20publication%20-%20Analysing%20global%20food%20trade%20data)" ] }, { "cell_type": "markdown", "id": "ee1ed15b", "metadata": {}, "source": [ "\n", "## Introduction\n", "\n", "This is a technical companion to our article [\"_How does food get traded around the world?_\"](https://ourworldindata.org/how-does-food-get-traded-around-the-world). It examines [FAOSTAT's Detailed Trade Matrix (TM)](https://www.fao.org/faostat/en/#data/TM), and explains the methodology and choices behind our visualizations.\n", "\n", "### Content\n", "The rest of this publication is structured as follows:\n", "\n", "- **Definitions**: some clarifications on the terminology around food trade data.\n", "- **Set up**: initial steps to be able to run this file, and some basic processing.\n", "- **Coverage:** how complete the bilateral trade data is, across years, countries, and commodities.\n", "- **Reporting agreement:** why the exporting and importing countries often report different quantities for the same shipment, and how we tackle that.\n", "- **Estimating production and supply:** how we estimate an exporter's production and an importer's domestic supply, by combining two datasets.\n", "- **Combining items:** how we group certain food items that are different variants of the same product (e.g. nuts with and without shell).\n" ] }, { "cell_type": "markdown", "id": "ec1ce8de", "metadata": {}, "source": [ "## Definitions\n", "The TM dataset contains _item flows_.\n", "An item is a food product, either a primary commodity such as wheat or bananas, or a processed product such as meat (fish products are not included in the dataset).\n", "A flow is a trade exchange between two countries, namely an import or an export.\n", "\n", "Each flow is measured as a quantity (in tonnes, or number of animals) and as a monetary value (in dollars).\n", "Here we will analyse quantities of traded food (in tonnes).\n", "\n", "Each entry in the data corresponds to a trade flow reported by a certain country and year, with a given partner country.\n", "For example, in one entry, Brazil reports exporting ~4 million tonnes of soya beans to Argentina in 2023.\n", "\n", "In some edge cases, the reporter and partner countries coincide; we remove those cases." ] }, { "cell_type": "markdown", "id": "d9cdbd74", "metadata": {}, "source": [ "## Set up\n", "\n", "In this section I'll guide you through a few steps you'd need to be able to run the code in this file, and replicate its results.\n", "If you are only interested in reading the outcomes, feel free to skip this section.\n", "\n", "### Required libraries\n", "\n", "Whether you run this code locally or on Google Collab, there are some libraries you may need to install." ] }, { "cell_type": "code", "execution_count": null, "id": "2ad05630", "metadata": { "tags": [ "collapsed" ] }, "outputs": [], "source": [ "# Whether you run this code in your local computer or on Google Colab, you first need to install the required dependencies:\n", "# * owid-catalog: a library that lets you explore and load datasets from OWID.\n", "# * plotly: a library that lets you create interactive visualizations.\n", "%pip install --upgrade owid-catalog plotly > /dev/null 2>&1" ] }, { "cell_type": "markdown", "id": "dc10fa7f", "metadata": {}, "source": [ "Import necessary libraries and define some common variables." ] }, { "cell_type": "code", "execution_count": null, "id": "dd172ee7", "metadata": {}, "outputs": [], "source": [ "import contextlib\n", "import io\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import plotly.express as px\n", "import plotly.graph_objects as go\n", "from owid.catalog import fetch\n", "\n", "\n", "def load_owid_table(catalog_path):\n", " \"\"\"Load a table from OWID's public catalog, so this notebook runs anywhere (e.g. Colab).\n", "\n", " `catalog_path` is \"channel/namespace/version/dataset/table\". Until these datasets are\n", " published to the public catalog, this falls back to the local ETL data directory.\n", " \"\"\"\n", " # The catalog's loading spinner overwrites one terminal line with \"\\r\"; under nbconvert each\n", " # frame is captured as its own output line, so silence stdout while loading.\n", " with contextlib.redirect_stdout(io.StringIO()):\n", " try:\n", " tb = fetch(catalog_path)\n", " return tb.reset_index() if list(tb.index.names) != [None] else tb\n", " except Exception:\n", " from owid.catalog import Dataset\n", " from etl.paths import DATA_DIR\n", "\n", " channel, namespace, version, dataset, table = catalog_path.split(\"/\")\n", " return Dataset(DATA_DIR / channel / namespace / version / dataset).read(table, safe_types=False)\n", "\n", "\n", "tb = load_owid_table(\"garden/faostat/2026-05-07/faostat_tm/faostat_tm\")\n", "tb_scl = load_owid_table(\"garden/faostat/2026-02-25/faostat_scl/faostat_scl\")\n", "tb_food_trade = load_owid_table(\"garden/faostat/2026-05-07/food_trade/food_trade\")\n", "\n", "# A few curated items combine several FAO codes (a primary commodity plus a mechanically-derived\n", "# form whose trade FAO reports separately). The food_trade step sums them; this mirrors the\n", "# multi-code entries in food_trade.items.yaml. Maps display name -> {FAO code: short form label}.\n", "COMBINED_ITEMS = {\n", " \"Beef\": {867: \"bone-in\", 870: \"boneless\"},\n", " \"Pork\": {1035: \"bone-in\", 1038: \"boneless\"},\n", " \"Almonds\": {221: \"in-shell\", 231: \"shelled\"},\n", " \"Cashews\": {217: \"in-shell\", 230: \"shelled\"},\n", " \"Walnuts\": {222: \"in-shell\", 232: \"shelled\"},\n", " \"Hazelnuts\": {225: \"in-shell\", 233: \"shelled\"},\n", " \"Brazil nuts\": {216: \"in-shell\", 229: \"shelled\"},\n", " \"Groundnuts\": {242: \"in-shell\", 243: \"shelled\"},\n", " \"Rice\": {31: \"milled\", 32: \"broken\", 28: \"husked\"},\n", " \"Sugar\": {162: \"raw\", 164: \"refined\"},\n", "}\n", "\n", "# Map each curated FAO code to its display name. Combined items appear in the published table under\n", "# a synthetic id (100000 + first code), so we expand them back to their underlying FAO codes here,\n", "# letting analyses that join on the trade matrix's own codes still cover every curated item.\n", "curated_code_to_display = {}\n", "for code, item in dict(zip(tb_food_trade[\"item_code\"].astype(int), tb_food_trade[\"item\"].astype(str))).items():\n", " if item in COMBINED_ITEMS:\n", " for member in COMBINED_ITEMS[item]:\n", " curated_code_to_display[member] = item\n", " else:\n", " curated_code_to_display[code] = item" ] }, { "cell_type": "markdown", "id": "a89842f5", "metadata": {}, "source": [ "### Basic data preparation\n", "\n", "The TM dataset has ~52 million entries.\n", "\n", "We select only rows of item flow quantities (in tonnes); then the dataset contains ~26 million entries.\n", "\n", "We also drop rows of self-trade, where reporter and partner countries coincide (which are just a ~0.1% of the data).\n", "\n", "Around 12% of rows correspond to trade of zero tonnes (which we keep in the data)." ] }, { "cell_type": "code", "execution_count": null, "id": "d92eaed7", "metadata": {}, "outputs": [], "source": [ "# Total number of entries\n", "# len(tb)\n", "\n", "# Basic data preparation: restrict to quantity reports in tonnes (this drops the\n", "# alternate-unit rows used for live animals — 'An', '1000 An', 'No' — and the\n", "# value-in-USD rows),\n", "tb = tb[(tb[\"unit\"] == \"t\")].reset_index(drop=True)\n", "# len(tb) * 1e-6\n", "\n", "# Drop self-trade rows (~0.1% of the data, where reporter\n", "# and partner are the same country).\n", "tb = tb[tb[\"reporter_country\"].astype(str) != tb[\"partner_country\"].astype(str)].reset_index(drop=True)\n", "# len(tb) * 1e-6\n", "\n", "# Share of rows with zero tonnes reported.\n", "# 100 * len(tb[tb[\"value\"] == 0]) / len(tb)" ] }, { "cell_type": "markdown", "id": "0e28cf08", "metadata": {}, "source": [ "## Coverage\n", "\n", "### List of item flows included\n", "\n", "The dataset has over 500 items.\n", "To avoid clutter, for the final custom visualization we have selected a curated list of items.\n", "However, in this analysis we will consider all items (except when we explicitly mention that we use the curated list).\n", "\n", "### Data provenance flags\n", "FAOSTAT flags each quantity row with its provenance.\n", "\n", "In general, quantity (in tonnes) and value (in dollars) are independently collected in the TM dataset.\n", "\n", "About 98% of rows in the TM dataset are flagged \"Official figure\"; the remaining ~2% are flagged as imputed, estimated, or sourced from an external organization.\n", "\n", "We keep all rows regardless of flag." ] }, { "cell_type": "code", "execution_count": 4, "id": "173b913d", "metadata": { "execution": { "iopub.execute_input": "2026-06-18T15:25:45.499673Z", "iopub.status.busy": "2026-06-18T15:25:45.499601Z", "iopub.status.idle": "2026-06-18T15:25:45.643298Z", "shell.execute_reply": "2026-06-18T15:25:45.642945Z" }, "tags": [ "collapsed" ] }, "outputs": [], "source": [ "# Double-check the flag-share numbers quoted above.\n", "imputed_flags = [\"Value imputed by a receiving agency\", \"Estimated value\", \"Figure from external organization\"]\n", "quantity_rows = tb[tb[\"element\"].isin([\"Import quantity\", \"Export quantity\"])]\n", "# Share of rows flagged \"Official figure\" across all years (claimed: ~98%).\n", "official_share = (quantity_rows[\"flag\"] == \"Official figure\").mean() * 100\n", "# Share of rows with a non-official flag across all years (claimed: ~2%).\n", "non_official_share = quantity_rows[\"flag\"].isin(imputed_flags).mean() * 100\n", "assert 97 < official_share < 99\n", "assert 1 < non_official_share < 3" ] }, { "cell_type": "markdown", "id": "ad0dc7fd", "metadata": {}, "source": [ "### Year coverage\n", "\n", "We would ideally show food trade figures for the most recent year available in the data.\n", "However, the latest year may be affected by incomplete reporting.\n", "\n", "The following chart helps us decide which year to choose.\n", "It shows the number of rows, as well as the number of distinct reporting countries, for each year in the dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "ef84887e", "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "