{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>\n",
       "  function code_toggle() {\n",
       "    if (code_shown){\n",
       "      $('div.input').hide('500');\n",
       "      $('#toggleButton').val('Show Code')\n",
       "    } else {\n",
       "      $('div.input').show('500');\n",
       "      $('#toggleButton').val('Hide Code')\n",
       "    }\n",
       "    code_shown = !code_shown\n",
       "  }\n",
       "\n",
       "  $( document ).ready(function(){\n",
       "    code_shown=false;\n",
       "    $('div.input').hide()\n",
       "  });\n",
       "</script>\n",
       "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n",
       "<style>\n",
       ".rendered_html td {\n",
       "    font-size: xx-large;\n",
       "    text-align: left; !important\n",
       "}\n",
       ".rendered_html th {\n",
       "    font-size: xx-large;\n",
       "    text-align: left; !important\n",
       "}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<script>\n",
    "  function code_toggle() {\n",
    "    if (code_shown){\n",
    "      $('div.input').hide('500');\n",
    "      $('#toggleButton').val('Show Code')\n",
    "    } else {\n",
    "      $('div.input').show('500');\n",
    "      $('#toggleButton').val('Hide Code')\n",
    "    }\n",
    "    code_shown = !code_shown\n",
    "  }\n",
    "\n",
    "  $( document ).ready(function(){\n",
    "    code_shown=false;\n",
    "    $('div.input').hide()\n",
    "  });\n",
    "</script>\n",
    "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n",
    "<style>\n",
    ".rendered_html td {\n",
    "    font-size: xx-large;\n",
    "    text-align: left; !important\n",
    "}\n",
    ".rendered_html th {\n",
    "    font-size: xx-large;\n",
    "    text-align: left; !important\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext tikzmagic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Multilingual and Multicultural NLP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "+ Multilingual NLP\n",
    "+ Transfer learning\n",
    "+ Multicultural NLP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Linguistic diversity and inclusion\n",
    "\n",
    "Language distribution in the world is highly skewed:\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/0e141942fa265142f41a2a26eb17b6005d3af29e/4-Table1-1.png\" width=90%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2020.acl-main.560/\">Joshi et al., 2020</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### NLP across the resource landscape\n",
    "\n",
    "Resource distribution is skewed as well:\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/0e141942fa265142f41a2a26eb17b6005d3af29e/3-Figure2-1.png\" width=65%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2020.acl-main.560/\">Joshi et al., 2020</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src=\"dl-applications-figures/WS_mapping.png\" width=\"100%\"/>\n",
    "\n",
    "Source: http://ai.stanford.edu/blog/weak-supervision/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "## How to get more labeled training data in your language?\n",
    "\n",
    "* Translating the training set\n",
    "* Translating the test set\n",
    "* Language-agnostic representations\n",
    "* Cross-lingual transfer learning\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/96c7f39c358343b6ea412697ed693fdd04a71516/2-Figure1-1.png\" width=\"70%\">\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/W18-0550/\">Horbach et al., 2020</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Annotation projection\n",
    "\n",
    "* When translating data (train or test), we need to **project the labels**.\n",
    "* Easy for text classification, but what about structured prediction, e.g., sequence labeling?\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/00cc6deb3cf2c9281ddcf4875aad3ee14c92e52f/3-Figure2-1.png\" width=\"70%\">\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/D19-1100v2/\">Jain et al., 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Common approach: projection based on word alignment.\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/00cc6deb3cf2c9281ddcf4875aad3ee14c92e52f/5-Figure5-1.png\" width=\"70%\">\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/D19-1100v2/\">Jain et al., 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Language-agnostic representations\n",
    "\n",
    "Instead of words, use **delexicalized** features, such as (universal) parts of speech.\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/7c0e45e79dd7fe664a020712f7ac697f60c1567d/2-Figure1-1.png\" width=\"90%\">\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/N13-1126/\">Täckström et al., 2013</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Transfer learning\n",
    "\n",
    "Train on one task/domain/language, and use (or test) on another. Many different flavors:\n",
    "<img src=\"https://ruder.io/content/images/2019/08/transfer_learning_taxonomy.png\" width=70%>\n",
    "\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://ruder.io/thesis/neural_transfer_learning_for_nlp.pdf#page=64\">Ruder, 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Multi-task learning\n",
    "\n",
    "Simple - training one model with multiple objectives simultaneously:\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/161ffb54a3fdf0715b198bb57bd22f910242eb49/19-Figure1.2-1.png\" width=40%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.cs.cornell.edu/~caruana/mlj97.pdf\">Caruana, 1997</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Weight sharing can happen on different levels:\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/ade0c116120b54b57a91da51235108b75c28375a/1-Figure1-1.png\" width=50%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.aclweb.org/anthology/D17-1206\">Hashimoto et al, 2017</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Domain adaptation\n",
    "\n",
    "*Domain*: intuitively, a collection of texts from a certain coherent sort of discourse ([Plank, 2011](https://pure.rug.nl/ws/portalfiles/portal/10469718/09complete.pdf)).\n",
    "\n",
    "For example, product reviews for DVDs, electronics, and kitchen goods:\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/e505303ba5287f468773fbef22ab1abf6875efca/1-Figure1-1.png\" width=45%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2020.emnlp-main.639/\">Wright & Augenstein, 2020</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Domain adaptation methods\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/11342d45911ee8a7c9e3a94117ce774ad7036172/2-Figure1-1.png\" width=80%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2020.coling-main.603/\">Ramponi & Plank, 2020</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## A more general view on generalization\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/cc006a0790ba6fcd994f051e4ebb1af1ad322cc2/9-Figure3-1.png\" width=80%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://arxiv.org/abs/2210.03050\">Hupkes et al., 2022</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/cc006a0790ba6fcd994f051e4ebb1af1ad322cc2/4-Figure1-1.png\" width=80%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://arxiv.org/abs/2210.03050\">Hupkes et al., 2022</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Sequential transfer learning\n",
    "\n",
    "Train general-purpose models on large datasets as representations, then use them for specific tasks.\n",
    "\n",
    "Type-level representations ([word embeddings](dl-representations_simple.ipynb)):\n",
    "- word2vec ([Mikolov et al., 2013](https://arxiv.org/abs/1301.3781))\n",
    "- GloVe ([Pennington et al., 2014](https://www.aclweb.org/anthology/D14-1162))\n",
    "- ...\n",
    "\n",
    "Token-level representations ([contextualised representations](dl-representations_contextual.ipynb)):\n",
    "- ELMo ([Peters et al., 2018](https://www.aclweb.org/anthology/N18-1202))\n",
    "- BERT ([Devlin et al., 2019](https://www.aclweb.org/anthology/N19-1423))\n",
    "- ...\n",
    "\n",
    "Language models ([tranformer language models](dl-representations_contextual_transformers.ipynb)):\n",
    "- GPT-2 ([Radford et al., 2018](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf))\n",
    "- T5 ([Raffel et al., 2019](https://arxiv.org/abs/1910.10683))\n",
    "- BART ([Lewis et al., 2020](https://aclanthology.org/2020.acl-main.703/))\n",
    "- GPT-3 ([Brown et al., 2020](https://arxiv.org/abs/2005.14165))\n",
    "- T0 ([Sanh et al., 2021](https://arxiv.org/abs/2110.08207))\n",
    "- FLAN ([Wei et al., 2021](https://arxiv.org/abs/2109.01652))\n",
    "- ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Paradigm 1: pre-train and fine-tune\n",
    "\n",
    "LM variations (such as ELMo and BERT) can be fine-tuned with extra task-specific parameters:\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/df2b0e26d0599ce3e70df8a9da02e51594e0e992/3-Figure1-1.png\" width=70%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.aclweb.org/anthology/N19-1423.pdf\">Devlin et al., 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Fine-tuning without extra parameters\n",
    "\n",
    "Encoder-decoder LMs (such as T5) can be directly fine-tuned on any task (in principle) posed **as text-to-text**.\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://1.bp.blogspot.com/-89OY3FjN0N0/XlQl4PEYGsI/AAAAAAAAFW4/knj8HFuo48cUFlwCHuU5feQ7yxfsewcAwCLcBGAsYHQ/s640/image2.png\" width=70%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://arxiv.org/abs/1910.10683\">Raffel et al., 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Paradigm 2: prompting\n",
    "\n",
    "Large enough LMs (such as GPT-3) are capable of few-shot **in-context learning** without fine-tuning any parameters.\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/ff0b2681d7b05e16c46dfb71d980cc2f605907cd/3-Figure4-1.png\" width=90%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://arxiv.org/abs/2109.01652\">Wei et al., 2021</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Generalizing across tasks\n",
    "\n",
    "GPT-3 and multitask LMs (such as T0 and FLAN) are also capable of **zero-shot** task generalization:\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://raw.githubusercontent.com/yongzx/bigscience-workshop.github.io/gh-pages/en/pages/uploads/images/Octopus.png\" width=70%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://arxiv.org/abs/2110.08207\">Sanh et al., 2021</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Cross-lingual transfer learning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Multilingual representations\n",
    "\n",
    "Trained on large, combined multilingual corpora *without cross-lingual supervision*.\n",
    "- mBERT ([Devlin et al., 2019](https://www.aclweb.org/anthology/N19-1423))\n",
    "- XLM, XLM-R ([Conneau et al., 2020](https://aclanthology.org/2020.acl-main.747/))\n",
    "- mBART ([Liu et al., 2020](https://aclanthology.org/2020.tacl-1.47/))\n",
    "- mT5 ([Xue et al., 2021](https://aclanthology.org/2021.naacl-main.41/))\n",
    "- XGLM ([Lin et al., 2021](https://arxiv.org/abs/2112.10668))\n",
    "- ...\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/d40e5a99078a9e759d3c2484e03f5794013a1455/3-Figure1-1.png\" width=80%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2020.acl-main.747/\">Conneau et al., 2020</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "# Massively multilingual models\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/74276a37bfa50f90dfae37f767b2b67784bd402a/3-Table1-1.png\" width=80%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2021.naacl-main.41/\">Xue et al., 2021</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zero-shot cross-lingual transfer\n",
    "\n",
    "1. Pre-train multilingual representation\n",
    "2. Fine-tune on target task in English\n",
    "3. Predict on target language"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "mBERT is unreasonably effective at cross-lingual transfer!\n",
    "\n",
    "NER F1:\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/809cc93921e4698bde891475254ad6dfba33d03b/2-Table1-1.png\" width=80%/>\n",
    "</center>\n",
    "\n",
    "POS accuracy:\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/809cc93921e4698bde891475254ad6dfba33d03b/2-Table2-1.png\" width=80%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.aclweb.org/anthology/P19-1493\">Pires et al., 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Why? (poll)\n",
    "\n",
    "See also [K et al., 2020](https://arxiv.org/pdf/1912.07840.pdf);\n",
    "[Wu and Dredze., 2019](https://www.aclweb.org/anthology/D19-1077.pdf)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Few-shot cross-lingual transfer\n",
    "\n",
    "1. Pre-train multilingual representation\n",
    "2. Fine-tune on target task in English\n",
    "3. **Fine-tune on a few examples in target language**\n",
    "4. Predict on target language"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/85dc7829455819283270eb643817bcf97133464d/7-Table4-1.png\" width=80%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2020.emnlp-main.363/\">Lauscher et al., 2020</a>)\n",
    "</div>\n",
    "\n",
    "\n",
    "Few-shot is often much better than zero-shot: a few examples can go a long way."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Exercise: multilingual pre-training for MT\n",
    "Consider a machine translation system using a multilingual pre-trained encoder and decoder.\n",
    "<a href=\"https://docs.google.com/forms/d/14osHSWt2D3cpg4QTbqm3wHkYtSzLWcVeRMhymQdMZj0/edit\"><img src=\"../img/vauquois-2021.png\"></a>\n",
    "\n",
    "+ It depends on the encoding and model.\n",
    "+ If the performance is good, the attention should incorporate also pragmatic differences. But it might be difficult in practice.\n",
    "+ Because it encodes the input into a numerical representation of the meaning that is language-independent."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Multicultural knowledge and factual alignment\n",
    "\n",
    "Language models don't just learn *languages* — they also encode **world knowledge** learned from the corpora they were trained on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Probing factual knowledge: LAMA\n",
    "\n",
    "The **LAMA (Language Model Analysis)** benchmark ([Petroni et al., 2019](https://aclanthology.org/D19-1250/)) probes whether LMs recall simple factual triples such as:\n",
    "\n",
    "> (Paris, capital_of, ?) → “France”\n",
    "\n",
    "by converting knowledge base facts (e.g., from Wikidata) into *cloze prompts*:\n",
    "\n",
    "> \"Paris is the capital of [MASK].\"\n",
    "\n",
    "Performance is measured as the model’s accuracy in predicting the correct object token.\n",
    "\n",
    "<center>\n",
    "    <img src=\"../img/lama.png\" width=40%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/D19-1250/\">Petroni et al., 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "But what happens when this knowledge is **language- or culture-specific**?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### mLAMA: cross-lingual knowledge probing\n",
    "\n",
    "The **mLAMA** benchmark ([Kassner et al., 2021](https://aclanthology.org/2021.eacl-main.284/)) extends LAMA to **53 languages**, testing how well multilingual LMs like mBERT or XLM-R remember the *same* facts across languages.\n",
    "\n",
    "Key finding:  \n",
    "> LMs often “know” the fact in English but fail to recall it in other languages — even when pre-trained multilingually.\n",
    "\n",
    "This reveals **unequal knowledge retention** and **cultural skew** toward English-centric corpora.\n",
    "\n",
    "<center>\n",
    "    <img src=\"../img/mlama.png\" width=70%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2021.eacl-main.284/\">Kassner et al., 2021</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Cross-lingual factual alignment and cultural bias\n",
    "\n",
    "Even if multilingual models can **translate**, they may not **align cultural knowledge** across languages.\n",
    "\n",
    "For instance:\n",
    "- The same country may have **different facts** (leaders, population stats) in local-language sources.\n",
    "- Cultural entities (e.g., “Thanksgiving”, “Diwali”) have **asymmetric representation**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "For example, the distribution of mentioned entities is highly skewed...\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/cdba1a75f59c0ff86397ba358637c17654d550e3/3-Figure2-1.png\" width=70%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2022.acl-long.239/\">Faisal et al., 2022</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "# Evaluating cross-lingual transfer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Multilingual evaluation benchmarks\n",
    "\n",
    "Collections of multilingual multi-task benchmarks:\n",
    "\n",
    "* XGLUE (<a href=\"https://aclanthology.org/2020.emnlp-main.484/\">Liang et al., 2020</a>)\n",
    "* XTREME (<a href=\"https://arxiv.org/abs/2003.11080\">Hu et al., 2020</a>): includes **TyDiQA**\n",
    "* XTREME-R (<a href=\"https://aclanthology.org/2021.emnlp-main.802/\">Ruder et al., 2021</a>) \n",
    "\n",
    "<center>\n",
    "    <a href=\"https://sites.research.google/xtreme\">\n",
    "    <img src=\"https://1.bp.blogspot.com/-5J6e2txWChk/XpSc_BaYFnI/AAAAAAAAFss/QCLROHrEutAN3GvOyfRzK8J7DA9yLY5GACLcBGAsYHQ/s640/XTREME%2BStill%2Bart_04%2Broboto.png\" width=\"60%\">\n",
    "    </a>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://arxiv.org/abs/2003.11080\">Hu et al., 2020</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Performance on XTREME\n",
    "\n",
    "<center>\n",
    "    <img src=\"../img/xtreme.png\" width=\"80%\">\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2021.emnlp-main.802\">Ruder et al., 2021</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Performance on TyDiQA-GoldP\n",
    "\n",
    "<center>\n",
    "    <img src=\"../img/tydiqa.png\" width=\"80%\">\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2021.emnlp-main.802\">Ruder et al., 2021</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Cross-lingual semantic parsing\n",
    "\n",
    "<img src=\"../img/mcwq.png\" width=80%/>\n",
    "\n",
    "(from [Cui et al., 2022](https://aclanthology.org/2022.tacl-1.55/))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Results\n",
    "\n",
    "Evaluated `mT5-small` by Exact Match (%).\n",
    "\n",
    "|  | En | He | Kn | Zh |\n",
    "| :--------------- | -- | -- | -- | -- |\n",
    "| Training on X, testing on X        | 38.3 | 29.3 | 31.5 | 36.3 |\n",
    "| Training on English, testing on X       | 38.3 | 0.2 | 0.3 | 0.2 |\n",
    "\n",
    "Zero-shot cross-lingual evaluation is much harder than monolingual evaluation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Cross-cultural NLP\n",
    "\n",
    "Not everything can be translated. Native language (and culture) data is important!\n",
    "\n",
    "<center>\n",
    "    <img src=\"../img/x-cultural_axes.png\" width=70%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2022.acl-long.482/\">Hershcovich et al., 2022</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Summary: From Multilingual to Multicultural NLP\n",
    "\n",
    "| 🌐 **Aspect** | **Multilingual NLP** | **Multicultural NLP** |\n",
    "|---------------|----------------------|------------------------|\n",
    "| **Main goal** | Cross-lingual *transfer* and *coverage* | Cross-cultural *representation* and *fairness* |\n",
    "| **Focus** | How well models generalize to new **languages** | How well models capture diverse **knowledge, values, and norms** |\n",
    "| **Key methods** | Translation, annotation projection, multilingual pre-training | Culturally grounded data, value-aligned evaluation, adaptation across societies |\n",
    "| **Evaluation benchmarks** | XTREME, XGLUE, TyDiQA... | mLAMA, BLEnD, NormAd, WorldValuesBench... |\n",
    "| **Challenges** | Linguistic diversity and data imbalance | Cultural bias, representational inequality, value alignment |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Further reading\n",
    "\n",
    "- [Ruder, 2017. An Overview of Multi-Task Learning in Deep Neural Networks](https://ruder.io/multi-task/)\n",
    "- [Ruder, 2019. The State of Transfer Learning in NLP](https://ruder.io/state-of-transfer-learning-in-nlp/)\n",
    "- [Ruder, 2016. A survey of cross-lingual word embedding models](https://ruder.io/cross-lingual-embeddings/)\n",
    "- [Søgaard, Anders; Vulic, Ivan; Ruder, Sebastian; Faruqui, Manaal. 2019. Cross-lingual word embeddings](https://www.morganclaypool.com/doi/abs/10.2200/S00920ED2V01Y201904HLT042)\n",
    "- [Wang, 2019. Cross-lingual transfer learning.](https://shanzhenren.github.io/csci-699-replnlp-2019fall/lectures/W6-L3-Cross_Lingual_Transfer.pdf)\n",
    "- [Pawar et al., 2025. Survey of Cultural Awareness in Language Models: Text and Beyond ](https://direct.mit.edu/coli/article/51/3/907/130804/Survey-of-Cultural-Awareness-in-Language-Models)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}