--- name: find-missing-translations description: Use when comparing Android strings.xml locale files to find untranslated string resources, missing translation keys, or preparing translation work for a specific language --- # Find Missing Translations ## Overview Extract string resource keys from the default `values/strings.xml` that are absent in a target locale's `strings.xml`, excluding non-translatable entries. Outputs missing keys and offers to translate them. ## When to Use - Need to find untranslated strings for a specific locale - Preparing a batch of strings for a translator - Checking translation coverage after adding new features ## Background: Crowdin strip-identical behavior This repo syncs translations via Crowdin (branch `l10n_crowdin_translations`). Crowdin's default export behavior **omits any translation that exactly equals the source**, so a key that the translator deliberately kept as English (common for brand terms like `"Nowhere Drop"`, single-word loanwords like `"Apps"` / `"Feed"` / `"Issues"`, or version prefixes like `"v%1$s"`) will not appear in the locale's `strings.xml` even though the Crowdin UI shows it as 100% translated. What this means for this skill: 1. **The raw on-disk diff is the candidate set.** A key missing from a locale file is either genuinely untranslated *or* a source-identical entry Crowdin stripped. Both are reported; the human decides which to skip. The Crowdin web UI ("N untranslated") is the ground truth for what genuinely needs work. 2. **Source-identical entries are a small, recognizable minority.** Brand terms (`Nowhere X`), single-word loanwords (`Apps` / `Feed` / `Issues`), and bare version/format strings (`v%1$s`) are the usual cases. Skip these by inspection rather than translating them to something identical. 3. **Don't add source-identical fallbacks.** Android falls back to `values/strings.xml` at runtime, so a key intentionally kept as English already renders correctly, and Crowdin's next sync would strip a local duplicate anyway. > **Historical note:** an earlier version of this skill tried to auto-filter the > candidate list with a git "sync-timestamp" heuristic (skip any key added before > the last `New Crowdin translations` commit). It was **dropped** because it > produced false negatives: a key added shortly before an export that translators > simply hadn't reached yet is genuinely missing, but the heuristic classified it > as "Crowdin already decided." Trust the raw diff + the Crowdin UI instead. ## Target Locales The default set of locales (unless the user specifies otherwise): | Locale | Language | Directory | |--------|----------|-----------| | `cs-rCZ` | Czech | `values-cs-rCZ` | | `pt-rBR` | Brazilian Portuguese | `values-pt-rBR` | | `sv-rSE` | Swedish | `values-sv-rSE` | | `de-rDE` | German | `values-de-rDE` | ## Technique ### 1. Identify files ``` Default: amethyst/src/main/res/values/strings.xml Target: amethyst/src/main/res/values-/strings.xml ``` ### 2. Find missing keys using cs-rCZ as reference Always diff against `cs-rCZ` first — it is the most complete locale and serves as the reference. Any keys missing in `cs-rCZ` will also be missing in the other target locales. You MUST diff **both** `` in the source will never appear in a `` diff. Forgetting `` is the most common silent failure of this skill (it misses things like `music_playlist_track_count`, `notification_count_more`, etc.). ```bash # Strings: extract translatable keys from default (exclude translatable="false") echo "=== missing ===" comm -23 \ <(grep ' ===" comm -23 \ <(grep '` translations need the per-locale CLDR category set (see Step 5 → "Plurals: handle with care"). Crowdin can asymmetrically strip keys across locales (each translator independently chose source-identical for different keys), so **cs-rCZ is not a reliable upper bound**. Diff **every** target locale and union the results — don't assume the cs-rCZ set covers the others. A quick per-locale count is a useful sanity check against the Crowdin UI's "N untranslated": ```bash for locale in cs-rCZ de-rDE sv-rSE pt-rBR; do ns=$(comm -23 \ <(grep '` is a single line; `` is a multi-line block — handle each appropriately. ```bash # Missing : full line from default strings.xml while IFS= read -r key; do grep "name=\"$key\"" amethyst/src/main/res/values/strings.xml done < <(comm -23 \ <(grep ': extract the multi-line block (opening tag through ) while IFS= read -r key; do awk -v key="$key" ' $0 ~ "/ { in_p = 0 } ' amethyst/src/main/res/values/strings.xml done < <(comm -23 \ <(grep '` resource — not a ``. Hardcoding `1` in English forces every translator to either also hardcode `1` (breaking languages where the `one` category covers other numbers, e.g. some Slavic languages) or to silently change the meaning. 2. **A `%d` / `%1$d` placeholder in a clearly singular/plural sentence** (e.g. `"%1$d reply"`, `"%d follower"`). Even though the placeholder is parameterised, English-only `one`/`other` agreement won't survive translation into languages that need `few`/`many`. Also **audit existing `` resources** for two anti-patterns: 1. **`quantity="one"` items that hardcode the literal `1`** (instead of using a `%d` / `%1$d` placeholder) — broken for languages where the `one` CLDR category covers more than just `n=1` (Russian, Ukrainian, Croatian, etc.). 2. **`quantity="zero"` items in any locale that doesn't natively use the `zero` CLDR category** — i.e. **everything except Arabic (`ar`) and Welsh (`cy`)**. ICU/CLDR maps `count=0` to `other` for English and all the locales we ship to (cs, de, pt-BR, sv, etc.), so `` is **dead code** there: `getQuantityString(id, 0)` will pick `other`, never the zero entry, and the visible runtime string ends up `"…0 items"` instead of the intended `"…no items"`. If a UX genuinely wants special "no items" wording at count=0, that has to be a call-site `if (count == 0)` branch to a separate ``, **not** a `quantity="zero"` plural item. Flag and offer to fix: ```bash # Scan every locale's strings.xml for entries that # hardcode "1" (or other literal digits) instead of using a placeholder. # Looks at default + all values-* locales. for f in amethyst/src/main/res/values/strings.xml amethyst/src/main/res/values-*/strings.xml; do awk -v file="$f" ' / and <) text = $0; sub(/^[^>]*>/, "", text); sub(/<.*$/, "", text) # Flag if it contains a digit AND no %d / %1$d placeholder if (text ~ /[0-9]/ && text !~ /%[0-9]*\$?d/) { print file ": one=\"" text "\"" } } /<\/plurals>/ { in_plurals = 0 } ' "$f" done ``` Then scan for dead `quantity="zero"` entries. CLDR's `zero` category is integer-bearing only in **Arabic (`ar`)** and **Welsh (`cy`)**. In every other locale, count=0 falls through to `other`, so a `` entry is dead and likely a translator/author bug (or it silently never fires): ```bash for f in amethyst/src/main/res/values/strings.xml amethyst/src/main/res/values-*/strings.xml; do # Skip Arabic and Welsh — they natively use the zero category. case "$f" in *values-ar*|*values-cy*) continue ;; esac awk -v file="$f" ' /]*>/, "", text); sub(/<.*$/, "", text) print file ": zero=\"" text "\"" } /<\/plurals>/ { in_plurals = 0 } ' "$f" done ``` For each hit, warn the user that the entry is unreachable in that locale. The fix is to **remove the ``** and, if the UX wanted distinct wording for count=0, add a separate `` plus an `if (count == 0)` branch at the call site (see "Plurals: handle with care" below). Quick scan over the missing keys: ```bash # Flag missing English values that look like they should be while IFS= read -r key; do line=$(grep "name=\"$key\"" amethyst/src/main/res/values/strings.xml) # Hardcoded standalone "1" (word-boundary), or a count placeholder followed by a likely-countable noun if echo "$line" | grep -qE '>([^<]*\b1\b[^<]*|[^<]*%[0-9]*\$?d[^<]*)<'; then echo "PLURAL CANDIDATE: $line" fi done < <(comm -23 \ <(grep ' ⚠️ `notification_count` is `"1 new reply"` — this hardcodes `"1"` and should likely be a `` resource (e.g. `quantity="one"` → `"%d new reply"`, `quantity="other"` → `"%d new replies"`). Convert before translating? Do not silently translate plural-shaped `` entries; the wrong shape will then need to be fixed in every locale. ### 5. Present results and ask to translate Output the missing entries as raw XML resource lines (copy-paste ready): ```xml Valid Valid from %1$s Lists ``` Also check `` and `` tags using the same approach if the project uses them. #### Plurals: handle with care When adding or proposing **``** entries, follow these rules: - **Never hardcode `"1"`** in the English text of a `quantity="one"` item. Use the format placeholder (e.g. `%1$d` / `%d`) so the runtime substitutes the actual count. Hardcoding `"1"` breaks every language whose `one` category covers numbers other than 1 (e.g. some Slavic languages). - **Don't assume `one` + `other` is enough.** CLDR plural categories vary by language: `zero`, `one`, `two`, `few`, `many`, `other`. Always include **every category the target language uses**, not just the categories present in English. Examples: - English (`en`): `one`, `other` - Czech (`cs`): `one`, `few`, `many`, `other` - Polish (`pl`): `one`, `few`, `many`, `other` - Russian (`ru`): `one`, `few`, `many`, `other` - Arabic (`ar`): `zero`, `one`, `two`, `few`, `many`, `other` - German / Swedish / Brazilian Portuguese: `one`, `other` - When a missing string contains a count placeholder and is conceptually a singular/plural pair, **flag it before translating** — it may belong as a `` resource rather than a single ``. Surface this to the user before proposing translations. - **Do not use `quantity="zero"` outside Arabic (`ar`) and Welsh (`cy`).** CLDR's `zero` category is integer-bearing only in those two languages. Android calls `PluralRules.select(0)` for the device locale; in English/German/Czech/Polish/Russian/Swedish/Portuguese/etc. it returns `other`, so the explicit `` is never picked at runtime and the user sees `"…0 items"` instead of the intended wording. If the design calls for "no items" at count=0, model it as a separate `` and an `if (count == 0)` branch at the call site: ```kotlin val label = if (count == 0) { stringRes(R.string.foo_no_items, dateLabel) } else { pluralStringResource(R.plurals.foo_items, count, dateLabel, count) } ``` - Reference: [Android `` docs](https://developer.android.com/guide/topics/resources/string-resource#Plurals) and [CLDR plural rules](https://unicode-org.github.io/cldr-staging/charts/latest/supplemental/language_plural_rules.html). **Then ask the user:** "Would you like me to translate these missing strings into [list of target locales]?" ### 6. Adding translations (if approved) When adding translated strings to locale files: - **Append new strings at the bottom** of the file, just before the closing `` tag. - Do NOT try to insert them in alphabetical or matching order — a separate process handles ordering. ## Common Mistakes - **Forgetting `translatable="false"`** — these should never appear in locale files - **Diffing only `` is a separate resource type; a source `` missing from a locale will never show up in a `` diff. Always run the diff twice (once per resource type) as shown in Step 2. The same goes for `` if the project uses it. - **Trusting a git "sync-timestamp" heuristic to pre-filter the list** — this skill used to skip keys added before the last `New Crowdin translations` commit, on the theory that Crowdin had already "decided" them. It was dropped: a key added shortly before an export that translators hadn't reached yet is genuinely missing, so the heuristic silently dropped real work. Use the raw on-disk diff and reconcile against the Crowdin web UI's untranslated count instead. - **Adding source-identical fallbacks locally** — they get overwritten on the next Crowdin sync. Android falls back to `values/strings.xml` at runtime anyway, so a key intentionally kept as English already renders correctly. Skip these by inspection (brand terms, loanwords, `v%1$s`-style strings); don't translate them to an identical value. - **Skipping per-locale diffs when only diffing cs-rCZ** — Crowdin can strip different keys in different locales (each translator's choice), so cs-rCZ is not a reliable upper bound. Diff each target locale and union the results. - **Inserting strings in a specific position** — always append at the bottom; ordering is handled separately - **Hardcoding `"1"` in a `` `quantity="one"` item** — always use the count placeholder; otherwise non-English `one` categories produce wrong text - **Copying English's `one`/`other` set into every locale** — each language must include all CLDR plural categories it uses (e.g. Czech needs `one`, `few`, `many`, `other`) - **Using `` to special-case count=0** — outside Arabic and Welsh, this entry is unreachable: ICU/CLDR maps 0 → `other`, so the runtime never picks the zero item and the user sees `"…0 items"`. Special-case at the call site with a separate `` instead.