--- name: entry-guidelines description: General quality standards for all je-dict-1 dictionary entries. Use when creating or revising any entry type. --- # Dictionary Entry Quality Guidelines When creating or revising dictionary entries for je-dict-1, follow these quality standards: ## CRITICAL: Write Each Entry Individually **DO NOT use Python scripts or automation to mass-produce entries.** Each dictionary entry must be written individually by hand, using: - Your own linguistic knowledge - The guidelines in this skill and related skills (`verb-entry`, `adjective-entry`, `particle-entry`, `other-entries`, `vocabulary-notes`) - Careful consideration of each word's unique characteristics **Why this matters:** - Each word has nuances that require individual attention - Examples must be natural and contextually appropriate - Notes should address learner-specific challenges for that word - Mass-produced entries lack the quality and depth learners need **The correct workflow:** 1. Select a word from `candidate_words.json` or user request 2. Research/consider the word's usage, collocations, and common patterns 3. Write the entry JSON directly using the Write tool 4. Validate: `python3 build/validate.py` 5. Repeat for each entry **After finishing all entries for a session:** ```bash python3 build/validate.py # Validate all entries python3 build/update_indexes.py # Update indexes and sync candidates python3 build/build_flat.py # Rebuild website (REQUIRED for GitHub Pages) git add entries/ docs/ *.json PROJECT_STATUS.md git commit -m "Add N new dictionary entries" git push ``` The `build_flat.py` step is critical - without it, new entries won't appear on the live site. The build uses an atomic process (builds to temp directory, then swaps) to prevent broken states if the build fails. **Never create scripts that generate entry content programmatically.** ## Before Creating a New Entry **IMPORTANT**: Always check if an entry already exists before creating a new one. ### Duplicate Definition **A word is a duplicate ONLY if BOTH the headword AND reading match exactly.** - **Homophones** (same reading, different headword) are **NOT duplicates** and should have separate entries - Example: 線香 (せんこう) and 先行 (せんこう) are different words - Example: 橋/箸/端 (all はし) are different words - **Homographs** (same headword, different reading) are **NOT duplicates** and should have separate entries - Example: 行く (いく) and 行く (ゆく) are different readings - Example: 明日 (あした) and 明日 (あす) are different readings ### Duplicate Check Process 1. **Run the duplicate check script**: ```bash python3 build/check_duplicate.py "食べる" "たべる" ``` - If it says "OK: ... is not in the dictionary or candidates" → Safe to create entry - If it says "DUPLICATE: ..." → SKIP this word, do NOT create a duplicate - **Informational notes** about homophones/homographs do NOT block entry creation 2. **Batch checking** (optional, to plan which candidates to work on): ```bash python3 build/check_duplicate.py --batch "食べる:たべる" "飲む:のむ" "書く:かく" ``` 3. **If the word was in candidate_words.json**: It will be automatically removed when you run `python3 build/update_indexes.py` after creating the entry. 4. **Only create new entries** for words that pass the duplicate check. This prevents duplicate entries and wasted effort on entries that must later be deleted. ## Content Guidelines 1. **Explain before exemplifying** - Definition first, then examples 2. **One meaning = one example minimum** - Every sense needs illustration 3. **Show grammatical connections** - Always demonstrate how words connect 4. **Prefer natural Japanese** - Avoid textbook stiffness 5. **Highlight non-obvious distinctions** - Focus on what learners cannot infer from English ## Consistency Guidelines 1. **Consistent depth across similar entries** - Don't over-explain one verb while under-explaining another 2. **Consistent structure within entry types** - All verbs should have same sections 3. **Consistent terminology** - Use same labels throughout (USAGE NOTES, not sometimes Notes) ## Example Sentence Guidelines **See the `example-sentences` skill for complete guidelines on:** - Minimum example counts per tier (5 for basic/core, 3 for general) - Progressive length requirements - Vocabulary restrictions by tier - Quality standards and formatting ### Key Requirements Summary 1. **Minimum counts**: Basic/core tiers need 5 examples per sense; general tier needs 3 2. **Progressive length**: Examples should get longer from first to last 3. **Vocabulary restrictions**: Basic tier examples must use tier-appropriate vocabulary 4. **Always include sense_numbers**: Every example must specify which definition sense(s) it illustrates ### Sense Numbers Requirement Every example sentence **must** have a `sense_numbers` field that links it to the definition(s) it illustrates: ```json "examples": [ { "id": "00001_word_ex1", "japanese": "...", "english": "...", "sense_numbers": [1] } ] ``` **Rules:** - **Single-sense entries**: Use `[1]` for all examples - **Multi-sense entries**: Each example must specify which sense(s) it demonstrates - **Examples illustrating multiple senses**: Use `[1, 2]` format - **Must reference valid senses**: Numbers must match `sense_number` values in definitions The validation script checks that all examples in multi-sense entries have valid sense_numbers. ## Furigana Requirements (CRITICAL) **All kanji MUST have furigana in ALL fields, including notes.** Format: `{漢字|かんじ}` This applies to: - Headwords - Example sentences - **Notes field** (idioms, collocations, cultural notes, etc.) - All explanatory text **Common mistakes to avoid:** ``` ✗ WRONG: 暖簾に腕押し ✓ RIGHT: {暖簾|のれん}に{腕押|うでお}し ✗ WRONG: 安堵の息をつく ✓ RIGHT: {安堵|あんど}の{息|いき}をつく ✗ WRONG: Sometimes written as 家鴨 ✓ RIGHT: Sometimes written as {家鴨|あひる} ``` Use compound readings for jukugo: `{友達|ともだち}` not `{友|とも}{達|だち}` **Verify before finalizing:** ```bash python3 build/verify_furigana.py ``` ## Entry Structure Every entry must include: - `id`: Format `{5-digit-number}_{romaji}` (e.g., `00396_taberu`) - `headword`: With furigana notation - `reading`: **Hiragana only** (see Reading Format below) - `part_of_speech`: Consistent terminology - `gloss`: Brief English equivalent - `definitions`: Array with sense_number, gloss, explanation - `examples`: 2-3 minimum, with id, Japanese, English, sense_numbers, and optional notes - `notes`: Usage notes, grammar patterns, common mistakes (see `vocabulary-notes` skill for formatting requirements) - `metadata`: Including vocabulary_tier (**always "general" for new entries**), created, modified timestamps ## Reading Format (CRITICAL) **All readings MUST be in hiragana, never katakana.** This applies to ALL entries, including: - Loanwords (katakana headwords like スキー, ストレージ) - Abbreviations (DM, PC, etc.) - Any word regardless of how the headword is written **Examples:** ``` ✓ CORRECT: headword: "スキー" reading: "すきー" ✓ CORRECT: headword: "DM" reading: "でぃーえむ" ✗ WRONG: headword: "スキー" reading: "スキー" ← Katakana readings cause duplicates! ``` **Why this matters:** - Katakana readings cause duplicate entries (same word with two different reading formats) - The dictionary uses readings for indexing and deduplication - Consistent hiragana readings ensure proper sorting and lookup **Note:** The long vowel mark `ー` is acceptable in hiragana readings (e.g., `すきー`, `すとれーじ`) since there is no hiragana equivalent. The validation script (`validate.py`) will report errors for entries with katakana readings. ## File Placement (CRITICAL) **Entries MUST be placed in the correct numeric range directory.** The path follows this pattern: `entries/{range}/{entry_id}.json` The **range directory** is determined by the numeric portion of the entry ID, rounded down to the nearest 500: - IDs 00000-00499 → `entries/00000/` - IDs 00500-00999 → `entries/00500/` - IDs 01000-01499 → `entries/01000/` - etc. ### Examples - Entry `00396_taberu` → `entries/00000/00396_taberu.json` - Entry `00538_aruku` → `entries/00500/00538_aruku.json` - Entry `01186_mukau` → `entries/01000/01186_mukau.json` - Entry `06237_fumikiru` → `entries/06000/06237_fumikiru.json` ### How to Get the Correct Path **ALWAYS run this command** to determine the correct path before writing: ```bash python3 build/get_entry_path.py ``` Example: ```bash python3 build/get_entry_path.py ふみきる 06237_fumikiru # Output: entries/06000/06237_fumikiru.json python3 build/get_entry_path.py こうりつてき 06240_kouritsuteki # Output: entries/06000/06240_kouritsuteki.json ``` The `validate.py` script checks for directory mismatches and will report errors. ## Metadata Timestamps **CRITICAL**: Timestamps MUST be actual current UTC time. The website converts UTC to JST (+9 hours) for display. Incorrect timestamps will show as wrong dates/times (often appearing hours or days in the future). ### How to Get the Correct Timestamp **ALWAYS run this command** to get the current UTC timestamp before writing each entry: ```bash python3 build/get_timestamp.py ``` This outputs the current UTC time, e.g.: `2026-01-12T10:45:30Z` Copy this exact output into both `created` and `modified` fields (for new entries) or just `modified` (for revisions). ### Why This Matters - The `Z` suffix means UTC (not local time, not JST) - The build script adds 9 hours to convert to JST for display - If you write `16:00:00Z` when actual UTC is `10:00`, it displays as **01:00 JST next day** (wrong!) - If you write `10:00:00Z` when actual UTC is `10:00`, it displays as **19:00 JST same day** (correct!) ### Common Mistakes to Avoid 1. **DO NOT** guess or estimate the timestamp 2. **DO NOT** use your perception of current time - always run the script 3. **DO NOT** use round hours like `12:00:00Z` or `15:00:00Z` (these are almost certainly wrong) 4. **DO NOT** copy timestamps from other entries 5. **DO NOT** write JST time with a Z suffix (this causes 9-hour errors) ### Validation Run `python3 build/validate.py` to check for: - Future timestamps (timestamp more than 24 hours ahead of current UTC time) - Suspiciously round timestamps (exactly `:00:00` seconds, likely not from the script) Note: The validator allows a 24-hour grace period for timestamps to accommodate CI/CD clock drift. ## Vocabulary Tier Policy **All new entries must be assigned to the "general" tier.** As of January 2026, the vocabulary tier realignment is complete: - **Basic tier** (795 entries): Fixed - contains foundational vocabulary - **Core tier** (1,998 entries): Fixed - contains essential adult communication vocabulary - **General tier** (4,566+ entries): All other vocabulary, including all new entries **Do NOT assign new entries to basic or core tiers** unless explicitly instructed by the user. The basic and core tiers have been curated to meet specific word count targets and maintain semantic group integrity. In `metadata.vocabulary_tier`, always use `"general"`: ```json "metadata": { "vocabulary_tier": "general", "created": "...", "modified": "..." } ``` ## Metadata Tags (REQUIRED) All entries must have properly structured tags in `metadata.tags`. This enables search, filtering, and export functionality. ### Required Tag Categories ```json "metadata": { "vocabulary_tier": "general", "tags": { "pos": ["noun"], // REQUIRED: Part of speech (array) "formality": "neutral", // REQUIRED: formal/neutral/informal/vulgar "politeness": "plain", // REQUIRED: honorific/humble/polite/plain "semantic": ["food"] // REQUIRED: Semantic category (array) }, "created": "...", "modified": "..." } ``` ### Part of Speech (`pos`) Valid values: `noun`, `verb-godan`, `verb-ichidan`, `verb-suru`, `verb-kuru`, `verb-irregular`, `adjective-i`, `adjective-na`, `adjective-no`, `adjective-taru`, `adverb`, `particle`, `conjunction`, `interjection`, `pronoun`, `counter`, `prefix`, `suffix`, `expression`, `pre-noun-adjectival`, `number`, `onomatopoeia`, `auxiliary` - Use arrays for multi-function words: `["noun", "verb-suru"]` - The array should list the most common/primary POS first ### Formality - `formal`: Used in formal/written contexts (敬語, 硬い表現) - `neutral`: Standard usage appropriate for most contexts (default) - `informal`: Casual/colloquial usage (くだけた表現) - `vulgar`: Strong/offensive language (use sparingly) ### Politeness (Keigo Classification) - `honorific`: 尊敬語 - Elevates the subject (いらっしゃる, おっしゃる) - `humble`: 謙譲語 - Lowers the speaker (申す, 参る) - `polite`: 丁寧語 - General polite forms (です/ます base forms) - `plain`: 普通体 - Plain/dictionary forms (default for most entries) ### Semantic Categories Choose the most appropriate category(ies) for the word's meaning: **Specific categories** (use when applicable): - Time: `time-day-of-week`, `time-month`, `time-season`, `time-period`, `time-general` - Nature: `animal-mammal`, `animal-bird`, `animal-fish`, `animal-insect`, `animal-general`, `plant-tree`, `plant-flower`, `plant-general`, `weather`, `geography` - Human: `body-part`, `body-internal`, `family`, `person`, `occupation` - Objects: `food`, `clothing`, `building`, `transportation`, `tool`, `furniture`, `electronics` - Abstract: `emotion`, `color`, `number`, `direction`, `size`, `quantity` - Actions: `movement`, `communication`, `cognition`, `existence`, `consumption` - Social: `greeting`, `education`, `work`, `leisure` **Fallback categories** (when no specific category fits): - `general`: For nouns without a specific semantic category - `action`: For verbs not fitting other action categories - `descriptive`: For adjectives and adverbs - `grammatical`: For particles and conjunctions - `expression`: For fixed expressions and interjections - `onomatopoeia`: For mimetic words ### Optional Tag Categories ```json "tags": { // ... required tags above ... "transitivity": "transitive", // For verbs: transitive/intransitive/both "style": ["spoken"], // written/spoken/literary/archaic/slang "domain": ["business"] // business/academic/technical/legal/medical/etc. } ``` - `transitivity`: Required for verbs - indicates if verb takes a direct object - `style`: Use when word is strongly associated with a medium - `domain`: Use when word is specialized/technical ### Tag Selection Tips 1. **Be specific when possible**: Use `food` not `general` for 寿司 2. **Multiple tags allowed**: 朝ご飯 can be `["food", "time-period"]` 3. **Match the primary meaning**: Tag based on the word's core meaning 4. **Check similar entries**: Ensure consistency with related words ## Quality Checklist Before finalizing any entry, verify: - [ ] **File placed in correct directory** (use `python3 build/get_entry_path.py `) - [ ] **All kanji have furigana** (headword, examples, AND notes) - [ ] Verify: `python3 build/verify_furigana.py ` shows "✓ OK" - [ ] **Tags are complete**: pos, formality, politeness, semantic all present - [ ] Examples progress from simple to complex - [ ] At least one collocation or fixed phrase is shown - [ ] Grammar patterns are explicitly demonstrated - [ ] Notes cover common learner mistakes - [ ] Notes are properly formatted (see `vocabulary-notes` skill) - [ ] Depth matches similar entries in the dictionary - [ ] All examples have valid sense_numbers - [ ] Run `python3 build/validate.py` to catch any directory or other errors