Male ratio: 92.89% (14863/16001), Female ratio: 4.47% (715/16001), Unknown ratio: 2.64% (423/16001). ## Known-Value Coverage - Core fields considered: 22 - Known-value coverage across core fields: 57.09% (200979/352022) - Average known core fields per biography: 12.56/22 ## Geography Coverage - Entries with resolved geocoded country: 86.48% (13838/16001) - Entries not in current Germany (of resolved geocoded locations): 34.97% (4839/13838) ## Address and Geocoding Detail - Entries with non-unknown address: 90.89% (14543/16001) - `street_or_specific` addresses: 56.55% (9048/16001) - `city_or_region_only` addresses: 20.65% (3305/16001) - Hard-to-geocode addresses (`multiple_locations`/`ambiguous`/`country_only`): 13.69% (2190/16001) - Unknown addresses (`address_class = unknown`): 9.11% (1458/16001) - One modern city/town inferred: 90.62% (14500/16001) - GeoNames-resolved rows: 86.48% (13838/16001) - Exact-name GeoNames matches (of resolved rows): 93.20% (12897/13838) - Population-fallback GeoNames matches (of resolved rows): 6.80% (941/13838) - Address rows needing review (LLM or GeoNames stage): 14.02% (2243/16001) - Top geocoded countries: Germany (8999), Austria (1293), Poland (659), France (551), United Kingdom (385) ## Review and Quality Flags - Explicit quality issues (`quality_issue = true`): 3.47% (556/16001) - Routed for review (`needs_validation = true`): 82.48% (13197/16001) - Review-routed rows without explicit quality issue: 79.00% (12641/16001) - Explicit quality issues among review-routed rows: 4.21% (556/13197) - Cross-reference entries (`cross_reference = true`): 4.54% (727/16001) - Identifier issues detected (`identifier_missing_detected = true`): 0.00% (0/16001) ## Family Information Coverage - Any family info present (`father`/`mother`/`ancestors`/`spouse`/`children`): 58.51% (9362/16001) - `father` present: 51.42% (8228/16001) - `mother` present: 42.12% (6740/16001) - `ancestors` present: 11.27% (1804/16001) - `spouse` present: 44.97% (7195/16001) - `children` present: 30.78% (4925/16001) - Family-field completeness (all family slots): 36.11% (28892/80005) ## Party and Membership Coverage - `political_party` present: 11.83% (1893/16001) - `memberships` present: 40.01% (6402/16001) - Party or memberships present: 43.55% (6969/16001) - Top political parties (known values): nationalliberal (400), Zentrum (239), Konservativ (57), konservativ (51), liberal (47) - Repeated memberships (>1 occurrence): Mitglied des Vereins Berliner Presse (28), Mitglied des Reichstags (27), Mitglied der II. sächsischen Kammer (25), Ehrenritter des Johanniter-Ordens (25), Verein Berliner Presse (20) ## Extraction Fact-F1 (Heuristic) - Rows evaluated: 16001 - Fields evaluated: 12 - TP/FP/FN/MIS counts: 76377/11/1066/111 - Micro precision: 0.9984 - Micro recall: 0.9862 - Micro F1: 0.9923 - Macro F1: 0.9764 - Lowest field-F1 entries: collections (F1 0.8802, P 0.9752, R 0.8021, TP 904, FP 0, FN 223, MIS 23), political_party (F1 0.9132, P 0.9962, R 0.8430, TP 1837, FP 0, FN 342, MIS 7), hobbies (F1 0.9589, P 0.9864, R 0.9328, TP 2028, FP 0, FN 146, MIS 28), memberships (F1 0.9885, P 0.9917, R 0.9853, TP 6356, FP 0, FN 95, MIS 53), ancestors (F1 0.9909, P 1.0000, R 0.9820, TP 1803, FP 0, FN 33, MIS 0) ## Occupation Classification - Coverage vs biographies in Excel: 100.00% (16001/16001) - OpenAI-classified rows (`classification_method = openai`): 97.16% (15546/16001) - Unknown occupation text (`classification_method = rule_fallback`): 2.84% (455/16001) - Mean confidence (all classified rows): 0.8549 - Low-confidence OpenAI rows (`confidence < 0.75`): 11.03% (1715/15546) - Unclear / mixed / title-only rows (`occupation_category = 11`): 5.42% (868/16001) - Largest occupation groups: 1. Bildung & Wissenschaft 30.24% (4839), 7. Kunst, Literatur, Musik & Medien 21.05% (3369), 4. Staat, Verwaltung, Politik & Diplomatie 14.85% (2376), 5. Militaer 8.24% (1319), Remaining categories 25.61% (4098) - Title markers in classified rows: nobility 9.58% (1533/16001), court 15.16% (2425/16001) ### Occupation Classification Figure ![Occupation groups in Degener biographies](occupation_classification_distribution.png)