---
name: autoresearch-genealogy
description: Structured prompts, vault templates, and autonomous research workflows for AI-assisted genealogy using Claude Code.
triggers:
  - help me research my family tree with AI
  - set up genealogy vault template
  - run autoresearch genealogy prompts
  - find ancestors using Claude Code
  - genealogy research workflow setup
  - cross-reference audit for family history
  - DNA chromosome analysis genealogy
  - organize genealogy documents with AI
---

# autoresearch-genealogy

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

A structured system of autoresearch prompts, Obsidian vault templates, archive guides, and methodology references for AI-assisted genealogy research. Built for Claude Code's autonomous research loops, adaptable to any AI tool or manual workflow.

---

## What This Project Does

- Provides 12 Claude Code `/autoresearch` prompts that autonomously search the web, update your vault, and self-verify results
- Supplies a complete 19-file Obsidian vault starter kit with YAML frontmatter and markdown templates
- Includes 24 country/region-specific archive guides (Europe, Americas, Oceania, Jewish genealogy)
- Offers 9 methodology reference documents covering confidence tiers, DNA guardrails, naming conventions, and source hierarchy
- Defines 7 step-by-step workflows for OCR pipelines, oral history, discrepancy resolution, and phase planning

---

## Installation

```bash
# Clone the repository
git clone https://github.com/mattprusak/autoresearch-genealogy.git
cd autoresearch-genealogy

# Copy vault template into your Obsidian vault
cp -r vault-template/ ~/path/to/your/ObsidianVault/genealogy/

# Or copy to any markdown editor folder
cp -r vault-template/ ~/Documents/my-genealogy/
```

No package manager or build step required — this is a pure markdown/prompt project.

---

## Project Structure

```
autoresearch-genealogy/
├── prompts/              # 12 autoresearch prompt files for Claude Code
├── vault-template/       # 19-file Obsidian vault starter kit
│   ├── Family_Tree.md
│   ├── Research_Log.md
│   ├── Open_Questions.md
│   ├── templates/        # Person, certificate, postcard, region, etc.
│   └── ...
├── archives/             # 24 country/region research guides
├── reference/            # 9 methodology documents
├── workflows/            # 7 step-by-step process guides
└── examples/             # 6 anonymized worked examples
```

---

## Quick Start Workflow

### Step 1: Seed your family tree

Open `vault-template/Family_Tree.md` and fill in what you already know, starting with yourself and working backward:

```markdown
---
title: Family Tree
last_updated: 2026-03-19
generations_documented: 3
lines_active: 2
---

# Family Tree

## Generation 1 (Self)
- **Name**: Jane Smith (b. 1985, Chicago, IL)

## Generation 2 (Parents)
- **Father**: John Smith (b. 1955, Detroit, MI)
- **Mother**: Mary O'Brien (b. 1958, Boston, MA)

## Generation 3 (Grandparents)
- **Paternal Grandfather**: Robert Smith (b. ~1920, unknown)
- **Paternal Grandmother**: Helen Kowalski (b. ~1925, Poland?)
```

### Step 2: Scan physical documents

Photograph or scan certificates, letters, postcards. Use the OCR workflow:

```
See: workflows/ocr-pipeline.md
```

### Step 3: Run autoresearch prompts in Claude Code

```
/autoresearch prompts/01-tree-expansion.md
```

### Step 4: Audit and verify

```
/autoresearch prompts/02-cross-reference-audit.md
```

---

## Autoresearch Prompts — Reference

Each prompt in `prompts/` follows this structure:

```markdown
## Goal
[What this iteration should accomplish]

## Metric
[Measurable success condition — e.g., "increase sourced person files from N to N+10"]

## Direction
[Step-by-step instructions for the AI]

## Verify
[Cross-check to run after each iteration]

## Guard Rails
[What NOT to do — prevent hallucination, preserve source rigor]

## Iterations
[How many loops to run before stopping for human review]

## Protocol
[Output format, file naming, YAML fields to populate]
```

### All 12 Prompts

| File | Purpose |
|------|---------|
| `01-tree-expansion.md` | Push every branch back using web research |
| `02-cross-reference-audit.md` | Find and fix discrepancies between tree and sources |
| `03-findagrave-sweep.md` | Locate Find a Grave memorials for deceased ancestors |
| `04-gedcom-completeness.md` | Sync GEDCOM file with vault data |
| `05-source-citation-audit.md` | Verify every person has ≥2 independent sources |
| `06-unresolved-persons.md` | Identify and resolve unnamed people in documents |
| `07-timeline-gap-analysis.md` | Find life events where records should exist but don't |
| `08-open-question-resolution.md` | Systematically attack every open research question |
| `09-bygdebok-extraction.md` | Extract data from digitized local history books |
| `10-colonial-records-search.md` | Search pre-1800 colonial American records |
| `11-immigration-search.md` | Locate passenger manifests and naturalization records |
| `12-dna-chromosome-analysis.md` | Analyze per-chromosome ancestry data |

### Running a prompt in Claude Code

```bash
# In Claude Code terminal or chat:
/autoresearch prompts/08-open-question-resolution.md

# With a specific vault path context:
/autoresearch prompts/03-findagrave-sweep.md --context vault-template/Family_Tree.md
```

---

## Vault Template Files

### Person file template (`vault-template/templates/person.md`)

```markdown
---
full_name: ""
birth_date: ""
birth_place: ""
death_date: ""
death_place: ""
father: ""
mother: ""
spouse: ""
children: []
confidence: "Moderate Signal"  # Strong Signal | Moderate Signal | Speculative
sources: []
open_questions: []
last_updated: ""
---

# [Full Name]

## Life Events

| Event | Date | Place | Source |
|-------|------|-------|--------|
| Birth | | | |
| Marriage | | | |
| Death | | | |

## Sources

1. [Source 1 — type, repository, date accessed]
2. [Source 2 — type, repository, date accessed]

## Open Questions

- [ ] Question 1
- [ ] Question 2

## Notes

[Narrative summary, naming variant notes, contextual history]
```

### Certificate transcription template (`vault-template/templates/certificate.md`)

```markdown
---
document_type: ""        # birth | death | marriage | baptism
document_date: ""
repository: ""
file_reference: ""
transcribed_by: ""
transcription_date: ""
confidence: ""
---

# Certificate: [Type] — [Name] — [Year]

## Transcription

[Verbatim transcription of the document]

## Key Data Extracted

- **Subject**: 
- **Date**: 
- **Place**: 
- **Witnesses/Informants**: 
- **Officiant**: 

## Discrepancies

[Note any conflicts with other sources]

## Image

![[filename.jpg]]
```

### Research log entry pattern (`vault-template/Research_Log.md`)

```markdown
## 2026-03-19 — Tree Expansion Session

**Prompt run**: 01-tree-expansion.md  
**Iterations**: 5  
**Metric start**: 42 sourced person files  
**Metric end**: 51 sourced person files  

### Searches Performed
- FamilySearch: "Kowalski Poznan 1880–1920" — 3 results, 2 useful
- Ancestry: "Smith Michigan census 1920" — found Robert Smith (b. 1919)
- FindAGrave: "Helen Kowalski Detroit" — memorial #12345678

### Negative Results (Important)
- No passenger manifest found for Stanislaw Kowalski, searched 1890–1910
- No church records found for O'Brien line in Cork pre-1850

### New Open Questions
- [ ] Was Robert Smith born in Michigan or Ohio? 1920 census says MI, 1930 says OH.
```

---

## Confidence Tier System

From `reference/confidence-tiers.md`:

```
Strong Signal    — Two or more independent primary sources agree
Moderate Signal  — One primary source, or two secondary sources agree
Speculative      — Logical inference, DNA suggestion, or single secondary source
```

Apply confidence in every person file YAML:

```markdown
---
confidence: "Moderate Signal"
---
```

---

## Archive Guides — Key Countries

Each guide in `archives/` covers:
- Where to find records (free vs paid)
- What AI tools can access directly vs what requires browser
- Record types available by era

```
archives/
├── ireland.md
├── england-wales.md
├── scotland.md
├── norway.md
├── sweden.md
├── poland.md
├── germany.md
├── italy.md
├── france.md
├── spain-portugal.md
├── netherlands.md
├── austria.md
├── hungary.md
├── russia-ukraine.md
├── usa-colonial.md
├── usa-immigration.md
├── usa-census.md
├── usa-vital-records.md
├── african-american.md
├── canada.md
├── mexico-latin-america.md
├── australia-new-zealand.md
├── jewish-genealogy.md
└── ...
```

Example usage in a prompt:

```markdown
# In prompts/09-bygdebok-extraction.md
## Direction
Consult archives/norway.md for Digitalarkivet access patterns.
Search Bygdebok collections for the Rogaland region, 1750–1900.
```

---

## Common Patterns

### Pattern 1: New ancestor intake

When a new ancestor is found during research:

```markdown
1. Create person file from vault-template/templates/person.md
2. Set confidence based on source count
3. Add to Family_Tree.md under correct generation
4. Log the discovery in Research_Log.md
5. Add unresolved questions to Open_Questions.md
6. Run 02-cross-reference-audit.md to check for conflicts
```

### Pattern 2: Resolving a date discrepancy

```markdown
# Open_Questions.md entry
## Q-042: Robert Smith birth state conflict
- 1920 census: born Michigan
- 1930 census: born Ohio
- Status: Unresolved
- Next step: Run 07-timeline-gap-analysis.md targeting Robert Smith
```

Then in Claude Code:

```
/autoresearch prompts/07-timeline-gap-analysis.md
# Focus: Robert Smith, b. ~1919, discrepancy Q-042
```

### Pattern 3: DNA-to-genealogy mapping

```markdown
# In vault-template/Genetic_Profile.md
---
test_company: AncestryDNA
test_date: 2024-11-01
ethnicity_summary:
  - region: Eastern Europe
    percentage: 38
  - region: Ireland/Scotland
    percentage: 31
---

# Then run:
/autoresearch prompts/12-dna-chromosome-analysis.md
```

### Pattern 4: Immigration research loop

```bash
# Run immigration search prompt
/autoresearch prompts/11-immigration-search.md

# Prompt will:
# 1. Pull all foreign-born ancestors from Family_Tree.md
# 2. Search passenger manifests (Ellis Island, Ancestry, FamilySearch)
# 3. Search naturalization records (NARA, Ancestry)
# 4. Update person files with ship name, arrival date, port
# 5. Log negative results for each unresolved ancestor
```

---

## Reference Documents

| File | Contents |
|------|---------|
| `reference/confidence-tiers.md` | Strong / Moderate / Speculative definitions |
| `reference/source-hierarchy.md` | Primary vs secondary vs derivative sources |
| `reference/dna-guardrails.md` | What DNA can and cannot prove; centimorgan thresholds |
| `reference/naming-conventions.md` | Patronymics, farm names, Polish przydomki |
| `reference/gedcom-guide.md` | GEDCOM field reference and export instructions |
| `reference/common-pitfalls.md` | AI hallucination patterns in genealogy, date traps |
| `reference/glossary.md` | Record type definitions, Latin terms, abbreviations |
| `reference/ai-capabilities.md` | What AI can access directly vs what requires human |
| `reference/case-for-autoresearch.md` | Methodology rationale |

---

## Troubleshooting

### AI is inventing sources

Set guard rails explicitly in your prompt session:

```markdown
## Guard Rails (add to any prompt)
- Do NOT fabricate census record URLs or Ancestry record IDs
- If a source cannot be directly linked, mark as "reported" not "confirmed"
- All new claims require a real URL or repository reference
- When uncertain, add to Open_Questions.md — do not guess
```

### Vault files getting out of sync with GEDCOM

Run the completeness audit:

```
/autoresearch prompts/04-gedcom-completeness.md
```

This compares every person in your GEDCOM against vault person files and flags mismatches.

### Name variants causing duplicate person files

Check `reference/naming-conventions.md` for your family's relevant region. Common traps:

- Norwegian farm name changes (Haugen → Bakke on emigration)
- Polish name Latinization in church records (Stanisław → Stanislaus)
- Irish anglicization (Ó Briain → O'Brien → Bryan)
- Spelling variation in census records ("Sakkarias" vs "Zacharias" — both valid)

Add aliases to person file YAML:

```markdown
---
full_name: "Stanisław Kowalski"
name_variants:
  - "Stanislaus Kowalski"
  - "Stanley Kowalski"
  - "S. Kowalski"
---
```

### Autoresearch loop running too long

Each prompt has an `## Iterations` field. Set it explicitly:

```markdown
## Iterations
Run 3 iterations maximum, then stop and output a summary for human review.
```

### OCR producing poor results on old documents

See `workflows/ocr-pipeline.md`. General guidance:

1. Photograph at 600 DPI minimum
2. Use even, diffuse lighting — no flash
3. Pre-process with a contrast adjustment before running OCR
4. Use `vault-template/templates/transcription.md` to record both the OCR output and your manual corrections side by side

---

## Contributing

To add a new archive guide or prompt:

1. Follow the existing file structure and YAML frontmatter patterns
2. Use placeholder names in all examples (no real family data)
3. Open a PR with a brief description of what region or record type you've added

License: MIT