# Contributing to Linglib

Linglib is a community library for formal linguistics in Lean 4. This
guide describes the most common ways to contribute, with the
conventions you'll need to know inlined. The codebase is also
extensively documented in module docstrings — when in doubt, look at
existing files in the same area.

## Getting set up

1. Install [Lean 4 via elan](https://leanprover-community.github.io/get_started.html).
2. Clone the repo and fetch the mathlib cache:
   ```bash
   git clone https://github.com/hawkrobe/linglib
   cd linglib
   lake exe cache get
   lake build
   ```
3. Build takes ~5 minutes on a warm cache. `sorry`s in the codebase
   are intentional placeholders for in-progress proofs.

## Four entry points

### Formalize a paper

Most contributions start by picking a paper from a subfield you know
and writing a study file at `Linglib/Studies/{AuthorYear}.lean` (study
files are flat, named author-year).

A study file typically:
- Imports the relevant theory layer (`Linglib/Semantics/`,
  `Linglib/Syntax/`, `Linglib/Pragmatics/`, …).
- Imports any cross-linguistic data needed from `Linglib/Fragments/`.
- Encodes the paper's model — definitions, denotations, constraint
  ranking, whatever the paper specifies — in Lean syntax.
- Proves the predictions the paper claims, using the encoded model.
- Cites the paper in the module docstring with `@cite{key}` (add the
  BibTeX entry to `blog/references.bib` first).

Look at any existing file under `Linglib/Studies/` for a template —
`Studies/Jenks2018.lean` (definiteness) and
`Studies/FrankGoodman2012PMF.lean` (an RSA implicature model) are
reasonable starting points. Theory-neutral empirical data lives
separately under `Linglib/Phenomena/{Phenomenon}/`.

### Add a Fragment

A Fragment under `Linglib/Fragments/{Language}/` exposes the lexical
entries and typological metadata for one language. Even a sparse
fragment — say, a handful of pronouns and a tense paradigm — is
useful, because it lets phenomena studies cite real cross-linguistic
data instead of toy examples.

If you work on a language whose typological profile is poorly
represented (most non-Indo-European languages outside our existing
sample), a small contribution here goes a long way. See an existing
fragment (e.g. `Fragments/Mayan/Tseltalan.lean`) for the schema.

### Pick something off the wishlist

The [wishlist](https://linglib.io/wishlist/) is a
curated list of theorems, papers, languages, and tools we'd love to
see formalized — modeled on Lean's
[1000+ theorems](https://leanprover-community.github.io/1000.html).
Items are tagged by difficulty (A: quick win · B: moderate · C: hard ·
D: blocked on upstream).

To claim an item before starting,
[open a tracking issue](https://github.com/hawkrobe/linglib/issues/new)
linking the wishlist line so others know you're working on it. To
add a new wish, send a PR editing
[`blog/content/wishlist.md`](blog/content/wishlist.md).

If you find a stub whose statement seems wrong (the underlying claim
doesn't hold, or the formalization mis-states the empirical claim),
that's also a valuable contribution — open an issue explaining the
problem.

### Write a post or report a problem

The [`blog/`](blog/) directory accepts posts on formalization
decisions, comparisons of frameworks, write-ups of completed work,
or critical readings of existing files.

If a definition seems wrong, a theorem trivializes the empirical
content, or you spot an inconsistency between two theories that
should agree, [open an issue](https://github.com/hawkrobe/linglib/issues).
Critical readings are welcome — formalization makes mistakes visible,
and we want them caught.

## Code conventions

A few conventions are worth knowing before your first PR. Look at
existing files in the same area for full context — the codebase is
the canonical reference.

**File structure.** Every `.lean` file should have a `/-! ... -/`
module docstring placed *after* the `import` statements (Lean
requires imports to come first).

**Citations.** Cite papers with `@cite{key}`; never inline
`Author (Year)`. Add the BibTeX entry to `blog/references.bib` first
using `lastname-lastname-year` keys, and validate with
`python3 blog/scripts/gen_bibliography.py --check`. The rendered
bibliography is a build artifact (regenerated in CI) — don't commit it.

**Proof style.** Prefer structural tactics (`exact`, `simp only`,
`decide`, `omega`, `linarith`) over bare `simp` or `native_decide`.
Bare `simp` is fragile under simp-set changes; `native_decide`
bypasses kernel verification.

**`sorry` over weakening.** State the full theorem and use `sorry`
to mark it incomplete — a `sorry` warning is explicit, while a
weakened-but-proved theorem can be silently forgotten as not having
captured the intended claim.

**Lean style.** `autoImplicit` is disabled (declare type parameters
explicitly). Use `Type*` not `Type 0`. Prefer `Prop` with
`[DecidablePred]` over `Bool` for predicate-shape data. Use
`lowerCamelCase` for lemma names.

**Imports.** Only import what's needed — no `import Mathlib`
blanket. Cite the leaf file. Unnecessary imports slow builds and
create false dependency edges.

## Pull requests

- Branch from `main`, push to a fork, open a PR.
- One coherent change per PR. Small is good.
- The repo is **squash-merge only**, so the PR title becomes the commit
  on `main` — write it in mathlib's `type(scope): subject` form
  (lowercase imperative, ≤ 72 chars), e.g.
  `feat(Studies): formalize Jenks 2018 definiteness`.
- Put the longer story in the PR description; declare cross-PR
  dependencies with a `Depends on #NNNN` line.

If you're not sure whether a contribution makes sense, open an issue
first to discuss. Substantive feedback on direction is welcome before
you've written code.

## Scope of welcome

Contributions are welcome from undergraduates working through their
first formal-semantics course, postdocs with strong opinions about
particular frameworks, computational linguists with probabilistic
models to formalize, and Lean enthusiasts looking for a domain. You
do not need to be a working academic linguist to contribute, and you
do not need to know mathlib internals to start.