--- name: tlaplus-spec-generator description: "Automatically generate TLA+ specifications from source code (C/C++, Python) for formal verification of distributed systems. Use when users need to: (1) Generate TLA+ specs from program implementations, (2) Model distributed systems, consensus protocols, or concurrent algorithms, (3) Extract state variables, actions, and invariants from code, (4) Create formal specifications for model checking with TLC, (5) Verify safety and liveness properties of distributed systems. Particularly effective for message-passing systems, replication protocols, consensus algorithms, and distributed transactions." --- # TLA+ Spec Generator Automatically generate TLA+ specifications from program implementations for formal verification of distributed systems. ## Overview This skill transforms imperative programs (C/C++, Python) into declarative TLA+ specifications. It analyzes program structure to identify state variables, actions, and system behavior, then generates well-structured TLA+ modules suitable for model checking with TLC. ## Workflow ### 1. Analyze Source Code Generate TLA+ specification from source files: ```bash # Single file python3 scripts/generate_spec.py program.py -o Spec.tla # Multiple files python3 scripts/generate_spec.py server.py client.py protocol.py -o Protocol.tla # With module name python3 scripts/generate_spec.py distributed_system.c -o System.tla --module-name DistributedSystem ``` The generator automatically: - Detects programming language - Parses source code - Identifies state variables and actions - Extracts system structure - Generates TLA+ specification ### 2. Specify System Parameters For distributed systems, specify the number of processes: ```bash python3 scripts/generate_spec.py consensus.py -o Consensus.tla --processes 3 ``` This creates a constant `N` in the TLA+ spec representing the number of processes/nodes. ### 3. Generated Output The generator produces two files: **1. Spec.tla** - Complete TLA+ specification: ```tla ---- MODULE Spec ---- EXTENDS Naturals, Sequences, FiniteSets, TLC CONSTANTS N \* Number of processes VARIABLES state, messages, committed vars == <> TypeOK == /\ state \in [1..N -> {"Init", "Working", "Done"}] /\ messages \in SUBSET Messages /\ committed \in SUBSET Operations Init == /\ state = [p \in 1..N |-> "Init"] /\ messages = {} /\ committed = {} SendMessage(p, msg) == /\ state[p] = "Working" /\ messages' = messages \cup {msg} /\ UNCHANGED <> Next == \/ \E p \in 1..N, msg \in Messages : SendMessage(p, msg) Spec == Init /\ [][Next]_vars ==== ``` **2. Spec_mapping.txt** - Explanation of program-to-TLA+ mapping: - State variables and their types - Actions and what they modify - Processes identified - Constants defined ### 4. Refine the Specification The generated spec is a starting point. Refine it by: 1. **Review state variables** - Ensure all relevant state is captured 2. **Complete action definitions** - Add preconditions and effects 3. **Add invariants** - Define safety properties 4. **Add temporal properties** - Define liveness properties 5. **Adjust abstraction** - Balance detail vs. tractability ### 5. Model Check with TLC Create a TLC configuration file (Spec.cfg): ``` CONSTANTS N = 3 SPECIFICATION Spec INVARIANT TypeOK ``` Run TLC model checker: ```bash tlc Spec.tla -config Spec.cfg ``` ## Abstraction Levels Control the level of abstraction: ```bash # Low abstraction (more detail, larger state space) python3 scripts/generate_spec.py program.py -o Spec.tla --abstraction low # Medium abstraction (balanced, recommended) python3 scripts/generate_spec.py program.py -o Spec.tla --abstraction medium # High abstraction (minimal states, protocol-level) python3 scripts/generate_spec.py program.py -o Spec.tla --abstraction high ``` **Medium abstraction (default):** - Booleans → preserved - Integers → bounded ranges (0..N) - Enums → preserved as string sets - Collections → sequences or sets - Focus on control flow and key state ## Common Use Cases ### Distributed Consensus Protocol **Scenario:** Implementing Raft or Paxos consensus algorithm. **Approach:** 1. Generate spec from implementation 2. Identify processes (nodes), state variables (term, log, votes) 3. Extract actions (RequestVote, AppendEntries, etc.) 4. Add safety properties (leader uniqueness, log consistency) 5. Verify with TLC **See:** [distributed_patterns.md](references/distributed_patterns.md#pattern-2-consensus-protocols) ### Message-Passing System **Scenario:** Distributed system with processes communicating via messages. **Approach:** 1. Generate spec from message handlers 2. Model network as set of messages in transit 3. Define Send and Receive actions 4. Add properties (message ordering, delivery guarantees) 5. Check for deadlocks and livelocks **See:** [distributed_patterns.md](references/distributed_patterns.md#pattern-1-message-passing) ### Replication Protocol **Scenario:** Primary-backup or multi-master replication. **Approach:** 1. Generate spec from replication logic 2. Model replicas and their logs 3. Define replication and commit actions 4. Add consistency properties 5. Verify under failures **See:** [distributed_patterns.md](references/distributed_patterns.md#pattern-4-replication) ### Leader Election **Scenario:** Implementing leader election algorithm. **Approach:** 1. Generate spec from election code 2. Model process states (follower, candidate, leader) 3. Extract election actions 4. Add safety (at most one leader) and liveness (eventually a leader) 5. Verify correctness **See:** [distributed_patterns.md](references/distributed_patterns.md#pattern-3-leader-election) ## Advanced Options ### Focus on Specific Functions Extract only relevant functions: ```bash python3 scripts/generate_spec.py system.py -o Spec.tla \ --focus-functions send_message receive_message commit_transaction ``` ### Track Specific Variables Explicitly specify state variables: ```bash python3 scripts/generate_spec.py system.py -o Spec.tla \ --track-vars state messages committed_ops leader_id ``` ## Refining Generated Specs Generated specs need refinement. Common refinements: **1. Complete action preconditions:** ```tla \* Generated (incomplete) SendMessage(p, msg) == /\ messages' = messages \cup {msg} \* Refined (with precondition) SendMessage(p, msg) == /\ state[p] = "Active" \* Precondition /\ msg \notin messages \* No duplicates /\ messages' = messages \cup {msg} /\ UNCHANGED <> ``` **2. Add invariants:** ```tla \* Safety properties SafetyInvariant == /\ \A p \in Procs : state[p] \in ValidStates /\ Cardinality({p \in Procs : state[p] = "Leader"}) <= 1 \* Add to spec INVARIANT TypeOK INVARIANT SafetyInvariant ``` **3. Add liveness properties:** ```tla \* Eventually reach consensus PROPERTY <>[](\A p \in Procs : state[p] = "Committed") \* Every request is eventually processed PROPERTY \A req \in Requests : [](Submitted(req) => <>Processed(req)) ``` **4. Add fairness:** ```tla \* Weak fairness: continuously enabled actions eventually happen Spec == Init /\ [][Next]_vars /\ WF_vars(ReceiveMessage) \* Strong fairness: infinitely often enabled actions eventually happen Spec == Init /\ [][Next]_vars /\ SF_vars(ElectLeader) ``` ## Tips - **Start with small models** - Use N=2 or N=3 processes initially - **Check TypeOK first** - Ensures basic correctness before complex properties - **Use symmetry** - Add `SYMMETRY Permutations(Procs)` to reduce state space - **Abstract data values** - Focus on protocol behavior, not data content - **Iterate abstraction** - If TLC is too slow, increase abstraction - **Read the mapping file** - Understand how program maps to TLA+ - **Consult patterns** - See reference docs for common distributed system patterns ## Common Issues **State explosion:** TLC runs out of memory or takes too long. - Solution: Increase abstraction, reduce N, add state constraints, use symmetry **Deadlock detected:** TLC finds states with no enabled actions. - Solution: Check action preconditions, ensure progress is always possible, add fairness **Invariant violated:** TLC finds counterexample. - Solution: Review counterexample trace, fix implementation or weaken invariant **Spec too abstract:** Properties are trivially true. - Solution: Decrease abstraction, track more variables, preserve more detail ## References - **[tlaplus_syntax.md](references/tlaplus_syntax.md)**: Complete TLA+ syntax reference with examples - **[distributed_patterns.md](references/distributed_patterns.md)**: Common patterns for distributed systems (consensus, replication, leader election) ## Scripts - **generate_spec.py**: Main generation script (orchestrates entire process) - **program_analyzer.py**: Program analysis module (extracts state, actions, processes) - **tlaplus_generator.py**: TLA+ generation module (produces TLA+ specifications) ## External Tools - **TLA+ Toolbox**: IDE for writing and checking TLA+ specs (https://lamport.azurewebsites.net/tla/toolbox.html) - **TLC Model Checker**: Command-line model checker (included with Toolbox) - **TLAPS**: TLA+ Proof System for theorem proving (optional)