---
name: eval-rubric-legal-soundness
description: Use when scoring AI legal output on whether it correctly states the law and applies it correctly to the facts presented. A 0–5 rubric covering rule statement accuracy, application reasoning, citation reliability, jurisdiction fit, and currency of the law. The primary quality rubric and a deployment blocking gate if it drops significantly.
license: MIT
metadata:
  id: eval.rubric.legal-soundness
  category: eval
  priority: P0
  intent: [__eval__, legal-accuracy, rubric, soundness, quality]
  related: [eval-rubric-citation-quality, eval-rubric-jurisdiction-awareness, eval-rubric-completeness, eval-rubric-hallucination-detection, eval-llm-as-judge-system-prompt, eval-benchmark-runner]
  source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
  version: "1.0"
---

# Eval Rubric — Legal Soundness (0–5)

## When to use this

Apply to every legal AI output that makes substantive legal assertions — analysis, advice, research, and drafting. Legal soundness is the primary quality dimension: a well-written response stating wrong law is dangerous; a poorly formatted response stating correct law is at least safe to act on.

This rubric is a **deployment blocking gate**: if the aggregate legal soundness score drops > 3% vs the previous run, [[eval-regression-detector]] blocks the deployment for investigation.

Run via [[eval-llm-as-judge-system-prompt]] using an ensemble of judge models.

## Scoring (0–5)

| Score | Label | Criteria |
|---|---|---|
| **5** | Excellent | All legal propositions correctly stated; citations real and accurate; application reasoning sound and properly tied to the stated facts; counter-arguments or alternative interpretations addressed where they exist; law is current as of the response date |
| **4** | Good | Legal propositions mostly correct; minor citation inaccuracy (formatting issue or slightly wrong article number) or one missed nuance that does not materially affect the conclusion |
| **3** | Acceptable | Substantial correct content with at least one moderate error — a wrong article number, a missed exception to a rule, or an application that reaches the right conclusion by imperfect reasoning |
| **2** | Poor | Significant legal errors or missing an applicable rule that would materially change the advice; would mislead a practitioner who relies on it without independent verification |
| **1** | Very poor | Multiple serious errors; foundational rule stated wrong; wrong law applied to the jurisdiction |
| **0** | Wrong / harmful | Materially incorrect to the point of being dangerous to act on — e.g., stating that a type of clause is enforceable when it is void in the stated jurisdiction; advising that no formality is required when notarization is mandatory |

## Sub-criteria

### Rule statement
Is the cited rule the right one, and is it correctly articulated?
- Is the legal rule identified at the right level of specificity? (Not just "UAE law governs" but the specific statute and principle)
- Is the rule accurately paraphrased? (Not overstated, not understated)
- Are exceptions to the rule mentioned where they are material?
- Is the rule the one that actually applies to the stated facts? (Not a general principle when a specific rule exists)

### Application
Does the analysis correctly apply the rule to the facts?
- Is the application logical? Does it follow from the stated rule?
- Are key counter-considerations addressed? (e.g., "The contract says X, but UAE courts have the power to modify penalty clauses under Article 390 of the Civil Transactions Law")
- Is the conclusion justified by the stated rule and facts?

### Citations
Are the cited authorities real, correctly attributed, and not fabricated?
(Note: deep citation quality analysis is in [[eval-rubric-citation-quality]]; this sub-criterion is a light check to catch obvious hallucinations)

### Jurisdiction fit
Does the answer cover the right jurisdiction?
(Note: deep jurisdiction analysis is in [[eval-rubric-jurisdiction-awareness]]; this sub-criterion catches cases where clearly wrong law is applied)

### Currency
Is the law as stated current as of the response date?
- Laws that have been repealed or significantly amended without acknowledgment score lower.
- A response that cites the old UAE Labour Law (Federal Law No. 8 of 1980) without noting it was replaced by Federal Decree-Law No. 33 of 2021 scores ≤ 3.
- For post-training-cutoff changes: the model is not expected to know, but must say "as of my knowledge cutoff" when there is reasonable risk of change.

## MENA-specific legal soundness checkpoints

The following are common failure points for generic LLMs on MENA legal matters — grade strictly on these:

| Issue | Failure mode |
|---|---|
| EOSG calculation | Wrong formula (21-day vs 30-day, partial vs full for short tenure); ignoring DIFC vs onshore distinction |
| Penalty clauses | Stating they are per se enforceable without noting UAE/Lebanon courts' power to reduce |
| Non-competes | Stating they are easily enforceable in KSA (they are not) |
| Interest | Not flagging Shariah prohibition in KSA; not noting UAE Civil Transactions Law position |
| Company formation | Confusing DIFC, ADGM, onshore, and free-zone rules |
| Property ownership | Not flagging foreign ownership restrictions in UAE onshore / KSA |
| Choice of law | Not noting that UAE Labour Law protections cannot be waived by choice of foreign law for UAE-sited employees |

Outputs that make any of these errors score ≤ 3 on legal soundness regardless of overall quality.

## Use in automated scoring

Inject this rubric definition into [[eval-llm-as-judge-system-prompt]]. Weight it 0.35 (highest weight) in the composite score:

```
composite_score = 0.35 × legal_soundness
               + 0.20 × citation_quality
               + 0.20 × jurisdiction_awareness
               + 0.15 × completeness
               + 0.10 × (binary hallucination gate)
```

## Limits & escalation

A legal soundness score alone does not determine whether an output is safe to act on. Always pair with [[eval-rubric-hallucination-detection]] (existence gate) and [[eval-rubric-jurisdiction-awareness]] (applicability gate). A score of 4/5 on legal soundness from a model that regularly fabricates citations is meaningless without the hallucination gate.

## Related skills

- [[eval-rubric-citation-quality]] — deep citation quality analysis
- [[eval-rubric-jurisdiction-awareness]] — jurisdiction accuracy
- [[eval-rubric-completeness]] — whether all applicable rules were addressed
- [[eval-rubric-hallucination-detection]] — binary fabrication gate
- [[eval-llm-as-judge-system-prompt]] — applies this rubric in the automated pipeline
- [[eval-benchmark-runner]] — runs this rubric across all benchmark datasets