# verl math_equal() Arbitrary Code Execution via Unsafe eval()

## Basic Information

| Field | Details |
|------|------|
| **Vulnerability Name** | Indirect Prompt Injection to ACE via Unsafe `eval()` in Math Answer Grading |
| **Vulnerability Type** | CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (eval Injection) |
| **Affected Component** | `verl/utils/reward_score/prime_math/grader.py` Line 298-301 |
| **Affected Versions** | verl ≤ 0.7.0 (main branch still affected as of analysis date) |
| **CVSS 3.1** | **8.1 (High)** — `AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H` |
| **GitHub** | https://github.com/verl-project/verl |

### CVSS Score Explanation

| Dimension | Value | Rationale |
|------|----|------|
| Attack Vector | Network (N) | Attacker triggers remotely by poisoning training datasets, no local access required |
| Attack Complexity | High (H) | Requires indirectly controlling LLM output format via Prompt Injection, not direct parameter passing |
| Privileges Required | None (N) | No system authentication required |
| User Interaction | None (N) | Training/evaluation pipeline triggers automatically, no human intervention needed |
| Confidentiality | High (H) | Can read training data, model weights, API keys, etc. |
| Integrity | High (H) | Can tamper with model checkpoints, plant backdoors |
| Availability | High (H) | Can terminate training processes, destroy compute resources |

---

## Vulnerability Description

verl is an open-sou large model reinfoment learning training framework by ByteDance (19k+ Stars), supporting algorithms such as PPO and GRPO. In its math answer scoring module `prime_math/grader.py`, the `math_equal()` function is used to compare whether the model-generated answer is equivalent to the ground truth answer.

When the ground truth answer is a matrix type (containing `\begin{pmatrix}`) and the model's output answer starts with `[` and ends with `]`, the code directly calls Python's built-in `eval()` function on the model output without any input sanitization or sandbox isolation.

An attacker can use Indirect Prompt Injection (injecting malicious instructions into the training dataset) to induce the LLM to output a string containing malicious Python code when answering matrix-type math problems. This string is extracted by `match_answer()`, passed into `math_equal()`, and ultimately executed by `eval()`, achieving arbitrary code execution (ACE).

---

## Vulnerability SINK

**File**: `verl/utils/reward_score/prime_math/grader.py` Line 298-301

```python
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    if isinstance(eval(prediction), list):   # ← SINK: directly eval untrusted input
        pred_matrix = eval(prediction)       # ← second eval
```

**Trigger Conditions (all three must be satisfied simultaneously):**

1. `reference` (ground truth answer) contains `\begin{pmatrix}` — i.e., a matrix-type math problem
2. `prediction` (answer extracted from model) starts with `[` and ends with `]`
3. `prediction` does not contain underscore `_` — otherwise `handle_base()` will truncate during the `normalize()` phase and throw a `ValueError`

---

## Call Stack

```
verl GRPO/PPO Training Loop
  └→ RewardManager.__call__()                    # compute reward score for each rollout
      └→ compute_score(solution_str, ground_truth)  # prime_math/__init__.py
          ├→ match_answer(solution_str)             # extract answer string from LLM output ← SOURCE
          │   └→ locate answer via keyword matching ("answer is", \boxed{} etc.)
          │   └→ return (is_matched, prediction)       # prediction without any security filtering
          └→ math_equal(prediction, reference)       # grader.py Line 174
              └→ normalize(prediction, pi)           # Line 188 — preprocessing
              │   └→ handle_base(prediction)         # detect underscore _ and handle base conversion
              │   └→ handle_pi(prediction, pi)       # detect \pi and replace
              └→ [matrix comparison branch] Line 298-301
                  └→ eval(prediction)                # ← SINK: arbitrary code execution
```

---

## SOURCE (Data Entry Point)

**File**: `verl/utils/reward_score/prime_math/__init__.py` Line 347-386

```python
def match_answer(response):
    is_matched = False
    # extract answer from LLM output via keywords
    for ans_marker in ["answer:", "answer is", "answers are"]:
        ans_idx = response.lower().rfind(ans_marker)
        if ans_idx != -1:
            is_matched = True
            response = response[ans_idx + len(ans_marker):].strip()
    # ... more extraction logic (\boxed{}, "is", "=" etc.) ...

    # require the answer to contain at least one digit
    is_matched = is_matched if any([c.isdigit() for c in response]) else False
    return is_matched, response
```

The input `response` to `match_answer()` comes from the raw output of the LLM model (`solution_str`). This function only performs text locating and slicing without any security filtering. In a Prompt Injection scenario, an attacker can manipulate problem descriptions in the training data to induce the model to output a malicious string in a specific format.

---

## Exploitation Prerequisites

| Condition | Description |
|------|------|
| Training data contains matrix-type problems | ground_truth must contain `\begin{pmatrix}`; such problems are common in the MATH dataset |
| LLM output can be controlled | Induce the model via Prompt Injection to output an answer in the `[malicious code]` format |
| Payload contains no underscore | `handle_base()` splits on `_` causing crashes; must use `exec()` instead of `__import__()` |
| Payload contains a digit | `match_answer()` line 384 requires at least one digit character in the answer |
| verl uses prime_math scoring | Requires `data_source` to correspond to MATH dataset or manually specifying the prime_math scoring function |

---

## Exploitation Steps

### Attack Scenario

An attacker injects a "matrix problem" containing Prompt Injection instructions into a public math dataset. When other researchers use the verl framework for GRPO/PPO training on that dataset, the LLM is induced to output a malicious payload when answering that problem, and verl's scoring function automatically triggers `eval()` to execute arbitrary code.

### Reproduction Flow

```
Step 1: Attacker crafts a poisoned math problem (containing Prompt Injection)
         ↓
Step 2: Local LLM (Qwen2.5-14B-Instruct) receives the poisoned problem
         ↓
Step 3: LLM is successfully injected, outputs an answer containing malicious code:
        "The answer is [exec("import os; os.system('echo PWNED1 > /tmp/verl-rce-proof.txt')")]"
         ↓
Step 4: verl compute_score() calls match_answer() to extract the answer
         ↓
Step 5: math_equal(prediction, reference) enters the matrix comparison branch
         ↓
Step 6: eval(prediction) executes the malicious code → writes to /tmp/verl-rce-proof.txt
         ↓
Step 7: RCE succeeds, attacker gains code execution on the training server
```

---

## Proof of Concept

The following PoC has been verified successfully on macOS + Ollama (qwen2.5:14b-instruct) + verl 0.7.0.

### Environment Setup

```bash
# 1. Install Ollama and pull the model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b-instruct

# 2. Clone verl and install
git clone https://github.com/verl-project/verl.git
cd verl
pip install -e .
```

### PoC Script

**[POC_CODE](verl_rce.py)**

### Actual Run Video

**[POC_VIDEO](https://www.youtube.com/watch?v=X3FwfYS-Xq0)**

### SCREENSHOT

<img width="1660" height="1810" alt="screenshot" src="https://github.com/user-attachments/assets/3b7cdfe4-e4e3-47fa-8768-701ed2d136e1" />

---

## Impact Analysis

### Direct Impact

| Impact | Description |
|------|------|
| **Arbitrary Code Execution** | Attacker executes arbitrary system commands on the training/evaluation server |
| **Data Theft** | Read sensitive information such as training data, model weights, API keys, cloud credentials |
| **Supply Chain Attack** | Tamper with model checkpoints to plant backdoors, affecting all downstream users |
| **Lateral Movement** | Training clusters typically have high-privilege network access, can serve as a pivot for internal network penetration |

### Attack Scenarios

1. **Dataset Poisoning**: An attacker injects matrix problems containing Prompt Injection into public math datasets (e.g., MATH, derived versions of GSM8K); triggered when researchers train with verl on that dataset
2. **Malicious Few-Shot**: An attacker embeds malicious output templates in few-shot examples to induce the model to generate a payload during the evaluation phase
3. **Adversarial Input**: Carefully crafted math problems that use token-level optimization to maximize the probability of the model outputting a malicious payload

### Notes on Exploitation Difficulty

Although CVSS rates Attack Complexity as High, actual exploitation is not particularly difficult:

- In GRPO training, each prompt generates 8-64 rollouts; triggering requires only one successful hit
- In testing, Qwen2.5-14B-Instruct was successfully injected on the first attempt
- The `exec()` payload bypasses `handle_base()` preprocessing and can reliably reach `eval()`
- Matrix-type problems are common in the MATH dataset; attackers do not need to craft their own ground_truth

---

## Remediation Recommendations

### Short-Term Fix (Recommended, One-Line Change)

Replace `eval()` with `ast.literal_eval()`, which only allows parsing Python literals:

```python
import ast

# grader.py Line 298-301, before fix:
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    if isinstance(eval(prediction), list):        # ← dangerous
        pred_matrix = eval(prediction)            # ← dangerous

# After fix:
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    try:
        parsed = ast.literal_eval(prediction)     # ← safe: only accepts literals
        if isinstance(parsed, list):
            pred_matrix = parsed
            # ... subsequent comparison logic ...
    except (ValueError, SyntaxError):
        pass
```

`ast.literal_eval()` only accepts Python literals such as strings, numbers, lists, and dictionaries, and will not execute arbitrary code.

### Long-Term Recommendations

1. **Global audit of `eval()` / `exec()` calls**: The `handle_pi()` function in `grader.py` (Line 82) also contains an `eval()` call; although wrapped with `contextlib.suppress(Exception)`, it should still be replaced
2. **Input validation allowlist**: Add format validation at the output stage of `match_answer()`, only allowing mathematical expression characters (digits, operators, brackets, commas)
3. **Sandbox isolation**: Sandbox the execution environment of the reward function, restricting access to modules such as `os` and `subprocess`
4. **Dependency security scanning**: Include `eval()` usage in CI/CD security checks to prevent new eval injection points from being introduced in subsequent code