--- name: pysa-false-negative-debugger description: Use when debugging a Pysa false negative (missing taint issue), comparing two Pysa output directories, or finding where taint flow is lost. oncalls: [pysa] --- # Pysa Model Debugger ## Overview Systematic workflow that identifies exactly where and why taint flow is lost by comparing two Pysa result directories. Accepts static analysis issue URLs (e.g., `https://www.internalfb.com/security/static_analysis/issue/?database=`). **REQUIRED BACKGROUND:** Load the `pysa-json-models` skill to understand trace element syntax (ports, call info, kinds). ## When to Use - User reports a Pysa false negative (issue found in run A, missing in run B) - User provides a static analysis issue URL ## Prerequisites The user must provide: 1. An issue URL: `https://www.internalfb.com/security/static_analysis/issue/?database=` 2. Two Pysa output directories: one where the issue IS found (FOUND), one where it is NOT (NOT-FOUND) ## Workflow ### Step 1: Extract Issue Metadata From the URL, extract: - **Issue instance ID**: the numeric ID in the path (e.g., `216172782209137158`) - **Database**: the `database` query parameter (e.g., `xdb.pysa-instagram-sharded.1`) Get the issue handle: ```bash db -e "SELECT handle FROM issues WHERE id=(SELECT issue_id FROM issue_instances WHERE id=);" ``` If the query fails, ask the user for help. ### Step 2: Parse the Handle Example handle: ``` accounts.service.Service.async_is_phone_suspicious:5120:0:Call|accounts.service.Service._async_helper|0|f:ec82abb59a9d0e207fe4f5acc361f0ad ``` Extract: - **Root callable**: everything before the first `:` (e.g., `accounts.service.Service.async_is_phone_suspicious`) - **Issue code**: the number after the first `:` (e.g., `5120`) ### Step 3: Explorer Tool All commands below use `buck run` to invoke the explorer. The shorthand `` means: ```bash buck run fbcode//tools/pyre/tools/pysa_model_explorer_cli:pysa_model_explorer -- ``` Run ` --help` for usage details. **Suppressing build noise:** Always append `2>/dev/null` to explorer commands to suppress buck build output that clutters the results. Only retry *without* `2>/dev/null` if the command produces empty or unexpected output, so you can see the actual error message from buck. ### Step 4: Verify the Issue Confirm the issue exists in FOUND: ```bash /tmp/FOUND get-issues --handle ``` If not found, ask the user for help. Confirm it does NOT exist (or is very different) in NOT-FOUND: ```bash /tmp/NOT-FOUND get-issues --code ``` The result should be empty or contain very different issues (different locations). ### Step 5: Investigate Where Taint Is Lost Check these options **in order**: A → B → C. #### Option A: Source Trace (Forward) In the issue from FOUND, examine `"traces"` → entries with `"name": "forward"`. Each root (trace element) may have multiple kinds with different `"length"` values. Pick the root whose minimum `"length"` across its kinds is smallest (missing length = 0, which is always shortest). **If it's an origin**: the source comes from a user-annotated function. Check if at least one of the leaves have a model in NOT-FOUND: ```bash /tmp/NOT-FOUND get-model --show sources --kind ``` If none of the leaves have source taint in NOT-FOUND, taint is lost at one of these leaf callables. If at least one does, the source trace is intact — move to Option B. **If it's a call** with `"resolves_to": [, ...]`: check if those callables have source taint in NOT-FOUND. If there are multiple entries (overrides), check all of them — taint is lost only if ALL are missing: ```bash /tmp/NOT-FOUND get-model --show sources --kind ``` Interpret the results: - **No entries** → taint is lost here or deeper. Keep diving into that callable's model in FOUND and repeat. - **Entries with wrong ports** (e.g., issue has `"port": "formal(a)"` in the `"call"` section, but model has sources for `"formal(b)"`) → these don't count. Taint is still lost. - **Matching entries found** → source trace is intact. Move to Option B. When you find a missing source, double-check it exists in FOUND: ```bash /tmp/FOUND get-model --show sources --kind ``` #### Option B: Sink Trace (Backward) Same process as Option A, but for `"name": "backward"` traces. Use `--show sinks` instead of `--show sources` with `get-model`. #### Diving Deeper (Options A/B) When Option A or B identifies a callee whose model is present in FOUND but missing in NOT-FOUND, you need to find exactly which function in the call chain lost the taint. Recurse into that callee's model: 1. **Examine the callee's model in FOUND**: ```bash # For a missing source (Option A): /tmp/FOUND get-model --show sources --kind # For a missing sink (Option B): /tmp/FOUND get-model --show sinks --kind ``` 2. **Pick the frame with the shortest length** (same strategy as in Option A/B). 3. **Identify the next callee**: if the frame is a `"call"`, use `"resolves_to"` to get the callee. If it's an `"origin"`, use the leaf names. 4. **Check the next callee's model in NOT-FOUND**: ```bash /tmp/NOT-FOUND get-model --show sources --kind ``` 5. **Interpret**: - **Model exists in NOT-FOUND** → taint is lost within the previous callee (e.g., ``). Use the Option C strategy on that callable: read its source code, check its call graph and TITO models to find the break. - **Model missing in NOT-FOUND** → taint is lost deeper. Recurse: go back to step 1 with the next callee. Continue until you find the function where taint is present in its callees but lost in its own model. Then apply Option C's strategy (call graph + TITO checks) to that function to pinpoint the root cause. #### Option C: Taint Lost in Root Callable If both forward and backward traces are correct, taint is lost within the root callable itself. 1. **Read the source code** using `"filename"` and location from the issue. Filenames are usually relative to `~/fbsource`. 2. **Check the call graph**: ```bash /tmp/NOT-FOUND get-call-graph ``` If you know the relevant line numbers (e.g., from the issue location or source code), use `--start-line` and `--end-line` to filter to that range: ```bash /tmp/NOT-FOUND get-call-graph --start-line --end-line ``` This avoids noise from irrelevant call graph edges in large functions. Verify calls to source and sink are present at the right locations. 3. **Check intermediate calls**. For flows like: ```python a = source() b = foo(a) sink(b) ``` `foo` is an intermediate call. Check: - Is `foo` resolved in the call graph? - If resolved (say, to `my_module.foo`), check its TITO model: ```bash /tmp/NOT-FOUND get-model my_module.foo --show tito ``` - Look at propagations. If TITO is missing, keep diving deeper: investigate `foo`'s call graph and models to find where propagation breaks. ### Step 6: Produce Summary Your output must include: 1. **Issue description**: root callable, code, handle, filename 2. **Path to the lost taint**: chain of callables from root to where taint is lost (include all callable names and ports) 3. **Which trace**: source (forward) or sink (backward) 4. **Root cause**: why taint is lost and where exactly ## Quick Reference: Explorer Commands | Command | Purpose | |---------|---------| | ` get-issues --handle ` | Find specific issue | | ` get-issues --code ` | Find issues by code | | ` get-issues --show-leaf-names` | Show leaf callables | | ` get-model --show sources --kind ` | Check source taint | | ` get-model --show sinks --kind ` | Check sink taint | | ` get-model --show tito` | Check TITO propagation | | ` get-call-graph ` | Check call resolution (`--start-line`, `--end-line` to filter by line range) | | ` get-overrides ` | List all overrides for a method | | ` search ` | Search for callables | `` is shorthand for `buck run fbcode//tools/pyre/tools/pysa_model_explorer_cli:pysa_model_explorer --`. Additional flags: `--show-features`, `--show-tito-positions`, `--show-class-intervals`, `--format text`. ## Common Mistakes | Mistake | Fix | |---------|-----| | Searching the wrong trace first | Always check forward (A) → backward (B) → local (C) in order | | Ignoring port mismatches | `formal(a)` ≠ `formal(b)` — matching kind with wrong port means taint is still lost | | Not cross-checking against FOUND | Always confirm the model exists in FOUND before concluding it's missing in NOT-FOUND | | Skipping call graph check | Unresolved calls are a common root cause for lost taint | | Stopping at first missing model | The missing model might itself be caused by a deeper missing propagation — keep diving | | Checking only one `resolves_to` entry | When there are multiple entries (overrides), check ALL of them — taint flows if any has the model | | Using `get-model` on `Overrides{module.Class.foo}` | Override targets don't have models. Use `get-overrides module.Class.foo` to list all overrides, then `get-model` on each override | | Using grep/jq on raw JSON files | Always use `pysa_model_explorer` — it handles indexing and filtering efficiently | | Forgetting `--show-leaf-names` on origins | Without this flag, you can't see which callables originated the taint | | Forgetting `2>/dev/null` on explorer commands | Buck build output clutters results — always append `2>/dev/null` and only remove it to diagnose errors | | Using knowledge_load | Avoid using knowledge_load on the given issue URL, prefer local results |