--- name: mem0-test-integration description: > Verify a Mem0 integration produced by /mem0-integrate. Runs in the same workspace on the same branch (loose coupling) — installs dependencies, runs the repo's native test suite, then exercises a real end-to-end smoke flow against the user's API key. Produces a scorecard. TRIGGER when: user has just run /mem0-integrate and says "verify", "test the integration", "run /mem0-test-integration", or when a .mem0-integration/ directory exists and tests have not been run yet on the current branch. DO NOT TRIGGER when: the user wants to run general project tests (defer to the repo's native test command), or when no prior /mem0-integrate run exists in the current branch (ask them to run /mem0-integrate first). This skill ONLY catches compile and runtime bugs by design. Logical integration errors — wrong data stored, wrong time retrieved, wrong user scoping — are on the human reviewer. license: Apache-2.0 metadata: author: mem0ai version: "0.1.0" category: ai-memory tags: "memory, integration, testing, tdd, platform, oss" coupling: loose mem0_tested_versions: "mem0ai (PyPI) >=2.0.0,<3.0.0; mem0ai (npm) >=3.0.0,<4.0.0" --- # mem0-test-integration Verifies what `/mem0-integrate` produced. Runs in the same workspace, on the same feature branch. Loose coupling — fast, catches compile and runtime bugs, does not catch logical errors. ## Canonical sources (use these, not ambient knowledge) All static checks and smoke-test shapes validate against these URLs. `WebFetch` each before running step 3. - Scope-tagged docs index: https://docs.mem0.ai/llms.txt - OpenAPI (Platform REST): https://docs.mem0.ai/openapi.json - Published SDK skill (canonical call patterns): https://raw.githubusercontent.com/mem0ai/mem0/main/skills/mem0/SKILL.md - Vercel AI SDK skill (if the target repo uses `@ai-sdk/*`): https://raw.githubusercontent.com/mem0ai/mem0/main/skills/mem0-vercel-ai-sdk/SKILL.md - SDK source (cross-check version against frontmatter `mem0_tested_versions`): - Repo root: https://github.com/mem0ai/mem0 - Python: https://github.com/mem0ai/mem0/tree/main/mem0 - TypeScript: https://github.com/mem0ai/mem0/tree/main/mem0-ts Read the `Delegated skill:` field in `.mem0-integration/plan.md` — if it names a skill URL, fetch that skill and use its example blocks as the reference for both static checks (step 3) and the smoke test (step 5). ## Non-invasiveness contract Every check in this skill assumes the integration is **additive and feature-flagged** (see `/mem0-integrate` "Integration principles"). Specifically: - `product.json` must contain a `feature_flag` field. - Steps 4–6 run in two passes: - **Pass A — flag unset.** All pre-existing tests must pass, smoke/E2E skip. The repo must behave like `main`. Any failure here is a **hard fail** — do not let the self-heal loop attempt a patch. - **Pass B — flag set.** New tests must pass, smoke and E2E run. - If Pass A fails, the scorecard marks `non_invasive: false` and sets `overall: fail` with a distinct reason code the integrator's heal loop refuses to touch. ## Preconditions Refuse to start unless ALL of the following are true: - `.mem0-integration/` directory exists in the repo root. - `.mem0-integration/product.json`, `goal.md`, and `plan.md` are readable and internally consistent (JSON parses, docs non-empty). - Current branch name begins with `mem0-integrate/` (set by the companion skill). Prevents accidental runs on unrelated branches. - Working tree is clean. The skill never modifies source files; any dirty state means the integration is mid-edit and not ready to verify. - The same API key the integration used is available in the environment (`MEM0_API_KEY` for Platform, `OPENAI_API_KEY` for OSS — read which from `product.json`). Interactive mode asks if missing; CI mode exits 2. Exit with a written rationale on any precondition failure. Never attempt to "fix up" state. ## Pipeline ### 1. Read the contract Load: - `product.json` → which language, which product (Platform vs OSS), which mem0 version, `write_site`, `read_site`. - `plan.md` → the mechanical contract (write pattern, read pattern, preserved behavior). - `goal.md` → the intent (displayed in the scorecard only; not tested). ### 2. Install dependencies Route by language from `product.json`: | Language | Command | |---|---| | Python | `pip install -e .` if editable, else `pip install -r requirements.txt`. Then `pip install mem0ai` if not already present at the pinned version. | | TypeScript / JavaScript | `npm install` (or `pnpm install` / `yarn install` if detected by lockfile). | If install fails → exit code 2 with stderr tail. Never move to testing if dependencies don't resolve. ### 3. Static sanity checks (fast, local, no API calls) - **Import check**: does the write-site file import the expected Mem0 surface? Authoritative list comes from `## Identify the User's Setup` in `https://docs.mem0.ai/llms.txt`: - Platform Python → `from mem0 import MemoryClient` - Platform TS → `import MemoryClient from "mem0ai"` - OSS Python → `from mem0 import Memory` - OSS TS → `import { Memory } from "mem0ai/oss"` If `plan.md` names a delegated skill (e.g., Vercel AI), use *that* skill's import signature instead of the list above. Mismatch → fail with line number. - **Version check**: installed `mem0ai` version falls in the range from this skill's `mem0_tested_versions`. Out of range → warn but continue. - **Type check** (TS tracks only): run `tsc --noEmit` or `tsup --dts`. Non-zero → fail. - **Lint** (if the repo has a linter configured): run the repo's own lint command. Lint failures from this skill's changes → fail; pre-existing lint failures → surface as a warning. - **Eager-init check**: grep the `write_site` and `read_site` files (paths from `product.json`) for `MemoryClient(` or `Memory(` at module scope — i.e., not inside a function, method, or class body. `MemoryClient()` validates the API key in `__init__` (network call) and OSS `Memory()` can eagerly initialize embedding/LLM providers — module-level instantiation hits the wire on import and breaks Pass A's test collection whenever the key is unset. Hit → fail with `file:line` and the lazy-init guidance from `/mem0-integrate` step 8 constraint #7. ### 4. Run the repo's native test suite (two passes) | Language | Test command (in priority order) | |---|---| | Python | `pytest` with the test files from step 5 of the companion skill, else `python -m unittest discover`. | | TypeScript / JavaScript | `npm test` if defined in package.json; else auto-detect `vitest` or `jest`. | **Pass A — `feature_flag` unset.** Run the *entire* pre-existing suite (excluding the new `test_mem0_*` files). **Must be 100% green.** Any failure here marks `non_invasive: false` in the scorecard and is a **hard fail** — the integrator's self-heal loop refuses to touch it. **Pass B — `feature_flag` set** (value from `product.json`). Run the full suite including the new tests. All must pass. Isolate integration-introduced failures using `git diff main..HEAD --name-only`. A test file that exists on `main` and fails only under the integration branch (flag set *or* unset) counts against the scorecard regardless of pass. A test file that already failed on `main` is surfaced as `pre_existing_unrelated` and does not count — but is still reported so the user can clean it up. Capture output to `.mem0-integration/test-stdout-flag-off.log` and `.mem0-integration/test-stdout-flag-on.log`. Scorecard reports pass/fail per pass. ### 5. Smoke test (real API call, shortest round-trip) Scripted end-to-end flow tailored to `product.json`. The call shapes below are the minimal ones; if `plan.md` names a delegated skill, use *that skill's* minimal example verbatim instead — it is the canonical shape for the detected stack. **Platform (Python):** from mem0 import MemoryClient c = MemoryClient() # uses MEM0_API_KEY uid = f"mem0-test-integration-{os.urandom(4).hex()}" c.add([{"role": "user", "content": "I prefer aisle seats"}], user_id=uid) hits = c.search("seat preference", user_id=uid) assert any("aisle" in h.get("memory", "") for h in hits), hits c.delete_all(user_id=uid) # clean up **Platform (TS):** same shape with `MemoryClient` from `"mem0ai"`. **OSS (Python / TS):** uses `Memory()` / `new Memory()` with default config (OpenAI LLM via `OPENAI_API_KEY`, local Qdrant). If the repo ships a `docker-compose.yml` with a Qdrant service, the skill starts it first and tears it down after. If no backing store is reachable → fail with a clear message naming the fix. The smoke test always uses a **disposable random user_id** prefixed with `mem0-test-integration-` so a failed cleanup doesn't pollute the user's real data. A background tidy step deletes any prefix-matching entries older than 24 hours on the next run. Capture output to `.mem0-integration/smoke-stdout.log`. ### 6. E2E integration test (run the app, exercise the flow) Unit tests + smoke prove the SDK works in isolation. This step is the real signal: **does memory actually appear in the app's user-visible output when the integration runs end-to-end?** Requires `plan.md` to contain an `E2E recipe:` section (authored by `/mem0-integrate` step 5). If absent → status `skipped` (not `fail`), note in scorecard that the repo has no runnable entry point. Recipe fields the skill reads: - `start` — shell command to launch the app using `$PORT` for any network port. Run in background with stdout/stderr teed to `.mem0-integration/e2e-app.log`. - `ready_probe` — how to detect readiness. `url=... status=...` polls an HTTP endpoint; `log="..."` waits for a substring in `e2e-app.log`; `sleep=N` waits N seconds (last resort). 60-second hard timeout. - `compose_services` — optional. If set, bring them up via `docker compose up -d ` before `start`, tear them down with `docker compose down` at the end. - `write_call` — triggers the Mem0 write path exactly once. Output is captured and surfaced on failure. 60-second hard timeout. - `write_async_wait_ms` — pause after `write_call` to let async memory flushes land. Default 0. - `read_call` — triggers the Mem0 read path. Typically a fresh session or new request that should surface the stored memory. - `read_assert` — substring, `regex=...`, or `jsonpath==` that must appear in `read_call`'s stdout. This is the E2E pass gate. Execution order: 1. Allocate an ephemeral TCP port; export as `PORT`. 2. Set `MEM0_USER_ID` to a disposable `mem0-test-integration-` value and export it, so the app can use the same scoping the smoke test does if the recipe wants cleanup. 3. Bring up `compose_services` if named. 4. Run `start` in the background. 5. Poll `ready_probe` until success or 60s timeout. Timeout → fail. 6. Run `write_call`. Non-zero exit → fail (but continue to cleanup). 7. Sleep `write_async_wait_ms`. 8. Run `read_call`. 9. Evaluate `read_assert` against `read_call`'s stdout. Miss → fail. 10. Cleanup (always, even on failure): SIGTERM the app, SIGKILL after 5s, `docker compose down` if services were started, `delete_all` memories matching `mem0-test-integration-*` on Platform scenarios. On any failure, the scorecard includes: - Last 40 lines of `e2e-app.log` - Full `write_call` output - Full `read_call` output - The expected vs actual for `read_assert` ### 7. Scorecard Write `.mem0-integration/scorecard.md` and `.mem0-integration/scorecard.json`: { "timestamp": "2026-04-20T14:03:11Z", "branch": "mem0-integrate/remember-user-preferences", "product": "platform", "language": "python", "mem0_version": "2.0.0", "non_invasive": true, "feature_flag": "MEM0_ENABLED", "results": { "install": {"status": "pass", "duration_ms": 12043}, "static_checks":{"status": "pass", "duration_ms": 812}, "unit_tests_flag_off": {"status": "pass", "duration_ms": 3920, "count": 47, "reason": "all pre-existing tests green with flag unset"}, "unit_tests_flag_on": {"status": "pass", "duration_ms": 4321, "count": 49}, "smoke_test": {"status": "pass", "duration_ms": 2890, "memory_id": "mem_..."}, "e2e_test": {"status": "pass", "duration_ms": 14200, "ready_probe_ms": 3100, "write_exit": 0, "read_assert_matched": true} }, "friction": { "dependency_install_retries": 0, "pre_existing_test_failures": 0, "warnings": ["mem0ai 2.0.0 pinned; consider 2.0.1 for fix X"] }, "overall": "pass" } The markdown version is human-readable and includes: - Goal doc + plan doc reprinted at top (so reviewers don't have to hunt). - Each check with pass/fail + log excerpt. - Friction summary. - Verbatim warnings from mem0 SDK (if any — e.g., deprecated field usage). - **Explicit "NOT checked" section** listing what loose coupling misses: "Whether the stored data is what the user wants stored. Whether search runs at the right moment. Whether user_id matches the actual session scope. Human review required." ### 8. Report + exit - Print the scorecard path + overall pass/fail to stdout. - **Do not commit the scorecard files.** They live in `.mem0-integration/`, which is gitignored. The user can inspect and optionally pin. - On fail: print the first failing step's log tail (last 40 lines) and stop. Do not attempt to fix anything. ## Artifacts (all under `.mem0-integration/`) | File | Purpose | Retention | |---|---|---| | `scorecard.md` | Human-readable verdict. | Overwritten per run. | | `scorecard.json` | Machine-readable verdict. Consumed by the CI scorecard workflow later. | Overwritten per run. | | `test-stdout-flag-off.log` | Step 4 Pass A (pre-existing suite, flag unset). | Overwritten per run. | | `test-stdout-flag-on.log` | Step 4 Pass B (full suite, flag set). | Overwritten per run. | | `smoke-stdout.log` | Full output from step 5. | Overwritten per run. | | `e2e-app.log` | Background app stdout/stderr from step 6. | Overwritten per run. | | `e2e-calls.log` | write_call + read_call invocations and outputs. | Overwritten per run. | ## Modes | Mode | Trigger | Behavior | |---|---|---| | Interactive (default) | TTY present, `MEM0_TEST_CI` unset | Asks for missing keys, prints friendly summaries. | | CI | `MEM0_TEST_CI=1` | Keys must be in env, no prompts, non-zero exit on any fail. JSON scorecard goes to stdout's tail for workflow parsing. | ## Invocation /mem0-test-integration # interactive, all steps /mem0-test-integration --ci # non-interactive /mem0-test-integration --skip-smoke # no API calls, no E2E /mem0-test-integration --skip-e2e # unit + smoke only (faster CI) /mem0-test-integration --only-smoke # just smoke /mem0-test-integration --only-e2e # just E2E (assumes deps installed) Composition: `--skip-*` can stack (`--skip-smoke --skip-e2e` = static + unit only, zero API cost). `--only-*` is mutually exclusive with all other flags. ## Exit codes | Code | Meaning | |---|---| | 0 | All checks passed. | | 1 | Precondition failed (no `.mem0-integration/`, wrong branch, dirty tree). | | 2 | Missing env key (CI mode) or dependency install failure. | | 3 | Static sanity check failed (wrong import, type error). | | 4 | Unit tests failed (Pass B — integration itself broken). | | 5 | Smoke test failed. | | 6 | E2E test failed (ready_probe timeout, write/read call failed, or read_assert miss). | | 7 | Non-invasiveness violation: Pass A failed (pre-existing tests broke). Integrator's heal loop refuses to touch this. | | 8 | Internal error (skill bug — report it). | ## Explicitly out of scope - **Modifying source files.** The skill is read-only against the repo. If verification exposes a bug, re-run `/mem0-integrate` on the same goal + plan; do not hand-patch. - **Fixing broken tests.** Failing unit tests are a signal that the integration is wrong, not that the tests are wrong. The skill does not "try a different test." - **Deep logical correctness.** The E2E step proves "something the user said earlier comes back later," which is a useful but shallow signal. It does NOT prove the integration picks the *right* facts to store, scopes `user_id` correctly across real users, or handles conflict resolution well. That's human review territory. - **Self-healing.** This skill never modifies source files. The paired `/mem0-integrate` skill in its default `--heal` mode consumes the scorecard produced here and drives its own remediation loop. Exit code 7 (non-invasiveness violation) is the explicit signal the heal loop must stop and surface to the user. - **Cross-branch comparisons.** No `main` baseline diffing. The scorecard reflects this branch only. - **Running against production data.** Every smoke test uses a disposable random user_id and cleans up after. Never touches any other user's data.