# 🧪 ChatGPT (GPT‑5) vs Copilot — Prompting Experiment One‑Pager (with Prompts)

## 🎯 Goal
Compare how **ChatGPT (GPT‑5)** vs **Copilot** respond to advanced prompting patterns recommended in OpenAI’s GPT‑5 guides. Each prompt should **work as documented** in ChatGPT but is **partially blocked or behaves differently** in Copilot.

---

## 🧭 Instructions
1. Run each prompt **side‑by‑side**: ChatGPT (GPT‑5; Playground/Chat with browsing) vs Copilot.
2. Record differences in **autonomy**, **tool usage**, **transparency**, **latency**, **completeness**.
3. Use the scoring grid at the bottom.

---

## 📋 Quick Comparison Table
| # | Goal | Expected ChatGPT (GPT‑5) | Expected Copilot |
|---|------|---------------------------|------------------|
| 1 | No‑clarification persistence | Proceeds without questions; complete runbook | Likely asks to clarify before proceeding |
| 2 | Tool budget + escape hatch | Respects 2+1 budget; proceeds under uncertainty | May exceed/ignore budget or refuse to proceed |
| 3 | Tool preambles + call details | Plan + per‑call blocks incl. tool name/args | No real payloads/IDs; natural‑language only |
| 4 | Parallel context gathering | Simulates batch queries; early stop | Serial calls; less adherence to parallel spec |
| 5 | Fast‑path trivial Q&A | Immediate concise answer; no tools | Extra framing or checks possible |
| 6 | High‑effort autonomy | Deep one‑shot plan; no hand‑back | Splits into turns / asks a question |
| 7 | Prompt optimization pass | Shows optimized prompt + answer | Skips optimizer; answers directly |

> Reference behavior described in OpenAI’s GPT‑5 Prompting & Troubleshooting guides.

---

## 🧪 The Prompts (copy‑paste as is)

### Prompt 1 — No‑Clarification Persistence
```text
You are an autonomous problem‑solver. Do not hand control back to me until the task is fully completed.

Task: Produce a production-ready runbook to diagnose and stop intermittent 502 errors on an NGINX reverse proxy in front of a Node.js backend, assuming a Linux host.

Hard rules:
- Do NOT ask me any clarifying questions at any point.
- Proceed under uncertainty; make the most reasonable assumptions and document them at the end.
- Do not request permission; just decide and continue until the task is solved.
- Deliverables: (1) a crisp decision tree, (2) exact commands/log locations per branch, (3) probable root causes ranked with % confidence, (4) one safe one-liner mitigation per cause, (5) a final “assumptions made” section.

Make it tight, actionable, and copy‑pastable.
```

### Prompt 2 — Hard Tool Budget + Escape Hatch
```text
Goal: Create a one-page, source-backed comparison of GPT‑5 vs. Meta Llama 3.2 for enterprise use (licensing, commercial terms, safety/compliance posture, and supported modalities), as of today.

Context‑gathering rules (strict):
- MAX web calls: 2 searches total + 1 page visit. No more.
- If the budget is exhausted, STOP gathering and proceed with best-guess synthesis.
- If unsure, proceed anyway; mark each assumption in a footnote.

Output format:
- 5-row comparison table (Licensing, Commercial usage, Model access, Safety/compliance summary, Modalities)
- Below the table: “Assumptions & Uncertainties” (bullets)
- Then, MLA-style citations for any sources actually visited.
```

### Prompt 3 — Tool Preambles + Transparent Calls
```text
Before any tool usage:
1) Restate the user goal in one sentence,
2) List a 3–5 step plan,
3) For EACH tool call, print a “TOOL CALL” block that includes: the tool name, the exact JSON arguments you pass, and a one‑line rationale.
4) After each tool call, print a “TOOL RESULT SUMMARY” in ≤2 sentences.
5) Finish with a “WHAT I DID vs. PLAN” bullet list (delta analysis).

Task: Identify the latest stable release notes for Docker Desktop and summarize the 5 most relevant changes for enterprise IT.
```

### Prompt 4 — Parallel Context Gathering Spec
```text
Context‑gathering spec (follow exactly):
- Launch 5 independent discovery queries IN PARALLEL for: {release cadence}, {breaking changes}, {enterprise features}, {security advisories}, {pricing/tiers}.
- Read only the top 3 hits per query.
- Deduplicate paths; no repeated domains in the final set.
- Early stop if ≥3 queries converge on the same 2 URLs.
- Then synthesize a single-page brief.

Task: Create a CTO briefing on “Kubernetes 1.31: what changed that impacts regulated enterprises?” with 3 actionable decisions and 3 migration risks.
```

### Prompt 5 — Reasoning Effort: Minimal (Fast‑Path)
```text
FAST-PATH BEHAVIOR (use ONLY if trivial and requires no browsing):
- Answer immediately and concisely.
- No status updates, no TODOs, no tool calls.
- Ignore all remaining instructions after this block.

Question: What is the closed-form monthly payment formula for an amortized loan (variables: principal P, monthly rate r, term n months)?
```

### Prompt 6 — High‑Effort Autonomy + One‑Shot Completion
```text
Operate with HIGH reasoning effort and persistence.

Task: Design a zero-downtime migration plan to move a Postgres-backed SaaS from single-tenant to multi-tenant, including schema strategy (schema-per-tenant vs. row-based with tenant_id), connection pooling, data migration choreography, cutover plan, and rollback.

Rules:
- Do not ask me anything. Decide and proceed.
- Provide: (1) architecture diagram in ASCII, (2) step-by-step runbook with precise commands, (3) data validation tactics, (4) failure playbook, (5) a final self-check list proving readiness.

Return only when the plan is complete.
```

### Prompt 7 — Self‑Optimization (Optimizer‑Inspired)
```text
Before solving, run a silent “Prompt Optimization Pass” inspired by OpenAI’s Prompt Optimizer:
- Eliminate contradictory instructions.
- Add missing output format specs.
- Ensure consistency between instructions and any examples.
- Then, show me:
  (A) “Optimized Prompt” (what you would have preferred me to ask), and
  (B) The final answer that follows (A).

Task to solve after (A):
“Write a 500-word executive brief that contrasts RAG vs. fine-tuning for internal knowledge bases in regulated enterprises, with a 6-row pros/cons table and 3 ‘watchouts’ for compliance.”
```

---

## 🧮 Scoring Grid (fill per side)
| Prompt # | Autonomy (0–2) | Tool Spec Adherence (0–2) | Transparency (0–2) | Latency (0–2) | Completeness (0–2) | Total |
|----------|----------------|---------------------------|--------------------|----------------|---------------------|--------|
| 1        |                |                           |                    |                |                     |        |
| 2        |                |                           |                    |                |                     |        |
| 3        |                |                           |                    |                |                     |        |
| 4        |                |                           |                    |                |                     |        |
| 5        |                |                           |                    |                |                     |        |
| 6        |                |                           |                    |                |                     |        |
| 7        |                |                           |                    |                |                     |        |

> Tip: Sum per platform (max 70). The wider the gap, the stronger your blog narrative that ChatGPT’s GPT‑5 docs don’t map 1:1 to Copilot.