--- name: macro-agent description: "Desktop macro control with image recognition. Commands: find, search, click-on, click, move-to, write, press, hotkey, scroll, drag, screenshot, region-capture, seq-create/add/run/list/delete. ALWAYS uses template matching (image search), NEVER fixed coordinates. Use 'region-capture' to capture new elements." --- # Macro Agent Desktop automation and UI control skill with **image recognition**. ## 🚨 CRITICAL: How to Handle User Requests **BEFORE doing ANY action, ALWAYS check if a sequence exists for it:** 1. **FIRST** run `seq-list` to see available sequences 2. **LOOK** for sequences that match the user's intent (e.g., `whatsapp_send_marco` for "send message to Marco") 3. **IF sequence exists**: Use `seq-run ` then add your custom actions (write message, press enter) 4. **IF NO sequence exists**: Then use individual commands ### Common Workflow: Send Message to Contact When user says "send message to X" or "envΓ­a mensaje a X": ``` 1. seq-list # Check available sequences 2. seq-run whatsapp_send_ # Run the messaging sequence 3. write "" # Type the message 4. press enter # Send it ``` **NEVER** use `hotkey super` or manual navigation when a sequence exists! ### Available Sequences (check with seq-list) The user has pre-configured sequences for common tasks. Always check them first! - `whatsapp_send_ross` - Opens WhatsApp and selects Ross contact - `whatsapp_send_marco` - Opens WhatsApp and selects Marco contact - Other sequences may exist - always run `seq-list` first! ## 🎯 How Element Detection Works When using `click-on` or `move-to`, the agent **ALWAYS** uses image recognition: 1. Searches for element image on screen (template matching) 2. If not found β†’ **FAILS** (no fallback to coordinates) This ensures elements are found dynamically based on their actual position. **Output includes `method` field:** - `image` = Found by template matching βœ… - `not_found` = Image not visible on screen ❌ **If element not found:** You need to capture it first with `region-capture`. ## ⚠️ Important **NO "navigate" command exists**. To navigate: 1. `find ` - Search for element info 2. `click-on ` - Click using image recognition (ALWAYS) ## Usage ```bash python ~/.copilot/skills/macro-agent/macro_agent.py [args] ``` ## Commands Reference | Action | Command | Example | |--------|---------|---------| | Search element | `find ` | `find brave` | | Search text | `search ` | `search save` | | Click element | `click-on ` | `click-on brave` | | Click coords | `click X Y` | `click 500 300` | | Move to element | `move-to ` | `move-to button` | | Move to coords | `move X Y` | `move 500 300` | | Write text | `write ` | `write "hello"` | | Press key | `press ` | `press enter` | | Hotkey | `hotkey ` | `hotkey ctrl c` | | Scroll | `scroll N` | `scroll -3` | | Screenshot | `screenshot ` | `screenshot test` | | Region capture | `region-capture` | `region-capture` | ## Sequence Commands | Command | Description | |---------|-------------| | `seq-create ` | Create new sequence | | `seq-add ""` | Add action to sequence | | `seq-show ` | View sequence | | `seq-run ` | Execute sequence | | `seq-list` | List all sequences | | `seq-delete ` | Delete sequence | ## Output JSON with: - `success`: true/false - `action`: Command executed - `target`: Element name (if applicable) - `coordinates`: {x, y} position - `message`: Result description ## Data Locations - **Elements**: `~/.copilot/skills/macro-agent/data/elements.json` (elemento definitions) - **Captures**: `~/.copilot/skills/macro-agent/data/captures/` (template images) - **Sequences**: `~/.copilot/skills/macro-agent/data/sequences/` (action sequences) ## Examples ### Find and Click App ```bash python ~/.copilot/skills/macro-agent/macro_agent.py find chrome python ~/.copilot/skills/macro-agent/macro_agent.py click-on chrome ``` ### Type and Submit ```bash python ~/.copilot/skills/macro-agent/macro_agent.py write "search query" python ~/.copilot/skills/macro-agent/macro_agent.py press enter ``` ### Keyboard Shortcut ```bash python ~/.copilot/skills/macro-agent/macro_agent.py hotkey ctrl shift s ``` ### Create and Run Sequence ```bash python ~/.copilot/skills/macro-agent/macro_agent.py seq-create my_macro python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "click-on file_menu" python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "wait 0.5" python ~/.copilot/skills/macro-agent/macro_agent.py seq-add my_macro "click-on save_option" python ~/.copilot/skills/macro-agent/macro_agent.py seq-run my_macro ``` ## Capture New Elements ```bash python ~/.copilot/skills/macro-agent/macro_agent.py region-capture ``` Keys: `f`=freeze, `c`/`Space`=capture, `+/-`=resize, `q`/`ESC`=quit ## πŸ“± Example: Send WhatsApp Message User says: "EnvΓ­a mensaje a Marco diciendo hola" **CORRECT approach:** ```bash # 1. First check sequences seq-list # 2. Found whatsapp_send_marco! Run it seq-run whatsapp_send_marco # 3. Type and send write "hola" press enter ``` **WRONG approach (NEVER do this):** ```bash # ❌ WRONG - Don't manually navigate! hotkey super wait 500 # This is stupid, use sequences! ``` ## πŸ”„ Decision Flow ``` User Request ↓ Run seq-list ↓ Sequence exists? ──YES──→ seq-run β†’ Additional actions (write, press) ↓ NO Use individual commands (click-on, write, press, etc.) ```