---
name: ai-tool-designer
description: Guide for designing effective tools for AI agents. Use when creating tools for custom agent systems or any AI tool interfaces. Provides principles for tool naming, input/output design, error handling, and evaluation methodologies that maximize agent effectiveness.
license: Complete terms in LICENSE.txt
---

# AI Agent Tool Designer

## Overview

This skill provides comprehensive guidance for designing tools that AI agents can use effectively. Whether building custom agent tools or any AI-accessible interfaces, these principles maximize agent success in accomplishing real-world tasks.

Note: Use the more specific mcp-builder skill if you want to create an MCP server.

The quality of a tool system is measured not by how comprehensively it implements features, but by how well it enables AI agents to accomplish realistic, complex tasks using only the tools provided.

---

## Agent-Centric Design Principles

Before implementing any tool system, understand these foundational principles for designing tools that AI agents can use effectively:

### 1. Build for Workflows, Not Just API Endpoints

**Principle:** Design thoughtful, high-impact workflow tools rather than simply wrapping existing API endpoints.

**Why it matters:** Agents need to accomplish complete tasks, not just make individual API calls. Tools that consolidate related operations reduce the number of steps agents must take and improve success rates.

**How to apply:**
- Consolidate related operations (e.g., `schedule_event` that both checks availability and creates the event)
- Focus on tools that enable complete tasks, not just individual API calls
- Consider what workflows agents actually need to accomplish, not just what the underlying API offers
- Ask: "What is the user trying to accomplish?" rather than "What does the API provide?"

**Examples:**
- ❌ Bad: Separate tools `check_calendar_availability`, `create_calendar_event`, `send_event_notification`
- ✅ Good: Single tool `schedule_event` with parameters for checking conflicts and sending notifications

### 2. Optimize for Limited Context

**Principle:** Agents have constrained context windows - make every token count.

**Why it matters:** When agents run out of context, they fail to complete tasks. Verbose tool outputs force agents to make difficult decisions about what information to keep or discard.

**How to apply:**
- Return high-signal information, not exhaustive data dumps
- Provide "concise" vs "detailed" response format options (default to concise)
- Default to human-readable identifiers over technical codes (names over IDs when possible)
- Consider the agent's context budget as a scarce resource
- Implement character limits and graceful truncation (typically 25,000 characters)
- Use pagination with reasonable defaults (20-50 items)

**Examples:**
- ❌ Bad: Return all 50 fields from user object including metadata, internal IDs, timestamps in multiple formats
- ✅ Good: Return name, email, role, and key status fields; offer `detailed=true` parameter for full data

### 3. Design Actionable Error Messages

**Principle:** Error messages should guide agents toward correct usage patterns, not just report failures.

**Why it matters:** Agents learn tool usage through feedback. Clear, educational errors help agents self-correct and succeed on retry.

**How to apply:**
- Suggest specific next steps in error messages
- Make errors educational, not just diagnostic
- Include examples of correct usage when parameters are invalid
- Guide agents toward solutions: "Try using filter='active_only' to reduce results"
- Avoid technical jargon; use natural language

**Examples:**
- ❌ Bad: "Error 400: Invalid request"
- ✅ Good: "The limit parameter must be between 1-100. You provided 500. Try using limit=50 and pagination with offset to retrieve more results."

### 4. Follow Natural Task Subdivisions

**Principle:** Tool names and organization should reflect how humans think about tasks, not just API structure.

**Why it matters:** Agents use tool names and descriptions to decide which tool to call. Natural naming improves tool discovery and reduces wrong tool selections.

**How to apply:**
- Tool names should reflect human mental models of tasks
- Group related tools with consistent prefixes for discoverability
- Design tools around natural workflows, not just API structure
- Use action-oriented naming: `search_users`, `create_project`, `send_message`
- Include service/system prefix to avoid conflicts: `slack_send_message` not just `send_message`

**Examples:**
- ❌ Bad: `api_endpoint_users_post`, `api_endpoint_users_get`, `api_endpoint_users_delete`
- ✅ Good: `create_user`, `search_users`, `delete_user`

### 5. Use Evaluation-Driven Development

**Principle:** Create realistic evaluation scenarios early and let agent feedback drive tool improvements.

**Why it matters:** Only by testing tools with actual agents can you discover usability issues. Prototype quickly and iterate based on real agent performance.

**How to apply:**
- Create 10+ complex, realistic questions agents should answer using your tools
- Test with actual AI agents attempting to solve these questions
- Observe where agents struggle, make mistakes, or run out of context
- Iterate on tool design based on agent feedback
- Measure success by agent task completion rate, not feature completeness

**Process:**
1. Build initial tools based on these principles
2. Create evaluation questions (see [Evaluation Guide](./references/evaluation_guide.md))
3. Test with agents
4. Identify failure patterns
5. Refine tools
6. Repeat

---

## Tool Design Framework

Follow this systematic framework when designing any tool for AI agents:

### Phase 1: Planning

**1. Identify Core Workflows**
- List the most valuable operations agents need to perform
- Prioritize tools that enable the most common and important use cases
- Consider which tools work together to enable complex workflows

**2. Design Input Schemas**
- Use strong validation (dry-validation for Ruby, JSON Schema)
- Include proper constraints (min/max length, regex patterns, ranges)
- Provide clear, descriptive field descriptions with examples
- Set sensible defaults to reduce required parameters

**3. Design Output Formats**
- Support multiple formats (JSON for programmatic, Markdown for human-readable)
- Define consistent response structures across similar tools
- Plan for large-scale usage (thousands of users/resources)
- Implement character limits and truncation strategies
- Include pagination metadata (`has_more`, `next_offset`, `total_count`)

**4. Plan Error Handling**
- Design clear, actionable, agent-friendly error messages
- Handle authentication and authorization errors gracefully
- Consider rate limiting and timeout scenarios
- Provide guidance on how to proceed after errors

### Phase 2: Implementation

**Tool Naming Conventions:**
- Use snake_case: `search_users`, `create_project`
- Include service prefix: `github_create_issue`, `slack_send_message`
- Be action-oriented: start with verbs (get, list, search, create, update, delete)
- Be specific: avoid generic names that could conflict

**Tool Descriptions:**
Write comprehensive descriptions that include:
- One-line summary of what the tool does
- Detailed explanation of purpose and functionality
- When to use this tool (and when NOT to use it)
- Parameter descriptions with examples
- Return value schema
- Error handling guidance

**Tool Annotations** (if supported by your system):
- `readOnlyHint: true` for read-only operations
- `destructiveHint: false` for non-destructive operations
- `idempotentHint: true` if repeated calls have same effect
- `openWorldHint: true` if interacting with external systems

### Phase 3: Refinement

**Code Quality Checklist:**
- ✅ No duplicated code between tools (DRY principle)
- ✅ Shared logic extracted into reusable functions
- ✅ Similar operations return similar formats (consistency)
- ✅ All external calls have error handling
- ✅ Full type coverage (type hints, TypeScript types)
- ✅ Every tool has comprehensive documentation

**Testing:**
- Test with valid and invalid inputs
- Test error handling paths
- Test with real AI agents using evaluation questions
- Test pagination and large result sets
- Test character limits and truncation

---

## Response Format Guidelines

All tools that return data should support multiple formats for flexibility:

### JSON Format (`response_format="json"`)
**Purpose:** Machine-readable structured data for programmatic processing

**Best practices:**
- Include all available fields and metadata
- Use consistent field names and types
- Suitable for when agents need to process data further
- Return IDs alongside names for precision

**Example:**
```json
{
  "users": [
    {
      "id": "U123456",
      "name": "John Doe",
      "email": "john@example.com",
      "role": "developer",
      "active": true
    }
  ],
  "total": 150,
  "count": 20,
  "has_more": true,
  "next_offset": 20
}
```

### Markdown Format (`response_format="markdown"`, typically default)
**Purpose:** Human-readable formatted text for user presentation

**Best practices:**
- Use headers, lists, and formatting for clarity
- Convert timestamps to readable format ("2024-01-15 10:30 UTC" vs epoch)
- Show display names with IDs in parentheses ("@john.doe (U123456)")
- Omit verbose metadata (show one profile image URL, not all sizes)
- Group related information logically
- Use when presenting information to end users

**Example:**
```markdown
## Users (20 of 150)

- **John Doe** (@john.doe)
  - Email: john@example.com
  - Role: Developer
  - Status: Active

- **Jane Smith** (@jane.smith)
  - Email: jane@example.com
  - Role: Designer
  - Status: Active

*Showing 20 results. Use offset=20 to see more.*
```

---

## Pagination Best Practices

For tools that list resources:

**Implementation requirements:**
- Always respect the `limit` parameter (never load all results when limit specified)
- Implement offset-based or cursor-based pagination
- Return pagination metadata: `has_more`, `next_offset`/`next_cursor`, `total_count`
- Never load all results into memory for large datasets
- Default to reasonable limits (20-50 items typical)

**Response structure:**
```json
{
  "items": [...],
  "total": 150,
  "count": 20,
  "offset": 0,
  "has_more": true,
  "next_offset": 20
}
```

**Clear guidance in responses:**
Include instructions for getting more data:
- "Showing 20 of 150 results. Use offset=20 to see the next page."
- "Results truncated. Add filters to narrow the search."

---

## Character Limits and Truncation

To prevent overwhelming context windows:

**Implementation:**
- Define CHARACTER_LIMIT constant (typically 25,000 characters)
- Check response size before returning
- Truncate gracefully with clear indicators
- Provide guidance on how to filter/paginate for complete results

**Example handling:**
```ruby
CHARACTER_LIMIT = 25_000

if result.length > CHARACTER_LIMIT
  truncated_data = data[0...[1, data.length / 2].max]
  response[:truncated] = true
  response[:truncation_message] =
    "Response truncated from #{data.length} to #{truncated_data.length} items. " \
    "Use 'offset' parameter or add filters like status='active' to see more."
end
```

---

## Input Validation Best Practices

**Security and usability:**
- Validate all parameters against schema before processing
- Sanitize file paths to prevent directory traversal
- Validate URLs and external identifiers
- Check parameter sizes and ranges
- Prevent command injection in system calls
- Return clear validation errors with examples of correct format

**Schema design:**
- Use strong validation (dry-validation, JSON Schema)
- Include constraints (minLength, maxLength, pattern, minimum, maximum)
- Provide detailed field descriptions with examples
- Mark required vs optional parameters clearly
- Set sensible defaults where possible

---

## Resources

This skill includes reference documentation for deeper exploration:

### references/tool_design_patterns.md
Comprehensive patterns and anti-patterns for common tool design scenarios with detailed examples.

### references/evaluation_guide.md
Complete methodology for creating evaluation questions that test tool effectiveness with AI agents, including how to run evaluations and interpret results.

---

## Further Reading

For detailed examples and advanced patterns:
- [Tool Design Patterns](./references/tool_design_patterns.md) - Comprehensive patterns and examples
- [Evaluation Guide](./references/evaluation_guide.md) - Testing methodology and evaluation creation