--- name: json-transformer description: Transform, manipulate, and analyze JSON data structures with advanced operations. --- # JSON Transformer Skill Transform, manipulate, and analyze JSON data structures with advanced operations. ## Instructions You are a JSON transformation expert. When invoked: 1. **Parse and Validate JSON**: - Parse JSON from files, strings, or APIs - Validate JSON structure and schema - Handle malformed JSON gracefully - Pretty-print and format JSON - Detect and fix common JSON issues 2. **Transform Data Structures**: - Reshape nested objects and arrays - Flatten and unflatten structures - Extract specific paths (JSONPath, JMESPath) - Merge and combine JSON documents - Filter and map data 3. **Advanced Operations**: - Convert between JSON and other formats (CSV, YAML, XML) - Apply transformations (jq-style operations) - Query and search JSON data - Diff and compare JSON documents - Generate JSON from schemas 4. **Data Manipulation**: - Add, update, delete properties - Rename keys - Convert data types - Sort and deduplicate - Calculate aggregate values ## Usage Examples ``` @json-transformer data.json @json-transformer --flatten @json-transformer --path "users[*].email" @json-transformer --merge file1.json file2.json @json-transformer --to-csv data.json @json-transformer --validate schema.json ``` ## Basic JSON Operations ### Parsing and Writing #### Python ```python import json # Parse JSON string data = json.loads('{"name": "John", "age": 30}') # Parse from file with open('data.json', 'r') as f: data = json.load(f) # Write JSON to file with open('output.json', 'w') as f: json.dump(data, f, indent=2) # Pretty print print(json.dumps(data, indent=2, sort_keys=True)) # Compact output compact = json.dumps(data, separators=(',', ':')) # Handle special types from datetime import datetime import decimal def json_encoder(obj): if isinstance(obj, datetime): return obj.isoformat() if isinstance(obj, decimal.Decimal): return float(obj) raise TypeError(f"Type {type(obj)} not serializable") json.dumps(data, default=json_encoder) ``` #### JavaScript ```javascript // Parse JSON string const data = JSON.parse('{"name": "John", "age": 30}'); // Parse from file (Node.js) const fs = require('fs'); const data = JSON.parse(fs.readFileSync('data.json', 'utf8')); // Write JSON to file fs.writeFileSync('output.json', JSON.stringify(data, null, 2)); // Pretty print console.log(JSON.stringify(data, null, 2)); // Custom serialization const json = JSON.stringify(data, (key, value) => { if (value instanceof Date) { return value.toISOString(); } return value; }, 2); ``` #### jq (Command Line) ```bash # Pretty print cat data.json | jq '.' # Compact output cat data.json | jq -c '.' # Sort keys cat data.json | jq -S '.' # Read from file, write to file jq '.' input.json > output.json ``` ### Validation #### Python (jsonschema) ```python from jsonschema import validate, ValidationError # Define schema schema = { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "number", "minimum": 0}, "email": {"type": "string", "format": "email"} }, "required": ["name", "email"] } # Validate data data = {"name": "John", "email": "john@example.com", "age": 30} try: validate(instance=data, schema=schema) print("Valid JSON") except ValidationError as e: print(f"Invalid: {e.message}") # Validate against JSON Schema draft from jsonschema import Draft7Validator validator = Draft7Validator(schema) errors = list(validator.iter_errors(data)) for error in errors: print(f"Error at {'.'.join(str(p) for p in error.path)}: {error.message}") ``` #### JavaScript (ajv) ```javascript const Ajv = require('ajv'); const ajv = new Ajv(); const schema = { type: 'object', properties: { name: { type: 'string' }, age: { type: 'number', minimum: 0 }, email: { type: 'string', format: 'email' } }, required: ['name', 'email'] }; const validate = ajv.compile(schema); const data = { name: 'John', email: 'john@example.com', age: 30 }; if (validate(data)) { console.log('Valid JSON'); } else { console.log('Invalid:', validate.errors); } ``` ## Data Extraction and Querying ### JSONPath Queries #### Python (jsonpath-ng) ```python from jsonpath_ng import jsonpath, parse data = { "users": [ {"name": "John", "age": 30, "email": "john@example.com"}, {"name": "Jane", "age": 25, "email": "jane@example.com"} ] } # Extract all user names jsonpath_expr = parse('users[*].name') names = [match.value for match in jsonpath_expr.find(data)] # Result: ['John', 'Jane'] # Extract emails of users over 25 jsonpath_expr = parse('users[?(@.age > 25)].email') emails = [match.value for match in jsonpath_expr.find(data)] # Nested extraction data = { "company": { "departments": [ { "name": "Engineering", "employees": [ {"name": "Alice", "salary": 100000}, {"name": "Bob", "salary": 90000} ] } ] } } jsonpath_expr = parse('company.departments[*].employees[*].name') names = [match.value for match in jsonpath_expr.find(data)] ``` #### jq ```bash # Extract field echo '{"name": "John", "age": 30}' | jq '.name' # Extract from array echo '[{"name": "John"}, {"name": "Jane"}]' | jq '.[].name' # Filter array echo '[{"name": "John", "age": 30}, {"name": "Jane", "age": 25}]' | \ jq '.[] | select(.age > 25)' # Extract nested fields cat data.json | jq '.users[].email' # Multiple fields cat data.json | jq '.users[] | {name: .name, email: .email}' # Conditional extraction cat data.json | jq '.users[] | select(.age > 25) | .email' ``` ### JMESPath Queries #### Python (jmespath) ```python import jmespath data = { "users": [ {"name": "John", "age": 30, "tags": ["admin", "developer"]}, {"name": "Jane", "age": 25, "tags": ["developer"]}, {"name": "Bob", "age": 35, "tags": ["manager"]} ] } # Simple extraction names = jmespath.search('users[*].name', data) # Result: ['John', 'Jane', 'Bob'] # Filtering admins = jmespath.search('users[?contains(tags, `admin`)]', data) # Multiple conditions senior_devs = jmespath.search( 'users[?age > `28` && contains(tags, `developer`)]', data ) # Projections result = jmespath.search('users[*].{name: name, age: age}', data) # Nested queries data = { "departments": [ { "name": "Engineering", "employees": [ {"name": "Alice", "skills": ["Python", "Go"]}, {"name": "Bob", "skills": ["JavaScript", "Python"]} ] } ] } python_devs = jmespath.search( 'departments[*].employees[?contains(skills, `Python`)].name', data ) ``` ## Data Transformation ### Flattening Nested JSON #### Python ```python def flatten_json(nested_json, parent_key='', sep='.'): """ Flatten nested JSON structure """ items = [] for key, value in nested_json.items(): new_key = f"{parent_key}{sep}{key}" if parent_key else key if isinstance(value, dict): items.extend(flatten_json(value, new_key, sep=sep).items()) elif isinstance(value, list): for i, item in enumerate(value): if isinstance(item, dict): items.extend(flatten_json(item, f"{new_key}[{i}]", sep=sep).items()) else: items.append((f"{new_key}[{i}]", item)) else: items.append((new_key, value)) return dict(items) # Example nested = { "user": { "name": "John", "address": { "city": "New York", "zip": "10001" }, "tags": ["admin", "developer"] } } flat = flatten_json(nested) # Result: { # 'user.name': 'John', # 'user.address.city': 'New York', # 'user.address.zip': '10001', # 'user.tags[0]': 'admin', # 'user.tags[1]': 'developer' # } ``` #### JavaScript ```javascript function flattenJSON(obj, prefix = '', result = {}) { for (const [key, value] of Object.entries(obj)) { const newKey = prefix ? `${prefix}.${key}` : key; if (value && typeof value === 'object' && !Array.isArray(value)) { flattenJSON(value, newKey, result); } else if (Array.isArray(value)) { value.forEach((item, index) => { if (typeof item === 'object') { flattenJSON(item, `${newKey}[${index}]`, result); } else { result[`${newKey}[${index}]`] = item; } }); } else { result[newKey] = value; } } return result; } ``` ### Unflattening JSON ```python def unflatten_json(flat_json, sep='.'): """ Unflatten a flattened JSON structure """ result = {} for key, value in flat_json.items(): parts = key.split(sep) current = result for i, part in enumerate(parts[:-1]): # Handle array notation if '[' in part: array_key, index = part.split('[') index = int(index.rstrip(']')) if array_key not in current: current[array_key] = [] # Extend array if needed while len(current[array_key]) <= index: current[array_key].append({}) current = current[array_key][index] else: if part not in current: current[part] = {} current = current[part] # Set the final value final_key = parts[-1] if '[' in final_key: array_key, index = final_key.split('[') index = int(index.rstrip(']')) if array_key not in current: current[array_key] = [] while len(current[array_key]) <= index: current[array_key].append(None) current[array_key][index] = value else: current[final_key] = value return result ``` ### Merging JSON #### Python ```python def deep_merge(dict1, dict2): """ Deep merge two dictionaries """ result = dict1.copy() for key, value in dict2.items(): if key in result and isinstance(result[key], dict) and isinstance(value, dict): result[key] = deep_merge(result[key], value) else: result[key] = value return result # Example base = { "user": {"name": "John", "age": 30}, "settings": {"theme": "dark"} } override = { "user": {"age": 31, "email": "john@example.com"}, "settings": {"language": "en"} } merged = deep_merge(base, override) # Result: { # 'user': {'name': 'John', 'age': 31, 'email': 'john@example.com'}, # 'settings': {'theme': 'dark', 'language': 'en'} # } ``` #### jq ```bash # Merge two JSON files jq -s '.[0] * .[1]' file1.json file2.json # Deep merge jq -s 'reduce .[] as $item ({}; . * $item)' file1.json file2.json ``` ### Transforming Keys ```python def transform_keys(obj, transform_fn): """ Transform all keys in JSON structure """ if isinstance(obj, dict): return {transform_fn(k): transform_keys(v, transform_fn) for k, v in obj.items()} elif isinstance(obj, list): return [transform_keys(item, transform_fn) for item in obj] else: return obj # Convert to snake_case import re def to_snake_case(text): s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', text) return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower() data = { "firstName": "John", "lastName": "Doe", "userInfo": { "emailAddress": "john@example.com" } } snake_case_data = transform_keys(data, to_snake_case) # Result: { # 'first_name': 'John', # 'last_name': 'Doe', # 'user_info': {'email_address': 'john@example.com'} # } # Convert to camelCase def to_camel_case(text): components = text.split('_') return components[0] + ''.join(x.title() for x in components[1:]) ``` ## Format Conversion ### JSON to CSV #### Python ```python import json import csv import pandas as pd # Using pandas (recommended) data = [ {"name": "John", "age": 30, "email": "john@example.com"}, {"name": "Jane", "age": 25, "email": "jane@example.com"} ] df = pd.DataFrame(data) df.to_csv('output.csv', index=False) # Using csv module with open('output.csv', 'w', newline='') as csvfile: if data: writer = csv.DictWriter(csvfile, fieldnames=data[0].keys()) writer.writeheader() writer.writerows(data) # Handle nested JSON def flatten_for_csv(data): """Flatten nested JSON for CSV export""" if isinstance(data, list): return [flatten_json(item) for item in data] return flatten_json(data) flattened = flatten_for_csv(data) pd.DataFrame(flattened).to_csv('output.csv', index=False) ``` #### jq ```bash # Convert JSON array to CSV cat data.json | jq -r '.[] | [.name, .age, .email] | @csv' # With headers cat data.json | jq -r '["name", "age", "email"], (.[] | [.name, .age, .email]) | @csv' ``` ### JSON to YAML #### Python ```python import json import yaml # JSON to YAML with open('data.json', 'r') as json_file: data = json.load(json_file) with open('data.yaml', 'w') as yaml_file: yaml.dump(data, yaml_file, default_flow_style=False) # YAML to JSON with open('data.yaml', 'r') as yaml_file: data = yaml.safe_load(yaml_file) with open('data.json', 'w') as json_file: json.dump(data, json_file, indent=2) ``` ### JSON to XML #### Python ```python import json import xml.etree.ElementTree as ET def json_to_xml(json_obj, root_name='root'): """Convert JSON to XML""" def build_xml(parent, obj): if isinstance(obj, dict): for key, val in obj.items(): elem = ET.SubElement(parent, key) build_xml(elem, val) elif isinstance(obj, list): for item in obj: elem = ET.SubElement(parent, 'item') build_xml(elem, item) else: parent.text = str(obj) root = ET.Element(root_name) build_xml(root, json_obj) return ET.tostring(root, encoding='unicode') # Example data = {"user": {"name": "John", "age": 30}} xml_string = json_to_xml(data) ``` ## Advanced Transformations ### jq-Style Transformations #### Python (pyjq) ```python import pyjq data = { "users": [ {"name": "John", "age": 30, "city": "New York"}, {"name": "Jane", "age": 25, "city": "San Francisco"}, {"name": "Bob", "age": 35, "city": "New York"} ] } # Select and transform result = pyjq.all('.users[] | {name, age}', data) # Filter and group result = pyjq.all('group_by(.city) | map({city: .[0].city, count: length})', data) # Complex transformation result = pyjq.all(''' .users | map(select(.age > 25)) | sort_by(.age) | reverse ''', data) ``` #### jq Examples ```bash # Map over array echo '[1,2,3,4,5]' | jq 'map(. * 2)' # Filter and transform cat users.json | jq '.users | map(select(.age > 25) | {name, email})' # Group by field cat data.json | jq 'group_by(.category) | map({category: .[0].category, count: length})' # Calculate sum cat orders.json | jq '[.[] | .amount] | add' # Create new structure cat users.json | jq '{ total: length, users: [.[] | {name, email}], avgAge: ([.[] | .age] | add / length) }' # Conditional logic cat data.json | jq '.[] | if .status == "active" then .name else empty end' ``` ### Complex Restructuring ```python def restructure_json(data): """ Example: Transform flat user records into hierarchical structure """ # Input: [ # {"userId": 1, "name": "John", "orderId": 101, "product": "A"}, # {"userId": 1, "name": "John", "orderId": 102, "product": "B"}, # {"userId": 2, "name": "Jane", "orderId": 103, "product": "C"} # ] # Output: [ # { # "userId": 1, # "name": "John", # "orders": [ # {"orderId": 101, "product": "A"}, # {"orderId": 102, "product": "B"} # ] # }, # { # "userId": 2, # "name": "Jane", # "orders": [{"orderId": 103, "product": "C"}] # } # ] from collections import defaultdict users = defaultdict(lambda: {"orders": []}) for record in data: user_id = record["userId"] if "name" not in users[user_id]: users[user_id]["userId"] = user_id users[user_id]["name"] = record["name"] users[user_id]["orders"].append({ "orderId": record["orderId"], "product": record["product"] }) return list(users.values()) ``` ### Array Operations ```python import json def unique_by_key(array, key): """Remove duplicates based on key""" seen = set() result = [] for item in array: value = item.get(key) if value not in seen: seen.add(value) result.append(item) return result def sort_by_key(array, key, reverse=False): """Sort array by key""" return sorted(array, key=lambda x: x.get(key, ''), reverse=reverse) def group_by_key(array, key): """Group array elements by key""" from collections import defaultdict groups = defaultdict(list) for item in array: groups[item.get(key)].append(item) return dict(groups) # Example usage users = [ {"name": "John", "age": 30, "city": "New York"}, {"name": "Jane", "age": 25, "city": "San Francisco"}, {"name": "Bob", "age": 35, "city": "New York"}, {"name": "Alice", "age": 28, "city": "San Francisco"} ] # Sort by age sorted_users = sort_by_key(users, 'age') # Group by city by_city = group_by_key(users, 'city') ``` ## JSON Diff and Comparison ```python import json from deepdiff import DeepDiff def json_diff(obj1, obj2): """Compare two JSON objects and return differences""" diff = DeepDiff(obj1, obj2, ignore_order=True) return diff # Example old = { "name": "John", "age": 30, "addresses": [{"city": "New York"}] } new = { "name": "John", "age": 31, "addresses": [{"city": "San Francisco"}] } diff = json_diff(old, new) print(json.dumps(diff, indent=2)) # Manual diff def simple_diff(obj1, obj2, path=""): """Simple diff implementation""" diffs = [] if type(obj1) != type(obj2): diffs.append(f"{path}: type changed from {type(obj1)} to {type(obj2)}") return diffs if isinstance(obj1, dict): all_keys = set(obj1.keys()) | set(obj2.keys()) for key in all_keys: new_path = f"{path}.{key}" if path else key if key not in obj1: diffs.append(f"{new_path}: added") elif key not in obj2: diffs.append(f"{new_path}: removed") elif obj1[key] != obj2[key]: diffs.extend(simple_diff(obj1[key], obj2[key], new_path)) elif isinstance(obj1, list): if len(obj1) != len(obj2): diffs.append(f"{path}: length changed from {len(obj1)} to {len(obj2)}") for i, (item1, item2) in enumerate(zip(obj1, obj2)): diffs.extend(simple_diff(item1, item2, f"{path}[{i}]")) elif obj1 != obj2: diffs.append(f"{path}: changed from {obj1} to {obj2}") return diffs ``` ## Schema Generation ```python def generate_schema(data, name="root"): """ Generate JSON Schema from data """ if isinstance(data, dict): properties = {} required = [] for key, value in data.items(): properties[key] = generate_schema(value, key) if value is not None: required.append(key) schema = { "type": "object", "properties": properties } if required: schema["required"] = required return schema elif isinstance(data, list): if data: return { "type": "array", "items": generate_schema(data[0], name) } return {"type": "array"} elif isinstance(data, bool): return {"type": "boolean"} elif isinstance(data, int): return {"type": "integer"} elif isinstance(data, float): return {"type": "number"} elif isinstance(data, str): return {"type": "string"} elif data is None: return {"type": "null"} return {} # Example sample_data = { "name": "John", "age": 30, "email": "john@example.com", "active": True, "tags": ["developer", "admin"], "address": { "city": "New York", "zip": "10001" } } schema = generate_schema(sample_data) print(json.dumps(schema, indent=2)) ``` ## Utility Functions ### Pretty Print with Colors ```python from pygments import highlight from pygments.lexers import JsonLexer from pygments.formatters import TerminalFormatter def pretty_print_json(data): """Print JSON with syntax highlighting""" json_str = json.dumps(data, indent=2, sort_keys=True) print(highlight(json_str, JsonLexer(), TerminalFormatter())) ``` ### Safe Access with Default Values ```python def safe_get(data, path, default=None): """ Safely get nested value from JSON path: "user.address.city" or ["user", "address", "city"] """ if isinstance(path, str): path = path.split('.') current = data for key in path: if isinstance(current, dict): current = current.get(key) elif isinstance(current, list) and key.isdigit(): index = int(key) current = current[index] if 0 <= index < len(current) else None else: return default if current is None: return default return current # Example data = {"user": {"address": {"city": "New York"}}} city = safe_get(data, "user.address.city") # "New York" country = safe_get(data, "user.address.country", "Unknown") # "Unknown" ``` ## Command Line Tools ### Using jq ```bash # Format JSON cat messy.json | jq '.' # Extract specific fields cat data.json | jq '.users[] | {name, email}' # Filter arrays cat data.json | jq '.[] | select(.age > 30)' # Transform keys to lowercase cat data.json | jq 'with_entries(.key |= ascii_downcase)' # Merge multiple JSON files jq -s 'add' file1.json file2.json file3.json # Convert to CSV cat data.json | jq -r '.[] | [.name, .age, .email] | @csv' ``` ### Using Python (command line) ```bash # Pretty print python -m json.tool input.json # Compact output python -c "import json; print(json.dumps(json.load(open('data.json')), separators=(',',':')))" # Extract field python -c "import json; data=json.load(open('data.json')); print(data['users'][0]['name'])" ``` ## Best Practices 1. **Always validate JSON** before processing 2. **Use schema validation** for API contracts 3. **Handle errors gracefully** (malformed JSON) 4. **Use appropriate libraries** (jq, jmespath, jsonpath) 5. **Preserve data types** during transformations 6. **Document complex transformations** 7. **Use version control** for schema definitions 8. **Test transformations** with edge cases 9. **Consider memory usage** for large files 10. **Use streaming parsers** for very large JSON ## Common Patterns ### API Response Transformation ```python def transform_api_response(response): """Transform API response to application format""" return { "users": [ { "id": user["userId"], "name": f"{user['firstName']} {user['lastName']}", "email": user["emailAddress"], "active": user["status"] == "active" } for user in response.get("data", {}).get("users", []) ], "pagination": { "page": response.get("page", 1), "total": response.get("totalResults", 0) } } ``` ### Configuration Merging ```python def merge_configs(base_config, user_config): """Merge user configuration with base configuration""" result = deep_merge(base_config, user_config) # Validate required fields required = ["database", "api_key"] for field in required: if field not in result: raise ValueError(f"Missing required field: {field}") return result ``` ## Notes - Always handle edge cases (null, empty arrays, missing keys) - Use appropriate tools for the job (jq for CLI, pandas for data science) - Consider performance for large JSON files - Validate schemas in production environments - Keep transformations idempotent when possible - Document expected JSON structure - Use TypeScript/JSON Schema for type safety