# Data Format & Technical Specifications Technical documentation for data structures, file formats, and storage specifications. ## File Structure ``` memcord/ ├── memory_slots/ # Internal JSON storage with enhanced metadata │ ├── project_alpha.json # With tags, groups, descriptions, compression │ ├── meeting_notes.json # Organized, searchable, optimized storage │ └── ... ├── archives/ # Archived memory slots (compressed) │ ├── index.json # Archive index with metadata │ ├── old_project_archived.json │ └── ... ├── shared_memories/ # Exported files │ ├── project_alpha.md │ ├── project_alpha.txt │ └── ... ├── docs/ # Documentation │ ├── PRD.md # Product Requirements Document │ ├── installation.md # Installation guide │ ├── tools-reference.md # Complete tools documentation │ ├── search-and-query.md # Search & AI features │ ├── data-format.md # This file │ ├── troubleshooting.md # Support documentation │ └── examples.md # Usage examples └── src/memcord/ # Source code ├── server.py # Main MCP server with 23 tools ├── storage.py # Enhanced storage with compression/archive support ├── models.py # Enhanced data models with compression metadata ├── compression.py # Content compression utilities ├── archival.py # Archival system for long-term storage ├── search.py # Search engine with TF-IDF scoring ├── query.py # Natural language query processing └── summarizer.py # Text summarization ``` ## Memory Slot Data Structure ### Enhanced Memory Slot JSON Schema ```json { "slot_name": "project_alpha", "created_at": "2024-01-01T12:00:00Z", "updated_at": "2024-01-01T15:30:00Z", "tags": ["project", "urgent", "backend"], "group_path": "projects/alpha/development", "description": "Development discussions for Project Alpha", "priority": 1, "is_archived": false, "archived_at": null, "archive_reason": null, "metadata": { "total_entries": 5, "total_characters": 12500, "last_summary_at": "2024-01-01T14:00:00Z", "search_keywords": ["api", "database", "authentication"], "contributors": ["alice", "bob"], "project_phase": "development" }, "entries": [ { "id": "entry_001", "type": "manual_save", "content": "Full conversation text...", "timestamp": "2024-01-01T12:00:00Z", "metadata": { "word_count": 850, "participant_count": 3, "topics": ["api_design", "authentication"], "decisions": ["use_jwt_tokens", "implement_rate_limiting"] }, "compression_info": { "is_compressed": false, "algorithm": "none", "original_size": null, "compressed_size": null, "compression_ratio": null, "compressed_at": null } }, { "id": "entry_002", "type": "auto_summary", "content": "Summary of key points...", "timestamp": "2024-01-01T12:30:00Z", "original_length": 5000, "summary_length": 750, "compression_ratio": 0.15, "metadata": { "summary_type": "progressive", "key_points": 5, "decisions_count": 2, "action_items": 3 }, "compression_info": { "is_compressed": true, "algorithm": "gzip", "original_size": 5000, "compressed_size": 750, "compression_ratio": 0.15, "compressed_at": "2024-01-01T12:30:00Z" } } ] } ``` ### Field Specifications #### Root Level Fields - **slot_name**: Unique identifier for the memory slot - **created_at**: ISO 8601 timestamp of creation - **updated_at**: ISO 8601 timestamp of last modification - **tags**: Array of lowercase strings for categorization - **group_path**: Hierarchical path using forward slashes - **description**: Human-readable description of the slot's purpose - **priority**: Integer (1-5) indicating importance - **metadata**: Object containing aggregate information - **entries**: Array of individual memory entries #### Entry Types - **manual_save**: Exact conversation text saved manually - **auto_summary**: AI-generated summary with compression - **import**: Content imported from external sources - **merge**: Combined content from multiple slots #### Metadata Schema ```json { "total_entries": "number", "total_characters": "number", "last_summary_at": "ISO 8601 timestamp", "search_keywords": ["array", "of", "strings"], "contributors": ["array", "of", "participants"], "project_phase": "string", "custom_fields": { "arbitrary": "key-value pairs" } } ``` ## Export Formats ### Markdown Format (.md) ```markdown # Memory Slot: project_alpha **Created:** 2024-01-01T12:00:00Z **Updated:** 2024-01-01T15:30:00Z **Tags:** project, urgent, backend **Group:** projects/alpha/development ## Description Development discussions for Project Alpha ## Entries ### Entry 1 - Manual Save (2024-01-01T12:00:00Z) Full conversation text... ### Entry 2 - Auto Summary (2024-01-01T12:30:00Z) *Summary (15% compression: 5000 → 750 characters)* Summary of key points... ``` ### Plain Text Format (.txt) ``` MEMORY SLOT: project_alpha ================================ Created: 2024-01-01T12:00:00Z Updated: 2024-01-01T15:30:00Z Tags: project, urgent, backend Group: projects/alpha/development Description: Development discussions for Project Alpha ENTRIES ------- [2024-01-01T12:00:00Z] Manual Save: Full conversation text... [2024-01-01T12:30:00Z] Auto Summary (15% compression): Summary of key points... ``` ### JSON Format (.json) Complete memory slot data structure as shown above, with all metadata preserved. ## Search Index Structure ### Inverted Index Format ```json { "term": "database", "documents": [ { "slot_name": "project_alpha", "entry_id": "entry_001", "tf": 0.05, "positions": [45, 123, 289], "context": "...database migration plan..." } ], "idf": 2.1, "total_frequency": 15 } ``` ### Search Result Format ```json { "query": "database migration", "total_results": 3, "execution_time_ms": 12, "results": [ { "slot_name": "project_alpha", "entry_id": "entry_001", "score": 0.85, "snippet": "...discussing the database migration plan...", "tags": ["project", "backend"], "group_path": "projects/alpha/development", "timestamp": "2024-01-01T12:00:00Z", "matched_terms": ["database", "migration"], "highlights": [ {"start": 45, "end": 53, "term": "database"}, {"start": 54, "end": 63, "term": "migration"} ] } ] } ``` ## MCP File Resources ### Resource URI Format - `memory://slot_name.md` - Markdown export - `memory://slot_name.txt` - Plain text export - `memory://slot_name.json` - Full JSON data ### Resource Metadata ```json { "uri": "memory://project_alpha.md", "name": "project_alpha.md", "description": "Memory slot: project_alpha (Markdown format)", "mimeType": "text/markdown", "size": 2048, "lastModified": "2024-01-01T15:30:00Z", "tags": ["project", "urgent", "backend"], "group": "projects/alpha/development" } ``` ## Archive Storage Structure ### Archive Index Format ```json { "created_at": "2024-01-01T12:00:00Z", "updated_at": "2024-01-01T15:30:00Z", "total_archives": 3, "total_original_size": 45000, "total_archived_size": 12000, "entries": { "old_project": { "slot_name": "old_project", "original_path": "memory_slots/old_project.json", "archive_path": "archives/old_project_archived.json", "archived_at": "2024-01-01T15:30:00Z", "archive_reason": "project_completed", "original_size": 15000, "archived_size": 4000, "compression_ratio": 0.27, "last_accessed": "2023-12-01T10:00:00Z", "entry_count": 12, "tags": ["project", "completed"], "group_path": "projects/legacy" } } } ``` ### Archived Memory Slot Format ```json { "slot_name": "old_project", "created_at": "2023-01-01T12:00:00Z", "updated_at": "2023-12-01T10:00:00Z", "is_archived": true, "archived_at": "2024-01-01T15:30:00Z", "archive_reason": "project_completed", "tags": ["project", "completed"], "group_path": "projects/legacy", "entries": [ { "type": "manual_save", "content": "H4sIAAAAAAACA+y9B3QUVRY...==", // Base64-encoded gzip-compressed content "timestamp": "2023-01-01T12:00:00Z", "compression_info": { "is_compressed": true, "algorithm": "gzip", "original_size": 5000, "compressed_size": 1200, "compression_ratio": 0.24, "compressed_at": "2024-01-01T15:30:00Z" } } ] } ``` ## Storage Implementation ### File System Layout ``` memory_slots/ ├── index.json # Global index of all slots ├── tags.json # Tag usage tracking ├── groups.json # Group hierarchy ├── project_alpha.json # Individual slot files ├── meeting_notes.json └── ... archives/ ├── index.json # Archive index with metadata ├── old_project_archived.json # Compressed archived slots ├── legacy_data_archived.json └── ... shared_memories/ ├── project_alpha.md # Exported markdown ├── project_alpha.txt # Exported text ├── project_alpha.json # Exported JSON └── ... .cache/ ├── search_index.json # Search index cache ├── term_frequencies.json # TF-IDF data └── query_cache/ # Cached query results ├── query_hash_1.json └── ... ``` ### Atomic Operations - **File writes**: Use temporary files with atomic rename - **Index updates**: Batch operations with rollback capability - **Cache invalidation**: Automatic cleanup on data changes - **Backup creation**: Automatic versioning of critical files ## Data Validation ### Schema Validation Rules - **slot_name**: Required, alphanumeric + underscore/hyphen - **timestamps**: Must be valid ISO 8601 format - **tags**: Array of strings, lowercase, no spaces - **group_path**: Forward slash separated, no leading/trailing slashes - **compression_ratio**: Float between 0.05 and 0.5 - **priority**: Integer between 1 and 5 ### Content Sanitization - **HTML stripping**: Remove potentially harmful HTML tags - **Encoding validation**: Ensure UTF-8 compliance - **Size limits**: Configurable maximum entry sizes - **Type checking**: Validate entry types against allowed values ## Performance Characteristics ### Storage Efficiency - **Content compression**: Gzip compression for entries over 1KB threshold - **Archive compression**: Additional compression for long-term storage - **Incremental updates**: Only modified fields are rewritten - **Index optimization**: Periodic cleanup and rebuilding - **Memory usage**: Lazy loading of large memory slots - **Space savings**: 30-70% reduction with intelligent compression ### Search Performance - **Index size**: ~10% of total content size - **Query time**: Sub-second for most queries - **Memory usage**: Bounded by configurable limits - **Concurrent access**: Thread-safe read operations ## Migration & Compatibility ### Version Compatibility - **Schema versioning**: Automatic migration between versions - **Backward compatibility**: Support for older formats - **Forward compatibility**: Graceful handling of newer fields - **Data integrity**: Validation during migration ### Export/Import - **Standard formats**: JSON, Markdown, Plain text - **Metadata preservation**: Full fidelity in JSON exports - **Bulk operations**: Efficient handling of large datasets - **Error handling**: Robust recovery from partial failures