--- namespace: aiwg platforms: [all] name: Integrity Verification description: SHA-256 checksum manifest generation, self-verification, and PREMIS fixity patterns category: media-curator --- # Integrity Verification Cryptographic checksum verification patterns for detecting bit rot, tampering, and transfer errors in media archives. Implements self-verifying manifests with PREMIS fixity metadata. ## Manifest Generation Script Complete bash implementation for generating self-verifying checksum manifests: ```bash #!/bin/bash set -euo pipefail # Archive Checksum Manifest Generator # Generates self-verifying SHA-256 checksum manifest # Usage: ./generate-checksums.sh /path/to/archive ARCHIVE_PATH="${1:-.}" CHECKSUM_FILE="CHECKSUMS.sha256" TEMP_FILE="/tmp/checksums-$$.tmp" # Validate archive exists if [ ! -d "$ARCHIVE_PATH" ]; then echo "Error: Archive directory not found: $ARCHIVE_PATH" >&2 exit 1 fi cd "$ARCHIVE_PATH" echo "Generating checksums for: $ARCHIVE_PATH" # Find all files, exclude checksum manifest itself # Use null-terminated strings for handling filenames with spaces find . -type f ! -name "$CHECKSUM_FILE" -print0 | \ sort -z | \ xargs -0 sha256sum > "$TEMP_FILE" # Count files FILE_COUNT=$(wc -l < "$TEMP_FILE") echo "Found $FILE_COUNT files" # Compute manifest hash (hash of the checksum content) MANIFEST_HASH=$(sha256sum "$TEMP_FILE" | awk '{print $1}') # Generate timestamp (ISO 8601 UTC with nanosecond precision) TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ) # Write final manifest with self-verifying header { echo "# MANIFEST_HASH: $MANIFEST_HASH" echo "# Generated: $TIMESTAMP" echo "# Verify with: tail -n +4 $CHECKSUM_FILE | sha256sum" cat "$TEMP_FILE" } > "$CHECKSUM_FILE" # Clean up rm "$TEMP_FILE" echo "✓ Generated $CHECKSUM_FILE" echo " Manifest hash: $MANIFEST_HASH" echo " Timestamp: $TIMESTAMP" echo " Files: $FILE_COUNT" ``` **Key features**: - Handles filenames with spaces via null-terminated strings (`-print0`, `-z`, `-0`) - Deterministic output (sorted by path) - Self-verifying header with manifest hash - ISO 8601 UTC timestamps with nanosecond precision - Exit on error (`set -euo pipefail`) ## Verification Commands ### Quick Manifest Integrity Check Verify manifest has not been tampered with (sub-second): ```bash #!/bin/bash # Quick verification - manifest integrity only CHECKSUM_FILE="CHECKSUMS.sha256" if [ ! -f "$CHECKSUM_FILE" ]; then echo "✗ Checksum manifest not found: $CHECKSUM_FILE" >&2 exit 1 fi # Extract expected hash from header EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}') if [ -z "$EXPECTED" ]; then echo "✗ Manifest header missing or malformed" >&2 exit 1 fi # Compute actual hash of manifest content (lines 4+) ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}') # Compare if [ "$EXPECTED" = "$ACTUAL" ]; then echo "✓ Manifest integrity verified" echo " Hash: $EXPECTED" exit 0 else echo "✗ Manifest has been tampered with" >&2 echo " Expected: $EXPECTED" >&2 echo " Actual: $ACTUAL" >&2 exit 1 fi ``` **Use case**: Daily automated checks. Fast execution regardless of archive size. **Exit codes**: - `0` - Manifest integrity verified - `1` - Manifest corrupted or tampered ### Full File Verification Verify all files match their checksums: ```bash #!/bin/bash # Full verification - manifest integrity + all files CHECKSUM_FILE="CHECKSUMS.sha256" # Step 1: Verify manifest integrity echo "Step 1: Verifying manifest integrity..." EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}') ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}') if [ "$EXPECTED" != "$ACTUAL" ]; then echo "✗ Manifest integrity check failed - stopping" >&2 exit 1 fi echo "✓ Manifest integrity verified" # Step 2: Verify all files echo "Step 2: Verifying all files..." if tail -n +4 "$CHECKSUM_FILE" | sha256sum -c; then echo "✓ All files verified successfully" exit 0 else echo "✗ One or more files failed verification" >&2 exit 1 fi ``` **Output format** (from `sha256sum -c`): ``` ./audio/episode-001.opus: OK ./audio/episode-002.opus: OK ./video/recording.mp4: FAILED sha256sum: WARNING: 1 computed checksum did NOT match ``` **Exit codes**: - `0` - All files verified successfully - `1` - Verification failed (manifest or files) ### Quiet Mode Show only failures: ```bash #!/bin/bash # Quiet verification - only show failures CHECKSUM_FILE="CHECKSUMS.sha256" # Quick manifest check (silent) EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}') ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}') if [ "$EXPECTED" != "$ACTUAL" ]; then echo "MANIFEST: FAILED" >&2 exit 1 fi # Verify files (quiet mode - only show failures) tail -n +4 "$CHECKSUM_FILE" | sha256sum -c --quiet EXIT_CODE=$? if [ $EXIT_CODE -eq 0 ]; then # Silent success exit 0 else # sha256sum already printed failures to stderr exit 1 fi ``` **Use case**: Cron jobs, automated monitoring, CI/CD pipelines. **Output**: Nothing on success, only failed files on failure. ## Regeneration After Changes Script for regenerating manifest after archive modifications: ```bash #!/bin/bash set -euo pipefail # Regenerate checksum manifest after archive changes # Usage: ./fix-checksums.sh /path/to/archive ARCHIVE_PATH="${1:-.}" CHECKSUM_FILE="CHECKSUMS.sha256" BACKUP_FILE="CHECKSUMS.sha256.bak" TEMP_FILE="/tmp/checksums-$$.tmp" cd "$ARCHIVE_PATH" # Backup existing manifest if [ -f "$CHECKSUM_FILE" ]; then cp "$CHECKSUM_FILE" "$BACKUP_FILE" echo "Backed up existing manifest to $BACKUP_FILE" fi # Generate new manifest echo "Regenerating checksums..." find . -type f ! -name "$CHECKSUM_FILE" ! -name "$BACKUP_FILE" -print0 | \ sort -z | \ xargs -0 sha256sum > "$TEMP_FILE" FILE_COUNT=$(wc -l < "$TEMP_FILE") MANIFEST_HASH=$(sha256sum "$TEMP_FILE" | awk '{print $1}') TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ) { echo "# MANIFEST_HASH: $MANIFEST_HASH" echo "# Generated: $TIMESTAMP" echo "# Verify with: tail -n +4 $CHECKSUM_FILE | sha256sum" cat "$TEMP_FILE" } > "$CHECKSUM_FILE" rm "$TEMP_FILE" # Detect changes if [ -f "$BACKUP_FILE" ]; then echo "" echo "Changes detected:" # Extract file paths from old and new manifests tail -n +4 "$BACKUP_FILE" | awk '{print $2}' | sort > /tmp/old-files-$$.txt tail -n +4 "$CHECKSUM_FILE" | awk '{print $2}' | sort > /tmp/new-files-$$.txt # Added files ADDED=$(comm -13 /tmp/old-files-$$.txt /tmp/new-files-$$.txt) if [ -n "$ADDED" ]; then echo " Added:" echo "$ADDED" | sed 's/^/ /' fi # Removed files REMOVED=$(comm -23 /tmp/old-files-$$.txt /tmp/new-files-$$.txt) if [ -n "$REMOVED" ]; then echo " Removed:" echo "$REMOVED" | sed 's/^/ /' fi # Modified files (different hash for same path) # This requires comparing hashes, not just paths COMMON_FILES=$(comm -12 /tmp/old-files-$$.txt /tmp/new-files-$$.txt) if [ -n "$COMMON_FILES" ]; then while IFS= read -r file; do OLD_HASH=$(grep -F "$file" "$BACKUP_FILE" | awk '{print $1}') NEW_HASH=$(grep -F "$file" "$CHECKSUM_FILE" | awk '{print $1}') if [ "$OLD_HASH" != "$NEW_HASH" ]; then echo " Modified: $file" fi done <<< "$COMMON_FILES" fi rm /tmp/old-files-$$.txt /tmp/new-files-$$.txt fi echo "" echo "✓ Generated new $CHECKSUM_FILE" echo " Manifest hash: $MANIFEST_HASH" echo " Files: $FILE_COUNT" ``` **Features**: - Backs up existing manifest to `.bak` file - Regenerates checksums for all current files - Reports added, removed, and modified files - Preserves backup for comparison ## VERIFY.md Template Human-readable instructions placed in archive root: ```markdown # Archive Integrity Verification This archive contains a self-verifying checksum manifest (`CHECKSUMS.sha256`) for detecting corruption, tampering, or transfer errors. ## Archive Information - **Generated**: {TIMESTAMP} - **Total files**: {FILE_COUNT} - **Total size**: {TOTAL_SIZE} - **Manifest hash**: {MANIFEST_HASH} ## Quick Verification (30 seconds) Verify the manifest has not been tampered with: \`\`\`bash EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}') ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}') [ "$EXPECTED" = "$ACTUAL" ] && echo "✓ Verified" || echo "✗ Tampered" \`\`\` ## Full Verification (10-60 minutes) Verify all files match their checksums: \`\`\`bash tail -n +4 CHECKSUMS.sha256 | sha256sum -c \`\`\` ## Recommended Schedule | Frequency | Verification Type | Purpose | |-----------|-------------------|---------| | Daily | Quick (manifest only) | Detect tampering | | Weekly | Full (all files) | Detect bit rot | | After transfer | Full | Verify transfer integrity | | Before backup | Full | Ensure source integrity | ## Automated Monitoring Add to crontab for daily verification: \`\`\`cron # Daily quick check at 2am 0 2 * * * cd /path/to/archive && tail -n +4 CHECKSUMS.sha256 | sha256sum -c --quiet || echo "Verification failed" | mail -s "Archive Alert" admin@example.com \`\`\` ## If Verification Fails ### Manifest Integrity Failure The manifest itself has been corrupted or tampered with. **Recovery**: 1. Restore `CHECKSUMS.sha256` from backup 2. If no backup, regenerate manifest (see below) ### File Verification Failure One or more files have been corrupted or modified. **Identify failures**: \`\`\`bash tail -n +4 CHECKSUMS.sha256 | sha256sum -c 2>&1 | grep FAILED \`\`\` **Recovery**: 1. Restore failed files from backup 2. If intentional modification, regenerate manifest ## Regenerate Manifest After making changes to the archive (add/remove/modify files): \`\`\`bash # Backup existing manifest cp CHECKSUMS.sha256 CHECKSUMS.sha256.bak # Regenerate find . -type f ! -name "CHECKSUMS.sha256" -print0 | \\ sort -z | \\ xargs -0 sha256sum > /tmp/checksums.tmp MANIFEST_HASH=$(sha256sum /tmp/checksums.tmp | awk '{print $1}') TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ) { echo "# MANIFEST_HASH: $MANIFEST_HASH" echo "# Generated: $TIMESTAMP" echo "# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum" cat /tmp/checksums.tmp } > CHECKSUMS.sha256 rm /tmp/checksums.tmp \`\`\` ## Technical Details - **Hash algorithm**: SHA-256 (NIST FIPS 180-4) - **Timestamp format**: ISO 8601 UTC with nanosecond precision - **Self-verification**: Manifest hash prevents undetected tampering - **Coverage**: All files except `CHECKSUMS.sha256` itself ## Support For questions or issues: - Documentation: https://aiwg.io/media-curator - Issues: https://github.com/jmagly/aiwg/issues - Command reference: `aiwg verify-archive --help` --- *Generated by AIWG Media Curator v{VERSION}* ``` **Template variables**: - `{TIMESTAMP}` - ISO 8601 generation timestamp - `{FILE_COUNT}` - Number of files in manifest - `{TOTAL_SIZE}` - Archive size (e.g., "4.2 GB") - `{MANIFEST_HASH}` - SHA-256 hash of manifest - `{VERSION}` - AIWG version ## Scheduled Verification ### Cron Pattern for Daily Checks ```cron # Quick verification daily at 2am 0 2 * * * cd /media/archives/podcast && /usr/local/bin/aiwg verify-archive . --verify --quiet || echo "Archive verification failed: $(pwd)" | mail -s "Archive Alert" admin@example.com # Full verification weekly on Sunday at 3am 0 3 * * 0 cd /media/archives/podcast && /usr/local/bin/aiwg verify-archive . --verify 2>&1 | mail -s "Weekly Archive Verification" admin@example.com ``` ### Systemd Timer **Service file** (`/etc/systemd/system/archive-verify.service`): ```ini [Unit] Description=Verify media archive checksums After=network.target [Service] Type=oneshot User=media WorkingDirectory=/media/archives/podcast ExecStart=/usr/local/bin/aiwg verify-archive . --verify --quiet StandardOutput=journal StandardError=journal ``` **Timer file** (`/etc/systemd/system/archive-verify.timer`): ```ini [Unit] Description=Daily archive verification [Timer] OnCalendar=daily Persistent=true [Install] WantedBy=timers.target ``` **Enable timer**: ```bash sudo systemctl enable archive-verify.timer sudo systemctl start archive-verify.timer ``` ### Monitoring Integration **Prometheus exporter pattern**: ```bash #!/bin/bash # Export verification status as Prometheus metrics CHECKSUM_FILE="CHECKSUMS.sha256" METRICS_FILE="/var/lib/prometheus/node_exporter/archive_integrity.prom" # Quick verification EXPECTED=$(grep '^# MANIFEST_HASH:' "$CHECKSUM_FILE" | awk '{print $3}') ACTUAL=$(tail -n +4 "$CHECKSUM_FILE" | sha256sum | awk '{print $1}') if [ "$EXPECTED" = "$ACTUAL" ]; then MANIFEST_OK=1 else MANIFEST_OK=0 fi # Write metrics cat > "$METRICS_FILE" <` **Format**: `<64-char-hex> ` ### PREMIS 3.0 Fixity **Purpose**: Record fixity information for digital preservation. **Key elements**: - `messageDigest` - Hash value with algorithm prefix (e.g., `sha256:abc123...`) - `messageDigestAlgorithm` - Algorithm name (`SHA-256`) - `messageDigestOriginator` - Software that computed hash - `fixityCheckDateTime` - When fixity was verified **Use case**: Long-term digital preservation requiring audit trails. ### W3C PROV-O **Entity-Activity-Agent model**: - **Entity**: Archive or checksum manifest - **Activity**: Checksum generation or verification - **Agent**: Software (AIWG Media Curator) that performed activity **Key relationships**: - `wasGeneratedBy` - Entity generated by activity - `used` - Activity used entity - `wasAssociatedWith` - Activity performed by agent - `wasAttributedTo` - Entity attributed to agent ### ISO 8601 Timestamps **Format**: `YYYY-MM-DDTHH:MM:SS.NNNNNNNNNZ` **Requirements**: - Always UTC (trailing `Z`) - Nanosecond precision (9 decimal places) - Monotonic (lexicographically sortable) **Bash command**: `date -u +%Y-%m-%dT%H:%M:%S.%NZ` **Example**: `2026-02-14T18:45:22.387654321Z` ## Performance Characteristics | Operation | Time Complexity | Example Duration (100 GB archive) | |-----------|-----------------|-----------------------------------| | Quick verify (manifest only) | O(1) | < 1 second | | Full verify (all files) | O(n × file_size) | 10-60 minutes (disk-bound) | | Generate manifest | O(n × file_size) | 10-60 minutes (disk-bound) | **Optimization tips**: - Use SSD storage for faster checksumming - Enable parallel hashing for multi-core systems (GNU parallel) - Quick verification detects 99% of tampering instantly - Schedule full verification during off-peak hours ## Error Scenarios | Error | Cause | Recovery | |-------|-------|----------| | Manifest integrity failure | Manifest file corrupted/tampered | Restore from backup or regenerate | | File verification failure | File corrupted or modified | Restore file from backup | | Missing manifest | New archive or manifest deleted | Generate new manifest | | Permission denied | Cannot read files | Fix permissions, run as appropriate user | | Disk full | Cannot write manifest | Free disk space | | Hash mismatch | File changed since manifest generated | Regenerate manifest if intentional | ## Integration Examples ### Git Pre-Commit Hook Verify archive integrity before committing: ```bash #!/bin/bash # .git/hooks/pre-commit if [ -f CHECKSUMS.sha256 ]; then echo "Verifying archive integrity..." if ! aiwg verify-archive . --verify --quiet; then echo "Error: Archive verification failed" >&2 echo "Run 'aiwg verify-archive . --fix' to regenerate checksums" >&2 exit 1 fi echo "✓ Archive integrity verified" fi ``` ### CI/CD Pipeline GitHub Actions workflow: ```yaml name: Verify Archive Integrity on: [push, pull_request] jobs: verify: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Install AIWG run: npm install -g aiwg - name: Verify archive run: aiwg verify-archive media/archives/podcast --verify ``` ### Backup Verification Verify backup integrity after rsync: ```bash #!/bin/bash # Backup and verify script SOURCE="/media/source/podcast" DEST="/media/backup/podcast" # Sync files rsync -av --delete "$SOURCE/" "$DEST/" # Verify destination cd "$DEST" if aiwg verify-archive . --verify --quiet; then echo "✓ Backup verified successfully" else echo "✗ Backup verification failed" >&2 exit 1 fi ``` ## References - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek explicit authorization before regenerating checksums (overwrites existing CHECKSUMS.sha256) - @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/acquire/SKILL.md — Acquisition skill that invokes integrity verification via --verify-after flag - @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Provenance tracking that records fixity events alongside checksums - @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/verify-archive/SKILL.md — High-level verify-archive skill that uses these integrity patterns