--- name: data-retention-archiving-planner description: Plans and implements data retention policies with archival strategies, compliance requirements, automated cleanup jobs, and cold storage migration. Use for "data retention", "data archival", "GDPR compliance", or "storage optimization". --- # Data Retention & Archiving Planner Manage data lifecycle with automated retention and archiving. ## Retention Policy Document ```markdown # Data Retention Policy ## Retention Periods | Data Type | Hot Storage | Cold Storage | Total Retention | Reason | | --------------------- | ----------- | ------------ | --------------- | ----------------- | | User accounts | Active | N/A | Indefinite | Business need | | Order history | 2 years | 5 years | 7 years | Tax compliance | | Logs | 30 days | 90 days | 120 days | Operational | | Analytics events | 90 days | 1 year | 15 months | Business insights | | Audit trails | 1 year | 6 years | 7 years | Legal compliance | | User sessions | 30 days | None | 30 days | Security | | Failed login attempts | 90 days | None | 90 days | Security | ## Compliance Requirements ### GDPR (EU) - Right to erasure (right to be forgotten) - Data minimization - Storage limitation ### HIPAA (Healthcare) - Minimum 6 years retention - Secure archival required ### SOX (Financial) - 7 years retention for financial records - Immutable audit trails ### PCI DSS (Payments) - 1 year minimum for audit logs - 3 months minimum for transaction logs ``` ## Archive Schema Design ```sql -- Hot database: Current active data CREATE TABLE orders ( id BIGSERIAL PRIMARY KEY, user_id BIGINT NOT NULL, total DECIMAL(10,2) NOT NULL, status TEXT NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT NOW(), updated_at TIMESTAMP NOT NULL DEFAULT NOW() ); -- Cold database: Archived historical data CREATE TABLE orders_archive ( id BIGINT PRIMARY KEY, user_id BIGINT NOT NULL, total DECIMAL(10,2) NOT NULL, status TEXT NOT NULL, created_at TIMESTAMP NOT NULL, updated_at TIMESTAMP NOT NULL, archived_at TIMESTAMP NOT NULL DEFAULT NOW() ); -- Create partition for time-based archival CREATE TABLE orders_2024_q1 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); CREATE TABLE orders_2024_q2 PARTITION OF orders FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); ``` ## Archival Job Implementation ```typescript // jobs/archive-orders.ts import { PrismaClient } from "@prisma/client"; const prisma = new PrismaClient(); const archivePrisma = new PrismaClient({ datasources: { db: { url: process.env.ARCHIVE_DATABASE_URL, }, }, }); interface ArchivalJob { table: string; retentionDays: number; batchSize: number; } async function archiveOrders() { const cutoffDate = new Date(); cutoffDate.setDate(cutoffDate.getDate() - 730); // 2 years console.log(`šŸ“¦ Archiving orders older than ${cutoffDate.toISOString()}`); let archived = 0; let hasMore = true; while (hasMore) { await prisma.$transaction(async (tx) => { // Find orders to archive const ordersToArchive = await tx.order.findMany({ where: { created_at: { lt: cutoffDate }, status: { in: ["delivered", "cancelled"] }, }, take: 1000, }); if (ordersToArchive.length === 0) { hasMore = false; return; } // Copy to archive database await archivePrisma.order.createMany({ data: ordersToArchive.map((order) => ({ ...order, archived_at: new Date(), })), skipDuplicates: true, }); // Delete from hot database await tx.order.deleteMany({ where: { id: { in: ordersToArchive.map((o) => o.id) }, }, }); archived += ordersToArchive.length; console.log(` Archived ${archived} orders...`); }); // Rate limiting await new Promise((resolve) => setTimeout(resolve, 100)); } console.log(`āœ… Archived ${archived} orders total`); } // Schedule: Run nightly archiveOrders(); ``` ## Automated Cleanup Jobs ```typescript // jobs/cleanup-old-data.ts interface CleanupJob { table: string; column: string; retentionDays: number; } const CLEANUP_JOBS: CleanupJob[] = [ { table: "sessions", column: "created_at", retentionDays: 30, }, { table: "password_reset_tokens", column: "created_at", retentionDays: 1, }, { table: "failed_login_attempts", column: "attempted_at", retentionDays: 90, }, { table: "analytics_events", column: "created_at", retentionDays: 90, }, ]; async function runCleanupJobs() { console.log("šŸ—‘ļø Running cleanup jobs...\n"); for (const job of CLEANUP_JOBS) { const cutoffDate = new Date(); cutoffDate.setDate(cutoffDate.getDate() - job.retentionDays); const result = await prisma.$executeRawUnsafe( ` DELETE FROM "${job.table}" WHERE "${job.column}" < $1 `, cutoffDate ); console.log( `āœ… ${job.table}: Deleted ${result} rows older than ${job.retentionDays} days` ); } console.log("\nāœ… Cleanup complete!"); } ``` ## Soft Delete Pattern ```typescript // Soft delete for GDPR compliance model User { id Int @id @default(autoincrement()) email String @unique name String deletedAt DateTime? // NULL = active, NOT NULL = deleted createdAt DateTime @default(now()) updatedAt DateTime @updatedAt @@index([deletedAt]) } // Middleware to filter soft-deleted records prisma.$use(async (params, next) => { if (params.action === 'findMany' || params.action === 'findFirst') { params.args.where = { ...params.args.where, deletedAt: null, // Only show non-deleted }; } return next(params); }); // Hard delete after retention period async function purgeDeletedUsers() { const cutoffDate = new Date(); cutoffDate.setDate(cutoffDate.getDate() - 90); // 90 days retention const result = await prisma.user.deleteMany({ where: { deletedAt: { lt: cutoffDate }, }, }); console.log(`šŸ—‘ļø Purged ${result.count} deleted users`); } ``` ## Cold Storage Migration ```bash #!/bin/bash # scripts/migrate-to-s3.sh # Dump old orders to S3 for cold storage CUTOFF_DATE="2022-01-01" echo "šŸ“¦ Migrating orders to S3..." # 1. Export to CSV psql $DATABASE_URL -c "\COPY ( SELECT * FROM orders WHERE created_at < '$CUTOFF_DATE' ) TO STDOUT WITH CSV HEADER" | gzip > orders_archive.csv.gz # 2. Upload to S3 aws s3 cp orders_archive.csv.gz s3://my-cold-storage/orders/ # 3. Verify upload if aws s3 ls s3://my-cold-storage/orders/orders_archive.csv.gz; then echo "āœ… Uploaded to S3" # 4. Delete from database psql $DATABASE_URL -c "DELETE FROM orders WHERE created_at < '$CUTOFF_DATE'" echo "āœ… Deleted from database" else echo "āŒ S3 upload failed, skipping deletion" exit 1 fi ``` ## Compliance Automation ```typescript // Right to be forgotten (GDPR) async function deleteUserData(userId: number) { console.log(`šŸ—‘ļø Deleting user data for user ${userId}...`); await prisma.$transaction(async (tx) => { // 1. Anonymize orders (keep for business records) await tx.order.updateMany({ where: { userId }, data: { userId: null, shippingAddress: "[DELETED]", billingAddress: "[DELETED]", }, }); // 2. Delete personal data await tx.userProfile.delete({ where: { userId } }); await tx.paymentMethod.deleteMany({ where: { userId } }); await tx.address.deleteMany({ where: { userId } }); // 3. Soft delete user account await tx.user.update({ where: { id: userId }, data: { email: `deleted-${userId}@example.com`, name: "[DELETED]", deletedAt: new Date(), }, }); }); console.log(`āœ… User data deleted`); } ``` ## Monitoring & Alerting ```typescript // Monitor archive job health async function checkArchivalHealth() { // Check oldest active order const oldestOrder = await prisma.order.findFirst({ orderBy: { created_at: "asc" }, }); const age = Date.now() - oldestOrder.created_at.getTime(); const ageDays = age / (1000 * 60 * 60 * 24); if (ageDays > 750) { // > 2 years + buffer console.error("āš ļø Orders older than retention period found!"); await sendAlert({ title: "Archive job failing", message: `Oldest order is ${ageDays.toFixed(0)} days old`, }); } // Check archive database size const archiveCount = await archivePrisma.order.count(); console.log(`šŸ“Š Archive database: ${archiveCount} orders`); // Check hot database size const hotCount = await prisma.order.count(); console.log(`šŸ“Š Hot database: ${hotCount} orders`); } ``` ## Restore from Archive ```typescript // Restore archived order (e.g., for audit) async function restoreArchivedOrder(orderId: number) { // Find in archive const archivedOrder = await archivePrisma.order.findUnique({ where: { id: orderId }, }); if (!archivedOrder) { throw new Error("Order not found in archive"); } // Copy to hot database await prisma.order.create({ data: { ...archivedOrder, archived_at: undefined, }, }); console.log(`āœ… Restored order ${orderId} from archive`); } ``` ## Schedule Configuration ```yaml # cron schedule for archival jobs jobs: archive-orders: schedule: "0 2 * * *" # 2 AM daily command: "npm run job:archive-orders" cleanup-sessions: schedule: "0 3 * * *" # 3 AM daily command: "npm run job:cleanup-sessions" purge-deleted-users: schedule: "0 4 * * 0" # 4 AM Sunday command: "npm run job:purge-deleted" health-check: schedule: "0 */6 * * *" # Every 6 hours command: "npm run job:check-archival-health" ``` ## Best Practices 1. **Define clear policies**: Document retention periods 2. **Automate everything**: Manual cleanup is unreliable 3. **Test restore**: Regularly test archive restoration 4. **Monitor job health**: Alert on failures 5. **Compliance first**: Meet legal requirements 6. **Soft delete**: Before hard delete 7. **Batch operations**: Avoid locking tables ## Output Checklist - [ ] Retention policy documented - [ ] Archive schema designed - [ ] Archival jobs implemented - [ ] Cleanup jobs automated - [ ] Soft delete pattern (if applicable) - [ ] Cold storage migration - [ ] GDPR compliance (right to be forgotten) - [ ] Job scheduling configured - [ ] Monitoring and alerting - [ ] Restore procedure tested