--- name: conducting-post-incident-lessons-learned description: Facilitate structured post-incident reviews to identify root causes, document what worked and failed, and produce actionable recommendations to improve future incident response. domain: cybersecurity subdomain: incident-response tags: - incident-response - lessons-learned - post-incident - after-action-review - process-improvement mitre_attack: - T1190 - T1566 - T1078 version: '1.0' author: mahipal license: Apache-2.0 nist_csf: - RS.MA-01 - RS.MA-02 - RS.AN-03 - RC.RP-01 --- # Conducting Post-Incident Lessons Learned ## When to Use - After any security incident has been fully resolved and recovery completed - Following tabletop exercises or IR simulations - After significant near-miss events - Quarterly review of accumulated incident trends - When IR playbooks need updating based on real-world experience ## Prerequisites - Incident fully resolved (containment, eradication, recovery complete) - Incident timeline and documentation gathered - All incident responders available for review session - Meeting space for collaborative discussion - Incident ticketing system data for metrics analysis ## Workflow ### Step 1: Gather Incident Data ```bash # Export incident timeline from ticketing system curl -s "https://thehive.local/api/v1/case/$CASE_ID/timeline" \ -H "Authorization: Bearer $THEHIVE_API_KEY" | jq '.' > incident_timeline.json # Extract detection and response metrics from SIEM index=notable incident_id="IR-2024-042" | stats min(_time) as first_alert, max(_time) as last_alert, count as total_alerts, dc(src) as unique_sources # Compile all responder actions and timestamps grep -E "timestamp|action|analyst" /var/log/ir/IR-2024-042/*.json | \ python3 -m json.tool > compiled_actions.json ``` ### Step 2: Conduct Blameless Post-Mortem Meeting ``` Structured Agenda (90 minutes): 1. Incident summary (5 min) - Factual overview 2. Timeline walkthrough (20 min) - Chronological events 3. What worked well (15 min) - Positive outcomes 4. What needs improvement (15 min) - Gaps and failures 5. Root cause analysis (15 min) - 5 Whys or fishbone 6. Action items (10 min) - Specific improvements with owners 7. Playbook updates (10 min) - Changes to IR procedures Blameless Principles: - Focus on systems and processes, not individuals - Assume best intentions with available information - Seek to understand, not to blame ``` ### Step 3: Perform Root Cause Analysis ```bash # 5 Whys analysis example: # Why 1: Why did ransomware encrypt production servers? # Answer: Attacker had domain admin credentials # Why 2: Why did attacker have domain admin credentials? # Answer: Kerberoasted a service account and cracked it # Why 3: Why was the service account password crackable? # Answer: Used a 12-character dictionary-based password # Why 4: Why was the service account password weak? # Answer: No enforcement of service account password policy # Why 5: Why was there no service account password policy? # Answer: PAM was not implemented for service accounts # ROOT CAUSE: Lack of privileged access management ``` ### Step 4: Calculate Response Metrics ```python from datetime import datetime events = { 'compromise': '2024-01-10 14:00:00', 'detection': '2024-01-15 08:30:00', 'triage': '2024-01-15 08:45:00', 'containment': '2024-01-15 09:30:00', 'eradication': '2024-01-16 14:00:00', 'recovery': '2024-01-18 16:00:00', 'closure': '2024-01-25 10:00:00', } fmt = '%Y-%m-%d %H:%M:%S' times = {k: datetime.strptime(v, fmt) for k, v in events.items()} print(f"Dwell Time: {times['detection'] - times['compromise']}") print(f"MTTD: {times['triage'] - times['detection']}") print(f"MTTC: {times['containment'] - times['detection']}") print(f"MTTR: {times['recovery'] - times['eradication']}") print(f"Total Duration: {times['closure'] - times['detection']}") ``` ### Step 5: Document Findings and Create Action Items ```bash # Create tracked action items in project management curl -X POST "https://jira.local/rest/api/2/issue" \ -H "Authorization: Bearer $JIRA_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "fields": { "project": {"key": "SEC"}, "summary": "Implement PAM for service accounts (IR-2024-042)", "issuetype": {"name": "Task"}, "priority": {"name": "High"}, "assignee": {"name": "security_engineer"}, "duedate": "2024-03-15" } }' ``` ### Step 6: Update Playbooks and Detection Rules ```yaml # New Sigma detection rule based on incident learnings title: Kerberoasting Activity Detected status: stable description: Detects Kerberoasting based on IR-2024-042 lessons logsource: product: windows service: security detection: selection: EventID: 4769 TicketEncryptionType: '0x17' condition: selection level: high tags: - attack.credential_access - attack.t1558.003 ``` ## Key Concepts | Concept | Description | |---------|-------------| | Blameless Post-Mortem | Reviewing incidents focusing on systems, not blaming individuals | | Root Cause Analysis | Identifying the fundamental reason the incident occurred | | 5 Whys | Iterative questioning technique to find root cause | | MTTD | Mean Time to Detect - time from compromise to detection | | MTTC | Mean Time to Contain - time from detection to containment | | MTTR | Mean Time to Recover - time from eradication to full recovery | | Continuous Improvement | Iterating on IR processes based on real incident data | ## Tools & Systems | Tool | Purpose | |------|---------| | TheHive/ServiceNow | Incident timeline and documentation | | Jira/Azure DevOps | Action item tracking | | Confluence/SharePoint | Lessons learned documentation | | Splunk/Elastic | Incident metrics and detection improvement | | Sigma | Detection rule development | ## Common Scenarios 1. **Ransomware Post-Mortem**: Review entire kill chain from initial access to encryption. Identify detection gaps and backup failures. 2. **Phishing Campaign Review**: Analyze why users clicked, why email filters missed it, and how to improve training. 3. **Cloud Misconfiguration Incident**: Review IaC pipeline, CSPM coverage, and change management process. 4. **Insider Threat Review**: Examine DLP effectiveness, access control gaps, and user monitoring capabilities. 5. **Third-Party Breach Impact**: Review vendor risk assessment process and data sharing agreements. ## Output Format - Post-incident review meeting minutes - Root cause analysis document - Incident metrics report (MTTD, MTTC, MTTR) - Action items list with owners and deadlines - Updated IR playbooks and detection rules - Executive summary for leadership