https://github.com/lwindolf/cloud-outages Cloud Outages 2026-04-26T22:20:19.768Z https://github.com/jpmonette/feed Recent major cloud outages. New additions to the 10+ years outage index. <![CDATA[2026-02-20 Cloudflare]]> https://github.com/lwindolf/cloud-outages/2026/2026/2026-02-20 Cloudflare.md 2026-02-20T00:00:00.000Z Root Cause
  • BYOIP Prefix Mass Deletion Outage
  • missing empty check on filter parameter
  • A buggy automated cleanup task attempted to delete BYOIP prefixes.

Impact

  • customers lost CDN, Spectrum, Dedicated Egress, and Magic Transit

Duration

5h

Sources

]]>
<![CDATA[2026-02-07 Microsoft Azure]]> https://github.com/lwindolf/cloud-outages/2026/2026/2026-02-07 Microsoft Azure.md 2026-02-07T00:00:00.000Z Root Cause
  • "The event began following a power interruption affecting one of the datacenters within the region, after which impact manifested as infrastructure availability loss and service disruptions across multiple dependent workloads in the region."

Impact

  • West US region
  • region without AZs, so all services more or less down for everyone without multi-region failover

Duration

20h

Sources

]]>
<![CDATA[2025-12-05 Cloudflare]]> https://github.com/lwindolf/cloud-outages/2025/2025/2025-12-05 Cloudflare.md 2025-12-05T00:00:00.000Z Impact
  • 28% of HTTP traffic

Root Cause

  • botched fix for React vulnerability

Duration

25min

Sources

]]>
<![CDATA[2025-11-18 Cloudflare]]> https://github.com/lwindolf/cloud-outages/2025/2025/2025-11-18 Cloudflare.md 2025-11-18T00:00:00.000Z Root Cause
  • faulty configuration replicated globally

Impact

  • global
  • services affected: Cloudflare Sites and Services (Access, Bot Management, CDN/Cache, Dashboard, Firewall, Network, WARP, Workers)
  • ChatGPT, X, many websites

Duration

9,5h

Sources

]]>
<![CDATA[2025-10-20 AWS]]> https://github.com/lwindolf/cloud-outages/2025/2025/2025-10-20 AWS.md 2025-10-20T00:00:00.000Z Root Cause
  • DNS problems with DynamoDB in us-east1
  • HA realized via DNS entries, wrong updating of DNS caused DynamoDB to be unavailable

Impact

  • many gaming platforms down

Duration

  • 3h (according to post mortem)
  • 15h (according to techtarget)

Sources

]]>
<![CDATA[2025-07-09 Outlook]]> https://github.com/lwindolf/cloud-outages/2025/2025/2025-07-09 Outlook.md 2025-07-09T00:00:00.000Z Impact
  • outlook.com
  • unable to access virtual mailboxes
  • sign in issues

Root Cause

A recent service update to an authentication component unintentionally prevented access for a subset of users, resulting in intermittent service unavailability.

Duration

19-21h

Sources

]]>
<![CDATA[2025-06-15 Heroku]]> https://github.com/lwindolf/cloud-outages/2025/2025/2025-06-15 Heroku.md 2025-06-15T00:00:00.000Z Root Cause
  • automated OS Update on production took networking routes down

Duration

23h

Sources

]]>
<![CDATA[2025-06-12 Google Cloud]]> https://github.com/lwindolf/cloud-outages/2025/2025/2025-06-12 Google Cloud.md 2025-06-12T00:00:00.000Z Impact
  • Many Google cloud locations
  • Secondary effects caused by Cloudflare being affected
  • Google Cloud, Google Workspace and Google Security Operations products experienced increased 503 errors in external API requests, impacting customers.

Root Cause

  • control plane policy bug in quota management
  • binary crash loop in each region deployment

Duration

  • overall 7h
  • most regions fixed after 2h

Sources

]]>
<![CDATA[2025-05-24 X]]> https://github.com/lwindolf/cloud-outages/2025/2025/2025-05-24 X.md 2025-05-24T00:00:00.000Z Root Cause
  • fire in data center PDX11

Duration

2,5h

Sources

]]>
<![CDATA[2025-01-08 Microsoft Azure]]> https://github.com/lwindolf/cloud-outages/2025/2025/2025-01-08 Microsoft Azure.md 2025-01-08T00:00:00.000Z Root Cause
  • A networking configuration change in East US 2 created issues across multiple Azure services.

Duration

50h

Sources

]]>
<![CDATA[2024-07-19 Crowdstrike]]> https://github.com/lwindolf/cloud-outages/2024/2024/2024-07-19 Crowdstrike.md 2024-07-19T00:00:00.000Z Root Cause

Crowdstrike agent downloading new update and causing a reboot loop.

Impact

  • All infrastructure running Windows with Crowdstrike agent installed
  • Worst global impact seen so far

Duration

  • hours to days due to the need to recover affected end user systems manually

Sources

]]>
<![CDATA[2024-07-18 Microsoft Azure]]> https://github.com/lwindolf/cloud-outages/2024/2024/2024-07-18 Microsoft Azure.md 2024-07-18T00:00:00.000Z Root Cause

Between 21:40 UTC on 18 July 2024 and 12:15 UTC on 19 July 2024, customers may have experienced issues with multiple Azure services in the Central US region due to an Azure Storage availability event. This issue affected Virtual Machine (VM) availability, which caused downstream impact on multiple Azure services, including failures of service management operations and connectivity or availability of services. Services with dependencies on the impacted Virtual Machines would have been affected.

Impact

  • Azure VMs in US central region

Duration

  • ~14h

Media

]]>
<![CDATA[2024-05-02 Google Cloud]]> https://github.com/lwindolf/cloud-outages/2024/2024/2024-05-02 Google Cloud.md 2024-05-02T00:00:00.000Z Root Cause

?

Impact

  • Infrastructure of client UniSuper was delete including backup

Duration

%

Sources

]]>
<![CDATA[2024-04-08 Rackspace]]> https://github.com/lwindolf/cloud-outages/2024/2024/2024-04-08 Rackspace.md 2024-04-08T00:00:00.000Z Root Cause

?

Impact

  • "impacted multiple downstream providers, as well as Rackspace customers within multiple regions including the U.S., Japan, Vietnam, Spain, Canada, Germany, Singapore, France, the Netherlands, the U.K., Brazil, and South Africa"

Duration

14min

Sources

]]>
<![CDATA[2024-01-26 Microsoft Teams]]> https://github.com/lwindolf/cloud-outages/2024/2024/2024-01-26 Microsoft Teams.md 2024-01-26T00:00:00.000Z Root Cause
  • "a networking issue impacting a portion of the Teams service"

Duration

2,5h

Sources

]]>
<![CDATA[2023-11-02 Cloudflare]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-11-02 Cloudflare.md 2023-11-02T00:00:00.000Z Root Cause

data center outage + high availability did not work

Duration

2d

Impact

Cloudflare control panel and analytics outage

Sources

]]>
<![CDATA[2023-07-05 Azure]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-07-05 Azure.md 2023-07-05T00:00:00.000Z Root Cause

fiber cut caused by severe weather conditions in the Netherlands

Duration

8h

Impact

Region West Europe partially down

Sources

]]>
<![CDATA[2023-06-13 AWS-us-east1]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-06-13 AWS-us-east1.md 2023-06-13T00:00:00.000Z Impact

Service degradation of 104 AWS services (that where using AWS Lambda)

Duration

3h

Root Cause

Lambda scaling crossing a new threshold hit a functional bug

Sources

]]>
<![CDATA[2023-04-25 GCP-europe-west-9]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-04-25 GCP-europe-west-9.md 2023-04-25T00:00:00.000Z Root Cause

fire after cooling system water pipe leak

Duration

1d

Impact

  • Cloud region europe-west-9 was offline (one day)
  • Zone europe-west-9-a was offline (two weeks)

Sources

]]>
<![CDATA[2023-04-07 SpaceX]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-04-07 SpaceX.md 2023-04-07T00:00:00.000Z Impact

No connection

Duration

2h

Root Cause

Expired certificate

Sources

]]>
<![CDATA[2023-03-09 Datadog]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-03-09 Datadog.md 2023-03-09T00:00:00.000Z Root Cause

automatic OS update takes network down

Duration

2d

Impact

Service outage

Sources

]]>
<![CDATA[2023-02-16 GCP]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-02-16 GCP.md 2023-02-16T00:00:00.000Z Root Cause

network update caused traffic disruption

Duration

6h

Impact

Gmail, Youtube, Google Drive partial outage

Sources

]]>
<![CDATA[2023-02-13 Oracle OCI]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-02-13 Oracle OCI.md 2023-02-13T00:00:00.000Z Root Cause

performance problems in DNS-based load management

Duration

3d

Impact

OCI Vault, API Gateway, Oracle Digital Assistant and OCI Search with OpenSearch

Sources

]]>
<![CDATA[2023-01-25 Microsoft Teams]]> https://github.com/lwindolf/cloud-outages/2023/2023/2023-01-25 Microsoft Teams.md 2023-01-25T00:00:00.000Z Root Cause

network configuration error

Duration

1h

Impact

World-wide MS Teams outage

Sources

]]>
<![CDATA[2022-12-05 AWS US East2]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-12-05 AWS US East2.md 2022-12-05T00:00:00.000Z Root Cause

unclear

Duration

75min

Impact

US East2 connectivity issues

Sources

]]>
<![CDATA[2022-10-25 Whatsapp]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-10-25 Whatsapp.md 2022-10-25T00:00:00.000Z Root Cause

backend application service failure

Duration

2h

Impact

Users unable to send/receive messages

Sources

]]>
<![CDATA[2022-09-15 Zoom]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-09-15 Zoom.md 2022-09-15T00:00:00.000Z Root Cause

unclear

Duration

1h

Impact

worldwide, no meetings possible

Sources

]]>
<![CDATA[2022-08-09 Google Search+Maps]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-08-09 Google Search+Maps.md 2022-08-09T00:00:00.000Z Root Cause

software update

Duration

1h

Impact

Google Search, Google Maps globally unavailable

Sources

]]>
<![CDATA[2022-07-08 AWS US East2 AZ1]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-07-08 AWS US East2 AZ1.md 2022-07-08T00:00:00.000Z Root Cause

power failure

Duration

20min

Impact

AZ1 of US East2 without connectivity

Sources

]]>
<![CDATA[2022-06-21 Cloudflare]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-06-21 Cloudflare.md 2022-06-21T00:00:00.000Z Root Cause

A change to the network configuration in those locations caused an outage [1]

Duration

1h 15min

Impact

many affected websites

Sources

]]>
<![CDATA[2022-04-05 Atlassian]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-04-05 Atlassian.md 2022-04-05T00:00:00.000Z Root Cause

global scale orchestration human error, instead of shutting down component product instances were terminated

Impact

400 companies and anywhere from 50,000 to 400,000 users had no access to JIRA, Confluence, OpsGenie, JIRA Status page, and other Atlassian Cloud services

Duration

">14days for some customers"

Sources

]]>
<![CDATA[2022-03-01 Apple]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-03-01 Apple.md 2022-03-01T00:00:00.000Z Root Cause

DNS problems

Impact

App Store, Maps, TV

Duration

4h

Sources

]]>
<![CDATA[2022-02-22 Slack]]> https://github.com/lwindolf/cloud-outages/2022/2022/2022-02-22 Slack.md 2022-02-22T00:00:00.000Z Root Cause

Quote from Slack status page: A configuration change inadvertently lead to a sudden increase in activity on our database infrastructure. Due to this increased activity, the affected databases failed to serve incoming requests to connect to Slack.

Impact

  • Slack not loading

Duration

5h

Status Page

]]>
<![CDATA[2021-12-08 AWS]]> https://github.com/lwindolf/cloud-outages/2021/2021/2021-12-08 AWS.md 2021-12-08T00:00:00.000Z Impact
  • different services in us-east1#

Duration

4h

Sources

]]>
<![CDATA[2021-10-04 Facebook]]> https://github.com/lwindolf/cloud-outages/2021/2021/2021-10-04 Facebook.md 2021-10-04T00:00:00.000Z Impact
  • Facebook, Instagram, Whatsapp down

Duration

6h

Sources

]]>
<![CDATA[2021-06-08 Fastly]]> https://github.com/lwindolf/cloud-outages/2021/2021/2021-06-08 Fastly.md 2021-06-08T00:00:00.000Z Impact
  • global incident
  • high origin loads
  • "Customers could continue to experience a period of increased origin load and lower Cache Hit Ratio (CHR)."

Duration

2h

Root Cause

  • unknown

Status Page

]]>
<![CDATA[2021-05-11 Salesforce]]> https://github.com/lwindolf/cloud-outages/2021/2021/2021-05-11 Salesforce.md 2021-05-11T00:00:00.000Z Impact
  • All services not available due to DNS outage

Duration

4h

Root Cause

  • failed global DNS change

Status Page

]]>
<![CDATA[2021-03-23 quay.io]]> https://github.com/lwindolf/cloud-outages/2021/2021/2021-03-23 quay.io.md 2021-03-23T00:00:00.000Z Impact
  • No image pulls possible

Duration

4h

Root Cause

  • somehow AWS related

Status Page

]]>
<![CDATA[2021-03-10 OVH SBG Datacenters]]> https://github.com/lwindolf/cloud-outages/2021/2021/2021-03-10 OVH SBG Datacenters.md 2021-03-10T00:00:00.000Z Impact
  • 4 datacenters down
  • 2 destroyed
  • recovery >10days

Sources

Provider Status Page

]]>
<![CDATA[2020-11-26 AWS]]> https://github.com/lwindolf/cloud-outages/2020/2020/2020-11-26 AWS.md 2020-11-26T00:00:00.000Z Root Cause

??

Duration

??

Impact

  • only us-east1
  • Roku, Adobe, Glassdoor, Autodesk, The Wall Street Journal, 1Password
  • Kinesis Data Streams API and other dependent services

Sources

]]>
<![CDATA[2020-08-24 Zoom]]> https://github.com/lwindolf/cloud-outages/2020/2020/2020-08-24 Zoom.md 2020-08-24T00:00:00.000Z Root Cause
  • not disclosed

Duration

3h

Sources

Status Page

]]>
<![CDATA[2020-06-29 Github]]> https://github.com/lwindolf/cloud-outages/2020/2020/2020-06-29 Github.md 2020-06-29T00:00:00.000Z Duration

2h

Impact

  • FIXME

Sources

]]>
<![CDATA[2020-06-10 IBM Cloud]]> https://github.com/lwindolf/cloud-outages/2020/2020/2020-06-10 IBM Cloud.md 2020-06-10T00:00:00.000Z Duration

several hours

Impact

  • cloud down globally

Sources

]]>
<![CDATA[2020-05-17 Zoom]]> https://github.com/lwindolf/cloud-outages/2020/2020/2020-05-17 Zoom.md 2020-05-17T00:00:00.000Z Root Cause
  • undisclosed

Duration

7h

Impact

  • customers unable to join meetings

Sources

]]>
<![CDATA[2020-05-12 Slack]]> https://github.com/lwindolf/cloud-outages/2020/2020/2020-05-12 Slack.md 2020-05-12T00:00:00.000Z Root Cause
  • scaling up automation failure
  • new servers were not added to LBs, causing continuous performance degradation

Duration

3h (everyone) 1d (for Electron app users)

Impact

  • no messages could be sent

Sources

Status Page

]]>
<![CDATA[2020-03-03 Azure]]> https://github.com/lwindolf/cloud-outages/2020/2020/2020-03-03 Azure.md 2020-03-03T00:00:00.000Z Root Cause
  • physical datacenter malfunction of air ventilation, overheating HW

Duration

6h

Impact

  • us-east1

Sources

]]>
<![CDATA[2019-07-18 Slack]]> https://github.com/lwindolf/cloud-outages/2019/2019/2019-07-18 Slack.md 2019-07-18T00:00:00.000Z Root Cause
  • some servers unavailablity, performance degradation

Duration

~7h

Impact

  • connectivity issues
  • 10-25% error rate

Sources

]]>
<![CDATA[2019-06-24 Verizon]]> https://github.com/lwindolf/cloud-outages/2019/2019/2019-06-24 Verizon.md 2019-06-24T00:00:00.000Z Root Cause
  • BGP route leak
  • Route propagation

Duration

3h

Impact

  • Google, AWS, Reddit, Netflix, Cloudflare customers

Sources

]]>
<![CDATA[2019-06-02 GCP Outage]]> https://github.com/lwindolf/cloud-outages/2019/2019/2019-06-02 GCP Outage.md 2019-06-02T00:00:00.000Z Root Cause
  • Network control plane
  • automation tool

Duration

  • 4h

Impact

  • G-Suite, Gmail, Google Docs, Google Drive, Google Cloud, YouTube
  • Vimeo, Shopify, Discord, Snapchat

Sources

Status Page

]]>
<![CDATA[2019-05-18 Salesforce]]> https://github.com/lwindolf/cloud-outages/2019/2019/2019-05-18 Salesforce.md 2019-05-18T00:00:00.000Z Root Cause
  • internal DB update script messed up user privileges (making them too open)

Duration

  • ~15h

Impact

  • all customers shut off to prevent unprivileged data access

Sources

]]>