LmCast :: Stay tuned in

Cloud Outages Highlight the Need for Resilient, Secure Infrastructure Recovery

Recorded: Oct. 31, 2025, 1 p.m.

Original Summarized

Cloud Outages Highlight the Need for Resilient Recovery TechTarget and Informa Tech’s Digital Business Combine.TechTarget and InformaTechTarget and Informa Tech’s Digital Business Combine.Together, we power an unparalleled network of 220+ online properties covering 10,000+ granular topics, serving an audience of 50+ million professionals with original, objective content from trusted sources. We help you gain critical insights and make more informed decisions across your business priorities.Dark Reading Resource LibraryBlack Hat NewsOmdia CybersecurityAdvertiseNewsletter Sign-UpNewsletter Sign-UpCybersecurity TopicsRelated TopicsApplication SecurityCybersecurity CareersCloud SecurityCyber RiskCyberattacks & Data BreachesCybersecurity AnalyticsCybersecurity OperationsData PrivacyEndpoint SecurityICS/OT SecurityIdentity & Access Mgmt SecurityInsider ThreatsIoTMobile SecurityPerimeterPhysical SecurityRemote WorkforceThreat IntelligenceVulnerabilities & ThreatsRecent in Cybersecurity TopicsCyber RiskZombie Projects Rise Again to Undermine SecurityZombie Projects Rise Again to Undermine SecuritybyRobert Lemos, Contributing WriterOct 30, 20257 Min ReadVulnerabilities & ThreatsLotL Attack Hides Malware in Windows Native AI StackLotL Attack Hides Malware in Windows Native AI StackbyNate Nelson, Contributing WriterOct 30, 20255 Min ReadWorld Related TopicsDR GlobalMiddle East & AfricaAsia PacificRecent in World See AllThreat IntelligenceSilver Fox APT Blurs the Line Between Espionage & CybercrimeSilver Fox APT Blurs the Line Between Espionage & CybercrimebyNate Nelson, Contributing WriterAug 8, 20253 Min ReadThreat IntelligenceIran-Israel War Triggers a Maelstrom in CyberspaceIran-Israel War Triggers a Maelstrom in CyberspacebyNate Nelson, Contributing WriterJun 19, 20255 Min ReadThe EdgeDR TechnologyEventsRelated TopicsUpcoming EventsPodcastsWebinarsSEE ALLResourcesRelated TopicsLibraryNewslettersPodcastsReportsVideosWebinarsWhite papers Partner PerspectivesSEE ALLСloud SecurityCyber RiskCybersecurity OperationsNews, news analysis, and commentary on the latest trends in cybersecurity technology.Cloud Outages Highlight the Need for Resilient, Secure Infrastructure RecoveryCloud Outages Highlight the Need for Resilient, Secure Infrastructure RecoveryCloud Outages Highlight the Need for Resilient, Secure Infrastructure RecoveryTwo massive technical outages over the past year underscore the need for cybersecurity teams to consider how to recover safely from disruptions without creating new security risks.Arielle Waldman, Features Writer , Dark Reading October 30, 20254 Min ReadSource: Simon Leigh via Alamy Stock PhotoAn Amazon Web Services (AWS) outage on Oct 19 caused significant disruptions to numerous websites and online services. Error messages splashed across users’ screens as they attempted to access popular sites like Amazon itself, as well as Snapchat and Disney+. The outage lasted two days, but spillover effects sprawled across industries. On Wednesday, the Microsoft Azure cloud platform and the Microsoft 365 service experienced a multi-hour outage due to what Microsoft described as an "an inadvertent configuration change." The Azure outage crippled critical business applications, bringing many organizations to a standstill. Like last year's CrowdStrike outage, these outages exemplified the blast radius that occurs when one or two vendors dominate a market area and own the infrastructure that everyone else relies on. While both incidents were the result of technical glitches, these extensive disruptions have serious cybersecurity implications for enterprises. Chaos Creates ConfusionLarge enterprises trickling down to small businesses all use AWS to host websites, applications, and databases. The cloud provider offers security tools to help companies bolster identity and access management and data protection. Even if a company isn't affected directly, providers they use for identity, incident response (IR), or threat detection could be down. The AWS outage affected services which many security and identity management services depend on, including EC2, DynamoDB, and Network Load Balancer. Related:Microsoft Security Change for Azure VMs Creates PitfallsWhen a widespread cloud outage like AWS occurs, it doesn't necessarily indicate an active security breach, but it can create vulnerabilities that lead to problems for enterprises, explains Ketaki Borade, senior analyst of infrastructure security at Omdia. "During the downtime and chaos of restoring services, IT teams can inadvertently leave gaps in monitoring or patching — similar to leaving a window unlocked while rushing out for a trip—that can become potential entry points for threat actors," Borade tells Dark Reading. Everyone's VulnerableOutages create security blind spots and put pressure on enterprises to restore services quickly, which can lead teams to bypass security controls. Heightened vulnerability surfaces are a real concern for enterprises following outages, agrees Jean-Christophe Gaillard, founder and CEO of Corix Partners. Enterprises may be forced to restart systems in degraded or fallback modes that lack standard security controls. Returning to standard secure configurations can take time, giving attackers an opportunity to strike. Related:Botnets Step Up Cloud Attacks Via Flaws, Misconfigurations"Emergency patches and configuration changes are often made without proper review or security, potentially leading to misconfigurations or insecure settings that attackers can exploit," Gaillard tells Dark Reading. Change management breakdowns could also lead to cybersecurity problems. Enterprises may forget to return to standard protocols altogether, which would leave systems and data exposed, adds Gaillard.But IT teams aren't the only ones who need to remain vigilant during widespread outages. Attackers may view it as the perfect opportunity to conduct phishing campaigns, sending messages that prompt users to "verify credentials" or "restore access," warns Borade.     How Resilient Are Enterprises?Incidents like these highlight the risks of relying solely on a single cloud provider, so enterprises need resilient security strategies and contingency plans to stay protected, she recommends. Resiliency levels play a large role when it comes to the extent of cybersecurity issues enterprises face following prolonged outages, like the one AWS recently suffered. In many cases, like a ransomware attack for example, enterprises need strong backup systems to recover. Related:Microsoft Adds Agentic AI Capabilities to SentinelIn cases like this, it's important to have more comprehensive fallback arrangements, because recovery requires more than effective data backup systems, says Rik Turner, chief analyst at Omdia. Any local or on-premise backup systems should be fully up-to-date with all the necessary patches if they are to take over securely, he explains.“If an organization has the ability—which I suspect is pretty rare—to fully switch to an alternative cloud provider while an AWS or any other cloud service provider is down, that will need to be both a seamless and a secure switchover process,” Turner says. “Frankly, I can see it being fraught with issues."  Can AI Play a Role to Recovery?Artificial intelligence can help offset mitigation efforts during widespread outages. Wild Moose, an AI-powered site reliability engineering platform, emerged from stealth this week, focusing on addressing cloud outages. During widespread outages, it can be difficult to distinguish cyberattacks from technical failures, which delays appropriate responses, says Yasmin Dunsky, Wild Moose CEO and co-founder. AI is used as part of incident response to uncover the root cause analysis – a vital but difficult assessment for security teams to make. Wild Moose it can help address security concerns that stem from technical glitches with its rapid root cause analysis. It can extend its analysis to the affected company’s dependents and  customers, Dunsky says.  "This helps both the affected company, and its downstream customers understand the scope of the problem and coordinate their response, rather than each organization independently scrambling to diagnose the same root cause,” she says.Borade sees the benefits of AI to mitigate the impact of highly disruptive outages that lead to cascading effects. AI systems can detect anomalies faster than humans, trigger automated responses, and even suggest remediation steps, she adds."That said, it’s worth noting the irony: We’re often using automation to fix issues caused by automation," she says. "It's a bit of a 'Who watches the Watchmen?' scenario, which is why human oversight remains critical." Read more about:CISO CornerAbout the AuthorArielle WaldmanFeatures Writer , Dark Reading Arielle Waldman is a Boston-based features writer for Dark Reading covering all things cybersecurity.See more from Arielle WaldmanMore InsightsIndustry ReportsIDC MarketScape: Worldwide Exposure Management 2025 Vendor AssessmentThe Forrester Wave™: Unified Vulnerability Management Solutions, Q3 2025The Total Economic Impact™ Of Palo Alto Networks NextGeneration FirewallsMiercom Test Results: PA-5450 Firewall WinsSecurity Without Compromise Better security, higher performance and lower TCOAccess More ResearchWebinarsThe Cloud is No Longer Enough: Securing the Modern Digital PerimeterSecuring the Hybrid Workforce: Challenges and SolutionsCybersecurity Outlook 2026Threat Hunting Tools & Techniques for Staying Ahead of Cyber AdversariesMeasuring Ransomware Resilience: What Hundreds of Security Leaders RevealedMore WebinarsYou May Also LikeFEATUREDCheck out the Black Hat USA Conference Guide for more coverage and intel from — and about — the show.Latest Articles in DR TechnologyAn 18-Year-Old Codebase Left Smart Buildings Wide OpenOct 30, 2025|4 Min ReadMicrosoft Security Change for Azure VMs Creates PitfallsOct 29, 2025|4 Min ReadAI-Generated Code Poses Security, Bloat ChallengesOct 29, 2025|6 Min ReadLevelBlue Announces Plans to Acquire XDR Provider CybereasonOct 15, 2025|2 Min ReadRead More DR TechnologyDiscover MoreBlack HatOmdiaWorking With UsAbout UsAdvertiseReprintsJoin UsNewsletter Sign-UpFollow UsCopyright © 2025 TechTarget, Inc. d/b/a Informa TechTarget. This website is owned and operated by Informa TechTarget, part of a global network that informs, influences and connects the world’s technology buyers and sellers. All copyright resides with them. Informa PLC’s registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. TechTarget, Inc.’s registered office is 275 Grove St. Newton, MA 02466.Home|Cookie Policy|Privacy|Terms of Use

Recent, widespread cloud outages, notably those experienced by Amazon Web Services (AWS) and Microsoft Azure, underscore the critical need for enterprises to develop robust, resilient, and secure infrastructure recovery strategies. These incidents, detailed by Dark Reading, highlight vulnerabilities stemming from over-reliance on single cloud providers and the potential for cascading disruptions within interconnected systems. The root causes, as highlighted by Ketaki Borade of Omdia, often originate from technical glitches, yet, the resulting spillover effects demonstrate how a single point of failure can rapidly amplify into significant operational and cybersecurity challenges.

The outages exposed a common pattern: disruptions to core services – including EC2, DynamoDB, and Network Load Balancer at AWS – directly impacted dependent systems, creating chaos across industries. This situation emphasizes the importance of diversification and redundancy, a point reinforced by Rik Turner of Omdia, who argues that the ability to seamlessly switch to an alternative provider during a crisis is exceedingly rare yet profoundly vital. Furthermore, the recovery process itself, as cautioned by Jean-Christophe Gaillard of Corix Partners, presents opportunities for attackers. IT teams, under pressure to swiftly restore services, may inadvertently introduce security gaps through hasty configuration changes or bypassed controls. This risk is compounded by the potential for phishing campaigns, exploiting the confusion and urgency created by the outages.

The reliance on a single cloud provider also creates heightened vulnerability, a concept further explored by Arielle Waldman, features writer for Dark Reading. The rapid, often unplanned, recovery attempts exacerbate this risk, creating ‘blind spots’ where security protocols may be temporarily suspended or significantly altered. The focus on restoring functionality can overshadow the crucial step of reinforcing security.

Artificial intelligence offers a potential countermeasure. Wild Moose, a newly emerged AI-powered site reliability engineering platform, seeks to address this challenge by rapidly identifying the root cause of outages, even in complex, interconnected environments. This AI-driven approach, as explained by Wild Moose CEO Yasmin Dunsky, can extend its analysis beyond the immediate impacted company, encompassing dependent systems and customers, allowing for a more coordinated and effective response. However, Dunsky acknowledges the paradoxical nature of relying on automation to mitigate issues stemming from automation, highlighting the need for continued human oversight.

The lessons of these recent outages are clear: resilience demands greater diversification of cloud infrastructure, coupled with proactive recovery plans that prioritize security. Successfully navigating future disruptions will require a combination of advanced technology—such as AI-powered monitoring and incident response—and a fundamental shift in thinking, moving beyond a purely operational focus to embrace a proactive, security-centric approach to cloud recovery.