Responding to the Recent AWS Outage: How Open LMS Ensured Service Resilience
On October 19–20, 2025, AWS experienced a major service disruption in one of its largest U.S data regions. The event affected several of Amazon’s core cloud services, including database, computing, and load-balancing systems that many organizations depend on to manage their workflows and daily operations. Since Open LMS uses AWS infrastructure, this incident temporarily affected portions of our hosted environment. We want to share what happened, how it affected our clients, and how our teams worked together to restore stability as quickly as possible.
What Happened
The disruption began late in the evening on October 19 when AWS encountered a failure in DynamoDB's internal DNS management system, essentially a problem in how DynamoDB updated and managed its own internal service records. A rare software race condition caused those records to be deleted by mistake, which prevented other AWS systems from connecting to DynamoDB.
That single issue set off a chain reaction across other AWS services:
- DynamoDB: The DNS failure made DynamoDB unreachable for a few hours, impacting both customers and internal AWS services.
- EC2: Since EC2 relies on DynamoDB for certain internal processes, new instance launches failed or were heavily throttled. While existing servers remained healthy, attempts to add new web nodes were delayed, causing intermittent slow performance and connection errors.
- NLB (Network Load Balancer): The delay in EC2 launches disrupted NLB health checks, causing some backend targets to cycle in and out of service. This led to increased connection errors and slower response times for some clients.
READ MORE | ‘How Open LMS’s Rigorous Approach to LMS Plugins Provides Security, Compliance, and Partnership’

Impact on Open LMS
During the incident, many Open LMS clients experienced gateway timeouts, slower site performance, and/or temporary loss of access. Once AWS resolved the DynamoDB DNS issue, site availability began to improve, but performance continued to fluctuate while AWS worked through EC2 and NLB recovery.
Open LMS’s Infrastructure and DevOps teams quickly identified the issue as an upstream AWS problem and immediately activated internal response protocols. We closely monitored site health, adjusted scaling as needed, and worked alongside our Customer Success and Support teams to keep clients informed and supported throughout the event.
Even amid a complex global outage, our focus remained clear: restore service for our clients as quickly as possible.
RELATED READING | ‘Avoid Data Breaches and Penalties: 3 Key Ways to Balance Student and Employee Privacy In Digital Learning’

Resilience in Action
The redundant and fault-tolerant design of the Open LMS architecture enabled our services to recover more than an hour before AWS reported full regional recovery. By 2:00 PM ET, nearly all Open LMS-hosted sites were stable and fully operational again, while other LMS systems remained down. Thanks to Open LMS’s global infrastructure, only sites in North America were affected by the outage. Clients in different regions, like APAC and EMEA, continued to operate normally.
A Resilient Foundation
The Open LMS platform is built on a cloud-native architecture designed for flexibility and fault tolerance. Each service component, including application servers, load balancers, and data systems, is monitored continuously and distributed across multiple availability zones. Automated health checks, scaling rules, and traffic management systems allow our platform to adapt quickly when upstream issues occur. This layered approach to resilience enables Open LMS to maintain stability and restore performance rapidly, even during large-scale cloud events.
While upstream provider outages are beyond our control, our preparation and response are not. This cloud event reinforced that Open LMS’s systems, and more importantly, our people, are ready to handle unexpected disruptions.
Our Commitment to You
At Open LMS, we understand that our platform plays a critical role in supporting educators, learners, and organizations around the world. Every minute of uptime matters. During this outage, our teams across Infrastructure, DevOps, Support, and Customer Success worked together seamlessly to communicate transparently and resolve issues swiftly. This experience reinforced a simple truth: resilience is not only about technology; it's also about the people behind it.
Open LMS remains deeply committed to providing a reliable, world-class learning platform that supports learning everywhere, every day, when you need it most. Take a tour of our LMS today, or request a demo to learn how we can support you.