Service Disruption on September 9–10, 2025
Summary
On September 9, 2025, Northpass experienced a service disruption affecting all AWS-hosted customers. Users reported difficulties logging in, accessing courses, and completing activities. Azure-hosted customers were not affected.
Impact
- Duration: ~2 hours of major disruption, followed by slower performance for several more hours
- Affected customers: AWS-hosted environments only
- Symptoms: login errors, slow page loads, delayed certificates, and issues with integrations (e.g., Zoom sessions)
Root Cause
The disruption was caused by our primary database running out of available input/output capacity (IOPS) in AWS. This slowed down critical operations and caused delays across the platform.
Resolution
Our engineering team took immediate action to stabilize the system, including expanding capacity and reducing system load. Once traffic normalized, performance returned to expected levels.
Next Steps (Preventing Recurrence)
We are implementing the following permanent improvements:
- Upgrading our AWS database storage to a higher-performance type with more IOPS capacity
- Improving monitoring and alerting to detect database pressure earlier
- Optimizing how we process background tasks to reduce load during peak usage
- Optimizing database queries to reduce impact on performance and improve reliability