Korur
Back to cases
Northwave Cloud logo
KOR-2024-C002
Cloud / SaaS

Northwave Cloud — Chaos engineering before a make-or-break launch

A B2B SaaS platform uncovered 12 hidden failure modes through controlled chaos experiments — and fixed every one before the traffic spike that would have exposed them.

The challenge

Northwave's SaaS platform had grown fast and passed every functional test, but nobody knew how it behaved under real failure. Load testing showed healthy averages, yet the team had a nagging suspicion that the green dashboards were hiding fragile dependencies.

A flagship enterprise customer was about to onboard, roughly tripling sustained traffic. The platform had never run at that level, and a visible outage during the rollout would have jeopardised the contract and the company's reputation in a crowded market.

The engineering team needed evidence — not opinions — about where the system would break, and they needed it before the launch, not during the incident retro afterwards.

Our solution

Korur designed a series of controlled chaos experiments against a production-like environment: killing instances, injecting network latency, throttling the database connection pool and severing third-party dependencies one at a time. Each experiment had a clear hypothesis and a defined blast radius so nothing ran uncontrolled.

The experiments quickly exposed failures the dashboards never showed — a retry storm that amplified a single slow query into a full outage, a cache stampede on cold start, and a queue that silently dropped messages when a downstream service timed out instead of surfacing the error.

We worked alongside Northwave's engineers to remediate each finding — adding circuit breakers, backpressure and timeouts — then re-ran the same experiments to prove the fixes held. Resilience moved from a hope to a measured, repeatable property.

Services used

The results

12

Failure modes identified

12 of 12

Fixed before production

0

Launch-day incidents

3x

Sustained traffic increase absorbed

The chaos experiments found problems our load tests had been hiding for a year. We walked into our biggest launch ever knowing exactly how the system fails — and having already fixed it. That confidence was priceless.

Sanne de Groot

VP Engineering, Northwave Cloud

Ready for similar results?

No-obligation conversation. Let's map your path to the same outcome.