All videos

Using Chaos Engineering to Build Resilient Distributed Applications

March 2, 2017

Distributed systems are hard to build and complex to adequately test. Often end-to-end testing assumes ideal conditions with networks that are stable and with machines that never fail. Real-world distributed systems however have to contend with issues such as variable latencies, unreliable networks, disks that fail, etc., all of which can lead to difficult to detect bugs that typically surface during crisis scenarios. Chaos Engineering solves these problems by injecting faults into real-world systems (e.g. by killing containers or changing networking behaviour), and then monitoring the distributed system for deviations from pre-defined steady-state behaviours.

By performing such experiments against production-like systems, Chaos Engineering allows code to be proved resilient and operations staff to be trained and drilled in handling crisis scenarios. This talk introduces and motivates the need for docker-compose-testkit – an open source library written in Scala to support the runtime verification of distributed applications and Chaos Engineering. Under the hood, the library interfaces with instrumented real-world systems using extensible effects and Monix observables. This enables both the functional reuse of Chaos experiments and the ability to generate these experiments by analysing abstract models of our real-world systems. Using real-world examples and live coding, this talk will demonstrate how docker-compose-toolkit may use its extensive fault injection capabilities (a la Chaos Engineering) to prove the resiliency of distributed applications.

Tags