Anyone who has been hovering around the Netflix world for the last few years will realize that they are perhaps the number one poster child for cloud engineering. Seemingly every time that Amazon Web Services has an outage (which, in their defense, isn’t often), Netflix, a huge AWS customer, would uncannily enjoy continuity of service. Was Netflix using some alchemy to keep their servers up? Or was AWS pulling out special favors for the movie disruptor?

Actually, the answer was far simpler than that: Netflix has always been an exemplar for planning for failure. That is, their core engineering tenet was that, as Forrest Gump so famously stated, “Shit Happens” and to be ready with second, third and fourth options in the event of service degradation. Netflix perfected this approach and, as part of it, introduced a bunch of different products all under the moniker of “Chaos Engineering”,” the idea being that if something can go wrong, it will and you better be ready for it.

Looking to find some Gremlins on your back

While Netflix open sourced a number of different tools, there was still a gap in the market between those organizations with the engineering resource of a Netflix and those who need assistance implementing this kind of approach, and this is where Gremlin comes in.

The company, founded by CEO Kolton Andrus and CTO Matthew Fornaciari in 2016, is all about making those who ply their trade on the internet more reliable. Gremlins simulated failure offerings help to reduce downtime and, as a result, maximize revenue for their customers. Gremlin has signed up a host of high-profile customers in its short existence including DTCC, Expedia, Remind, Twilio, and Walmart.

Limiting the limitations

When Gremlin for launched, however, in the parlance of the day they were following a Minimum Viable Product (MVP) approach. That approach saw Gremlin offering services to simulate attacks on hosts and servers. That was good but there was a glaring problem, especially for a company at the forefront of modern infrastructure approaches: the world has gotten very excited over containers and Gremlin didn’t have a container answer.

That is changing today with the news that Gremlin is rolling out support for containers which will allow customers to automatically discover Docker containers within the Gremlin UI and safely run chaos engineering experiments on them. Customers can, for example, run attacks on resources such as CPU / memory overload; or on the network to simulate DNS problems or latency; or choose to randomly shutdown containers to see if your architecture can handle it.

The value that Chaos Engineering delivers is obvious, one Gremlin customer, Under Armour, is particularly effusive. Senior Engineering Manager Paul Osman said that:

Chaos Engineering has been a big part of our migration to containerized infrastructure. We use Gremlin to test various failure scenarios and build confidence in the resiliency of our microservices. The ability to target containerized services with an easy-to-use UI has reduced the amount of time it takes us to do fault injection significantly.

And the costs of outages are not insignificant, IHS Markit, in its report The Cost of Server, Application, and Network Downtime estimated that downtime is costing North American organizations $700 billion per year. Per-company figures provided by Gartner, cite the average cost of downtime at $300,000 per hour, or $5,600 per minute.

Featuring Multiple Attacks and Container Discovery

Gremlin’s failure injection platform, applied to Docker Environments is a well thought-out application of Chaos Engineering for this use case. In particular:

  • The “multiple attacks” feature allows DevOps teams to better prepare for real-world disasters by simulating compounding issues
  • Container Discovery enables teams to automatically run experiments on Dockerized infrastructure as it expands, contracts, and shifts across hosts


A logical progression from this company. For the first time, delivering chaos on tap is seen as a positive move!

Ben Kepes

Ben Kepes is a technology evangelist, an investor, a commentator and a business adviser. Ben covers the convergence of technology, mobile, ubiquity and agility, all enabled by the Cloud. His areas of interest extend to enterprise software, software integration, financial/accounting software, platforms and infrastructure as well as articulating technology simply for everyday users.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.