chaos testing | The Blog Pros

December 27, 2021

Systematic and Chaotic Testing: A Way to Achieve Cloud Resilience

In today’s digital technology era where downtime translates to shut down, it is imperative to build resilient cloud structures. For example, in the pandemic, IT maintenance teams can no longer be on-premises to reboot any server in the data center. This may lead to a big hindrance in accessing all the data or software, putting a halt on productivity, and creating overall business loss if the on-premises hardware is down. However, the solution here would be to transmit all your IT operations to cloud infrastructure that ensures security by rendering 24/7, round-the-clock tech support by remote members. Cloud essentially poses as a savior here.

Recently, companies have been fully utilizing the cloud potency, and hence, observability and resilience of cloud operations become imperative as downtime now equates to disconnection and business loss.

October 22, 2021

How Chaos Mesh Helps Apache APISIX Improve System Stability

Apache APISIX is a cloud-native, high-performance, scaling microservices API gateway. It is one of the Apache Software Foundation's top-level projects and serves hundreds of companies around the world, processing their mission-critical traffic, including finance, the Internet, manufacturing, retail, and operators. Our customers include NASA, the European Union's digital factory, China Mobile, and Tencent.

As the community grows, Apache APISIX's features more frequently interact with external components, making the system more complex and increasing the possibility of errors. To identify potential system failures and build confidence in the production environment, we introduced the concept of Chaos Engineering.

June 24, 2021

Chaos Engineering Make Disciplined Microservices

Chaos and discipline, These two words are an oxymoron, you might be thinking, how can chaos make disciplined microservices?

But the universal truth is discipline means the absence of chaos, so until you have not experienced chaos you can not be disciplined.

August 12, 2020

The Principles of Chaos Engineering

Resilience is something those who use Kubernetes to run apps and microservices in containers aim for. When a system is resilient, it can handle losing a portion of its microservices and components without the entire system becoming inaccessible.

Resilience is achieved by integrating loosely coupled microservices. When a system is resilient, microservices can be updated or taken down without having to bring the entire system down. Scaling becomes easier too, since you don’t have to scale the whole cloud environment at once.

October 10, 2019

Designing Fault-Tolerant Microservices With Toxiproxy and Cucumber

Designing fault-tolerant microservices.

You may also like: Making Your Microservices Resilient and Fault Tolerant

Thinking About Fault Tolerance From Day 0

Fault tolerance — alongside with security and other traits - is hard to factor-in after the service is already built. It's created by making careful design decisions starting at the same time your service was born.

July 16, 2019

A Key to Success: Failure with Chaos Engineering [Video]

Test in Production is back! In May we hosted the Meetup at the Microsoft Reactor in San Francisco. The focus of this event was the culture of failure. Specifically, we wanted to hear how the culture of failure (avoiding failure, recovering from failure, and learning from failure) has an impact on how we test in production.

Ana Medina, Chaos Engineer at Gremlin, spoke about how performing Chaos Engineering experiments and celebrating failure helps engineers build muscle memory, spend more time building features and build more resilient complex systems.