6.2K stars! We recommend an open source chaos engineering test platform: Chaos Mesh!

1、Introduction to Chaos Mesh

Chaos Meshis an open source chaos engineering platform designed to help users test, validate and optimize the reliability and stability of their applications in a production environment. By introducing fault injection and chaos engineering principles, Chaos Mesh can simulate a variety of failure scenarios, such as network latency, node failures, disk failures, etc., in order to help users identify and solve potential problems in their systems.

Project Address:

/mirrors/Chaos-Mesh
/pingcap/chaos-mesh

2. Chaos Mesh characteristics:

Diverse fault injection: Chaos Mesh supports a variety of fault injection methods, including network failure, node failure, disk failure, etc. Users can choose the appropriate fault injection method for testing according to their needs.
Fine-grained fault control: Users can make fine-grained configurations of fault injection through the console provided by Chaos Mesh, including the type of fault, injection time, injection range, etc., in order to better simulate the fault situation in the actual production environment.
Observability and monitoring: Chaos Mesh provides rich monitoring and observability features that allow users to monitor the effects of fault injection in real time and understand the stability and reliability of the system.
Containerization support: Chaos Mesh can be integrated with containerization platforms such as Kubernetes to support chaos engineering experiments in container environments and help users better understand the stability and reliability of containerized applications.
Flexible scheduling strategies: Users can define the scheduling policy for fault injection according to their needs, including timed triggering, periodic triggering, etc., in order to better control the timing and frequency of fault injection.

Overall, Chaos Mesh is a powerful chaos engineering platform that helps users to improve system reliability and stability by performing system stability testing and fault simulation in production environments.

3. Chaos Mesh Installation Steps

1. Download Chaos Mesh: You can get the latest version of the installation file from the Chaos Mesh GitHub repository.
Deploying Chaos Mesh: You can use Helm to deploy Chaos Mesh by executing the following commands:

helm repo add chaos-mesh 
helm install chaos-mesh chaos-mesh/chaos-mesh --namespace=chaos-testing --version=0.12.0

Verify Deployment: Wait for the deployment to complete, and then use the following commands to verify whether Chaos Mesh has been successfully deployed:

kubectl get pods -n chaos-testing

The main operations that the current experiment can support for fault injection are:

pod-kill: Simulates a Kubernetes Pod being killed.
pod-failure: simulates the continuous unavailability of a Kubernetes pod, which can be used to simulate node downtime and unavailability scenarios.
network-delay: simulates network delay.
network-loss: Simulates network loss.
network-duplication: simulates network packet duplication.
network-corrupt: Simulates network packet corruption.
network-partition: simulates a network partition.
I/O delay: simulates file system I/O delay.
I/Oerrno: simulates file system I/O errors.

4. Steps for using Chaos Mesh

1、Creating Fault Injection Experiments: Create a fault injection experiment using the Chaos Mesh console or a command line tool, selecting parameters such as the type of fault, target application, and injection time.

For example:Creating network latency experiments: UseChaos Mesh CLI Create a network latency experiment, specifying the target application and the network latency parameters to be simulated. You can create a network latency experiment using the following command:

chaosctl create network-delay --time 30s --target myapp --duration 60s

The --time parameter specifies the delay time, which is set to 30 seconds here.
The --target parameter specifies the target application, which is set to myapp.
The --duration parameter specifies the duration of the experiment, which is set to 60 seconds here.

2、running experiment: Use the Chaos Mesh CLI to launch the network latency experiment created to observe in real time how the target application behaves in the presence of network latency. You can run the experiment using the following command:

chaosctl start network-delay --name my-network-delay

3、Monitoring and observation: Can be usedChaos Mesh The monitoring and observability functions provided monitor the effect of network delay experiments in real time to understand the stability and reliability of the system.

4、analysis: Analyze data and logs collected during experimental runs to assess system performance and make adjustments and optimizations as needed.

5、Adjustment of experimental parameters: Adjust the parameters of the fault injection experiment, such as the type of fault, injection time, injection range, etc., according to the experimental results and feedback, in order to better simulate the fault situation in the real production environment.

6、End of experiment: At the end of the experiment duration, you can end the experiment using the following command:

chaosctl stop my-network-delay

The above steps will allow you to install and use Chaos Mesh for Chaos Engineering experiments to help improve the reliability and stability of your system. Please ensure that you use Chaos Engineering tools carefully in a production environment to avoid unnecessary impact on your system.