GitHub – Polly-Contrib/Simmy: Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET
Simmy
Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET. It is releasing April 2019 and works with Polly v7.0.0 onwards.
Simmy allows you to introduce a chaos-injection policy or policies at any location where you execute code through Polly.
Motivation
There are a lot of questions when it comes to chaos-engineering and making sure that a system is actually ready to face the worst possible scenarios:
- Is my system resilient enough?
- Am I handling the right exceptions/scenarios?
- How will my system behave if X happens?
- How can I test without waiting for a handled (or even unhandled) exception to happen in my production environment?
Using Polly helps me introduce resilience to my project, but I don’t want to have to wait for expected or unexpected failures to test it out. My resilience could be wrongly implemented; testing the scenarios is not straight forward; and mocking failure of some dependencies (for example a cloud SaaS or PaaS service) is not always straightforward.
What do I need, to simulate chaotic scenarios in my production environment?
- A way to mock failures of dependencies (any service dependency for example).
- Define when to fail based on some external factors – maybe global configuration or some rule.
- A way to revert easily, to control the blast radius.
- Production grade, to run this in a production or near-production system with automation.
Chaos policies
Simmy offers the following chaos-injection policies:
Policy
What does the policy do?
Exception
Injects exceptions in your system.
Result
Substitute results to fake faults in your system.
Latency
Injects latency into executions before the calls are made.
Behavior
Allows you to inject any extra behaviour, before a call is placed.
Usage
Step 1: Set up the Monkey Policy
Inject exception
var
chaosPolicy
=
MonkeyPolicy
.InjectException
(Action
<
InjectOutcomeOptions
<
Exception
>>
);
For example:
//
Following example causes the policy to throw SocketException with a probability of 5% if enabledvar
fault
=
new
SocketException
(errorCode
:10013
);var
chaosPolicy
=
MonkeyPolicy
.InjectException
(with
=>
with
.Fault
(fault
) .InjectionRate
(0
.
05
) .Enabled
() );
Inject result
var
chaosPolicy
=
MonkeyPolicy
.InjectResult
(Action
<
InjectOutcomeOptions
<
TResult
>>
);
For example:
//
Following example causes the policy to return a bad request HttpResponseMessage with a probability of 5% if enabledvar
result
=
new
HttpResponseMessage
(HttpStatusCode
.BadRequest
);var
chaosPolicy
=
MonkeyPolicy
.InjectResult
<HttpResponseMessage
>(with
=>
with
.Result
(result
) .InjectionRate
(0
.
05
) .Enabled
() );
Inject latency
var
chaosPolicy
=
MonkeyPolicy
.InjectLatency
(Action
<
InjectLatencyOptions
>
);
For example:
//
Following example causes policy to introduce an added latency of 5 seconds to a randomly-selected 10% of the calls.var
isEnabled
=
true
;var
chaosPolicy
=
MonkeyPolicy
.InjectLatency
(with
=>
with
.Latency
(TimeSpan
.FromSeconds
(5
)) .InjectionRate
(0
.
1
) .Enabled
(isEnabled
) );
Inject behavior
var
chaosPolicy
=
MonkeyPolicy
.InjectBehaviour
(Action
<
InjectBehaviourOptions
>
);
For example:
//
Following example causes policy to execute a method to restart a virtual machine; the probability that method will be executed is 1% if enabledvar
chaosPolicy
=
MonkeyPolicy
.InjectBehaviour
(with
=>
with
.Behaviour
(()=>
restartRedisVM
()) .InjectionRate
(0
.
01
) .EnabledWhen
((ctx
,ct
)=>
isEnabled
(ctx
,ct
)) );
Parameters
All the parameters are expressed in a Fluent-builder syntax way.
Enabled
Determines whether the policy is enabled or not.
- Configure that the monkey policy is enabled.
PolicyOptions
.Enabled
();
- Receives a boolean value indicating whether the monkey policy is enabled.
PolicyOptions
.Enabled
(bool
);
- Receives a delegate which can be executed to determine whether the monkey policy should be enabled.
PolicyOptions
.EnabledWhen
(Func
<
Context
,CancellationToken
,bool
>
);
InjectionRate
A decimal between 0 and 1 inclusive. The policy will inject the fault, randomly, that proportion of the time, eg: if 0.2, twenty percent of calls will be randomly affected; if 0.01, one percent of calls; if 1, all calls.
- Receives a double value between [0, 1] indicating the rate at which this monkey policy should inject chaos.
PolicyOptions
.InjectionRate
(Double
);
- Receives a delegate which can be executed to determine the rate at which this monkey policy should inject chaos.
PolicyOptions
.InjectionRate
(Func
<
Context
,CancellationToken
,Double
>
);
Fault
The fault to inject. The Fault
api has overloads to build the policy in a generic way: PolicyOptions.Fault<TResult>(...)
- Receives an exception to configure the fault to inject with the monkey policy.
PolicyOptions
.Fault
(Exception
);
- Receives a delegate representing the fault to inject with the monkey policy.
PolicyOptions
.Fault
(Func
<
Context
,CancellationToken
,Exception
>
);
Result
The result to inject.
- Receives a generic TResult value to configure the result to inject with the monkey policy.
PolicyOptions
.Result
<TResult
>(TResult
);
- Receives a delegate representing the result to inject with the monkey policy.
PolicyOptions
.Result
<TResult
>(Func
<
Context
,CancellationToken
,TResult
>
);
Latency
The latency to inject.
- Receives a TimeSpan value to configure the latency to inject with the monkey policy.
PolicyOptions
.Latency
(TimeSpan
);
- Receives a delegate representing the latency to inject with the monkey policy.
PolicyOptions
.Latency
(Func
<
Context
,CancellationToken
,TimeSpan
>
);
Behaviour
The behaviour to inject.
- Receives an Action to configure the behaviour to inject with the monkey policy.
PolicyOptions
.Behaviour
(Action
);
- Receives a delegate representing the Action to inject with the monkey policy.
PolicyOptions
.Behaviour
(Action
<
Context
,CancellationToken
>
);
Context-driven behaviour
All parameters are available in a Func<Context, ...>
form. This allows you to control the chaos injected:
- in a dynamic manner: by eg driving the chaos from external configuration files
- in a targeted manner: by tagging your policy executions with a
Context.OperationKey
and introducing chaos targeting particular tagged operations
The example app demonstrates both these approaches in practice.
Step 2: Execute code through the Monkey Policy
//
Executes through the chaos policy directlychaosPolicy
.Execute
(()=>
someMethod
());//
Executes through the chaos policy using ContextchaosPolicy
.Execute
((ctx
)=>
someMethod
(),context
);//
Wrap the chaos policy inside other Polly resilience policies, using PolicyWrapvar
policyWrap
=
Policy
.Wrap
(fallbackPolicy
,timeoutPolicy
,chaosLatencyPolicy
);policyWrap
.Execute
(()=>
someMethod
())//
All policies are also available in async forms.var
chaosLatencyPolicy
=
MonkeyPolicy
.InjectLatencyAsync
(with
=>
with
.Latency
(TimeSpan
.FromSeconds
(5
)) .InjectionRate
(0
.
1
) .Enabled
() );var
policyWrap
=
Policy
.WrapAsync
(fallbackPolicy
,timeoutPolicy
,chaosLatencyPolicy
);var
result
=
await
policyWrap
.ExecuteAsync
(token
=>
service
.GetFoo
(parametersBar
,token
),myCancellationToken
);//
For general information on Polly policy syntax see: https://github.com/App-vNext/Polly
It is usual to place the Simmy policy innermost in a PolicyWrap. By placing the chaos policies innermost, they subvert the usual outbound call at the last minute, substituting their fault or adding extra latency. The existing Polly policies – further out in the PolicyWrap – still apply, so you can test how the Polly resilience you have configured handles the chaos/faults injected by Simmy.
Note: The above examples demonstrate how to execute through a Simmy policy directly, and how to include a Simmy policy in an individual PolicyWrap. If your policies are configured by .NET Core DI at StartUp, for example via HttpClientFactory, there are also patterns which can configure Simmy into your app as a whole, at StartUp. See the Simmy Sample App discussed below.
Example app: Controlling chaos via configuration and Polly.Context
This Simmy sample app shows different approaches/patterns for how you can configure Simmy to introduce chaos policies in a project. Patterns demonstrated are:
- Configuring
StartUp
so that Simmy chaos policies are only introduced in builds for certain environments (for instance, Dev but not Prod). - Configuring Simmy chaos policies to be injected into the app without changing any existing Polly configuration code.
- Injecting faults or chaos by modifying external configuration.
The patterns shown in the sample app are intended as starting points but are not mandatory. Simmy is very flexible, and we would love to hear how you use it!
Wrapping up
All chaos policies (Monkey policies) are designed to inject behavior randomly (faults, latency or custom behavior), so a Monkey policy allows you to specify an injection rate between 0 and 1 (0-100%) thus, the higher is the injection rate the higher is the probability to inject them. Also it allows you to specify whether or not the random injection is enabled, that way you can release/hold (turn on/off) the monkeys regardless of injection rate you specify, it means, if you specify an injection rate of 100% but you tell to the policy that the random injection is disabled, it will do nothing.
Further information
See Issues for latest discussions on taking Simmy forward!
Credits
Simmy was the brainchild of @mebjas and @reisenberger. The major part of the implementation was by @vany0114 and @mebjas, with contributions also from @reisenberger of the Polly team.
Blogs and architecture samples around Simmy
Blog posts
Samples
-
Dylan Reisenberger presents an intentionally simple example .NET Core WebAPI app demonstrating how we can set up Simmy chaos policies for certain environments and without changing any existing configuration code injecting faults or chaos by modifying external configuration.
-
Geovanny Alzate Sandoval made a microservices based sample application to demonstrate how chaos engineering works with Simmy using chaos policies in a distributed system and how we can inject even a custom behavior given our needs or infrastructure, this time injecting custom behavior to generate chaos in our Service Fabric Cluster.
-
Bjørn Einar Bjartnes made a red-green load-testing resilience workshop to understand how errors and resiliency mechanisms affect a system under load. It has been used to run workshops at for example NDC Oslo and there is a video from the workshop at DotNext.