Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft chaos engineering definition/whitepaper #3

Open
caniszczyk opened this issue Apr 26, 2018 · 28 comments
Open

Draft chaos engineering definition/whitepaper #3

caniszczyk opened this issue Apr 26, 2018 · 28 comments

Comments

@caniszczyk
Copy link
Contributor

No description provided.

@seeker89
Copy link
Contributor

Keen to help with that !

@Lawouach
Copy link
Member

Happy to support the effort too.

@3rdman
Copy link
Contributor

3rdman commented May 18, 2018

Me to :)

@mattforni
Copy link

Ping

@caniszczyk
Copy link
Contributor Author

the best bet is currently to contribute to the proposal here which is sketching out a bit of an outline of what can become a whitepaper/landscape:

https://docs.google.com/document/d/1BeeJZIyReCFNLJQrZjwA4KMlUJelxFFEv3IwED16lHE/edit?ts=5ace0eab#heading=h.k8f5ndt8affu

Here are my ideas for a draft outline, would love feedback since I'm new to this space still:

  • What is chaos engineering?
  • A history of chaos engineering
  • Chaos Engineering Use Cases
  • Planning Experiments
  • Chaos Engineering in Cloud Native Systems
  • Chaos Culture: Planning Chaos/GameDays
  • Conclusion

@ramin
Copy link
Contributor

ramin commented May 21, 2018

ping

@Lawouach
Copy link
Member

@caniszczyk That document is likely getting hard to navigate, and make sense of. I'm happy to move it to this repo so we can start using GH issues instead.

While GH is not a document-collaboration tool, I guess, should we clearly mark each section in the proposal, we could simply refer to each section from GH issues for discussions.

@seeker89
Copy link
Contributor

seeker89 commented May 21, 2018 via email

@Lawouach
Copy link
Member

Regarding the outline @caniszczyk, it's a good starting point. I might add a section regardng chaos engineering in relation to other disciplines/practices: security, CI/CD... basically, where does CE fit in the toolchain? But, maybe, this is covered by the "CE in Cloud Native Systems"?

@3rdman
Copy link
Contributor

3rdman commented May 21, 2018

I agreee with @Lawouach and @seeker89, the Google doc got crowded fast :)

We could just do a bit of Markdown on individual sections and then generate something, e.g. a PDF, when needed.

@caniszczyk
Copy link
Contributor Author

on the suggestion from everyone, I converted what we had in the gdoc to here:

https://github.com/chaoseng/wg-chaoseng/blob/master/WHITEPAPER.md

It needs a lot of work but now we can start iterating via pull requests.

cc: @chaoseng/maintainers

@joaoasrosa
Copy link
Contributor

@caniszczyk +1

@Lawouach
Copy link
Member

Hey all,

Here is a strawman of structure for the whitepaper. Hopefully will help the discussion :)


Chaos Engineering Whitepaper v0.1

What is Chaos Engineering?

Short History

Principles

Objective: Harness and Improve System Resilience

Benefits for Cloud Native Systems

Relation to Existing Software and Operational Practices

Use Cases

Practicing Chaos Engineering

Chaos Engineering Flow

Define a Baseline

State the Hypothesis to Confirm/Infirm

Determine a Perturbation to Perform

Chaos Engineering Perturbations

Degrade Network Conditions

Vary Computing Resources

Stress to the Limits

Simulate Data Loss

Change ACLs Permissions

Provoke a Security Breach

Chaos Engineering Automation

Continous Chaos Engineering

Chaos Engineering Reporting

Report Findings

@veggiemonk
Copy link
Contributor

veggiemonk commented Jun 25, 2018

Hi @Lawouach

Thank you for taking to the time to organize things a bit.
Where does the landscape fit in this structure ?
Can it be put in another document?

@Lawouach
Copy link
Member

Hey @veggiemonk. Thanks, it looks like nothing when I look at it now but finding the right phrasing took me half a day the other day. Formalizing is hard :D

It depends on how we organize the whitepaper, either we list a bunch of examples for each section (so for instance on "Degrade Network Conditions", we could indicate Gremlin, Pumba, Muxy...) so that there is locality between the topic and potential vendors.

Or we continue with a long list of vendors at the bottom of the paper.

@veggiemonk
Copy link
Contributor

Hi @Lawouach, I totally understand that's hard work! 🙏

For now, the landscape doesn't need to be too formal because the list isn't that long actually. As a suggestion, let's keep it it at the end.

What do you think?

I don't know if the white paper is the right place for that but what about renaming the section "Chaos Engineering Flow" to "How to start Chaos Engineering".
As a first step, we could add "setup monitoring"
As a second step, we could "Warn users/developers about it" ?

It seems pretty basic but without that it can be hard/dangerous to do CE.
Maybe it is too simple for this paper.

What are your views on that?

@Lawouach
Copy link
Member

Interesting, I like the guidelines approach indeed.

There is certainly room for a section around the theory, as per the principles. But a "how to get started" one would be very welcome indeed!

@russmiles
Copy link

How to get started + Links to product landscape and getting started points there would be awesome

@veggiemonk
Copy link
Contributor

Ok let's see what kind of resources we can gather in there.

@ramin
Copy link
Contributor

ramin commented Jun 25, 2018

A section of case studies and papers around the field was something we discussed in the last meeting also. Maybe as a very final section on 'Further Reading' ?

@Lawouach thank you so much for getting this started!

What do people think about starting a branch with @Lawouach's structure as a README we can start opening PRs against with sections filled in, a merged PR is an approval and we can go deeper on specific content for each section, then link to each PR in this issue?

@Lawouach
Copy link
Member

I think I will refine taking comments that were made. Give me a moment :)

@Lawouach
Copy link
Member

Lawouach commented Jun 25, 2018


Chaos Engineering Whitepaper v0.1

What is Chaos Engineering?

Short History

Principles

Discuss the steady state, experiment, etc. Just to set the "theory"?

Why practicing Chaos Engineering?

Harness and Improve System Resilience

If Chaos Engineering isn't the goal per-se, what is? Resiliency? Reliability?

Benefits for Cloud Native Systems

Software and Operational Practices In Production

A clear indication that whereas testing, CI/CD are mostly upstream practices, Chaos Engineering is very much downstream and act against a live system. would that make sense?

Use Cases

The current use-cases are a good starting point but should we detail them? Similar to the depth we can find in the serverless whitepaper?

Practicing Chaos Engineering

Getting Started With Chaos Engineering

Is my system ready to endure Chaos Engineering?

Should we hint at what minimal level you need to be before getting started? I mean, what if your system is barely resilient as it is?

Do I need to get started in production?

While we may want this, starting in prod may not fit "getting started scenarios".

Communicate with the Organization

This is where we need to continue the discussion and figure out how far we want/can go with the patterns.

Should we talk gamedays for instance? Observability?

The following phases may or may not be useful. I think it would be valuable if we could describe what it means to deal with chaos in those various cases, but is it the right place?

Chaos Engineering Perturbations

Degrade Network Conditions

Vary Computing Resources

Stress to the Limits

Simulate Data Loss

Change ACLs Permissions

Provoke a Security Breach

Assume application fails to restart

Chaos Engineering Automation

Continous Chaos Engineering

Chaos Engineering Reporting

Report Findings

Landscape


@veggiemonk
Copy link
Contributor

That looks good! Thanks @Lawouach for the hard work!

I think a PR is in order for us to move forward.

@mattforni
Copy link

@chaoseng/maintainers (CC @caniszczyk) so just out of curiosity what is the plan on iterating on this document now? I had a few minutes this afternoon and wanted to add some of my thoughts here, but it's a bit difficult to know where to start.

I'm happy to just take some time, make some edits and submit a PR for consideration, but didn't want to ruffle any feathers or step on any toes. Would it be beneficial to assign topics to individuals to comment on? Just thinking out loud here.

@Lawouach
Copy link
Member

Hey @mattforni, I'd say it's totally fine to offer PRs to the document?

On my side, I used this issue as it felt more rapid to get started but I wonder if that would scale for a whole document indeed :D

@veggiemonk
Copy link
Contributor

PRs are the way to move forward! ⏩

@caniszczyk
Copy link
Contributor Author

caniszczyk commented Jun 28, 2018 via email

@Lawouach
Copy link
Member

Lawouach commented Jul 6, 2018

Started on my trail of thoughts #41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants