Giter VIP home page Giter VIP logo

nebula-chaos's Introduction

nebula-chaos

This repository has been deprecated, it's no longer maintained.

Chaos framework for the Storage Service

Plan Intro

There are some built-in plans in nebula-chaos. Each plan is a json in conf directory. The plan need to specify some instances (usually including nebula graph/meta/storage) and some actions. The actions is a collection of different type actions, which forms a dag. The dependency between actions need to be specified in depends field. Most of the action need to specify related nebula instance in inst_index field. You can add customize based on these rules.

A utils to draw a flow chart of the plan is included, use it like this: python3 src/tools/FlowChart.py conf/scale_up_and_down.json.

Start all services, write data, then create a check point, write some more data, restore from check point. In the end, we check the validity by checking whether data is the same as the one when we create check point.

Clean all wals of specified space, then start all services, write a circle, then check data integrity.

Start all services, disturb (random kill a storage service, clean the data path, restart) while write a circle, then check data integrity.

Start all services, disturb (random kill a storage service, truncate some bytes from last wal of specified space and part, restart) while write a circle, then check data integrity.

Use integer vid, start all services, disturb (random kill and restart a storage service) while write and read using integer vid.

Use string vid, start all services, disturb (random kill and restart a storage service) while write and read using string vid.

Start all services, kill all storage services and restart while writing.

Start 3 storage servies, add 4th storage service using balance data while write a circle, then check data integrity. Then stop 1st storage service, remove it using balance data while write a circle then check data integrity. Likewise, add 1st storage service back and remove the 4th storage service.

Start all services, disturb (random drop all packets of a storage service, recover later) while write a circle, then check data integrity. The network partition is based on iptables. Make sure the user has sudo authority and can execute iptables without password.

PS: all storage services in random_network_partition and random_traffic_control must be deployed on different ip. The reason is that we don't know the source port of storage service, we can only use ip to indicate the service.

Start all services, disturb (random delay all packets of a storage service, recover later) while write a circle, then check data integrity. The traffic is based on tcconfig, which is a tc command wrapper. Install it at first, since it will use tc and ip command, use the following scripts to make it has capabilities with not super user.

setcap cap_net_admin+ep /usr/sbin/tc
setcap cap_net_raw,cap_net_admin+ep /usr/sbin/ip

Start all services, disturb (cat /dev/zero until disk is full) while write a circle, the storage services which use the direcory should be crashed, then we clean the mock file and restart, check data integrity at last.

Use a ramdisk or tmpfs with limited size to test this plan, otherwise the whole disk will be occupied.

Start all services, disturb (simulate slow disk io) while write a circle, then check data integrity. We use SysytemTap to simulate slow disk io. The major and minor field is the MAJOR/MINOR device id of disk where storage serveice's data path mounted.

yum install systemtap

You may need install kernel-devel and kernel-debuginfo as well (the version must be same with kernel).

Start all services, balance leader, turn off auto_compactions, set wal_ttl to 60s, five concurrent threads write about 10G of data, view the leaders distribution of the current space, enable forced compression, turn on auto_compactions, wait a while, view the leaders distribution of the current space again, compare the results of checking the leaders distribution to see if the leaders have changed.

Start all services, balance leader, turn off auto_compactions, set wal_ttl to 60s, using storage perf to write data, stop writing data after the specified time, view the leaders distribution of the current space, enable forced compression, turn on auto_compactions, wait a while, view the leaders distribution of the current space again, compare the results of checking the leaders distribution to see if the leaders have changed. storage perf needs to be specified by the user, stable version Git: 1cd031fa.

Start all services, write some data with index, check if index is compatible with data.

Start all services, write some data, rebuild index, check if index is compatible with data.

Start all services, write some data, then write some data to overwrite the previous data and rebuild index at the same time, check if index is compatible with data.

nebula-chaos's People

Contributors

bright-starry-sky avatar critical27 avatar dangleptr avatar darionyaphet avatar kikimo avatar panda-sheep avatar sherman-the-tank avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nebula-chaos's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.