Giter VIP home page Giter VIP logo

namazu's Introduction

Namazu: Programmable Fuzzy Scheduler for Testing Distributed Systems

Release Join the chat at https://gitter.im/osrg/namazu GoDoc Build Status Coverage Status Go Report Card

Namazu (formerly named Earthquake) is a programmable fuzzy scheduler for testing real implementations of distributed system such as ZooKeeper.

doc/img/namazu.png

Namazu permutes Java function calls, Ethernet packets, Filesystem events, and injected faults in various orders so as to find implementation-level bugs of the distributed system. Namazu can also control non-determinism of the thread interleaving (by calling sched_setattr(2) with randomized parameters). So Namazu can be also used for testing standalone multi-threaded software.

Basically, Namazu permutes events in a random order, but you can write your own state exploration policy (in Golang) for finding deep bugs efficiently.

Namazu (鯰) means a catfish 🐟 in Japanese.

Blog: http://osrg.github.io/namazu/

Twitter: @NamazuFuzzTest

Looking for Namazu Swarm (Distributed Parallel CI)?

Namazu Swarm executes multiple CI jobs in parallel across a Docker cluster. Namazu Swarm is developed as a part of Namazu, but it does not depends on Namazu (although you can combine them together).

Namazu Swarm is hosted at osrg/namazu-swarm.

Found and Reproduced Bugs

🆕=Found, 🔁=Reproduced

Flaky integration tests

Issue Reproducibility
(traditional)
Reproducibility
(Namazu)
Note
🆕 ZOOKEEPER-2212
(race)
0% 21.8% In traditional testing, we could not reproduce the issue in 5,000 runs (60 hours). We newly found the issue and improved its reproducibility using Namazu Ethernet inspector. Note that the reproducibility improvement depends on its configuration(see also #137).
Blog article and repro code (Ryu SDN version and Netfilter version) are available.

Flaky xUnit tests (picked out, please see also #125)

Issue Reproducibility
(traditional)
Reproducibility
(Namazu)
Note
🔁 YARN-4548 11% 82% Used Namazu process inspector.
🔁 ZOOKEEPER-2080 14% 62% Used Namazu Ethernet inspector. Blog article and repro code are available.
🔁 YARN-4556 2% 44% Used Namazu process inspector.
🔁 YARN-5043 12% 30% Used Namazu process inspector.
🔁 ZOOKEEPER-2137 2% 16% Used Namazu process inspector.
🔁 YARN-4168 1% 8% Used Namazu process inspector.
🔁 YARN-1978 0% 4% Used Namazu process inspector.
🔁 etcd #5022 0% 3% Used Namazu process inspector.

We also improved reproducibility of some flaky etcd tests (to be documented).

Others

Issue Note
🆕 YARN-4301
(fault tolerance)
Used Namazu filesystem inspector and Namazu API. Repro code is available.
🆕 etcd command line client (etcdctl) #3517
(timing specification)
Used Namazu Ethernet inspector. Repro code is available.
The issue has been fixed in #3530 and it also resulted a hint of #3611.

Talks

Talks about Namazu Swarm

Getting Started

Installation

The installation process is very simple:

$ sudo apt-get install libzmq3-dev libnetfilter-queue-dev
$ go get github.com/osrg/namazu/nmz

Currently, Namazu is tested with Go 1.6.

You can also download the latest binary from here.

Container Mode

The following instruction shows how you can start Namazu Container, the simplified, Docker-like CLI for Namazu.

$ sudo nmz container run -it --rm -v /foo:/foo ubuntu bash

In Namazu Container, you can run arbitrary command that might be flaky. JUnit tests are interesting to try.

nmzc$ git clone something
nmzc$ cd something
nmzc$ for f in $(seq 1 1000);do mvn test; done

You can also specify a config file (--nmz-autopilot option for nmz container.) A typical configuration file (config.toml) is as follows:

# Policy for observing events and yielding actions
# You can also implement your own policy.
# Default: "random"
explorePolicy = "random"

[explorePolicyParam]
  # for Ethernet/Filesystem/Java inspectors, event are non-deterministically delayed.
  # minInterval and maxInterval are bounds for the non-deterministic delays
  # Default: 0 and 0
  minInterval = "80ms"
  maxInterval = "3000ms"

  # for Ethernet/Filesystem inspectors, you can specify fault-injection probability (0.0-1.0).
  # Default: 0.0
  faultActionProbability = 0.0

  # for Process inspector, you can specify how to schedule processes
  # "mild": execute processes with randomly prioritized SCHED_NORMAL/SCHED_BATCH scheduler.
  # "extreme": pick up some processes and execute them with SCHED_RR scheduler. others are executed with SCHED_BATCH scheduler.
  # "dirichlet": execute processes with SCHED_DEADLINE scheduler. Dirichlet-distribution is used for deciding runtime values.
  # Default: "mild"
  procPolicy = "extreme"

[container]
  # Default: false
  enableEthernetInspector = true
  ethernetNFQNumber = 42
  # Default: true
  enableProcInspector = true
  procWatchInterval = "1s"
  # Default: true (for volumes (`-v /foo:/bar`))
  enableFSInspector = true

For other parameters, please refer to config.go and randompolicy.go.

Non-container Mode

Process inspector

$ sudo nmz inspectors proc -pid $TARGET_PID -watch-interval 1s

By default, all the processes and the threads under $TARGET_PID are randomly scheduled.

You can also specify a config file by running with -autopilot config.toml.

You can also set -orchestrator-url (e.g. http://127.0.0.1:10080/api/v3) and -entity-id for distributed execution.

Note that the process inspector may be not effective for reproducing short-running flaky tests, but it's still effective for long-running tests: issue #125.

The guide for reproducing flaky Hadoop tests (please use nmz instead of microearthquake): FOSDEM slide 42.

Filesystem inspector (FUSE)

$ mkdir /tmp/{nmzfs-orig,nmzfs}
$ sudo nmz inspectors fs -original-dir /tmp/nmzfs-orig -mount-point /tmp/nmzfs -autopilot config.toml
$ $TARGET_PROGRAM_WHICH_ACCESSES_TMP_NMZFS
$ sudo fusermount -u /tmp/nmzfs

By default, all the read, mkdir, and rmdir accesses to the files under /tmp/nmzfs are randomly scheduled. /tmp/nmzfs-orig is just used as the backing storage. (Note that you have to set explorePolicyParam.minInterval and explorePolicyParam.maxInterval in the config file.)

You can also inject faullts (currently just injects -EIO) by setting explorePolicyParam.faultActionProbability in the config file.

Ethernet inspector (Linux netfilter_queue)

$ iptables -A OUTPUT -p tcp -m owner --uid-owner $(id -u johndoe) -j NFQUEUE --queue-num 42
$ sudo nmz inspectors ethernet -nfq-number 42
$ sudo -u johndoe $TARGET_PROGRAM
$ iptables -D OUTPUT -p tcp -m owner --uid-owner $(id -u johndoe) -j NFQUEUE --queue-num 42

By default, all the packets for johndoe are randomly scheduled (with some optimization for TCP retransmission).

You can also inject faults (currently just drop packets) by setting explorePolicyParam.faultActionProbability in the config file.

Ethernet inspector (Openflow 1.3)

You have to install ryu and hookswitch for this feature.

$ sudo pip install ryu hookswitch
$ sudo hookswitch-of13 ipc:///tmp/hookswitch-socket --tcp-ports=4242,4243,4244
$ sudo nmz inspectors ethernet -hookswitch ipc:///tmp/hookswitch-socket

Please also refer to doc/how-to-setup-env-full.md for this feature.

Java inspector (AspectJ, byteman)

To be documented

How to Contribute

We welcome your contribution to Namazu. Please feel free to send your pull requests on github!

$ cd $GOPATH/src/github.com/osrg
$ git clone https://github.com/YOUR_GITHUB_ACCOUNT/namazu.git
$ cd namazu
$ git checkout -b your-branch
$ ./build
$ your-editor foo.go
$ ./clean && ./build && go test -race ./nmz/...
$ git commit -a -s

Copyright

Copyright (C) 2015 Nippon Telegraph and Telephone Corporation.

Released under Apache License 2.0.


Advanced Guide

Distributed execution

Basically please follow these examples: example/zk-found-2212.ryu, example/zk-found-2212.nfqhook

Step 1

Prepare config.toml for distributed execution. Example:

# executed in `nmz init`
init = "init.sh"

# executed in `nmz run`
run = "run.sh"

# executed in `nmz run` as the test oracle
validate = "validate.sh"

# executed in `nmz run` as the clean-up script
clean = "clean.sh"

# REST port for the communication.
# You can also set pbPort for ProtocolBuffers (Java inspector)
restPort = 10080

# of course you can also set explorePolicy here as well

Step 2

Create materials directory, and put *.sh into it.

Step 3

Run nmz init --force config.toml materials /tmp/x.

This command executes init.sh for initializing the workspace /tmp/x. init.sh can access the materials directory as ${NMZ_MATERIALS_DIR}.

Step 4

Run for f in $(seq 1 100);do nmz run /tmp/x; done.

This command starts the orchestrator, and executes run.sh, validate.sh, and clean.sh for testing the system (100 times).

run.sh should invoke multiple Namazu inspectors: nmz inspectors <proc|fs|ethernet> -entity-id _some_unique_string -orchestrator-url http://127.0.0.1:10080/api/v3

*.sh can access the /tmp/x/{00000000, 00000001, 00000002, ..., 00000063} directory as ${NMZ_WORKING_DIR}, which is intended for putting test results and some relevant information. (Note: 0x63==99)

validate.sh should exit with zero for successful executions, and with non-zero status for failed executions.

clean.sh is an optional clean-up script for each of the execution.

Step 5

Run nmz summary /tmp/x for summarizing the result.

If you have JaCoCo coverage data, you can run java -jar bin/nmz-analyzer.jar --classes-path /somewhere/classes /tmp/x for counting execution patterns as in FOSDEM slide 18.

doc/img/exec-pattern.png

API for your own exploration policy

// implements nmz/explorepolicy/ExplorePolicy interface
type MyPolicy struct {
	actionCh chan Action
}

func (p *MyPolicy) ActionChan() chan Action {
	return p.actionCh
}

func (p *MyPolicy) QueueEvent(event Event) {
	// Possible events:
	//  - JavaFunctionEvent (byteman)
	//  - PacketEvent (Netfilter, Openflow)
	//  - FilesystemEvent (FUSE)
	//  - ProcSetEvent (Linux procfs)
	//  - LogEvent (syslog)
	fmt.Printf("Event: %s\n", event)
	// You can also inject fault actions
	//  - PacketFaultAction
	//  - FilesystemFaultAction
	//  - ProcSetSchedAction
	//  - ShellAction
	action, err := event.DefaultAction()
	if err != nil {
		panic(err)
	}
	// send in a goroutine so as to make the function non-blocking.
	// (Note that nmz/util/queue/TimeBoundedQueue provides
	// better semantics and determinism, this is just an example.)
	go func() {
		fmt.Printf("Action ready: %s\n", action)
		p.actionCh <- action
		fmt.Printf("Action passed: %s\n", action)
	}()
}

func NewMyPolicy() ExplorePolicy {
	return &MyPolicy{actionCh: make(chan Action)}
}

func main(){
	RegisterPolicy("mypolicy", NewMyPolicy)
	os.Exit(CLIMain(os.Args))
}

Please refer to example/template for further information.

Semi-deterministic replay

If an event structure has replay_hint hash string (that does not contain time-dependent/random things), you can semi-deterministically replay a scenario using time.Duration(hash(seed,replay_hint) % maxInterval). No record is required for replaying.

We have a PoC for ZOOKEEPER-2212. Please refer to #137.

We also implemented a similar thing for Go: go-replay.

Known Limitation

After running Namazu (process inspector with exploreParam.procPolicyParam="dirichlet") many times, sched_setattr(2) can fail with EBUSY. This seems to be a bug of kernel; We're looking into this.

FAQs

Q. The example test always fails (or always succeeds). What does it mean?

A. Probably it is due to a misconfiguration. Please check the logs.

e.g. example/zk-found-2212.nfqhook:

$ nmz init --force config.toml materials /tmp/zk-2212
$ nmz run /tmp/zk-2212
Validation failed: ...
$ ls -l /tmp/zk-2212/00000000/
total 296
drwxr-xr-x 2 root      root   4096 Sep  5 05:30 actions/
-rw-r--r-- 1 root      root   1098 Sep  5 05:30 check-fle-states.log
-rw-r--r-- 1 root      root      2 Sep  5 05:30 check-fle-states.result
srwxr-xr-x 1 root      root      0 Sep  5 05:29 ether_inspector=
-rw-r--r-- 1 root      root  33369 Sep  5 05:30 history
-rw-r--r-- 1 root      root  97856 Sep  5 05:30 inspector.log
-rw-r--r-- 1 root      root      6 Sep  5 05:29 inspector.pid
-rw-r--r-- 1 root      root 126836 Sep  5 05:30 nfqhook.log
-rw-r--r-- 1 root      root      6 Sep  5 05:29 nfqhook.pid
-rw-r--r-- 1 root      root   1302 Sep  5 05:30 nmz.log
-rw-r--r-- 1 root      root     71 Sep  5 05:30 result.json
drwxr-xr-x 3 nfqhooked root   4096 Sep  5 05:30 zk1/
drwxr-xr-x 3 nfqhooked root   4096 Sep  5 05:30 zk2/
drwxr-xr-x 3 nfqhooked root   4096 Sep  5 05:30 zk3/

If an error is recorded in inspector.log or nfqhook.log, probably the ZooKeeper packet inspector (written in python, misc/pynmz) is not working due to some dependency issue. Please install required packages accordingly.

You may also need to adjust some parameter in /tmp/zk-2212/config.toml, such as explorePolicyParam.maxInterval and explorePolicyParam.minInterval for higher reproducibility.


If you have any questions, please do not hesitate to contact us via GitHub issues or via Gitter.

namazu's People

Contributors

akihirosuda avatar fuku-ys avatar gitter-badger avatar mitake avatar nolouch avatar philips avatar soramichi avatar v01dstar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

namazu's Issues

Go 1.6: panic: runtime error: cgo argument has Go pointer to Go pointer

This should be golang/go#14210.

$ sudo EQ_DEBUG=1 ./bin/earthquake inspectors ethernet -autopilot ~/WORK/ether.toml -nfq-number 42
..
panic: runtime error: cgo argument has Go pointer to Go pointer [recovered]
        panic: runtime error: cgo argument has Go pointer to Go pointer

goroutine 1 [running]:
panic(0xbf6fc0, 0xc8211b9d00)
        /home/suda/go/src/runtime/panic.go:464 +0x3e6
github.com/osrg/earthquake/earthquake/util/core.Recoverer()
        /home/suda/gopath/src/github.com/osrg/earthquake/earthquake/util/core/coreutil.go:50 +0x106
panic(0xbf6fc0, 0xc8211b9d00)
        /home/suda/go/src/runtime/panic.go:426 +0x4e9
github.com/AkihiroSuda/go-netfilter-queue._cgoCheckPointer0(0xa8c7e0, 0xc8211abe58, 0xc8211b9cf0, 0x1, 0x1, 0xa4d380)
        github.com/AkihiroSuda/go-netfilter-queue/_obj/_cgo_gotypes.go:59 +0x4d
github.com/AkihiroSuda/go-netfilter-queue.NewNFQueue(0x100013a002a, 0xffff, 0xdd0b60, 0x0, 0x0)
        /home/suda/gopath/src/github.com/AkihiroSuda/go-netfilter-queue/netfilter.go:101 +0x47a
github.com/osrg/earthquake/earthquake/inspector/ethernet.(*NFQInspector).Serve(0xc8200e1600, 0x0, 0x0)
        /home/suda/gopath/src/github.com/osrg/earthquake/earthquake/inspector/ethernet/ethernet_nfq.go:53 +0x2c4
github.com/osrg/earthquake/earthquake/cli/inspectors.runEtherInspector(0xc82000a180, 0x4, 0x4, 0x1)
        /home/suda/gopath/src/github.com/osrg/earthquake/earthquake/cli/inspectors/ethernet.go:125 +0x903
github.com/osrg/earthquake/earthquake/cli/inspectors.etherCmd.Run(0xc82000a180, 0x4, 0x4, 0xc8211d1b30)
        /home/suda/gopath/src/github.com/osrg/earthquake/earthquake/cli/inspectors/ethernet.go:62 +0x35
github.com/osrg/earthquake/earthquake/cli/inspectors.(*etherCmd).Run(0x18f20e8, 0xc82000a180, 0x4, 0x4, 0xeaaec0)
        <autogenerated>:2 +0xab
github.com/mitchellh/cli.(*CLI).Run(0xc8211b63c0, 0xc8211af2c0, 0x0, 0x0)
        /home/suda/gopath/src/github.com/mitchellh/cli/cli.go:153 +0x56e
github.com/osrg/earthquake/earthquake/cli.inspectorsCmd.Run(0xc82000a170, 0x5, 0x5, 0xc8211d1d28)
        /home/suda/gopath/src/github.com/osrg/earthquake/earthquake/cli/inspectors.go:51 +0x1f2
github.com/osrg/earthquake/earthquake/cli.(*inspectorsCmd).Run(0x18f20e8, 0xc82000a170, 0x5, 0x5, 0xeaaea8)
        <autogenerated>:5 +0xab
github.com/mitchellh/cli.(*CLI).Run(0xc8211b6300, 0xc8211af170, 0x0, 0x0)
        /home/suda/gopath/src/github.com/mitchellh/cli/cli.go:153 +0x56e
github.com/osrg/earthquake/earthquake/cli.CLIMain(0xc82000a150, 0x7, 0x7, 0x0)
        /home/suda/gopath/src/github.com/osrg/earthquake/earthquake/cli/main.go:36 +0x2d6
main.main()
        /home/suda/gopath/src/github.com/osrg/earthquake/earthquake/main.go:25 +0x3b

History Storage

Purposes

  • Greedy state exploration
  • Statistics of explored state patterns

Candidates:

  • JSON DBs
    • MongoDB
    • CouchDB
  • Graph DBs
    • Cayley
    • HypergraphDB
    • Neo4j
    • Tinkerpop

Flaky test? (endpoint)

Build 193 (successful)
https://travis-ci.org/osrg/earthquake/builds/124123660

ok      github.com/osrg/earthquake/earthquake/cli   2.055s  coverage: 0.3% of statements
ok      github.com/osrg/earthquake/earthquake/cli/inspectors    2.028s  coverage: 14.3% of statements
ok      github.com/osrg/earthquake/earthquake/cli/tools 1.014s  coverage: 2.5% of statements
ok      github.com/osrg/earthquake/earthquake/endpoint  11.878s coverage: 72.1% of statements
ok      github.com/osrg/earthquake/earthquake/endpoint/local    2.870s  coverage: 85.4% of statements
ok      github.com/osrg/earthquake/earthquake/endpoint/pb   1.076s  coverage: 76.8% of statements
....

Build 195 (fail)
https://travis-ci.org/osrg/earthquake/builds/124127161

ok      github.com/osrg/earthquake/earthquake/cli   2.184s  coverage: 0.3% of statements
ok      github.com/osrg/earthquake/earthquake/cli/inspectors    2.241s  coverage: 14.3% of statements
ok      github.com/osrg/earthquake/earthquake/cli/tools 1.016s  coverage: 2.5% of statements
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
The build has been terminated

scientific paper

hello

I'm trouble finding if you published any paper related to namazu? Please provide a link and bibtex if you can

Include packet []byte in PacketEvent

Now earthquake has its own Ethernet inspector and does not depend on pyearthquake for Ethernet inspection.
So including packet []byte in PacketEvent makes a sense for those who want to analyze []byte in his/her own ExplorePolicy.
Perhaps @mitake is interested in this.

init command error occurs when specifying different device to materials dir and storage dir

-----------------
root@ubuntu-01:~/earthquake# bin/earthquake init example/zk.byteman.add_node/config_dumb.json 
example/zk.byteman.add_node/materials /eq_test
failed to link (src:
example/zk.byteman.add_node/materials/quorumStart.sh(example/zk.byteman.add_node/materials/quorumStart.sh), dst: 
/eq_test/materials/quorumStart.sh): link example/zk.byteman.add_node/materials/quorumStart.sh 
/eq_test/materials/quorumStart.sh: invalid cross-device link
link example/zk.byteman.add_node/materials/quorumStart.sh /eq_test/materials/quorumStart.sh: invalid cross-device link
-----------------

Not only link, and add copy options.

Flaky test: nmz/endpoint

4919228
https://travis-ci.org/osrg/namazu/builds/134960743

ok      github.com/osrg/namazu/nmz/cli  2.768s  coverage: 0.3% of statements
ok      github.com/osrg/namazu/nmz/cli/container/run    2.432s  coverage: 0.0% of statements
ok      github.com/osrg/namazu/nmz/cli/inspectors   2.568s  coverage: 12.6% of statements
ok      github.com/osrg/namazu/nmz/cli/tools    1.024s  coverage: 2.5% of statements
ok      github.com/osrg/namazu/nmz/container    2.534s  coverage: 0.0% of statements
ok      github.com/osrg/namazu/nmz/container/ns 1.025s  coverage: 0.0% of statements
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
The build has been terminated

Typical output:
https://travis-ci.org/osrg/namazu/builds/134956146

ok      github.com/osrg/namazu/nmz/cli  2.220s  coverage: 0.3% of statements
ok      github.com/osrg/namazu/nmz/cli/container/run    2.226s  coverage: 0.0% of statements
ok      github.com/osrg/namazu/nmz/cli/inspectors   2.230s  coverage: 12.6% of statements
ok      github.com/osrg/namazu/nmz/cli/tools    1.017s  coverage: 2.5% of statements
ok      github.com/osrg/namazu/nmz/container    2.223s  coverage: 0.0% of statements
ok      github.com/osrg/namazu/nmz/container/ns 1.015s  coverage: 0.0% of statements
ok      github.com/osrg/namazu/nmz/endpoint 2.404s  coverage: 72.1% of statements    <--- this
...

namazu has error when using O_DIRECT

Hello:
I wrote a program to verify the NAMAZU file system fault injection.
./nmz inspectors fs -mount-point /tmp/nmzfs-mnt -original-dir /tmp/nmzfs-orig -autopilot config.toml

The program are as follows:
test_fs.txt

I open a file that uses a direct IO option.
If the normal directory below /tmp/nmzfs-orig for write operation, you can write.But in the /tmp/nmzfs-mnt directory below operation, write will be wrong.
The error is:
write ./direct_io.data failed: Invalid argument
write ./direct_io.data failed: Invalid argument
write ./direct_io.data failed: Invalid argument

I checked the source code, found finally mounted by fusermount.So I added the direct_io option to fusermount in namazu/vendor/github.com/osrg/hookfs/hookfs/server.go.

pathFs := pathfs.NewPathNodeFs(hookfs, nil)
conn := nodefs.NewFileSystemConnector(pathFs.Root(), opts)
originalAbs, _ := filepath.Abs(hookfs.Original)
var the_opt = []string{"nonempty","direct_io"}
mOpts := &fuse.MountOptions{
AllowOther: true,
Name: hookfs.FsName,
FsName: originalAbs,
Options: the_opt,
}

But fusermount does not support the direct_io option.The error is:

[NMZ-INF] 16:34:40.02: fusermount the cmd is [/bin/fusermount /tmp/nmzfs-mnt -o nonempty,direct_io,allow_other,subtype=hookfs,fsname=/tmp/nmzfs-orig]
(at mount_linux.go:50)
[NMZ-WRN] 16:34:40.02: ignoring restPort: -1 (at endpoint.go:86)
[NMZ-WRN] 16:34:40.02: ignoring pbPort: -1 (at endpoint.go:97)
/bin/fusermount: mount failed: Invalid argument
[NMZ-CRT] 16:34:40.02: fusermount exited with code 256
(at fs.go:103)
[NMZ-CRT] 16:34:40.02: PANIC: fusermount exited with code 256
(at coreutil.go:49)
[NMZ-INF] 16:34:40.02: Hint: For debug info, please set "NMZ_DEBUG" to 1. (at coreutil.go:53)

Can you help me solve the problem? Looking forward to your early reply, thank you very much.

`

ethernet_test fails intermittently

https://travis-ci.org/osrg/earthquake

Seems related to socket shutdown.

https://github.com/osrg/earthquake/blob/8dcaf8be0e6fb206d0fd853078ee4abba9dafa25/earthquake/inspector/ethernet/ethernet_test.go

https://travis-ci.org/osrg/earthquake/builds/113319662

[EQ-DBG] 05:28:37.06: LOCAL EP handled action Signal{map[string]interface {}{"uuid":"92fd2562-6d82-4df7-89e2-62e56f6f8808", "entity":"_dummy_entity_id", "class":"EventAcceptanceAction", "option":map[string]interface {}{}, "type":"action", "event_uuid":"4831b94c-1233-4a02-b280-ce6806c8b6b3"}} (at log.go:210) 
[EQ-INF] 05:28:37.06: Shutting down.. (at log.go:266) 
PASS
coverage: 46.2% of statements
Assertion failed: pfd.revents & POLLIN (signaler.cpp:193)
SIGABRT: abort
PC=0x7f90e2404cc9 m=3
signal arrived during cgo execution
goroutine 25 [syscall, locked to thread]:
runtime.cgocall(0x9b5910, 0xc82003fd08, 0x0)
    /home/travis/.gimme/versions/go/src/runtime/cgocall.go:123 +0x97 fp=0xc82003fcc0 sp=0xc82003fc98
github.com/vaughan0/go-zmq._Cfunc_zmq_poll(0xc821976580, 0x3, 0xffffffffffffffff, 0xc800000000)
    ??:0 +0x64 fp=0xc82003fd08 sp=0xc82003fcc0
github.com/vaughan0/go-zmq.(*PollSet).Poll(0xc82003fee0, 0xffffffffffffffff, 0x0, 0x2, 0x0)
    /home/travis/gopath/src/github.com/vaughan0/go-zmq/poll.go:107 +0x22f fp=0xc82003fda0 sp=0xc82003fd08
github.com/vaughan0/go-zmq.(*Channels).processSockets(0xc82001e420)
    /home/travis/gopath/src/github.com/vaughan0/go-zmq/channels.go:114 +0x6ea fp=0xc82003ffa8 sp=0xc82003fda0
runtime.goexit()
    /home/travis/.gimme/versions/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc82003ffb0 sp=0xc82003ffa8
created by github.com/vaughan0/go-zmq.(*Socket).ChannelsBuffer
    /home/travis/gopath/src/github.com/vaughan0/go-zmq/channels.go:38 +0x3e2
goroutine 1 [running]:
    goroutine running on other thread; stack unavailable
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /home/travis/.gimme/versions/go/src/runtime/asm_amd64.s:2006 +0x1
goroutine 20 [semacquire]:
sync.runtime_Syncsemacquire(0xc820064e10)
    /home/travis/.gimme/versions/go/src/runtime/sema.go:241 +0x1a0
sync.(*Cond).Wait(0xc820064e00)
    /home/travis/.gimme/versions/go/src/sync/cond.go:63 +0x85
github.com/cihub/seelog.(*asyncLoopLogger).processItem(0xc8200c41b0, 0x0)
    /home/travis/gopath/src/github.com/cihub/seelog/behavior_asynclooplogger.go:50 +0x14d
github.com/cihub/seelog.(*asyncLoopLogger).processQueue(0xc8200c41b0)
    /home/travis/gopath/src/github.com/cihub/seelog/behavior_asynclooplogger.go:63 +0x48
created by github.com/cihub/seelog.NewAsyncLoopLogger
    /home/travis/gopath/src/github.com/cihub/seelog/behavior_asynclooplogger.go:40 +0xd9
goroutine 21 [semacquire]:
sync.runtime_Syncsemacquire(0xc820064f90)
    /home/travis/.gimme/versions/go/src/runtime/sema.go:241 +0x1a0
sync.(*Cond).Wait(0xc820064f80)
    /home/travis/.gimme/versions/go/src/sync/cond.go:63 +0x85
github.com/cihub/seelog.(*asyncLoopLogger).processItem(0xc8200c42d0, 0x0)
    /home/travis/gopath/src/github.com/cihub/seelog/behavior_asynclooplogger.go:50 +0x14d
github.com/cihub/seelog.(*asyncLoopLogger).processQueue(0xc8200c42d0)
    /home/travis/gopath/src/github.com/cihub/seelog/behavior_asynclooplogger.go:63 +0x48
created by github.com/cihub/seelog.NewAsyncLoopLogger
    /home/travis/gopath/src/github.com/cihub/seelog/behavior_asynclooplogger.go:40 +0xd9
goroutine 4 [select]:
github.com/osrg/earthquake/earthquake/endpoint/local.(*LocalEndpoint).eventRoutine(0x116a920)
    /home/travis/gopath/src/github.com/osrg/earthquake/earthquake/endpoint/local/localendpoint.go:44 +0x478
created by github.com/osrg/earthquake/earthquake/endpoint/local.(*LocalEndpoint).Start
    /home/travis/gopath/src/github.com/osrg/earthquake/earthquake/endpoint/local/localendpoint.go:100 +0x1da
goroutine 5 [select]:
github.com/osrg/earthquake/earthquake/endpoint/local.(*LocalEndpoint).actionRoutine(0x116a920)
    /home/travis/gopath/src/github.com/osrg/earthquake/earthquake/endpoint/local/localendpoint.go:71 +0x482
created by github.com/osrg/earthquake/earthquake/endpoint/local.(*LocalEndpoint).Start
    /home/travis/gopath/src/github.com/osrg/earthquake/earthquake/endpoint/local/localendpoint.go:101 +0x1fc
goroutine 6 [select]:
github.com/osrg/earthquake/earthquake/util/mockorchestrator.(*MockOrchestrator).routine(0xc821b09780)
    /home/travis/gopath/src/github.com/osrg/earthquake/earthquake/util/mockorchestrator/mockorchestrator.go:81 +0x225
created by github.com/osrg/earthquake/earthquake/util/mockorchestrator.(*MockOrchestrator).Start
    /home/travis/gopath/src/github.com/osrg/earthquake/earthquake/util/mockorchestrator/mockorchestrator.go:99 +0x43
goroutine 23 [chan receive]:
github.com/osrg/earthquake/earthquake/inspector/transceiver.(*LocalTransceiver).routine(0x1169b70)
    /home/travis/gopath/src/github.com/osrg/earthquake/earthquake/inspector/transceiver/localtransceiver.go:74 +0x6d
created by github.com/osrg/earthquake/earthquake/inspector/transceiver.(*LocalTransceiver).Start
    /home/travis/gopath/src/github.com/osrg/earthquake/earthquake/inspector/transceiver/localtransceiver.go:85 +0x43
goroutine 24 [select]:
github.com/vaughan0/go-zmq.(*Channels).processOutgoing(0xc82001e420)
    /home/travis/gopath/src/github.com/vaughan0/go-zmq/channels.go:72 +0x3b2
created by github.com/vaughan0/go-zmq.(*Socket).ChannelsBuffer
    /home/travis/gopath/src/github.com/vaughan0/go-zmq/channels.go:37 +0x3c0
rax    0x0
rbx    0x0
rcx    0xffffffffffffffff
rdx    0x6
rdi    0x525e
rsi    0x5260
rbp    0x1
rsp    0x7f90e0b4baa8
r8     0x7f90e0b4c700
r9     0xc1
r10    0x8
r11    0x202
r12    0x0
r13    0x0
r14    0x7f90e0b4bd08
r15    0x7f90e0b4bd04
rip    0x7f90e2404cc9
rflags 0x202
cs     0x33
fs     0x0
gs     0x0
FAIL    github.com/osrg/earthquake/earthquake/inspector/ethernet    1.239s

Enable fs inspector in container mode

Namazu supports proc inspector and ethernet inspector in container mode currently. If Namazu can support fs inspector in container mode either, it would be perfect.

Use three-valued logic in verification to handle experiment failure

Testee system can get weird due to invalid configuration of Earthquake itself.

For example, when interval parameter of random explorer is extremely big, a ZooKeeper instance can get weird.
(stat command returns This ZooKeeper instance is not currently serving requests)
Such a weird state should not be regarded as a bug of ZooKeeper.
We might need three-value logic verification to handle such a weird state.

Make EQ itself fault-tolerant

Orchestrator must tolerant to unexpected deaths/revivals of processes.

In current impl., an action being sent to dead process is lost.
Even if process has revived, an action is lost because revived process has a different TCP socket.
5ee85d0

Workarounds

  • Introduce ack for actions
  • Use ZooKeeper or Consul (too complicated?)

LOW HANGING FRUIT: make goreportcard happy

goreportcard says A+ for Namazu v0.2.0, but there are still things we should do:

Component Score
go_vet 100%
gocyclo 98%
gofmt 98%
golint 41%
ineffassign 98%
license 100%
misspell 98%

gocyclo: 98%

namazu/nmz/cli/init.go
  Line 108: warning: cyclomatic complexity 22 of function _init() is high (> 15) (gocyclo)
namazu/nmz/explorepolicy/random/randompolicy.go
  Line 156: warning: cyclomatic complexity 17 of function (*Random).LoadConfig() is high (> 15) (gocyclo)

gofmt 98%

namazu/nmz/cli/tools/visualize.go
  Line 1: warning: file is not gofmted (gofmt)
namazu/nmz/signal/action_sched_procset_test.go
  Line 1: warning: file is not gofmted (gofmt)

golint 41%

...

ineffassign 98%

namazu/nmz/inspector/transceiver/resttransceiver.go
  Line 92: warning: err assigned and not used (ineffassign)
  Line 69: warning: err assigned and not used (ineffassign)
  Line 47: warning: err assigned and not used (ineffassign)
namazu/nmz/container/ns/boot.go
  Line 42: warning: err assigned and not used (ineffassign)

misspell 98%

namazu/nmz/endpoint/pb/pbendpoint.go
  Line 30: warning: 53:found "diferrent" a misspelling of "different" (misspell)
namazu/nmz/endpoint/pb/pbendpoint_test.go
  Line 107: warning: 55:found "diferrent" a misspelling of "different" (misspell)

earthquake-container: cannot parse explorePolicyParam

Earthquake container cannot parse explorePolicyParam correctly, and hence default values are used.

In full-stack Earthquake, int64 values are converted to float64 ones by {{config.DumpToJsonFile}}.
https://github.com/osrg/earthquake/blob/89221e5405072010607526f7797509bc4d7901f1/earthquake/util/config/config.go#L58

However, in Earthquake container, as {{config.DumpToJsonFile}} is not used, the glitch happens.

I'm going to eliminate {{config.DumpToJsonFile}} and refactor configuration functions.

Circle CI is failing

https://circleci.com/gh/osrg/earthquake/124

Setting up fuse (2.9.2-4ubuntu4.15.04.1) ...

Creating fuse group...

Adding group `fuse' (GID 113) ...

Done.

Creating fuse device...

mknod: 'fuse-': Operation not permitted

makedev fuse c 10 229 root root 0660: failed

chown: cannot access '/dev/fuse': No such file or directory

dpkg: error processing package fuse (--configure):

 subprocess installed post-installation script returned error exit status 1

Errors were encountered while processing:

 fuse

E: Sub-process /usr/bin/dpkg returned an error code (1)
lxc-start: The container failed to start.
lxc-start: Additional information can be obtained by setting the --logfile and --logpriority options.

docker build -t osrg/earthquake . returned exit code 1

INFO[0273] The command [/bin/sh -c apt-get install -y default-jdk maven] returned a non-zero code: 100 Action failed: docker build -t osrg/earthquake .

runtime panic of process inspector

Below panic was caused during my testing etcd e2e test:

panic: Invalid parameter n: 0

goroutine 37 [running]:
panic(0xa304a0, 0xc420fdb6c0)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/osrg/namazu/vendor/github.com/leesper/go_rng.DirichletGenerator.FlatDirichlet(0xc42048c208, 0x0, 0xc421e05b60, 0x0, 0xc421e05b60)
        /home/mitake/gopath/src/github.com/osrg/namazu/vendor/github.com/leesper/go_rng/dirichlet.go:56 +0x1ad
github.com/osrg/namazu/nmz/explorepolicy/random.(*dirichlet).dirichletSchedDeadline(0xc42048c2a8, 0x16414c0, 0x0, 0x0, 0xf4240, 0x3ff0000000000000, 0x0)
        /home/mitake/gopath/src/github.com/osrg/namazu/nmz/explorepolicy/random/dirichlet.go:56 +0x8b
github.com/osrg/namazu/nmz/explorepolicy/random.(*dirichlet).Action(0xc42048c2a8, 0xc421e5c540, 0xc4220c2050, 0xc4200d6b60, 0xc4200d6b60, 0xc4220c2050)
        /home/mitake/gopath/src/github.com/osrg/namazu/nmz/explorepolicy/random/dirichlet.go:42 +0x9c
github.com/osrg/namazu/nmz/explorepolicy/random.(*Random).makeActionForEvent(0xc42018cb40, 0x10e42e0, 0xc421e5c540, 0xc42129bf78, 0x2, 0x0, 0x0)
        /home/mitake/gopath/src/github.com/osrg/namazu/nmz/explorepolicy/random/randompolicy.go:303 +0xa7
github.com/osrg/namazu/nmz/explorepolicy/random.(*Random).dequeueEventRoutine(0xc42018cb40)
        /home/mitake/gopath/src/github.com/osrg/namazu/nmz/explorepolicy/random/randompolicy.go:323 +0x10d
created by github.com/osrg/namazu/nmz/explorepolicy/random.New
        /home/mitake/gopath/src/github.com/osrg/namazu/nmz/explorepolicy/random/randompolicy.go:118 +0x1d6

I used a script like below for iterating the test:

#! /bin/bash

for i in `seq 1 1000`; do
    GOPATH=`pwd`/gopath nmz inspectors proc -watch-interval 1ms -stdout nmz/log-$i -autopilot dirichlet.toml -cmd "./e2e.test -bin-dir bin -test.run TestCtlV3Lock"
done

dirichlet.toml is this:

explorePolicy = "random"
[explorePolicyParam]
 procPolicy = "dirichlet"

Any ideas? @AkihiroSuda

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.