Giter VIP home page Giter VIP logo

sensord's People

Contributors

blalor avatar gorsuch avatar joekarl avatar warwickp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sensord's Issues

sci-fi: a goroutine per check?

Rather than have a central scheduler, it might be more interesting to have a single goroutine per check, attempting to fire once per second.

The main routine's job would then be to periodically poll for checks and subsequently kill and spawn new goroutines as checks come and go.

Scaling up sensord

Ok, so for kicks I threw 800 health check endpoints into a list and spun up a single sensord instance. Using the Canary Watch dashboard I saw that the number of points available for each of the checks was only 2-3, where it should be up around 180 (my retention value). I set the MEASURER_COUNT to 20 and found I had 10-13 points per URL. I raised MEASURER_COUNT again to 200, but didn't get another increase in retained points.

Thoughts on how I can scale this up?

Stream Fanout?

Right now, the streaming mechanism is simple (which I like) and works like a queue. Measurements are pushed onto a channel, every web client that connects in gets a copy of that channel and streams down some share of the measurements. If there are not any clients, measurements will block - which is just fine. See the README for a usage example.

Illustration of how things will look when the ingestd component is in play:

sensord -> ingestd -> redis -> canaryd

Multiple ingestd instances can be put in place to process the stream faster.

But what of the case in which I want redundant ingestd instances or other clients? For example, I may want to put aggregators in multiple regions to help bring some more tolerance to the system. If one region goes down, the other, operating independently, has a chance at survival.

A simple illustration:

canaryd <- redis <- ingestd <- sensord -> ingestd -> redis -> canaryd

In a typical message bus, this would be achieved by topic subscription / fanout. If sensord had a redis instance underneath, we could lean on its pubsub capabilities. However,it seems that we should be able to do this in process, keeping our footprint small.

How do we safely do that here, defending sensord against slow readers?

ensure the a given check cannot be run concurrently

Each check gets a scheduler goroutine that tries to take a measurement once per second. If there are enough slack measurement gorountines, and the check is slow, it is possible that multiple measurements can run for a single check at the same time.

I would like to prevent that from happening.

2.0 Meta Issue - Simplification and Integration

This issue contains the goals for sensord 2.0. I'm aiming to simplify everything, and focus on making it easier to use sensord with a more traditional open source monitoring environment (statsd, graphite, etc).

The sensors will continue to poll a central manifest, and that manifest will be extended in such a way that each check is allowed to specify a destination store for its results.

A general outline of the goals (subject to change):

  • define outputs via a manifest
    • canaryd (soon to be deprecated)
    • stdout (likely in logfmt)
    • statsd
    • librato
  • rename repo to canaryio/canary
  • sensord becomes a package under that (perhaps one of many)
  • experimental Heroku support

When this lands the plan will be to close down the canaryio/meta repo, and eventually deprecate the canaryio/canaryd tooling. This should leave us with a much more focused and arguably more valuable tool.

Provide configuration for check period

I feel like the period between checks should be configurable. I'm not certain, but I think currently each URL is hit once per second, which seems a bit excessive. Once per minute would be sufficient for my purposes.

Consider redesign?

I think we should be able to make this component much simpler than gorsuch/canary.sensor. I propose that we:

  • ditch redis
  • use channels only
  • CHECKS_URL - points to a place where it might pick up the checks it should be working on
  • MEASUREMENTS_URL - points to an endpoint where we can POST our results to

It'll be up to the source to worry about partitioning / sharding the data and not us. Our job is simply to measure and to stay visible.

Using non-buffered or small buffered channels will allow us to ditch all of the redis back off logic and throw out a dependency. Removing the HTTP server will make this thing much lighter and allow us to ditch a dependency, too.

go-curl still kinda sucks

Partly related to #57, and I'm partly just venting, here.

I keep running into this panic:

2014/06/26 21:13:15 fn=udpPusher endpoint=canaryio-canaryd.dev.docker:5000
panic: error calling Getinfo


goroutine 43 [running]:
runtime.panic(0x613060, 0xc2102dd760)
        /usr/lib64/golang/src/pkg/runtime/panic.c:266 +0xb6
github.com/andelf/go-curl.(*CURL).Getinfo(0xc2102e8720, 0x0, 0x0, 0x0, 0x0, ...)
        /tmp/tmp.MsXsQ1mOQ4/go/src/github.com/canaryio/sensord/Godeps/_workspace/src/github.com/andelf/go-curl/easy.go:356 +0x475
main.(*Check).Measure(0xc210308340, 0x7fffe9e57f26, 0x9, 0xc21000a840, 0x1, ...)
        /tmp/tmp.MsXsQ1mOQ4/go/src/github.com/canaryio/sensord/sensord.go:114 +0x3f6
main.measurer(0x7fffe9e57f26, 0x9, 0xc21000a840, 0x1, 0x1, ...)
        /tmp/tmp.MsXsQ1mOQ4/go/src/github.com/canaryio/sensord/sensord.go:128 +0x123
created by main.main
        /tmp/tmp.MsXsQ1mOQ4/go/src/github.com/canaryio/sensord/sensord.go:299 +0x665

The panic is coming from github.com/andelf/go-curl/easy.go#L356 , which is called by sensord.go#L114. I'm pretty sure this is because I'm running against libcurl-7.19.7-37.el6_5.3.x86_64 (CentOS 6.5) and CURLINFO_LOCAL_IP was added in 7.21 (I think). I have no idea why this worked fine for me until today. 💣🔮💩 (╯°□°)╯︵ ┻━┻

I'm currently re-building my sensord package to use the latest version of curl compiled from source. Hopefully that solves the current problem (which I realize is exacerbated by the old-enough-to-drink version I'm running against).

I've also seen some other sketchy crashes due to go-curl that I can't recall the details of right now. @tobz and I have kicked around the idea of ripping out go-curl and replacing it with straight-up net/http. Some of the timers would be difficult to retain without reimplementing the http client, like time to connect and time to first byte. But the DNS lookup time could be kept by timing net.LookupIP() and using the resulting address when making the http request. I'm not sure if the trade-off is worthwhile (the timings are the heart and soul of Canary!), but I'm ready to put go-curl in cement boots right now.

Necessary Components

  • scheduler goroutine
  • checker goroutines(s)
  • recording goroutine(s)
  • web server for delivery

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.