Giter VIP home page Giter VIP logo

gostatsd's Introduction

gostatsd

Godoc Build Status Coverage Status GitHub tag Docker Pulls Docker Stars MicroBadger Layers Size Go Report Card license

An implementation of Etsy's statsd in Go, based on original code from @kisielk.

The project provides both a server called "gostatsd" which works much like Etsy's version, but also provides a library for developing customized servers.

Backends are pluggable and only need to support the backend interface.

Being written in Go, it is able to use all cores which makes it easy to scale up the server based on load.

Building the server

[](# Go version needs to be the same in: CI config, README, Dockerfiles, and Makefile) Gostatsd currently targets Go 1.21.3. If you are compiling from source, please ensure you are running this version.

From the gostatsd directory run make build. The binary will be built in build/bin/<arch>/gostatsd.

You will need to install the Golang build dependencies by running make setup in the gostatsd directory. This must be done before the first build, and again if the dependencies change. A protobuf installation is expected to be found in the tools/ directory. Managing this in a platform agnostic way is difficult, but PRs are welcome. Hopefully it will be sufficient to use the generated protobuf files in the majority of cases.

If you are unable to build gostatsd please check your Go version, and try running make setup again before reporting a bug.

Running the server

gostatsd --help gives a complete description of available options and their defaults. You can use make run to run the server with just the stdout backend to display info on screen.

You can also run through docker by running make run-docker which will use docker-compose to run gostatsd with a graphite backend and a grafana dashboard.

While not generally tested on Windows, it should work. Maximum throughput is likely to be better on a linux system, however.

The server listens for UDP packets by default. You can use unix sockets providing an absolute path to the socket in the metrics-addr configuration option. The socket mode used in this case is SOCK_DGRAM. Note that using unix sockets will only work on linux and that it will ignore conn-per-reader configuration option.

Configuring the server mode

The server can currently run in two modes: standalone and forwarder. It is configured through the top level server-mode configuration setting. The default is standalone.

In standalone mode, raw metrics are processed and aggregated as normal, and aggregated data is submitted to configured backends (see below)

This configuration mode allows the following configuration options:

  • expiry-interval: interval before metrics are expired, see Metric expiry and persistence section. Defaults to 5m. 0 to disable, -1 for immediate.
  • expiry-interval-counter: interval before counters are expired, defaults to the value of expiry-interval.
  • expiry-interval-gauge: interval before gauges are expired, defaults to the value of expiry-interval.
  • expiry-interval-set: interval before sets are expired, defaults to the value of expiry-interval.
  • expiry-interval-timer: interval before timers are expired, defaults to the value of expiry-interval.
  • flush-aligned: whether or not the flush should be aligned. Setting this will flush at an exact time interval. With a 10 second flush-interval, if the service happens to be started at 12:47:13, then flushing will occur at 12:47:20, 12:47:30, etc, rather than 12:47:23, 12:47:33, etc. This removes query time ambiguity in a multi-server environment. Defaults to false.
  • flush-interval: duration for how long to batch metrics before flushing. Should be an order of magnitude less than the upstream flush interval. Defaults to 1s.
  • flush-offset: offset for flush interval when flush alignment is enabled. For example, with an offset of 7s and an interval of 10s, it will flush at 12:47:10+7 = 12:47:17, etc.
  • ignore-host: indicates whether or not an explicit host field will be added to all incoming metrics and events. Defaults to false
  • max-readers: the number of UDP receivers to run. Defaults to 8 or the number of logical cores, whichever is less.
  • max-parsers: the number of workers available to parse metrics. Defaults to the number of logical cores.
  • max-workers: the number of aggregators to process metrics. Defaults to the number of logical cores.
  • max-queue-size: the size of the buffers between parsers and workers. Defaults to 10000, monitored via channel.* metric, with dispatch_aggregator_batch and dispatch_aggregator_map channels.
  • max-concurrent-events: the maximum number of concurrent events to be dispatching. Defaults to 1024, monitored via channel.* metric, with backend_events_sem channel.
  • estimated-tags: provides a hint to the system as to how many tags are expected to be seen on any particular metric, so that memory can be pre-allocated and reducing churn. Defaults to 4. Note: this is only a hint, and it is safe to send more.
  • log-raw-metric: logs raw metrics received from the network. Defaults to false.
  • metrics-addr: the address to listen to metrics on. Defaults to :8125. Using a file path instead of host:port will create a Unix Domain Socket in the specified path instead of using UDP.
  • namespace: a namespace to prefix all metrics with. Defaults to ''.
  • statser-type: configures where internal metrics are sent to. May be internal which sends them to the internal processing pipeline, logging which logs them, null which drops them. Defaults to internal, or null if the NewRelic backend is enabled.
  • percent-threshold: configures the "percentiles" sent on timers. Space separated string. Defaults to 90.
  • heartbeat-enabled: emits a metric named heartbeat every flush interval, tagged by version and commit. Defaults to false.
  • receive-batch-size: the number of datagrams to attempt to read. It is more CPU efficient to read multiple, however it takes extra memory. See [Memory allocation for read buffers] section below for details. Defaults to 50.
  • conn-per-reader: attempts to create a connection for every UDP receiver. Not supported by all OS versions. It will be ignored when unix sockets are used for the connection. Defaults to false.
  • bad-lines-per-minute: the number of metrics which fail to parse to log per minute. This is used to prevent a bad client spamming malformed statsd data, while still logging some information to enable troubleshooting. Defaults to 0.
  • hostname: sets the hostname on internal metrics
  • timer-histogram-limit: specifies the maximum number of buckets on histograms. See [Timer histograms] below.

In forwarder mode, raw metrics are collected from a frontend, and instead of being aggregated they are sent via http to another gostatsd server after passing through the processing pipeline (cloud provider, static tags, filtering, etc).

A forwarder server is intended to run on-host and collect metrics, forwarding them on to a central aggregation service. At present the central aggregation service can only scale vertically, but horizontal scaling through clustering is planned.

Aligned flushing is deliberately not supported in forwarder mode, as it would impact the central aggregation server due to all for forwarder nodes transmitting at once, and the expectation that many forwarding flushes will occur per central flush anyway.

Configuring forwarder mode requires a configuration file, with a section named http-transport. The raw version spoken is not configurable per server (see HTTP.md for version guarantees). The configuration section allows the following configuration options:

  • compress: boolean indicating if the payload should be compressed. Defaults to true
  • compression-type: compression algorithm to use, one of zlib or lz4. Defaults to zlib. Note lz4 is non-standard so make sure the downstream consumer can handle Content-Encoding='lz4'.
  • compression-level: compression level to use (0-9). 0 = best speed, 9 = best compression. Defaults to 9.
  • api-endpoint: configures the endpoint to submit raw metrics to. This setting should be just a base URL, for example https://statsd-aggregator.private, with no path. Required, no default
  • max-requests: maximum number of requests in flight. Defaults to 1000 (which is probably too high)
  • concurrent-merge: maximum number of concurrent goroutines allowed to merge metrics before forwarding. Defaults to 1 for backward-compatibility
  • max-request-elapsed-time: duration for the maximum amount of time to try submitting data before giving up. This includes retries. Defaults to 30s (which is probably too high). Setting this value to -1 will disable retries.
  • consolidator-slots: number of slots in the metric consolidator. Memory usage is a function of this. Lower values may cause blocking in the pipeline (back pressure). A UDP only receiver will never use more than the number of configured parsers (--max-parsers option). Defaults to the value of --max-parsers, but may require tuning for HTTP based servers.
  • transport: see TRANSPORT.md for how to configure the transport.
  • custom-headers : a map of strings that are added to each request sent to allow for additional network routing / request inspection. Not required, default is empty. Example: --custom-headers='{"region" : "us-east-1", "service" : "event-producer"}'
  • dynamic-headers : similar with custom-headers, but the header values are extracted from metric tags matching the provided list of string. Tag names are canonicalized by first replacing underscores with hyphens, then converting first letter and each letter after a hyphen to uppercase, the rest are converted to lower case. If a tag is specified in both custom-header and dynamic-header, the vaule set by custom-header takes precedence. Not required, default is empty. Example: --dynamic-headers='["region", "service"]'. This is an experimental feature and it may be removed or changed in future versions.

The following settings from the previous section are also supported:

  • expiry-*
  • ignore-host
  • max-readers
  • max-parsers
  • estimated-tags
  • log-raw-metric
  • metrics-addr
  • namespace
  • statser-type
  • heartbeat-enabled
  • receive-batch-size
  • conn-per-reader
  • bad-lines-per-minute
  • hostname
  • log-raw-metric

Running as a Lambda Extension (experimental feature)

Gostatsd can be run as a lambda extension in forwarder mode. The metrics are flushed at the end of each lambda invocation by default. The flush interval is ignored for your custom metrics, internal metrics are still flushed on a best effort basis using the configured flush interval.

To support flushes based on the runtime function, a lambda telemetry server is started at the reserved lambda hostname sandbox on port 8083. This can be configured by setting the lambda-extension-telemetry-address configuration parameter. This will need to be done if port 8083 is not available within the lambda runtime.

The flush per invocation setting can be disabled by setting lambda-extension-manual-flush to false, however this is not recommended unless the lambda is constantly invoked. Since the extensions are suspended once the user lambda functions return, this may lead to metric loss (for inflight requests) or metric delay until the next invocation in lambdas that are sparsely invoked.

Configurable options:

  • lambda-extension-telemetry-address: address that the extension telemetry server should listen on
  • lambda-extension-manual-flush: boolean indicating whether the lambda should flush per invocation and disregard the the flush interval

All options for specified in the previous section for the forwarder are also configurable with the following caveats:

  • dynamic-headers are not supported
  • flush-interval will not be respected when lambda-extension-manual-flush is set to true

Metric expiry and persistence

After a metric has been sent to the server, the server will continue to send the metric to the configured backend until it expires, even if no additional metrics are sent from the client. The value sent depends on the metric type:

  • counter: sends 0 for both rate and count
  • gauge: sends the last received value.
  • set: sends 0
  • timer: sends non-percentile values of 0. Percentile values are not sent at all (see issue #135)

Setting an expiry interval of 0 will persist metrics forever. If metrics are not carefully controlled in such an environment, the server may run out of memory or overload the backend receiving the metrics. Setting a negative expiry interval will result in metrics not being persisted at all.

Each metric type has its own interval, which is configured using the following precedence (from highest to lowest): expiry-interval-<type> > expiry-interval > default (5 minutes).

Configuring HTTP servers

The service supports multiple HTTP servers, with different configurations for different requirements. All http servers are named in the top level http-servers setting. It should be a space separated list of names. Each server is then configured by creating a section in the configuration file named http.<servername>. An http server section has the following configuration options:

  • address: the address to bind to
  • enable-prof: boolean indicating if profiler endpoints should be enabled. Default false
  • enable-expvar: boolean indicating if expvar endpoints should be enabled. Default false
  • enable-ingestion: boolean indicating if ingestion should be enabled. Default false
  • enable-healthcheck: boolean indicating if healthchecks should be enabled. Default true

For example, to configure a server with a localhost only diagnostics endpoint, and a regular ingestion endpoint that can sit behind an ELB, the following configuration could be used:

backends='stdout'
http-servers='receiver profiler'

[http.receiver]
address='0.0.0.0:8080'
enable-ingestion=true

[http.profiler]
address='127.0.0.1:6060'
enable-expvar=true
enable-prof=true

There is no capability to run an https server at this point in time, and no auth (which is why you might want different addresses). You could also put a reverse proxy in front of the service. Documentation for the endpoints can be found under HTTP.md

Configuring backends

Refer to backends for configuration options for the backends.

Cloud providers

Cloud providers are a way to automatically enrich metrics with metadata from a cloud vendor.

Refer to cloud providers for configuration options for the cloud providers.

Configuring timer sub-metrics

By default, timer metrics will result in aggregated metrics of the form (exact name varies by backend):

<base>.Count
<base>.CountPerSecond
<base>.Mean
<base>.Median
<base>.Lower
<base>.Upper
<base>.StdDev
<base>.Sum
<base>.SumSquares

In addition, the following aggregated metrics will be emitted for each configured percentile:

<base>.Count_XX
<base>.Mean_XX
<base>.Sum_XX
<base>.SumSquares_XX
<base>.Upper_XX - for positive only
<base>.Lower_-XX - for negative only

These can be controlled through the disabled-sub-metrics configuration section:

[disabled-sub-metrics]
# Regular metrics
count=false
count-per-second=false
mean=false
median=false
lower=false
upper=false
stddev=false
sum=false
sum-squares=false

# Percentile metrics
count-pct=false
mean-pct=false
sum-pct=false
sum-squares-pct=false
lower-pct=false
upper-pct=false

By default (for compatibility), they are all false and the metrics will be emitted.

Timer histograms (experimental feature)

Timer histograms inspired by Prometheus implementation can be enabled on a per time series basis using gsd_histogram meta tag with value containing histogram bucketing definition (joined with _) e.g. gsd_histogram:-10_0_2.5_5_10_25_50.

It will:

  • output additional counter time series with name <base>.histogram and le tags specifying histogram buckets.
  • disable default sub-aggregations for timers e.g. <base>.Count, <base>.Mean, <base>.Upper, <base>.Upper_XX, etc.

For timer with gsd_histogram:-10_0_2.5_5_10_25_50 meta tag, following time series will be generated

  • <base>.histogram with tag le:-10
  • <base>.histogram with tag le:0
  • <base>.histogram with tag le:2.5
  • <base>.histogram with tag le:5
  • <base>.histogram with tag le:10
  • <base>.histogram with tag le:25
  • <base>.histogram with tag le:50
  • <base>.histogram with tag le:+Inf

Each time series will contain a total number of timer data points that had a value less or equal le value, e.g. counter <base>.histogram with the tag le:5 will contain the number of all observations that had a value not bigger than 5. Counter <base>.histogram with tag le:+Inf is equivalent to <base>.count and contains the total number.

All original timer tags are preserved and added to all the time series.

To limit cardinality, timer-histogram-limit option can be specified to limit the number of buckets that will be created (default is math.MaxUint32). Value of 0 won't disable the feature, 0 buckets will be emitted which effectively drops metrics with gsd_hostogram tags.

Incorrect meta tag values will be handled in best effort manner, i.e.

  • gsd_histogram:10__20_50 & gsd_histogram:10_incorrect_20_50 will generate le:10, le:20, le:50 and le:+Inf buckets
  • gsd_histogram:incorrect will result in only le:+Inf bucket

This is an experimental feature and it may be removed or changed in future versions.

Load testing

There is a tool under cmd/loader with support for a number of options which can be used to generate synthetic statsd load. There is also another load generation tool under cmd/tester which is deprecated and will be removed in a future release.

Help for the loader tool can be found through --help.

Sending metrics

The server listens for UDP packets on the address given by the --metrics-addr flag, aggregates them, then sends them to the backend servers given by the --backends flag (space separated list of backend names).

Currently supported backends are:

  • cloudwatch
  • datadog
  • graphite
  • influxdb
  • newrelic
  • statsdaemon
  • stdout

The format of each metric is:

<bucket name>:<value>|<type>\n
  • <bucket name> is a string like abc.def.g, just like a graphite bucket name
  • <value> is a string representation of a floating point number
  • <type> is one of c, g, or ms for "counter", "gauge", and "timer" respectively.

A single packet can contain multiple metrics, each ending with a newline.

Optionally, gostatsd supports sample rates (for simple counters, and for timer counters) and tags:

  • <bucket name>:<value>|c|@<sample rate>\n where sample rate is a float between 0 and 1
  • <bucket name>:<value>|c|@<sample rate>|#<tags>\n where tags is a comma separated list of tags
  • <bucket name>:<value>|<type>|#<tags>\n where tags is a comma separated list of tags

Tags format is: simple or key:value.

A simple way to test your installation or send metrics from a script is to use echo and the netcat utility nc:

echo 'abc.def.g:10|c' | nc -w1 -u localhost 8125

Monitoring

Many metrics for the internal processes are emitted. See METRICS.md for details. Go expvar is also exposed if the --profile flag is used.

Memory allocation for read buffers

By default gostatsd will batch read multiple packets to optimise read performance. The amount of memory allocated for these read buffers is determined by the config options:

max-readers * receive-batch-size * 64KB (max packet size)

The metric avg_packets_in_batch can be used to track the average number of datagrams received per batch, and the --receive-batch-size flag used to tune it. There may be some benefit to tuning the --max-readers flag as well.

Using the library

In your source code:

import "github.com/atlassian/gostatsd/pkg/statsd"

Note that this project uses Go modules for dependency management.

Documentation can be found via go doc github.com/atlassian/gostatsd/pkg/statsd or at https://godoc.org/github.com/atlassian/gostatsd/pkg/statsd

Versioning

Gostatsd uses semver versioning for both API and configuration settings, however it does not use it for packages.

This is due to gostatsd being an application first and a library second. Breaking API changes occur regularly, and the overhead of managing this is too burdensome.

Contributors

Pull requests, issues and comments welcome. For pull requests:

  • Add tests for new features and bug fixes
  • Follow the existing style
  • Separate unrelated changes into multiple pull requests

See the existing issues for things to start contributing.

For bigger changes, make sure you start a discussion first by creating an issue and explaining the intended change.

Atlassian requires contributors to sign a Contributor License Agreement, known as a CLA. This serves as a record stating that the contributor is entitled to contribute the code/documentation/translation to the project and is willing to have it used in distributions and derivative works (or is willing to transfer ownership).

Prior to accepting your contributions we ask that you please follow the appropriate link below to digitally sign the CLA. The Corporate CLA is for those who are contributing as a member of an organization and the individual CLA is for those contributing as an individual.

License

Copyright (c) 2012 Kamil Kisiel. Copyright @ 2016-2020 Atlassian Pty Ltd and others.

Licensed under the MIT license. See LICENSE file.

gostatsd's People

Contributors

adampetrovic avatar aelse avatar aelse-atlassian avatar ash2k avatar beanliu avatar brentthew avatar dependabot[bot] avatar digitalpoetry avatar eriksw avatar excaleo avatar hligit avatar hstan avatar irisgve avatar jamesmoessis avatar jcmackie avatar jtblin avatar juliawong avatar kav91 avatar kisielk avatar linxgnu avatar lisanmason avatar malderete avatar mattpatl avatar moviestoreguy avatar rangelreale avatar rebecca2000 avatar rubenruizdegauna avatar sfarsaci avatar tenaria avatar tiedotguy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gostatsd's Issues

No SIGINT handler

gostatsd doesn't terminate if stdin is a tty and SIGINT (CTRL-C) is sent.

Ability to configure log level

if v.GetBool(ParamVerbose) {
log.SetLevel(log.DebugLevel)
}

Would be great to get this exposed as log levels rather than just --verbose, so we can turn off INFO in certain use cases. For example, using this package in a test suite to write statsd metrics to stdout.

Rate limiting for event dispatching

Currently a goroutine per backend is created for each incoming event and there is no limit on how many we can start. To avoid running out of memory and/or saturating backend a rate limiting should be implemented in event dispatcher. It should be backend-agnostic.

Problems with statsdaemon backend writing to etsy statsd

I'm trying to set this service up to feed into an etsy statsd server with the statsdaemon backend. The etsy server is rejecting the messages as invalid because it doesn't understand the tag syntax:

Sep 1 19:39:39 statsd-cyanite-1 statsd: 1 Sep 19:39:39 - DEBUG: Bad line: 0.002660,g,#aggregator_id in msg "statsd.processing_time:0.002660|g|#aggregator_id:1,s:luneta-3.ec2.nearbuysystems.com"

I can't figure out how to remove the tags from the backend and I'm also surprised that no one has run into this before. Am I missing something stupid? If not I'd be happy to contribute a fix. Would you prefer a new backend or a config option for the statsdaemon one?

Set timeouts on all socket operations

  • net.Dial() does DNS resolution, establishes connection/etc with no timeout by default;
  • PacketConn.ReadFrom()/PacketConn.WriteTo() and Conn.Read()/Conn.Write() do reads/writes with no timeout by default;

We should set timeouts to some reasonable values otherwise they may default to no timeout or to some system defined timeout which is not what we want most of the time.

FakePacketConn.Close() is a noop

statsd_test uses a fake socket to generate packets, and depends on handlePacket to report an error before checking ctx for expiry. As part of a larger refactoring I am doing, this behavior is changing and packet handling can no longer error, which removes all indication of "done" from the testing.

Scaling out

In your documentation section Load balancing and scaling out, you mention it's possible to send requests to multiple gostatsd instances, which can then send the stats to another main gostatsd instance.

I would like to do something similar although, instead of sending requests to a load-balanced pool of gostatsd instances, each server would report data to it's own local gostatsd process, which would then report those stats to a central gostatsd server. Would you have any configuration examples I could follow to achieve this type of setup?

Thank you

Graphite backend ignores tags

The functions inside preparePayload in /backend/backends/graphite/graphite.go accept the tagsKey parameter and each metric is prepared based on its tag. However the tag is never used in constructing the metric name to send to graphite.

So what happens is that if you have metric key a.b.c that is broken up into 10 tags, the metric a.b.c is sent to graphite 10 times with the same name & timestamp, but with the various values for each of the tags. Then graphite simply takes the last one to arrive and put that in as the value for that timestamp, the other values are lost.

Ideally if there is a tag, the preparePayload function would insert it as (probably) a dot suffix after the key name.

Cloud provider metadata lookup should be asynchronous

Currently cloud provider lookup for metadata happens synchronously in the receiver goroutine(s). That obviously means it blocks the particular receiver for the time of the call. By default number of receivers is equal to the number of cores. On a small box there is only one core so if the single receiver blocks we do not read data from the socket while we are waiting for the response. It may lead to dropped packets even if the number of incoming packets is not big if the system runs out of socket buffer space.

To fix this cloud provider lookups should be asynchronous. How exactly this should be implemented is not clear to me yet. Thinking.

TestCloudHandlerExpirationAndRefresh is flaky

Have not investigated the cause.

--- FAIL: TestCloudHandlerExpirationAndRefresh (0.00s)
    --- FAIL: TestCloudHandlerExpirationAndRefresh/1.2.3.4 (0.90s)
        Error Trace:    cloud_handler_test.go:61
                        cloud_handler_test.go:31
        Error:          Not equal:
                        expected: []gostatsd.IP{"1.2.3.4", "1.2.3.4"}
                        received: []gostatsd.IP{"1.2.3.4", "1.2.3.4", "1.2.3.4"}

                        Diff:
                        --- Expected
                        +++ Actual
                        @@ -1,2 +1,3 @@
                        -([]gostatsd.IP) (len=2) {
                        +([]gostatsd.IP) (len=3) {
                        + (gostatsd.IP) (len=7) "1.2.3.4",
                          (gostatsd.IP) (len=7) "1.2.3.4",
FAIL
FAIL    github.com/atlassian/gostatsd/pkg/statsd        9.919s

Figure out what is wrong with requests to Datadog

{"level":"info","msg":"[datadog] failed request status: 400\n\u003chtml\u003e\u003cbody\u003e\u003ch1\u003e400 Bad request\u003c/h1\u003e\nYour browser sent an invalid request.\n\u003c/body\u003e\u003c/html\u003e\n","time":"2017-01-26T20:25:08Z"}

google pubsub support

We recently added support for publishing to google pubsub to the carbon-relay-ng project, and consuming from pubsub to the go-carbon project. It's working well enough that I'm considering also adding pubsub support to gostatsd. Right now our pipeline looks like:

clients -> [gostatsd x 3 (ingestion tier)] -> [gostatsd x1 (aggregator)] -> [graphite-relay-ng] -> [google pubsub] -> [go-carbon]

Instead of having the gostatsd aggregator tier funnel into graphite-relay-ng over TCP we could have gostatsd publish in graphite format to pubsub, bypassing the TCP pipe to graphite-relay-ng. I suspect this could add a little more robustness to the pipeline. For example, right now when we update the graphite-relay-ng pods (it's a kubernetes deployment) there is a minute or so of lost metrics from the gostatsd tier. In this setup the pipeline would look like this:

clients -> [gostatsd x 3 (ingestion tier)] -> [gostatsd x1 (aggregator)] -> [google pubsub] -> [go-carbon]

In addition to gostatsd publishing to google pubsub in graphite format as in the example above, it might also be desirable for the gostatsd ingestion tier to publish statsd format to pubsub, such as:

clients -> [gostatsd x 3 (ingestion tier)] -> [google pubsub] -> [gostatsd x1 (aggregator)] -> [google pubsub] -> [go-carbon]

The goal being fewer dropped metrics when the gostatsd-aggregator process is restarted or dies.

Questions for gostatsd team:

  1. Would the project consider PRs that added google-pubsub support?
  2. If so, is there a recommended approach? In looking at the existing graphite-TCP backend code I think I would end up copying significant amounts of code to format the payload into graphite linemode format. So would it be better to add a configuration option to the existing graphite backend to toggle between TCP and google-pubsub? Or, create a new separate 'google pubsub' backend?

Handle InvalidInstanceID.NotFound

From logs:

{"level":"info","msg":"Error retrieving instance details from cloud provider for 10.124.204.122: error listing AWS instances: InvalidInstanceID.NotFound: The instance IDs 'i-0945cd029ef92983b, i-0fb66467917dd1e6a' do not exist\n\tstatus code: 400, request id: 14e9165b-5bc7-448b-8dc7-6d1b24e84227","time":"2017-01-27T01:05:47Z"}

{"level":"info","msg":"Error retrieving instance details from cloud provider for 10.124.218.209: error listing AWS instances: InvalidInstanceID.NotFound: The instance IDs 'i-0945cd029ef92983b, i-0fb66467917dd1e6a' do not exist\n\tstatus code: 400, request id: 3d91c78d-da66-4391-84ba-65b7b65b1662","time":"2017-01-27T01:05:47Z"}

{"level":"info","msg":"Error retrieving instance details from cloud provider for 10.124.212.12: error listing AWS instances: InvalidInstanceID.NotFound: The instance IDs 'i-0945cd029ef92983b, i-0fb66467917dd1e6a' do not exist\n\tstatus code: 400, request id: 02a83a26-7470-42c7-9f7b-cea0c15ccf7f","time":"2017-01-27T01:05:47Z"}

AWS documentation: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html#CommonErrors

The specified instance does not exist. Ensure that you have indicated the region in which the instance is located, if it's not in the default region. This error may occur because the ID of a recently created instance has not propagated through the system. For more information, see Eventual Consistency.

I think it is worth handling this edge case explicitly instead of spamming logs.

make build error pkg/statsd/batched_reader.go:41: undefined: ipv6.Message

I'm trying to build Gostatsd tag 3.1.1 on Go 1.8.3, after running make setup and make build I get:

# github.com/atlassian/gostatsd/pkg/statsd
pkg/statsd/batched_reader.go:41: undefined: ipv6.Message
pkg/statsd/batched_reader.go:45: br.conn.ReadBatch undefined (type *ipv6.PacketConn has no field or method ReadBatch)

Full debug:

jlory@jlory-ubuntu:~/go/src/github.com/atlassian/gostatsd$ make setup
go get -u github.com/Masterminds/glide
go get -u github.com/alecthomas/gometalinter
gometalinter --install
Installing:
  aligncheck
  deadcode
  dupl
  errcheck
  gas
  goconst
  gocyclo
  goimports
  golint
  gosimple
  gotype
  gotypex
  ineffassign
  interfacer
  lll
  megacheck
  misspell
  safesql
  staticcheck
  structcheck
  unconvert
  unparam
  unused
  varcheck
glide install --strip-vendor
[INFO]	Downloading dependencies. Please wait...
[INFO]	--> Found desired version locally github.com/ash2k/stager 6e9c7b0eacd465286fac042bfb29a170aa8c2c3f!
[INFO]	--> Found desired version locally github.com/aws/aws-sdk-go 72df63b404d3f9820db08c73176c1b277d9f614f!
[INFO]	--> Found desired version locally github.com/cenkalti/backoff 61ba96c4d1002f22e06acb8e34a7650611125a63!
[INFO]	--> Found desired version locally github.com/fsnotify/fsnotify 4da3e2cfbabc9f751898f250b49f2439785783a1!
[INFO]	--> Found desired version locally github.com/go-ini/ini c787282c39ac1fc618827141a1f762240def08a3!
[INFO]	--> Found desired version locally github.com/go-redis/redis 975882d73d21759d45a4eb49652064083bc23e61!
[INFO]	--> Found desired version locally github.com/hashicorp/hcl 68e816d1c783414e79bc65b3994d9ab6b0a722ab!
[INFO]	--> Found desired version locally github.com/jmespath/go-jmespath bd40a432e4c76585ef6b72d3fd96fb9b6dc7b68d!
[INFO]	--> Found desired version locally github.com/magiconair/properties 8d7837e64d3c1ee4e54a880c5a920ab4316fc90a!
[INFO]	--> Found desired version locally github.com/mitchellh/mapstructure d0303fe809921458f417bcf828397a65db30a7e4!
[INFO]	--> Found desired version locally github.com/pelletier/go-toml 16398bac157da96aa88f98a2df640c7f32af1da2!
[INFO]	--> Found desired version locally github.com/sirupsen/logrus 89742aefa4b206dcf400792f3bd35b542998eb3b!
[INFO]	--> Found desired version locally github.com/spf13/afero ee1bd8ee15a1306d1f9201acc41ef39cd9f99a1b!
[INFO]	--> Found desired version locally github.com/spf13/cast acbeb36b902d72a7a4c18e8f3241075e7ab763e4!
[INFO]	--> Found desired version locally github.com/spf13/jwalterweatherman 12bd96e66386c1960ab0f74ced1362f66f552f7b!
[INFO]	--> Found desired version locally github.com/spf13/pflag 7aff26db30c1be810f9de5038ec5ef96ac41fd7c!
[INFO]	--> Found desired version locally github.com/spf13/viper 25b30aa063fc18e48662b86996252eabdcf2f0c7!
[INFO]	--> Found desired version locally github.com/stretchr/testify 890a5c3458b43e6104ff5da8dfa139d013d77544!
[INFO]	--> Found desired version locally golang.org/x/crypto c84b36c635ad003a10f0c755dff5685ceef18c71!
[INFO]	--> Found desired version locally golang.org/x/net 0a9397675ba34b2845f758fe3cd68828369c6517!
[INFO]	--> Found desired version locally golang.org/x/sys 314a259e304ff91bd6985da2a7149bbf91237993!
[INFO]	--> Found desired version locally golang.org/x/text 1cbadb444a806fd9430d14ad08967ed91da4fa0a!
[INFO]	--> Found desired version locally golang.org/x/time 6dc17368e09b0e8634d71cac8168d853e869a0c7!
[INFO]	--> Found desired version locally gopkg.in/yaml.v2 eb3733d160e74a9c7e442f435eb3bea458e1d19f!
[INFO]	--> Found desired version locally github.com/davecgh/go-spew 04cdfd42973bb9c8589fd6a731800cf222fde1a9!
[INFO]	--> Found desired version locally github.com/pmezard/go-difflib d8ed2627bdf02c080bf22230dbb337003b7aba2d!
[INFO]	Setting references.
[INFO]	--> Setting version for github.com/aws/aws-sdk-go to 72df63b404d3f9820db08c73176c1b277d9f614f.
[INFO]	--> Setting version for github.com/ash2k/stager to 6e9c7b0eacd465286fac042bfb29a170aa8c2c3f.
[INFO]	--> Setting version for github.com/jmespath/go-jmespath to bd40a432e4c76585ef6b72d3fd96fb9b6dc7b68d.
[INFO]	--> Setting version for github.com/cenkalti/backoff to 61ba96c4d1002f22e06acb8e34a7650611125a63.
[INFO]	--> Setting version for github.com/hashicorp/hcl to 68e816d1c783414e79bc65b3994d9ab6b0a722ab.
[INFO]	--> Setting version for github.com/mitchellh/mapstructure to d0303fe809921458f417bcf828397a65db30a7e4.
[INFO]	--> Setting version for github.com/go-redis/redis to 975882d73d21759d45a4eb49652064083bc23e61.
[INFO]	--> Setting version for github.com/magiconair/properties to 8d7837e64d3c1ee4e54a880c5a920ab4316fc90a.
[INFO]	--> Setting version for github.com/spf13/jwalterweatherman to 12bd96e66386c1960ab0f74ced1362f66f552f7b.
[INFO]	--> Setting version for github.com/sirupsen/logrus to 89742aefa4b206dcf400792f3bd35b542998eb3b.
[INFO]	--> Setting version for github.com/spf13/viper to 25b30aa063fc18e48662b86996252eabdcf2f0c7.
[INFO]	--> Setting version for github.com/stretchr/testify to 890a5c3458b43e6104ff5da8dfa139d013d77544.
[INFO]	--> Setting version for github.com/spf13/afero to ee1bd8ee15a1306d1f9201acc41ef39cd9f99a1b.
[INFO]	--> Setting version for github.com/spf13/cast to acbeb36b902d72a7a4c18e8f3241075e7ab763e4.
[INFO]	--> Setting version for github.com/fsnotify/fsnotify to 4da3e2cfbabc9f751898f250b49f2439785783a1.
[INFO]	--> Setting version for github.com/go-ini/ini to c787282c39ac1fc618827141a1f762240def08a3.
[INFO]	--> Setting version for github.com/pelletier/go-toml to 16398bac157da96aa88f98a2df640c7f32af1da2.
[INFO]	--> Setting version for github.com/spf13/pflag to 7aff26db30c1be810f9de5038ec5ef96ac41fd7c.
[INFO]	--> Setting version for golang.org/x/crypto to c84b36c635ad003a10f0c755dff5685ceef18c71.
[INFO]	--> Setting version for golang.org/x/net to 0a9397675ba34b2845f758fe3cd68828369c6517.
[INFO]	--> Setting version for github.com/pmezard/go-difflib to d8ed2627bdf02c080bf22230dbb337003b7aba2d.
[INFO]	--> Setting version for github.com/davecgh/go-spew to 04cdfd42973bb9c8589fd6a731800cf222fde1a9.
[INFO]	--> Setting version for gopkg.in/yaml.v2 to eb3733d160e74a9c7e442f435eb3bea458e1d19f.
[INFO]	--> Setting version for golang.org/x/sys to 314a259e304ff91bd6985da2a7149bbf91237993.
[INFO]	--> Setting version for golang.org/x/time to 6dc17368e09b0e8634d71cac8168d853e869a0c7.
[INFO]	--> Setting version for golang.org/x/text to 1cbadb444a806fd9430d14ad08967ed91da4fa0a.
[INFO]	Exporting resolved dependencies...
[INFO]	--> Exporting github.com/fsnotify/fsnotify
[INFO]	--> Exporting github.com/hashicorp/hcl
[INFO]	--> Exporting github.com/ash2k/stager
[INFO]	--> Exporting github.com/go-redis/redis
[INFO]	--> Exporting github.com/magiconair/properties
[INFO]	--> Exporting github.com/go-ini/ini
[INFO]	--> Exporting github.com/aws/aws-sdk-go
[INFO]	--> Exporting github.com/mitchellh/mapstructure
[INFO]	--> Exporting github.com/jmespath/go-jmespath
[INFO]	--> Exporting github.com/pelletier/go-toml
[INFO]	--> Exporting github.com/stretchr/testify
[INFO]	--> Exporting github.com/spf13/cast
[INFO]	--> Exporting github.com/spf13/viper
[INFO]	--> Exporting github.com/spf13/jwalterweatherman
[INFO]	--> Exporting github.com/spf13/afero
[INFO]	--> Exporting github.com/cenkalti/backoff
[INFO]	--> Exporting github.com/spf13/pflag
[INFO]	--> Exporting github.com/sirupsen/logrus
[INFO]	--> Exporting github.com/davecgh/go-spew
[INFO]	--> Exporting github.com/pmezard/go-difflib
[INFO]	--> Exporting golang.org/x/net
[INFO]	--> Exporting golang.org/x/crypto
[INFO]	--> Exporting golang.org/x/sys
[INFO]	--> Exporting gopkg.in/yaml.v2
[INFO]	--> Exporting golang.org/x/text
[INFO]	--> Exporting golang.org/x/time
[INFO]	Replacing existing vendor dependencies
[INFO]	Removing nested vendor and Godeps/_workspace directories...
[INFO]	Removing: /home/jlory/go/src/github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/awsmigrate/awsmigrate-renamer/vendor
[INFO]	Removing: /home/jlory/go/src/github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/vendor
[INFO]	Removing: /home/jlory/go/src/github.com/atlassian/gostatsd/vendor/github.com/stretchr/testify/vendor
go get -u github.com/githubnemo/CompileDaemon
go get -u github.com/jstemmer/go-junit-report
go get -u golang.org/x/tools/cmd/goimports
jlory@jlory-ubuntu:~/go/src/github.com/atlassian/gostatsd$ make build
gofmt -w=true -s $(find . -type f -name '*.go' -not -path "./vendor/*")
goimports -w=true -d $(find . -type f -name '*.go' -not -path "./vendor/*")
go build -i -v -o build/bin/$(uname -s | tr A-Z a-z)/gostatsd -ldflags "-s -X main.Version=$(git describe --abbrev=0 --tags) -X main.GitCommit=$(git rev-parse --short HEAD) -X main.BuildDate=$(date +%Y-%m-%d-%H:%M)" github.com/atlassian/gostatsd/cmd/gostatsd
github.com/atlassian/gostatsd/vendor/golang.org/x/sys/unix
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl/hcl/strconv
github.com/atlassian/gostatsd/vendor/github.com/magiconair/properties
github.com/atlassian/gostatsd/vendor/github.com/mitchellh/mapstructure
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl/hcl/token
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl/hcl/ast
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl/hcl/scanner
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl/json/token
github.com/atlassian/gostatsd/vendor/github.com/pelletier/go-toml
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl/json/scanner
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl/hcl/parser
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl/json/parser
github.com/atlassian/gostatsd/vendor/github.com/spf13/afero/mem
github.com/atlassian/gostatsd/vendor/github.com/hashicorp/hcl
github.com/atlassian/gostatsd/vendor/github.com/fsnotify/fsnotify
github.com/atlassian/gostatsd/vendor/golang.org/x/text/transform
github.com/atlassian/gostatsd/vendor/golang.org/x/text/unicode/norm
github.com/atlassian/gostatsd/vendor/github.com/spf13/cast
github.com/atlassian/gostatsd/vendor/github.com/spf13/jwalterweatherman
github.com/atlassian/gostatsd/vendor/github.com/spf13/pflag
github.com/atlassian/gostatsd/vendor/gopkg.in/yaml.v2
github.com/atlassian/gostatsd/vendor/golang.org/x/net/context
github.com/atlassian/gostatsd/vendor/github.com/cenkalti/backoff
github.com/atlassian/gostatsd/vendor/golang.org/x/crypto/ssh/terminal
github.com/atlassian/gostatsd/vendor/github.com/spf13/afero
github.com/atlassian/gostatsd/vendor/github.com/sirupsen/logrus
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/awserr
github.com/atlassian/gostatsd/vendor/github.com/go-ini/ini
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/endpoints
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/client/metadata
github.com/atlassian/gostatsd/vendor/github.com/jmespath/go-jmespath
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/credentials
github.com/atlassian/gostatsd/vendor/github.com/spf13/viper
github.com/atlassian/gostatsd/vendor/golang.org/x/net/http2/hpack
github.com/atlassian/gostatsd/vendor/golang.org/x/text/unicode/bidi
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/awsutil
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws
github.com/atlassian/gostatsd/vendor/github.com/ash2k/stager/wait
github.com/atlassian/gostatsd/vendor/github.com/ash2k/stager
github.com/atlassian/gostatsd
github.com/atlassian/gostatsd/vendor/golang.org/x/text/secure/bidirule
github.com/atlassian/gostatsd/vendor/golang.org/x/net/bpf
github.com/atlassian/gostatsd/vendor/golang.org/x/net/idna
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/request
github.com/atlassian/gostatsd/pkg/backends/datadog
github.com/atlassian/gostatsd/pkg/backends/sender
github.com/atlassian/gostatsd/pkg/backends/null
github.com/atlassian/gostatsd/pkg/backends/stdout
github.com/atlassian/gostatsd/pkg/backends/graphite
github.com/atlassian/gostatsd/pkg/backends/statsdaemon
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/client
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/corehandlers
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/private/protocol/rest
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/ec2metadata
github.com/atlassian/gostatsd/pkg/backends
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/private/protocol
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/private/protocol/query/queryutil
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/credentials/ec2rolecreds
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/signer/v4
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/credentials/endpointcreds
github.com/atlassian/gostatsd/vendor/golang.org/x/net/lex/httplex
github.com/atlassian/gostatsd/vendor/golang.org/x/net/http2
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/defaults
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/private/protocol/query
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/private/protocol/ec2query
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/service/sts
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/service/ec2
github.com/atlassian/gostatsd/pkg/statser
github.com/atlassian/gostatsd/vendor/golang.org/x/net/internal/iana
github.com/atlassian/gostatsd/vendor/golang.org/x/net/internal/socket
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/credentials/stscreds
github.com/atlassian/gostatsd/vendor/github.com/aws/aws-sdk-go/aws/session
github.com/atlassian/gostatsd/vendor/golang.org/x/net/ipv6
github.com/atlassian/gostatsd/vendor/golang.org/x/time/rate
github.com/atlassian/gostatsd/pkg/statsd
# github.com/atlassian/gostatsd/pkg/statsd
pkg/statsd/batched_reader.go:41: undefined: ipv6.Message
pkg/statsd/batched_reader.go:45: br.conn.ReadBatch undefined (type *ipv6.PacketConn has no field or method ReadBatch)
github.com/atlassian/gostatsd/pkg/cloudproviders/aws
github.com/atlassian/gostatsd/pkg/cloudproviders
Makefile:33: recipe for target 'build' failed
make: *** [build] Error 2

version 0.15.0 locks up after one flush window

We're trying gostatsd. Under load (around 40k incoming metrcs per second) version 0.14.0 recorded it was receiving around 10k metrics per flush window (60s):

DEBU[0600] numStats: 10571 packetsReceived: 10537

I just tried the new release (0.15) and it locked up after one flush window. That is no more metrics were sent and the app didn't stop when CTRL-C was pressed. I had to CTRL-Z and kill the pid.

./gostatsd_0.15 --flush-interval 60s --verbose --config-path "./config.toml" --metrics-addr "0.0.0.0:8125"                          INFO[0000] Starting server
INFO[0000] No cloud provider specified
INFO[0000] [graphite] address=10.0.0.202:2013 dialTimeout=5s writeTimeout=30s
INFO[0000] Initialised backend "graphite"
DEBU[0060] Sending 2054 metrics to backend graphite
DEBU[0060] Sending 3320 metrics to backend graphite
DEBU[0060] Sending 6079 metrics to backend graphite
DEBU[0060] Sending 1848 metrics to backend graphite

config.toml:

[graphite]
        address = "10.0.0.202:2013"
        legacy_namespace = false
        global_prefix = "stats"
        prefix_timer = ""
        prefix_counter = ""

10.0.0.202:2013 is carbon relay

Any idea where I should look to debug this?

We're running on ubuntu 14.04

Add tags to own metrics

Own metrics do not get tags from cloud provider applied at the moment. This should be fixed.

Add option to remove tags from graphite backend

Since graphite doesn't support tags, the normalizeBucketName() method at https://github.com/atlassian/gostatsd/blob/master/backend/backends/graphite/graphite.go#L32 translates metrics from something like

stats.test.counter

to

stats.test.counter.statsd_source_id.192.168.1.1.statsd_source_id.127.0.0.1

which really breaks things for us (trying to replace etsy/statsd with gostatsd). I got around it by commented out lines 33-38 of graphite.go, but ideally either tags support should be removed entirely from the graphite backend, or add a flag like "--no-tags" to the startup. I did play with --default-tags but there doesn't appear to be way to turn it off completely. I'd put in a PR for the latter, but I don't have enough Go experience to figure that out in short order.

Datadog 413 request entity too large

time="2016-06-03T05:04:55Z" level=warning msg="[datadog] failed to send metrics, sleeping for 456.466795ms: received bad status code 413"
Should have a limit on the payload size and split request into several (send them concurrently).

sending metrics to backend failed error messages

I'm trying out this version of gostatsd, it looks really promising. When running gostatsd with either of these startup commands:

gostatsd --backends graphite --config-path ./config.toml --flush-interval 10s --verbose

or

gostatsd --backends "statsdaemon" --config-path ./config.toml --verbose

I see this error message every time it flushes the metrics, yet they do show up in graphite (tcpdump confirms it). I've exhausted my very limited knowledge of go trying to figure out why https://github.com/atlassian/gostatsd/blob/master/statsd/flusher.go#L113 is logging an error. Any ideas?

Fix web console

Currently commented out due to refactoring. Need to re-implement it, should just show internal statistics, including backend statistics.

Empty body with non-zero Content-Length?

Sometimes this happens:

time="2016-05-27T03:54:51Z" level=warning msg="[datadog] failed to send metrics, sleeping for 46.906778157s:
error POSTing: Post https://app.datadoghq.com/api/v1/series?api_key=*****:
http: ContentLength=2975673 with Body length 0"

Panic in Datadog backend

panic: reflect: slice index out of range [recovered]
#011panic: reflect: slice index out of range
goroutine 349 [running]:
panic(0x9ac080, 0xc820407aa0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/runtime/panic.go:481 +0x3e6
encoding/json.(*encodeState).marshal.func1(0xc820205cb8)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:269 +0x11f
panic(0x9ac080, 0xc820407a90)
#011/usr/local/Cellar/go/1.6.2/libexec/src/runtime/panic.go:443 +0x4e9
reflect.Value.Index(0x9927a0, 0xc82023f9e0, 0x197, 0x175, 0x0, 0x0, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/reflect/value.go:854 +0x151
encoding/json.(*arrayEncoder).encode(0xc82001e1e8, 0xc820166000, 0x9927a0, 0xc82023f9e0, 0x197, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:690 +0xd2
encoding/json.(*arrayEncoder).(encoding/json.encode)-fm(0xc820166000, 0x9927a0, 0xc82023f9e0, 0x197, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:697 +0x51
encoding/json.(*sliceEncoder).encode(0xc82001e1f0, 0xc820166000, 0x9927a0, 0xc82023f9e0, 0x197, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:667 +0xb0
encoding/json.(*sliceEncoder).(encoding/json.encode)-fm(0xc820166000, 0x9927a0, 0xc82023f9e0, 0x197, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:676 +0x51
encoding/json.(*structEncoder).encode(0xc82016a330, 0xc820166000, 0xb2a980, 0xc82023f9e0, 0x199, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:587 +0x2c4
encoding/json.(*structEncoder).(encoding/json.encode)-fm(0xc820166000, 0xb2a980, 0xc82023f9e0, 0x199, 0xc82023f900)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:601 +0x51
encoding/json.(*ptrEncoder).encode(0xc82001e1f8, 0xc820166000, 0xacaca0, 0xc82023f9e0, 0x16, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:709 +0xea
encoding/json.(*ptrEncoder).(encoding/json.encode)-fm(0xc820166000, 0xacaca0, 0xc82023f9e0, 0x16, 0xc82023f900)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:714 +0x51
encoding/json.(*encodeState).reflectValue(0xc820166000, 0xacaca0, 0xc82023f9e0, 0x16)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:301 +0x6b
encoding/json.(*encodeState).marshal(0xc820166000, 0xacaca0, 0xc82023f9e0, 0x0, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:274 +0xa9
encoding/json.Marshal(0xacaca0, 0xc82023f9e0, 0x0, 0x0, 0x0, 0x0, 0x0)
#011/usr/local/Cellar/go/1.6.2/libexec/src/encoding/json/encode.go:139 +0x84
github.com/atlassian/gostatsd/backend/backends/datadog.(*client).post(0xc82000b180, 0xc1bc50, 0xe, 0xc15088, 0x7, 0xacaca0, 0xc82023f9e0, 0x0, 0x0)
#011/Users/mmazurskiy/gopath/src/github.com/atlassian/gostatsd/backend/backends/datadog/datadog.go:229 +0x52
github.com/atlassian/gostatsd/backend/backends/datadog.(*client).postMetrics(0xc82000b180, 0xc82023f9e0, 0x0, 0x0)
#011/Users/mmazurskiy/gopath/src/github.com/atlassian/gostatsd/backend/backends/datadog/datadog.go:200 +0x7b
github.com/atlassian/gostatsd/backend/backends/datadog.(*client).SendMetricsAsync.func1.1(0xc82000b180, 0xc82023f9e0, 0x7f249cdb4ee0, 0xc82003ab40, 0xc820458120)
#011/Users/mmazurskiy/gopath/src/github.com/atlassian/gostatsd/backend/backends/datadog/datadog.go:120 +0x3d
created by github.com/atlassian/gostatsd/backend/backends/datadog.(*client).SendMetricsAsync.func1
#011/Users/mmazurskiy/gopath/src/github.com/atlassian/gostatsd/backend/backends/datadog/datadog.go:125 +0x6f
Jun 22 00:37:05 ip-10-116-8-214 systemd[1]: statsd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

Preserve IP->metadata cache across restarts

When process is restarted (e.g. due to an update) the IP-to-metadata cache is empty. Why it is bad:

  • All incoming metrics are piled up in buffers, blocked by the cloud provider;
  • Cloud provider needs to satisfy a lot of requests - likely hitting the allowed request rate limit.
    It leads to increased errors rates and latency of metric processing.
    To solve this the cache should be preserved across restarts (with a ttl for each entry).

Host tag shouldn't be set for percentiles

Percentiles should be calculated across hosts. Because we add the host tag automatically (as per Datadog agent), percentiles are not calculated properly. Fixing this issue might make the code failry complex though, not sure as I haven't looked into it for a while.

Metrics are automatically tagged by client hostname

See line 315 in aggregator.go

tagsKey := formatTagsKey(m.Tags, m.Hostname)

This results in all incoming metrics being tagged by hostname. This means that especially for timer metrics the calculations (min, max, median, etc.) are all fragmented by the hostname tag. When you have a large (and changing depending on load) number of logstash instances sending metrics to gostatsd it means that we can't get the correct numbers out at the end.

I'm wondering if this could be made optional or if there is another way to configure the program to not shard on the tags if not needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.