Markdown files for Datadog's longform blog posts: https://www.datadoghq.com/blog/
Please read our contribution guidelines before opening a new issue or pull request.
Markdown files for Datadog's longform blog posts: https://www.datadoghq.com/blog/
Markdown files for Datadog's longform blog posts: https://www.datadoghq.com/blog/
Please read our contribution guidelines before opening a new issue or pull request.
I work on a product with multiple endpoints, but due to the sensitive nature of data, we are only looking to send Nginx logs for one endpoint. Does DataDog support it? If Yes, How can we enable it?
Is there any road map to add documentation apis for spark as well ?
On the page https://www.datadoghq.com/blog/how-to-collect-haproxy-metrics/, the parameters to enable stats on HAProxy doesn't work with the latest version (1.6) : https://cbonte.github.io/haproxy-dconv/1.6/configuration.html#4
listen stats :9000
should be replaced by
listen stats
bind :9000
Hello, I'm wondering if the dd-agent can collect custom applications metrics exposed to cadvisor, without the need to use statsd agent.
Hi,
suggested method to look at network stats works only if there is only one process inside a container. In most of the cases, we might have a shell script launching all the other required processes and entering into an infinite loop or monitoring applications that were launched.
https://www.datadoghq.com/blog/monitoring-kubernetes-performance-metrics/ says:
Metric to watch: CPU utilization
Tracking the amount of CPU your pods are using compared to their configured requests and limits, as well as CPU utilization at the node level, will give you important insight into cluster performance.
However, it doesn't explain how to do that. The https://docs.datadoghq.com/containers/kubernetes/data_collected/ page doesn't show any metric with pod or container-level CPU usage information. What metric should queries use to get that info?
Just trying to create a timeseries graph based on a wildcard of the name of the instance.
{
"requests": [
{
"q": "avg:system.load.1{name:content*}",
"type": "line",
"conditional_formats": [],
"aggregator": "avg"
}
],
"viz": "timeseries"
}
You get the idea.. trying to match on any server with name content but it breaks horribly. No carnage but it just turns red. Am I missing something? My google foo is failing me.
Hello, I had gone through the post https://www.datadoghq.com/blog/monitoring-kubernetes-performance-metrics/
. It is very well written and good to read.
I had a query regarding some metrics in Kubernetes which I am not able to see on the web mentioned clearly .
Is it possible to get the Garbage collector metrics etc. I saw there are some for Go lang in cAdvisor but anything specific for Java JVM .
I saw the below article
https://www.robustperception.io/measuring-java-garbage-collection-with-prometheus/
But do you have any specific way we can do it.
Hello All...
In the "Metric to watch: Volume queue length" Section in The "Part 1: Key metrics for Amazon EBS monitoring" article, it was mentioned that "A rule of thumb for SSD volumes is to aim for a queue length of one for every 500 IOPS available" and the source for that statement is https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_procedures.html#UnderstandingQueueLength , which has been updated to be "we recommend that you target a queue length of 1 for every 1000 IOPS available".
So, Could you please update your article to reflect the latest changes in the documentation?
Thank you,
Ahmed
Quote from https://www.datadoghq.com/blog/tomcat-architecture-and-performance
Upon startup, Tomcat will create threads based on the value set for minSpareThreads and increase that number based on demand, up to the number of maxThreads. If the maximum number of threads is reached, and all threads are busy, incoming requests are placed in a queue (acceptCount) to wait for the next available thread. The server will only continue to accept a certain number of concurrent connections (as determined by maxConnections). When the queue is full and the number of connections hits maxConnections, any additional incoming clients will start receiving Connection Refused errors.
The bold part is incorrect. Tomcat will continue accept new connections until the concurrent connections reaches maxConnections. Once maxConnections is reached, OS will queue new connections until acceptCount.
Hi, i'm looking for a way to monitor frontends by host header in HAProxy config where we route by host header. For example, consider the following config:
frontend http-in
bind *:80
log /dev/log len 65535 local1 info
capture request header User-Agent len 30
capture request header X-Request-ID len 36
capture request header Host len 32
# Frontend rules for host header routing
use_backend user if { hdr(Host) -i user user.example.com }
use_backend login if { hdr(Host) -i login login.example.com }
backend user
mode http
server-template user 10 _user._tcp.service.consul resolvers consul resolve-prefer ipv4 check
backend login
mode http
server-template login 10 _login._tcp.service.consul resolvers consul resolve-prefer ipv4 check
Is there a way to get stats of all frontends, for example in particular haproxy.frontend.response.4xx
by header:Host
?
env: - name: API_KEY Value: "YOUR_API_KEY_HERE"
Value must be value
In the section "Configure the Agent" there is a problem which delayed my integration.
This section is supposed to be for NGINX and not for NGINXPlus. For NGINXPlus we also have to add "use_plus_api: true" which is not mentioned in this guide.
conf.yaml
init_config:
instances:
On Docker for Mac, if we go to Preferences > Advanced. We see a swap field defaulted to 1GB.
Can you please tell me if there is a command or API to get the present value set for the Swap.
Does DataDog monitor openstack now that Neutron is the preferred component? The document written in 2015 goes into detail over monitoring with Nova but that is not an option for those using Neutron.
I would appreciate if someone will document how the datadog agent should be configured to get Celery metrics in Datadog.
In the Garbage Collection section of the "How to monitor Elasticsearch performance" article, the link to Oracle's Garbage Collection article is not working anymore:
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
Keep up the great work, those articles rock!
Apologies if this isn't the proper forum for this kind of feedback. After reading the time series graph 101 blog post, I wanted to mention this kind of visualization in case not on your roadmap already. I haven't seen it available in any of the monitoring systems I've used.
The dashboard showed at https://github.com/DataDog/the-monitor/blob/master/varnish/monitor_varnish_using_datadog.md is blank.
This is probably related to DataDog/dd-agent#1459
Anyone willing to follow the guide will end up with a blank dashboard wondering if something went wrong, but seeing the graphs from "infrastructure->apps->varnish" is possible.
Is there a way i can fix the dashboard (".MAIN.") without having to rewrite everything?
edit-> varnish 5.2 and datadog-agent-5.18.1
Is there a way to monitor NGINX 5xx errors with NGINX (open-source)?
I saw an old article that says it's only available for NGINX Plus.
But on the most recent article there's nothing saying about that.
In EKS automountServiceAccountToken to be set to true for data dog agents to work
Hi,
We monitor our Postgres Database using the DataDog Agent for PostgreSQL as well as by tracing the queries our Go
applications generate. We use the ddtrace's gorm package to trace the database queries.
When investigating the requests for a query, the values for trace.postgres.query.hits
and postgresql.queries.count
are returning completely different results within the same timeframe.
Is this expected? Why is this happening?
Hello,
I am Yilin, from the Apache APISIX community. Apache APISIX is a Cloud-native API gateway, and it is the top-level project of the Apache Software Foundation. You can get more details from GitHub: https://github.com/apache/apisix.
We recently released a plugin to integrate Datadog in Apache APISIX. I think the plugin is very meaningful for developers and both communities.In addition, it will help Datadog and Apache APISIX publicize and let more developers and companies know about us.
I am reaching out to you guys to see if we can have this blog posted here.What do you think?
in datadoghg web site in article How to monitor Elasticsearch performance should be mentioned that thread_pool.bulk from version 6.3 of elastic has been renamed to thread_pool.write.
You are giving an example JSON log for Nginx in this guide :
https://www.datadoghq.com/blog/how-to-monitor-nginx-with-datadog/#use-json-logs-for-automatic-parsing
it's quite a poor example as:
status
attributes, that should be a http.status_code
probablySome of our articles contain internal links to other Datadog pages and articles, e.g.
/blog/monitoring-101-collecting-data/
These links won't work from within Github, and they can be confusing to external contributors (see for example #68), so we should either:
Is it possible to configure the datadog agent to gather stats using the stats socket rather than a localhost url?
In this article: https://www.datadoghq.com/blog/how-to-collect-haproxy-metrics/ it does say how to setup socket stats collecting.. but then in the next article it only says how to integrate that using the localhost endpoint.
Thanks for any help.
Reword this section, please.
Despite being pre-1.0, (current version is 0.9.0.1), it is production-ready
Kafka is now 2.0
Hi folks,
Yesterday I ran into an issue about collecting docker metrics on CoreOS v. 1068.6.0 using a shell script. I was trying to collect memory usage of a container by cat-ing the following file /sys/fs/cgroup/memory/system.slice/docker-$CONTAINER_ID/memory.usage_in_bytes
All the time I was getting the same value and it wasn't the right one. Digging around I've found that this file has been moved at /sys/fs/cgroup/memory/init.scope/system.slice/docker-$CONTAINER_ID/memory.usage_in_bytes
So basically the path for the metrics in newer versions of CoreOS has been changed from /sys/fs/cgroup/<METRIC>/system.slice/docker-$CONTAINER_ID/<METRIC_VALUE>
to /sys/fs/cgroup/<METRIC>/init.scope/system.slice/docker-$CONTAINER_ID/<METRIC_VALUE>
The testing has been done against 2 CoreOS clusters. One cluster has version 1010.6.0 and the other one 1068.6.0. Also the new path exists in the latest version of CoreOS (1068.9.0).
Maybe you should update the metrics collection page
Thanks!
If HAproxy isn't terminating SSL, the metrics look a bit misleading (large red number for 2xx).
Are there any plans to allow an alternate SSL passthrough centric dashboard? (connections per second, response times, etc.)
Please verify, but the description seems inaccurate - see the Sender.java code, runOnce function.
In elasticsearch integration doc I am seeing few of the metrics are missing from the Metrics section. For example, jvm.buffer_pools.*
, jvm.classes.*
. Can someone let me know
elastic.d/conf.yaml
?conf.yaml
. For example, let's say I am not interested in elasticsearch.cgroup.cpu.stat.number_of_times_throttled
and want it not to be collected?Agent Version - 7.21.1
Per DataDog/integrations-core#2582 (comment)
- name: DD_KUBERNETES_KUBELET_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
needs to be replaced with
- name: DD_KUBERNETES_KUBELET_HOST
valueFrom:
fieldRef:
fieldPath: spec.nodeName
Issue with deploy_droplet.py in https://github.com/DataDog/the-monitor/blob/master/openstack/devstack/deploy_droplet.py
○ → deploy_droplet.py
File "/usr/bin/deploy_droplet.py", line 33
print "IP: " + IP
^
SyntaxError: Missing parentheses in call to 'print'
Update Kafka series to match corp-hugo.
https://www.datadoghq.com/blog/eks-monitoring-datadog/#create-and-deploy-the-cluster-agent-manifest
Based on the provided documentation, kubectl apply -f /path/to/datadog-cluster-agent.yaml
should work, but it fails with error:
kubectl apply -f /path/to/datadog-cluster-agent.yaml
service/datadog-cluster-agent created
error: unable to recognize "/path/to/datadog-cluster-agent.yaml": no matches for kind "Deployment" in version "extensions/v1beta1"
Fixes required:
apiVersion: extensions/v1beta1
to apiVersion: apps/v1
spec
:selector:
matchLabels:
app: datadog-cluster-agent
Please update the documentation to support the fixes as otherwise other users will run into issues
Hello,
the information posted here is all incorrect (you will spend a lot of time following it and end up with pods that are either erroring or crashing)
part 3:
https://www.datadoghq.com/blog/eks-monitoring-datadog/
you should delete this page so people will not waste time and instead direct readers to the official documentation (which I am trying right now after spending hours following outdated documentation)
I was following the docs for https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/#metrics-collection-nginx-logs and noticed extra slashes in the rendered output which aren't in the example output
It looks like the Markdown parser or HTML generator is inserting extra slashes for $
Expected:
the-monitor/nginx/how_to_collect_nginx_metrics.md
Lines 207 to 209 in c98a018
log_format nginx '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent $request_time '
'"$http_referer" "$http_user_agent"';
Great documentation. However, the links to Part 1 + 3 are broken. Thanks.
The template chart of indexing latency and query latency might be wrong. It shows different with Kibana.
Take query latency for example. The template is using rate of fetch time and rate of query time like this:
It might be better to use derivatives to get the same chart with Kibana like this:
(derivative of fetch time + derivative of query time ) / derivative of query total
The json of this is
{ "viz": "timeseries", "status": "done", "requests": [ { "q": "( derivative(sum:elasticsearch.search.fetch.time{$cluster}) + derivative(sum:elasticsearch.search.query.time{$cluster}) ) * 1000 / derivative(sum:elasticsearch.search.query.total{$cluster})", "aggregator": "avg", "conditional_formats": [], "type": "line" } ], "autoscale": true }
The article https://www.datadoghq.com/blog/monitor-elasticsearch-datadog/#building-custom-elasticsearch-dashboards
has a nice looking elasticsearch dashboard, can you provide the source? Even if it doesn't work out of the box I'd rather start with something.
https://www.datadoghq.com/blog/how-to-collect-haproxy-metrics/
has this fragment
listen stats # Define a listen section called "stats"
bind :9000 # Listen on localhost:9000
that does NOT make haproxy stats service bind only to localhost:9000
Hello,
I have a HAProxy questionn.
Is it possible to monitor the number of backends currently UP / DOWN (the kind of thing you can see by opening HATOP)?
I'd quite like to see that information and have monitors against it.
Thanks for your help.
We have set server and client timeout. But I could not able to measure it from ui. I want to know how many clients are getting timeout and when. Same for the server.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.