Giter VIP home page Giter VIP logo

celery-exporter's Introduction

celery-exporter Build Status Code style: black

celery-tasks-by-task

Table of Contents

Why another exporter?

While I was adding Celery monitoring to a client site I realized that the existing brokers either didn't work, exposed incorrect metric values or didn't expose the metrics I needed. So I wrote this exporter which essentially wraps the built-in Celery monitoring API and exposes all of the event metrics to Prometheus in real-time.

Features

  • Tested for both Redis and RabbitMQ
  • Uses the built in real-time monitoring component in Celery to expose Prometheus metrics
  • Tracks task status (task-started, task-succeeded, task-failed etc)
  • Tracks which workers are running and the number of active tasks
  • Follows the Prometheus exporter best practises
  • Deployed as a Docker image or Python single-file binary (via PyInstaller)
  • Exposes a health check endpoint at /health
  • Grafana dashboards provided by the Celery-mixin
  • Prometheus alerts provided by the Celery-mixin

Dashboards and alerts

Alerting rules can be found here. By default we alert if:

  • A task failed in the last 10 minutes.
  • No Celery workers are online.

Tweak these to suit your use-case.

The Grafana dashboard (seen in the image above) is here. You can import it directly into your Grafana instance.

There's another Grafana dashboards that shows an overview of Celery tasks. An image can be found in ./images/celery-tasks-overview.png. It can also be found here.

Usage

Celery needs to be configured to send events to the broker which the exporter will collect. You can either enable this via Celery configuration or via the Celery CLI.

Enable events using the CLI

To enable events in the CLI run the below command. Note that by default it doesn't send the task-sent event which needs to be configured in the configuration. The other events work out of the box.

$ celery -A <myproject> control enable_events

Enable events using the configuration:

# In celeryconfig.py
worker_send_task_events = True
task_send_sent_event = True

Configuration in Django:

# In settings.py
CELERY_WORKER_SEND_TASK_EVENTS = True
CELERY_TASK_SEND_SENT_EVENT = True
Running the exporter

Using Docker:

docker run -p 9808:9808 danihodovic/celery-exporter --broker-url=redis://redis.service.consul/1

Using the Python binary (for-non Docker environments):

curl -L https://github.com/danihodovic/celery-exporter/releases/download/latest/celery-exporter -o ./celery-exporter
chmod+x ./celery-exporter
./celery-exporter --broker-url=redis://redis.service.consul/1
Kubernetes

There's a Helm in the directory charts/celery-exporter for deploying the Celery-exporter to Kubernetes using Helm.

Environment variables

All arguments can be specified using environment variables with a CE_ prefix:

docker run -p 9808:9808 -e CE_BROKER_URL=redis://redis danihodovic/celery-exporter
Specifying optional broker transport options

While the default options may be fine for most cases, there may be a need to specify optional broker transport options. This can be done by specifying one or more --broker-transport-option parameters as follows:

docker run -p 9808:9808 danihodovic/celery-exporter --broker-url=redis://redis.service.consul/1 \
  --broker-transport-option global_keyprefix=danihodovic \
  --broker-transport-option visibility_timeout=7200

In case of extended transport options, such as sentinel_kwargs you can pass JSON string:, for example:

docker run -p 9808:9808 danihodovic/celery-exporter --broker-url=sentinel://sentinel.service.consul/1 \
  --broker-transport-option master_name=my_master \
  --broker-transport-option sentinel_kwargs="{\"password\": \"sentinelpass\"}"

The list of available broker transport options can be found here: https://docs.celeryq.dev/projects/kombu/en/stable/reference/kombu.transport.redis.html

Specifying an optional retry interval

By default, celery-exporter will raise an exception and exit if there are any errors communicating with the broker. If preferred, one can have the celery-exporter retry connecting to the broker after a certain period of time in seconds via the --retry-interval parameter as follows:

docker run -p 9808:9808 danihodovic/celery-exporter --broker-url=redis://redis.service.consul/1 \
  --retry-interval=5
Grafana Dashboards & Prometheus Alerts

Head over to the Celery-mixin in this subdirectory to generate rules and dashboards suited to your Prometheus setup.

Metrics

Name Description Type
celery_task_sent_total Sent when a task message is published. Counter
celery_task_received_total Sent when the worker receives a task. Counter
celery_task_started_total Sent just before the worker executes the task. Counter
celery_task_succeeded_total Sent if the task executed successfully. Counter
celery_task_failed_total Sent if the execution of the task failed. Counter
celery_task_rejected_total The task was rejected by the worker, possibly to be re-queued or moved to a dead letter queue. Counter
celery_task_revoked_total Sent if the task has been revoked. Counter
celery_task_retried_total Sent if the task failed, but will be retried in the future. Counter
celery_worker_up Indicates if a worker has recently sent a heartbeat. Gauge
celery_worker_tasks_active The number of tasks the worker is currently processing Gauge
celery_task_runtime_bucket Histogram of runtime measurements for each task Histogram
celery_queue_length The number of message in broker queue Gauge
celery_active_consumer_count The number of active consumer in broker queue (Only work for RabbitMQ and Qpid broker, more details at here) Gauge
celery_active_worker_count The number of active workers in broker queue Gauge
celery_active_process_count The number of active process in broker queue. Each worker may have more than one process. Gauge

Used in production at https://findwork.dev and https://django.wtf.

Development

Pull requests are welcome here!

To start developing run commands below to prepare your environment after the git clone command:

# Install dependencies and pre-commit hooks
poetry install
pre-commit install

# Test everything works fine
pre-commit run --all-files
docker-compose up -d
pytest --broker=memory      --log-level=DEBUG
pytest --broker=redis       --log-level=DEBUG
pytest --broker=rabbitmq    --log-level=DEBUG

Contributors

Made with contrib.rocks.

celery-exporter's People

Contributors

abellotti avatar adinhodovic avatar aequitas avatar ahelmy avatar allburov avatar amandahla avatar claudinoac avatar danihodovic avatar dependabot[bot] avatar derom avatar evgkrsk avatar gciria avatar hiaselhans avatar homholueng avatar kittywaresz avatar likesavabutworse avatar max-wittig avatar pb-dod avatar roganartu avatar savinaroja avatar shuternay avatar spankratov avatar steinbrueckri avatar tversteeg avatar vokracko avatar wyattw-orca avatar xulaus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

celery-exporter's Issues

Can't pass json-like string as --broker-transport-option

Hi!

I want to connect this exporter to Sentinel that uses Redis, in my case it means that I have to pass dict object to Celery app instance (https://github.com/danihodovic/celery-exporter/blob/master/src/exporter.py#L211), but currently it's impossible because of --broker-transport-option params handling logic, it's restricted with key=value pattern and value will always by a string

What I'm trying to achieve:

docker run -p 9808:9808 -it --rm danihodovic/celery-exporter --broker-url sentinel://:[email protected]:26379/1 --broker-transport-option master_name=master01 --broker-transport-option sentinel_kwargs="{\"password\": \"xxx\"}" --log-level DEBUG

What I get (it happened because self.sentinel_kwargs value equals to {"password": "xxx"} string, but dict was expected by Redis client class):

  File "celery-exporter-playground/venv/lib/python3.9/site-packages/redis/sentinel.py", line 186, in <listcomp>
    Redis(hostname, port, **self.sentinel_kwargs)
TypeError: redis.client.Redis() argument after ** must be a mapping, not str

Is it possible to add parsing json-like strings in attempt to transform it to Python dict before updating transport_options object with this value? (As it implemented in https://github.com/OvalMoney/celery-exporter/blob/master/celery_exporter/__main__.py#L130?)

Add helm repository

How are you adding the helm repository?
I see index.html in the gh-pages branch. But it does not work.

$ helm repo add celery-exporter https://github.com/danihodovic/celery-exporter

Error: looks like "https://github.com/danihodovic/celery-exporter" is not a valid chart repository or cannot be reached: failed to fetch https://github.com/danihodovic/celery-exporter/index.yaml : 404 Not Found

Bug: Queue length doesn't seem right

I'm doing a bit of load testing and I'm seeing roughly correct numbers for sent, but the tasks don't reflect in the queue length all the time. The tasks are short (10s) so maybe this is a queue length polling time issue?

image

Add Sentinel support for the celery_queue_size metric

This metric is currently gated behind a small list of supported transports that doesn't include Sentinel:

def track_queue_metrics(self):
with self.app.connection() as connection: # type: ignore
transport = connection.info()["transport"]
acceptable_transports = ["redis", "rediss", "amqp", "memory"]
if transport not in acceptable_transports:
logger.debug(
f"Queue length tracking is only implemented for {acceptable_transports}"
)
return
# request workers to response active queues
# we need to cache queue info in exporter in case all workers are offline
# so that no worker response to exporter will make active_queues return None
queues = self.app.control.inspect().active_queues() or {}
for info_list in queues.values():
for queue_info in info_list:
self.queue_cache.add(queue_info["name"])
track_length = lambda q, l: self.celery_queue_length.labels(
queue_name=q
).set(l)
for queue in self.queue_cache:
if transport in ["redis", "rediss"]:
queue_length = redis_queue_length(connection, queue)
track_length(queue, queue_length)
elif transport in ["amqp", "memory"]:
queue_length = rabbitmq_queue_length(connection, queue)
track_length(queue, queue_length)
consumer_count = rabbitmq_queue_consumer_count(connection, queue)
self.celery_active_consumer_count.labels(queue_name=queue).set(
consumer_count
)

From what I can tell, it's because the redis implementation doesn't have Sentinel support (and that this exporter leans on Celery itself for all other metrics). See:

def redis_queue_length(connection, queue: str) -> int:
return connection.default_channel.client.llen(queue)

I think this should be fairly easy to add, by just looking up the master every time we need the queue size, per https://stackoverflow.com/a/45886136.

I'm happy to take a crack at a diff. We started using this exporter to get queue sizes, but obviously want to stick with sentinel support if we can.

Is it possible to add pre-commit configuration for pylint\black?

@danihodovic have you thought about adding pre-commit configuration in the repo and call it instead of separate commands?

I see that you did a great job to cache all pypi-dependencies in github actions, for pre-commit speed it up we can use https://pre-commit.ci/ to run these checks (without it takes too long to setup these hooks)

I'd like to create a PR for it, if you don't mind pre-commit!

I've been using pre-commit in personal and open sources projects for a long time, it really helps to not care about all linters, just call pre-commit install after git pull once! e.g. https://github.com/devopshq/artifactory

Memory usage grows non-stop

Hello!

First of all, thank you for this amazing project.

We're seeing the memory usage for the container with the exporter growing non-stop over time. It grows until an OOM error arises and Kubernetes restarts the container.

This is a look at the memory usage over 7 days:

Screenshot 2023-06-02 at 11 24 36

I'm unsure if this is related, but our current Prometheus configuration is as follows:

  • 10s scrape interval.
  • 10s timeout.

I'll share with you how our Grafana dashboard metrics for the said week:

Screenshot 2023-06-02 at 11 26 17

Due to #157 we switched from a Deployment to a StatefulSet, and it improved greatly our Prometheus queries latency, however, I worry about OOM errors, but I'm unsure what configuration may be the cause of it.

Thanks in advance!

Edit: Just for clarification, we're using the official Docker image without any modifications, the only parameter is the Redis Broker Queue URL.

helm installation with sentinel cluster and broker-transport-option

Hello. I am gonna deploy celery-exporter by using helm. I have 3 nodes redis sentinel cluster. So how can apply CE_BROKER_URL for sentinel cluster, for one redis I just give the one IP but here no idea and also how can I apply "broker-transport-option master_name=my_master" for example. By using docker commands I can do it but how can apply these values with helm installation. For example should I use CE_BROKER_TRANSPORT_OPTION as an env in values yaml?

kombu.exceptions.ContentDisallowed: Refusing to deserialize untrusted content of type pickle (application/x-python-serialize)

Hello,

Currently running celery with Django. I require these configurations

CELERY_TASK_SERIALIZER = "pickle"
CELERY_ACCEPT_CONTENT = ["json", "pickle"]

however the celery-exporter container is throwing these errors

2023-07-24 13:41:01,397] ERROR in app: Exception on /metrics [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/app/src/http_server.py", line 32, in metrics
    current_app.config["metrics_puller"]()
  File "/app/src/exporter.py", line 139, in scrape
    self.track_queue_metrics()
  File "/app/src/exporter.py", line 219, in track_queue_metrics
    for worker, stats in (self.app.control.inspect().stats() or {}).items()
  File "/usr/local/lib/python3.10/site-packages/celery/app/control.py", line 243, in stats
    return self._request('stats')
  File "/usr/local/lib/python3.10/site-packages/celery/app/control.py", line 106, in _request
    return self._prepare(self.app.control.broadcast(
  File "/usr/local/lib/python3.10/site-packages/celery/app/control.py", line 741, in broadcast
    return self.mailbox(conn)._broadcast(
  File "/usr/local/lib/python3.10/site-packages/kombu/pidbox.py", line 344, in _broadcast
    return self._collect(reply_ticket, limit=limit,
  File "/usr/local/lib/python3.10/site-packages/kombu/pidbox.py", line 386, in _collect
    self.connection.drain_events(timeout=timeout)
  File "/usr/local/lib/python3.10/site-packages/kombu/connection.py", line 316, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 971, in drain_events
    get(self._deliver, timeout=timeout)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 584, in get
    ret = self.handle_event(fileno, event)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 566, in handle_event
    return self.on_readable(fileno), self
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 562, in on_readable
    chan.handlers[type]()
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 967, in _brpop_read
    self.connection._deliver(loads(bytes_to_str(item)), dest)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 991, in _deliver
    callback(message)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 624, in _callback
    return callback(message)
  File "/usr/local/lib/python3.10/site-packages/kombu/messaging.py", line 620, in _receive_callback
    decoded = None if on_m else message.decode()
  File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 194, in decode
    self._decoded_cache = self._decode()
  File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 198, in _decode
    return loads(self.body, self.content_type,
  File "/usr/local/lib/python3.10/site-packages/kombu/serialization.py", line 242, in loads
    raise self._for_untrusted_content(content_type, 'untrusted')
kombu.exceptions.ContentDisallowed: Refusing to deserialize untrusted content of type pickle (application/x-python-serialize)
[2023-07-24 13:42:01,397] ERROR in app: Exception on /metrics [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/app/src/http_server.py", line 32, in metrics
    current_app.config["metrics_puller"]()
  File "/app/src/exporter.py", line 139, in scrape
    self.track_queue_metrics()
  File "/app/src/exporter.py", line 219, in track_queue_metrics
    for worker, stats in (self.app.control.inspect().stats() or {}).items()
  File "/usr/local/lib/python3.10/site-packages/celery/app/control.py", line 243, in stats
    return self._request('stats')
  File "/usr/local/lib/python3.10/site-packages/celery/app/control.py", line 106, in _request
    return self._prepare(self.app.control.broadcast(
  File "/usr/local/lib/python3.10/site-packages/celery/app/control.py", line 741, in broadcast
    return self.mailbox(conn)._broadcast(
  File "/usr/local/lib/python3.10/site-packages/kombu/pidbox.py", line 344, in _broadcast
    return self._collect(reply_ticket, limit=limit,
  File "/usr/local/lib/python3.10/site-packages/kombu/pidbox.py", line 386, in _collect
    self.connection.drain_events(timeout=timeout)
  File "/usr/local/lib/python3.10/site-packages/kombu/connection.py", line 316, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 971, in drain_events
    get(self._deliver, timeout=timeout)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 584, in get
    ret = self.handle_event(fileno, event)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 566, in handle_event
    return self.on_readable(fileno), self
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 562, in on_readable
    chan.handlers[type]()
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/redis.py", line 967, in _brpop_read
    self.connection._deliver(loads(bytes_to_str(item)), dest)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 991, in _deliver
    callback(message)
  File "/usr/local/lib/python3.10/site-packages/kombu/transport/virtual/base.py", line 624, in _callback
    return callback(message)
  File "/usr/local/lib/python3.10/site-packages/kombu/messaging.py", line 620, in _receive_callback
    decoded = None if on_m else message.decode()
  File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 194, in decode
    self._decoded_cache = self._decode()
  File "/usr/local/lib/python3.10/site-packages/kombu/message.py", line 198, in _decode
    return loads(self.body, self.content_type,
  File "/usr/local/lib/python3.10/site-packages/kombu/serialization.py", line 242, in loads
    raise self._for_untrusted_content(content_type, 'untrusted')
kombu.exceptions.ContentDisallowed: Refusing to deserialize untrusted content of type pickle (application/x-python-serialize)
[ec2-user@ip-172-31-29-0 monitoring]$ 

celery_task_rejected and celery_task_failed metrics are not updating even if the celery task is failing

I have intergerate this celery exporter in my django app and set these two fields as True in settings:
CELERY_WORKER_SEND_TASK_EVENTS = True
CELERY_TASK_SEND_SENT_EVENT = True

The celery_task_failed metric is not getting update even after my celery task is failing to execute and throwing an exception. I have use the docker image and passed my broker url. I have attached my yaml file for deployment and the metric data which shows only some data and not the failed task data.

Metric.txt

celery-exporter.txt

celery-exporter does not work with SQS Broker

I've noticed a PR for SQS support - #1

Followed prerequisites, like enabling celery events with -E flag(enable_events), & ran the celery-exporter --broker-url <SQS_Queue_URL> -l localhost:8888

It starts with the following logs:

[2022-09-28 19:57:57,757] botocore.credentials:INFO: Found credentials from IAM Role: <IAM_ROLE>
[2022-09-28 19:57:57,907] kombu.mixins:INFO: Connected to sqs://<SQS_URL>//

I could see two queues were created with name:

  1. celeryev-<UUID>
  2. <UUID>-reply-celery-pidbox

However, I could not see the celery metrics. Currently, it shows the following on scraping url:

python_gc_objects_collected_total{generation="0"} 1742.0
python_gc_objects_collected_total{generation="1"} 4233.0
python_gc_objects_collected_total{generation="2"} 257.0
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
python_gc_collections_total{generation="0"} 233.0
python_gc_collections_total{generation="1"} 21.0
python_gc_collections_total{generation="2"} 1.0
python_info{implementation="CPython",major="3",minor="8",patchlevel="13",version="3.8.13"} 1.0
process_virtual_memory_bytes 5.22747904e+08
process_resident_memory_bytes 7.9413248e+07
process_start_time_seconds 1.66439535912e+09
process_cpu_seconds_total 1.53
process_open_fds 14.0
process_max_fds 1024.0
celery_workers{namespace="celery"} 0.0

I've also observed the following error, which I did not observe while testing celery-prometheus-exporter(it does not show metrics either).

[2022-09-28 19:58:12,470] workers-thread:ERROR: Error while pinging workers

Questions:

  1. Is there any prerequisite that seems to be missed?
  2. Is SQS supported & tested as broker for celery metrics?
  3. Is there a way to work around this issue?

Let me know if you need any additional information.

How do I actually connect this to grafana?

image

When I put the url in grafana, I get a 404:

image

The endpoint seems to produce a lot of logs

# HELP celery_task_sent_total Sent when a task message is published.
# TYPE celery_task_sent_total counter
celery_task_sent_total{hostname="web-669dd797b5-cbtnj",name="printing.tasks.execute_print_job"} 1.0
celery_task_sent_total{hostname="p1-metal",name="api_in.tasks.reprocess_cdtp_orders"} 8.0
celery_task_sent_total{hostname="p1-metal",name="banking.tasks.get_all_account_transactions"} 10.0
celery_task_sent_total{hostname="collider-7",name="api_in.tasks.reprocess_cdtp_orders"} 0.0
celery_task_sent_total{hostname="p1kube",name="banking.tasks.get_all_account_transactions"} 0.0
celery_task_sent_total{hostname="p1-metal",name="documents.tasks.auto_create_all_documents"} 8.0
celery_task_sent_total{hostname="p1",name="documents.tasks.auto_create_all_documents"} 0.0
celery_task_sent_total{hostname="p1-metal",name="robotics.tasks.collect_thermo_hygrometer_data"} 8.0
celery_task_sent_total{hostname="p1kube",name="robotics.tasks.collect_thermo_hygrometer_data"} 0.0
celery_task_sent_total{hostname="p1-metal",name="hr.tasks.apply_work_period_application_seconds"} 36.0
celery_task_sent_total{hostname="p1kube",name="hr.tasks.apply_work_period_application_seconds"} 0.0
celery_task_sent_total{hostname="p1-metal",name="covid_packaging.tasks.auto_update_suggested_weights"} 36.0
celery_task_sent_total{hostname="p1-metal",name="p1.celery.ReadyRollVin_save_garment_shrinkage"} 8.0
celery_task_sent_total{hostname="p1kube",name="p1.celery.ReadyRollVin_save_garment_shrinkage"} 0.0
celery_task_sent_total{hostname="p1kube",name="covid_packaging.tasks.auto_update_suggested_weights"} 0.0
celery_task_sent_total{hostname="p1-metal",name="p1.celery.create_amazon_orders_cron"} 36.0
celery_task_sent_total{hostname="collider-4",name="p1.celery.create_amazon_orders_cron"} 0.0
celery_task_sent_total{hostname="p1-metal",name="production.tasks.process_all_output_card_files"} 71.0
celery_task_sent_total{hostname="p1",name="production.tasks.process_all_output_card_files"} 0.0
celery_task_sent_total{hostname="p1-metal",name="gov.tasks.check_pack_data_status"} 15.0
celery_task_sent_total{hostname="p1-metal",name="garments_cutting.tasks.send_all_stranded_pvcos"} 15.0
celery_task_sent_total{hostname="p1",name="gov.tasks.check_pack_data_status"} 0.0
celery_task_sent_total{hostname="collider-9",name="garments_cutting.tasks.send_all_stranded_pvcos"} 0.0
celery_task_sent_total{hostname="celery-celery-57cf47cd46-thnfl",name="banking.tasks.get_transactions"} 36.0

Missing metrics in prometheus.

Hi, I am missing 'fail' metrics such as
image

Every metric that has to be with failures I don't get, the others work fine.

I am using the latest version of prometheus, grafana and celery-exporter.

My celery start script:

celery -A tasks worker -l INFO -E -c 10 -n worker1@%h

Celery dockerfile

FROM python:3.8.6

RUN useradd -ms /bin/bash celery
WORKDIR /home/kektop
USER celery

COPY celery/* ./

RUN pip install -r requirements.txt
ENV PATH="${PATH}:/home/celery/.local/bin"

#CMD ["tail", "-f", "/dev/null"]
CMD ["bash", "celery_start.sh"]

celeryconfig.py

broker_url = 'amqp://guest:guest@localhost:5672//'

imports = ('tasks',)
worker_send_task_events = True
task_send_sent_event = True

No data values for most tasks metrics

I am running the exporter as a docker container and have enabled events, but when I am do a curl on the endpoint most metrics have no values.

CONTAINER ID   IMAGE                         COMMAND                  CREATED        STATUS        PORTS                                       NAMES
a3483de3a78b   danihodovic/celery-exporter   "python /app/cli.py …"   16 hours ago   Up 16 hours   0.0.0.0:9808->9808/tcp, :::9808->9808/tcp   bold_taussig

Below shows the response to enabling events.

[root@txslntvpa2v sevone_webhook_celery_nonprod]# celery -A webhookDevCelery.celery control enable_events
->  celery@txslntvpa2v: OK
        task events enabled

Below shows the curl results. Just wanted to make sure I am not missing something in regards to enabling this.

# HELP celery_task_sent_total Sent when a task message is published.
# TYPE celery_task_sent_total counter
# HELP celery_task_received_total Sent when the worker receives a task.
# TYPE celery_task_received_total counter
# HELP celery_task_started_total Sent just before the worker executes the task.
# TYPE celery_task_started_total counter
# HELP celery_task_succeeded_total Sent if the task executed successfully.
# TYPE celery_task_succeeded_total counter
# HELP celery_task_failed_total Sent if the execution of the task failed.
# TYPE celery_task_failed_total counter
# HELP celery_task_rejected_total The task was rejected by the worker, possibly to be re-queued or moved to a dead letter queue.
# TYPE celery_task_rejected_total counter
# HELP celery_task_revoked_total Sent if the task has been revoked.
# TYPE celery_task_revoked_total counter
# HELP celery_task_retried_total Sent if the task failed, but will be retried in the future.
# TYPE celery_task_retried_total counter
# HELP celery_worker_up Indicates if a worker has recently sent a heartbeat.
# TYPE celery_worker_up gauge
celery_worker_up{hostname="txslntvpa2v"} 1.0
# HELP celery_worker_tasks_active The number of tasks the worker is currently processing
# TYPE celery_worker_tasks_active gauge
celery_worker_tasks_active{hostname="txslntvpa2v"} 0.0
# HELP celery_task_runtime Histogram of task runtime measurements.
# TYPE celery_task_runtime histogram
# HELP celery_queue_length The number of message in broker queue.
# TYPE celery_queue_length gauge
celery_queue_length{queue_name="celery"} 0.0
# HELP celery_active_consumer_count The number of active consumer in broker queue.
# TYPE celery_active_consumer_count gauge
# HELP celery_active_worker_count The number of active workers in broker queue.
# TYPE celery_active_worker_count gauge
celery_active_worker_count{queue_name="celery"} 1.0
# HELP celery_active_process_count The number of active processes in broker queue.
# TYPE celery_active_process_count gauge
celery_active_process_count{queue_name="celery"} 8.0

Add option to set log-level

Thanks @danihodovic for picking up the torch and making this exporter 🙌

I noticed the default log-level for the Docker version of the exporter is set to DEBUG. This can create quite a lot of noise in production envs where we might not need that level of verbosity.

Could you add a CLI option to allow users to specify the log-level?

Helm installation broken after introducing liveness and readiness probes

When trying to install helm chart, validation errors appear:

Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: error validating "": error validating data: 
[ValidationError(Deployment.spec.template.spec.containers[0].livenessProbe.httpGet): unknown field "failureThreshold" in io.k8s.api.core.v1.HTTPGetAction, ValidationError(Deployment.spec.template.spec.containers[0].livenessProbe.httpGet): unknown field "periodSeconds" in io.k8s.api.core.v1.HTTPGetAction, 
ValidationError(Deployment.spec.template.spec.containers[0].livenessProbe.httpGet): unknown field "successThreshold" in io.k8s.api.core.v1.HTTPGetAction, ValidationError(Deployment.spec.template.spec.containers[0].livenessProbe.httpGet): unknown field "timeoutSeconds" in io.k8s.api.core.v1.HTTPGetAction, 
ValidationError(Deployment.spec.template.spec.containers[0].readinessProbe.httpGet): unknown field "failureThreshold" in io.k8s.api.core.v1.HTTPGetAction, ValidationError(Deployment.spec.template.spec.containers[0].readinessProbe.httpGet): unknown field "periodSeconds" in io.k8s.api.core.v1.HTTPGetAction, 
ValidationError(Deployment.spec.template.spec.containers[0].readinessProbe.httpGet): unknown field "successThreshold" in io.k8s.api.core.v1.HTTPGetAction, ValidationError(Deployment.spec.template.spec.containers[0].readinessProbe.httpGet): unknown field "timeoutSeconds" in io.k8s.api.core.v1.HTTPGetAction]

Since I have no permissions to create PRs, here's the solution (wrong indent of 4 last lines):

          readinessProbe:
            httpGet:
              path: /health
              port: http
            timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds | default "5" }}
            failureThreshold: {{ .Values.livenessProbe.failureThreshold | default "5" }}
            periodSeconds: {{ .Values.livenessProbe.periodSeconds | default "10" }}
            successThreshold: {{ .Values.livenessProbe.successThreshold | default "1" }}

The same applies to readinessProbe.

Feature Request: Timezone aware y-axis

I'd like to be able have the time in my timezone.

Currently I see:
image

If I migrate to time series

image

I can then use my browser's timezone

image

this fixes it

image

I'm happy to create a PR for the Grafana, but not sure where that repo is.

celery_worker_up gauge is not updated when a worker drops without sending a worker-offline event

celery_worker_up continues to emit 1 for a given worker hostname indefinitely after it is killed if it never sends a worker-offline event (eg: is SIGKILLed).

Relevant code block:

handlers = {
"worker-heartbeat": self.track_worker_heartbeat,
"worker-online": lambda event: self.track_worker_status(event, True),
"worker-offline": lambda event: self.track_worker_status(event, False),
}

You can verify this easily, by starting up a worker and killing it. You'll see something like this:

tl in host3 on ~ via 🐍 v2.7.16 took 5s
❯ curl localhost:9808/metrics -s | rg celery_worker_up
# HELP celery_worker_up Indicates if a worker has recently sent a heartbeat.
# TYPE celery_worker_up gauge
celery_worker_up{hostname="host1"} 1.0
celery_worker_up{hostname="host2"} 1.0
celery_worker_up{hostname="host3"} 1.0

tl in host3 on ~ via 🐍 v2.7.16 took 1s
❯ sc-restart celery-exporter.service
[sudo] password for tl:

tl in host3 on ~ via 🐍 v2.7.16 took 34s
❯ curl localhost:9808/metrics -s | rg celery_worker_up
# HELP celery_worker_up Indicates if a worker has recently sent a heartbeat.
# TYPE celery_worker_up gauge
celery_worker_up{hostname="host1"} 1.0
celery_worker_up{hostname="host2"} 1.0

I have two ideas for how to handle this:

  1. support some kind of "worker timeout" option, and after not seeing a heartbeat for that long celery_worker_up is set to 0 for that hostname. Users would need to set this in alignment with their celery heartbeat interval settings, though. A long default (10 minutes?) still seems better than never, though.
  2. export a new metric: celery_worker_last_heartbeat_timestamp - the unix timestamp of the last heartbeat received for a given worker

(2) seems a lot simpler than (1), which I guess needs some kind of background thread, though leaving (1) around seems like a footgun. I'm happy to submit a diff for (2) as it allows greater precision in alerting than the binary up metric.

Docker container crashes unexpectedly

Hello!
I'm using docker image on the latest version and sometimes the container randomly crashes and exports wrong data or no data at all to Prometheus. sometimes it happens with logs that I attached below and sometimes with no logs at all. I don't know if it's for the bad configuration or what. I just pass the Redis broker URI to the container with no other configuration.

$ docker logs -f celery_exporter

Socket error
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/waitress/channel.py", line 121, in handle_write
flush()
File "/usr/local/lib/python3.9/site-packages/waitress/channel.py", line 259, in _flush_some
num_sent = self.send(chunk)
File "/usr/local/lib/python3.9/site-packages/waitress/wasyncore.py", line 431, in send
result = self.socket.send(data)
TimeoutError: [Errno 110] Connection timed out
total open connections reached the connection limit, no longer accepting new connections

Exception while fetching queue length

Creating this issue so I don't lose track of it.

I was giving the latest 0.4.0 build a try as we could use the new celery_queue_length metrics. However, for our use, I'm getting an exception for each of the queues we have and those counters are staying at 0.

Our usage is:

python cli.py --port 8000 --broker-url "redis://...:6379" --retry-interval 5 --log-level INFO \ 
                       --broker-transport-option global_keyprefix=aab-

In the output, we get an error like the following for each queue:

2022-07-22 20:38:18.033 | ERROR    | src.exporter:track_queue_length:127 - Queue check_azure_subscription_and_create_cloud_account declare failed: Channel.queue_declare: (404) NOT_FOUND - no queue 'check_azure_subscription_and_create_cloud_account' in vhost '/'
Traceback (most recent call last):

  File "/usr/lib64/python3.9/threading.py", line 930, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x7f08c5f77ee0>
    └ <Thread(waitress-0, started daemon 139675548083968)>
  File "/usr/lib64/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
    │    └ <function Thread.run at 0x7f08c5f77c10>
    └ <Thread(waitress-0, started daemon 139675548083968)>
  File "/usr/lib64/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <Thread(waitress-0, started daemon 139675548083968)>
    │    │        │    └ (0,)
    │    │        └ <Thread(waitress-0, started daemon 139675548083968)>
    │    └ <bound method ThreadedTaskDispatcher.handler_thread of <waitress.task.ThreadedTaskDispatcher object at 0x7f08bff4f460>>
    └ <Thread(waitress-0, started daemon 139675548083968)>

  File "/usr/local/lib/python3.9/site-packages/waitress/task.py", line 84, in handler_thread
    task.service()
    │    └ <function HTTPChannel.service at 0x7f08c012d4c0>
    └ <waitress.channel.HTTPChannel connected 10.128.4.1:56130 at 0x7f08bfef6940>

  File "/usr/local/lib/python3.9/site-packages/waitress/channel.py", line 397, in service
    task.service()
    │    └ <function Task.service at 0x7f08c00db700>
    └ <waitress.task.WSGITask object at 0x7f08bfef6040>


  File "/usr/local/lib/python3.9/site-packages/waitress/task.py", line 168, in service
    self.execute()
    │    └ <function WSGITask.execute at 0x7f08c00dbb80>
    └ <waitress.task.WSGITask object at 0x7f08bfef6040>

  File "/usr/local/lib/python3.9/site-packages/waitress/task.py", line 434, in execute
    app_iter = self.channel.server.application(environ, start_response)
               │    │       │      │           │        └ <function WSGITask.execute.<locals>.start_response at 0x7f08bd0993a0>
               │    │       │      │           └ {'REMOTE_ADDR': '10.128.4.1', 'REMOTE_HOST': '10.128.4.1', 'REMOTE_PORT': '56130', 'REQUEST_METHOD': 'GET', 'SERVER_PORT': '8...
               │    │       │      └ <Flask 'src.http_server'>
               │    │       └ <waitress.server.TcpWSGIServer listening 0.0.0.0:8000 at 0x7f08bfee4f10>
               │    └ <waitress.channel.HTTPChannel connected 10.128.4.1:56130 at 0x7f08bfef6940>
               └ <waitress.task.WSGITask object at 0x7f08bfef6040>

  File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2091, in __call__
    return self.wsgi_app(environ, start_response)
           │    │        │        └ <function WSGITask.execute.<locals>.start_response at 0x7f08bd0993a0>
           │    │        └ {'REMOTE_ADDR': '10.128.4.1', 'REMOTE_HOST': '10.128.4.1', 'REMOTE_PORT': '56130', 'REQUEST_METHOD': 'GET', 'SERVER_PORT': '8...
           │    └ <function Flask.wsgi_app at 0x7f08c0188f70>
           └ <Flask 'src.http_server'>

  File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
               │    └ <function Flask.full_dispatch_request at 0x7f08c0188550>
               └ <Flask 'src.http_server'>

  File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
         │    └ <function Flask.dispatch_request at 0x7f08c01884c0>
         └ <Flask 'src.http_server'>

  File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
           │    │           │    │              │    │            │   └ {}
           │    │           │    │              │    │            └ <Request 'http://10.128.5.5:8000/metrics' [GET]>
           │    │           │    │              │    └ 'celery_exporter.metrics'
           │    │           │    │              └ <Rule '/metrics' (OPTIONS, HEAD, GET) -> celery_exporter.metrics>
           │    │           │    └ {'static': <function Flask.__init__.<locals>.<lambda> at 0x7f08c00a5550>, 'celery_exporter.index': <function index at 0x7f08c...
           │    │           └ <Flask 'src.http_server'>
           │    └ <function Flask.ensure_sync at 0x7f08c0188820>
           └ <Flask 'src.http_server'>

  File "/opt/celery-exporter/src/http_server.py", line 32, in metrics
    current_app.config["metrics_puller"]()
    └ <Flask 'src.http_server'>

> File "/opt/celery-exporter/src/exporter.py", line 122, in track_queue_length
    ret = connection.default_channel.queue_declare(
          │          └ <property object at 0x7f08c1a96860>
          └ <Connection: redis://cloudigrade-redis.ephemeral-ztg6s3.svc:6379// at 0x7f08bd02ef70>

  File "/usr/local/lib/python3.9/site-packages/kombu/transport/virtual/base.py", line 516, in queue_declare
    raise ChannelError(
          └ <class 'amqp.exceptions.ChannelError'>

amqp.exceptions.ChannelError: Channel.queue_declare: (404) NOT_FOUND - no queue 'check_azure_subscription_and_create_cloud_account' in vhost '/'

Quickly looking at the new code, the logged exception is coming from here:

logger.exception(f"Queue {queue} declare failed: {str(ex)}")

Temporarily updating to a logger.info quiets down the error, but still a caught exception and setting counters to 0.

We've been running celery-exporter without any issues since May.

No queue, no events

I'm using Django + Celery + Redis + django-celery-results on a single host. Everything works like a charm. I'm trying to run celery-exporter (from cli, git cloned source master branch, tried 0.4.1 as well) and get the following:

  • worker-heartbeat event is successfully received
  • task-sent event for all initiated tasks is successfully received
  • no other task events are received
  • track_queue_length shows Queue 'celery' not found error in debug log

LLEN celery in redis-cli shows that queue. Celery versions are equal in Django and celery-exporter: 5.2.3. Have no idea where to look further.

Grafana dashboard example

Hi, I can't figure out how to generate dashboard to look same as on your screenshot, maybe you can add dashboard json to repo ?

support transport options

Hi,

It would be great if transport_options param would be added here. So in case where broker is redis-sentinel i could pass master name.

Cheers.

Segmentation fault using the binary

Hello, I tried to execute the celery-exporter binary and I get a Segmentation fault ...

Perhaps pip install like is better than distributing a PyInstaller binaries ?

receiving celery_worker_tasks_active = 0

celery_worker_tasks_active{hostname="worker2"} 0.0
celery_worker_tasks_active{hostname="worker1"} 0.0

Hey! i get 0 active tasks even though there are currently 10. it seems to me that celery-exporter monitoring is not updating active tasks often enough

Add version tagged Docker image

It would be nice with a version tagged Docker image instead of just latest. Don't want any surprises when deploying in production.

run failed maybe caused by environment variable

Is there some special environment variable caused run failed with reason

2022-10-11 09:09:28.632 | ERROR    | src.exporter:run:255 - celery-exporter exception '[Errno 2] No such file or directory: '/home/app/gauge_all_1.db'', retrying in 0 seconds.

here is my run config

apiVersion: apps/v1
kind: Deployment
metadata:
  name: celery-exporter
  annotations:
    reloader.stakater.com/auto: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: celery-exporter
  template:
    metadata:
      labels:
        app: celery-exporter
      annotations:
        prometheus.io/scrape: "true"
    spec:
      containers:
      - name: celery-exporter
        image: danihodovic/celery-exporter:latest
        imagePullPolicy: IfNotPresent
        ports:
          - containerPort: 9808
            name: metrics
            protocol: TCP
        args: ["--broker-url=redis://zeno-redis:6379/0"]
        envFrom:
          - configMapRef:
              name: my-config-map
        resources:
          requests:
            memory: 400Mi
          limits:
            memory: 800Mi

after I removed field envFrom , it worked

Cannot install with poetry for some reason.

Hey there,

Trying to install this with poetry but I keep getting the following:

  • Installing celery-exporter (0.5.3 0ea8dba): Failed

  EnvCommandError

  Command ['/home/routhinator/.cache/pypoetry/virtualenvs/theden-django-N-6mPsJq-py3.10/bin/pip', 'install', '--no-deps', '-U', '/home/routhinator/.cache/pypoetry/virtualenvs/theden-django-N-6mPsJq-py3.10/src/celery-exporter'] errored with the following return code 1, and output: 
  Processing /home/routhinator/.cache/pypoetry/virtualenvs/theden-django-N-6mPsJq-py3.10/src/celery-exporter
    Installing build dependencies: started
    Installing build dependencies: finished with status 'done'
    Getting requirements to build wheel: started
    Getting requirements to build wheel: finished with status 'done'
    Preparing metadata (pyproject.toml): started
    Preparing metadata (pyproject.toml): finished with status 'error'
    error: subprocess-exited-with-error
    
    × Preparing metadata (pyproject.toml) did not run successfully.
    │ exit code: 1
    ╰─> [16 lines of output]
        Traceback (most recent call last):
          File "/home/routhinator/.cache/pypoetry/virtualenvs/theden-django-N-6mPsJq-py3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
            main()
          File "/home/routhinator/.cache/pypoetry/virtualenvs/theden-django-N-6mPsJq-py3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
            json_out['return_val'] = hook(**hook_input['kwargs'])
          File "/home/routhinator/.cache/pypoetry/virtualenvs/theden-django-N-6mPsJq-py3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
            return hook(metadata_directory, config_settings)
          File "/tmp/pip-build-env-v5uv1_uj/overlay/lib/python3.10/site-packages/poetry/core/masonry/api.py", line 41, in prepare_metadata_for_build_wheel
            builder = WheelBuilder(poetry)
          File "/tmp/pip-build-env-v5uv1_uj/overlay/lib/python3.10/site-packages/poetry/core/masonry/builders/wheel.py", line 57, in __init__
            super().__init__(poetry, executable=executable)
          File "/tmp/pip-build-env-v5uv1_uj/overlay/lib/python3.10/site-packages/poetry/core/masonry/builders/builder.py", line 83, in __init__
            self._module = Module(
          File "/tmp/pip-build-env-v5uv1_uj/overlay/lib/python3.10/site-packages/poetry/core/masonry/utils/module.py", line 69, in __init__
            raise ModuleOrPackageNotFound(
        poetry.core.masonry.utils.module.ModuleOrPackageNotFound: No file/folder found for package celery-exporter
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: metadata-generation-failed
  
  × Encountered error while generating package metadata.
  ╰─> See above for output.
  
  note: This is an issue with the package mentioned above, not pip.
  hint: See above for details.
  
  [notice] A new release of pip available: 22.2.2 -> 22.3.1
  [notice] To update, run: pip install --upgrade pip
  

  at ~/.local/lib/python3.10/site-packages/poetry/utils/env.py:1195 in _run
      1191│                 output = subprocess.check_output(
      1192│                     cmd, stderr=subprocess.STDOUT, **kwargs
      1193│                 )
      1194│         except CalledProcessError as e:
    → 1195│             raise EnvCommandError(e, input=input_)
      1196│ 
      1197│         return decode(output)
      1198│ 
      1199│     def execute(self, bin, *args, **kwargs):


I have added celery-exporter as a requirement and pointed at the lastest release tag like so:

celery-exporter = {git = "https://github.com/danihodovic/celery-exporter.git", tag = "celery-exporter-0.4.1"}

Is there a PyPi release I'm missing? Or install directions I am missing?

While I'm at it - do I need to add this to installed_apps? I don't see this mentioned in the readme. Will the metrics come out in django-prometheus exporter metrics? Or is there a port configuration when running with Django that I am missing in the README?

[QUESTION] What is the memory requirement for celery-exporter?

We have a docker service that's using the image danihodovic/celery-exporter

We have a memory limit set for this service at 100M. However, we find that this celery-exporter service's memory use steadily climbs up and consistently around ~3m mark, this limit is reached and the celery-exporter service gets killed and starts again. So I was wondering how much memory is usually needed to be able to run this tool?

Almost no data is shown in the dashboard

Hello again and thank you for your exporter:) I see the data and the endpoint I've configured and I see that this endpoint is properly pulled by Prometheus, but then Grafana dashboard I've imported from your link only shows the number of active workers, any idea why?

My Grafana version is v8.1.4

Thank you

image

Here is how the center panel query explorer looks like

image

Metrics become very large over time on Kubernetes platform

Installed on kubernetes via the helm chart provided on this repo.

I see the metrics endpoint also has hostname label. The thing with kubernetes is, the hostname has some randomly generated suffix if you're use Deployment resource. So for every restart/update to the pods, it will generate new hostname.

I'm also using kubernetes' CronJob to call celery tasks. This also generates new hostname every time a job is called.

And now the grafana dashboard load time worsens as time goes by. Do you know any approach I can take to tackle this issue?

celery_worker_up reporting 1 for turned off workers

Hi,
has anyone noticed when scaling up/down the celery workers (pods) that the metric celery_worker_up just keeps rising and reports all previous workers as running?
When I try to run celery -A proj inspect stats it returns the correct amount of workers.
Looking at the code I dont see anything weird.

Thanks a lot for help

Refusing to deserialize untrusted content of type pickle

File "/usr/local/lib/python3.9/site-packages/kombu/messaging.py", line 620, in _receive_callback
decoded = None if on_m else message.decode()
File "/usr/local/lib/python3.9/site-packages/kombu/message.py", line 194, in decode
self._decoded_cache = self._decode()
File "/usr/local/lib/python3.9/site-packages/kombu/message.py", line 198, in _decode
return loads(self.body, self.content_type,
File "/usr/local/lib/python3.9/site-packages/kombu/serialization.py", line 242, in loads
raise self._for_untrusted_content(content_type, 'untrusted')
kombu.exceptions.ContentDisallowed: Refusing to deserialize untrusted content of type pickle (application/x-python-serialize)

Deployed celery-exporter using helm chart.

Error create image from Dockerfile

Hello @danihodovic!!

I've a problem with image creation

Step 7/10 : RUN poetry config virtualenvs.create false && poetry install --no-interaction #!COMMIT
 ---> Running in 3212599f0a54
Skipping virtualenv creation, as specified in config file.
Installing dependencies from lock file

  SolverProblemError

  Because celery (5.0.5 git rev allow-heartbeats-in-tests) depends on pycurl (==7.43.0.5) which doesn't match any versions, celery is forbidden.

  So, because celery-exporter depends on celery (5.0.5 git rev allow-heartbeats-in-tests), version solving failed.

  at /usr/local/lib/python3.8/site-packages/poetry/puzzle/solver.py:241 in _solve
      237│             packages = result.packages
      238│         except OverrideNeeded as e:
      239│             return self.solve_in_compatibility_mode(e.overrides, use_latest=use_latest)
      240│         except SolveFailure as e:
    → 241│             raise SolverProblemError(e)
      242│ 
      243│         results = dict(
      244│             depth_first_search(
      245│                 PackageNode(self._package, packages), aggregate_package_nodes
The command '/bin/sh -c poetry config virtualenvs.create false && poetry install --no-interaction #!COMMIT' returned a non-zero code: 1

I tried change dependency and celery repository, but still failing

Adding more metrics

Hi,

I have just found this great repo right when we are adding a lot of monitoring at work (also for celery).

I would love to have a few more metrics (don't we all :P):

  • queuing time
  • task run time
  • queue length (per worker, per task name etc.). This may not be easy as each broker requires different treatment but rabbitmq/redis are most popular, could get started with those.

Our workload is strictly CPU intensive and we need to know when we are saturating the workers and need to buy more VMs.

I was thinking of trying to implement this and make a PR, it would be a damn shame just to do it for my work only :)

Would you be ok with me making a PR for this? Cannot promise anything time-wise as it is super busy now but I should get started this weekend (just get the repo and start testing my options, reading the code etc., design approach to this).

I think the main idea is to reuse what you already have and add more handlers. To measure the queueing time its simply:
task started time - task received time but please correct me if I am wrong here.

We could also check the latency so: task received time - task sent time
Task run time should be already present on the event itself I believe (haven't inspected them in debug mode for a while though).

Initially the metric could be by host/task name as the existing ones you have, later maybe could be combined for queues/workers if possible.

Please let me know what you think about this idea and if contribution is ok :).

Tom

Statistics in Grafana not working

I deployed the exporter, but the statistics in Grafana do not work.

Prometheus:

  - job_name: "Celery"
    static_configs:
    - targets: ["10.50.0.1:9808"]

curl http://10.50.0.1:9808/metrics:

# HELP celery_task_sent_total Sent when a task message is published.
# TYPE celery_task_sent_total counter
celery_task_sent_total{hostname="instance32",name="app.email_services.send_email",queue_name="celery"} 11.0
celery_task_sent_total{hostname="instance32",name="app.tasks.search_for_new_available_staff",queue_name="celery"} 0.0
celery_task_sent_total{hostname="instance32",name="app.tasks.search_for_new_available_shifts",queue_name="celery"} 0.0
celery_task_sent_total{hostname="instance32",name="app.tasks.search_for_expired_shifts",queue_name="celery"} 0.0
celery_task_sent_total{hostname="instance32",name="app.tasks.search_for_unconfirmed_shifts",queue_name="celery"} 0.0
celery_task_sent_total{hostname="instance32",name="celery.backend_cleanup",queue_name="celery"} 0.0
# HELP celery_task_sent_created Sent when a task message is published.
# TYPE celery_task_sent_created gauge
celery_task_sent_created{hostname="instance32",name="app.email_services.send_email",queue_name="celery"} 1.6855433891455138e+09
celery_task_sent_created{hostname="instance32",name="app.tasks.search_for_new_available_staff",queue_name="celery"} 1.6855776002541292e+09
celery_task_sent_created{hostname="instance32",name="app.tasks.search_for_new_available_shifts",queue_name="celery"} 1.6855776002659323e+09
celery_task_sent_created{hostname="instance32",name="app.tasks.search_for_expired_shifts",queue_name="celery"} 1.6855776002684853e+09
celery_task_sent_created{hostname="instance32",name="app.tasks.search_for_unconfirmed_shifts",queue_name="celery"} 1.685577600423241e+09
celery_task_sent_created{hostname="instance32",name="celery.backend_cleanup",queue_name="celery"} 1.68559200013313e+09
# HELP celery_task_received_total Sent when the worker receives a task.
# TYPE celery_task_received_total counter
celery_task_received_total{hostname="instance32",name="app.email_services.send_email",queue_name="celery"} 13.0
celery_task_received_total{hostname="instance32",name="app.tasks.search_for_new_available_staff",queue_name="celery"} 1.0
celery_task_received_total{hostname="instance32",name="app.tasks.search_for_new_available_shifts",queue_name="celery"} 1.0
celery_task_received_total{hostname="instance32",name="app.tasks.search_for_expired_shifts",queue_name="celery"} 1.0
celery_task_received_total{hostname="instance32",name="app.tasks.search_for_unconfirmed_shifts",queue_name="celery"} 1.0
celery_task_received_total{hostname="instance32",name="celery.backend_cleanup",queue_name="celery"} 1.0
# HELP celery_task_received_created Sent when the worker receives a task.
# TYPE celery_task_received_created gauge
celery_task_received_created{hostname="instance32",name="app.email_services.send_email",queue_name="celery"} 1.6855433895425422e+09
celery_task_received_created{hostname="instance32",name="app.tasks.search_for_new_available_staff",queue_name="celery"} 1.6855776002541919e+09
celery_task_received_created{hostname="instance32",name="app.tasks.search_for_new_available_shifts",queue_name="celery"} 1.685577600265991e+09
celery_task_received_created{hostname="instance32",name="app.tasks.search_for_expired_shifts",queue_name="celery"} 1.6855776002685437e+09
celery_task_received_created{hostname="instance32",name="app.tasks.search_for_unconfirmed_shifts",queue_name="celery"} 1.6855776004233027e+09
celery_task_received_created{hostname="instance32",name="celery.backend_cleanup",queue_name="celery"} 1.685592000133328e+09
# HELP celery_task_started_total Sent just before the worker executes the task.
# TYPE celery_task_started_total counter
celery_task_started_total{hostname="instance32",name="app.email_services.send_email",queue_name="celery"} 13.0
celery_task_started_total{hostname="instance32",name="app.tasks.search_for_new_available_staff",queue_name="celery"} 1.0
celery_task_started_total{hostname="instance32",name="app.tasks.search_for_new_available_shifts",queue_name="celery"} 1.0
celery_task_started_total{hostname="instance32",name="app.tasks.search_for_expired_shifts",queue_name="celery"} 1.0
celery_task_started_total{hostname="instance32",name="app.tasks.search_for_unconfirmed_shifts",queue_name="celery"} 1.0
celery_task_started_total{hostname="instance32",name="celery.backend_cleanup",queue_name="celery"} 1.0
# HELP celery_task_started_created Sent just before the worker executes the task.
# TYPE celery_task_started_created gauge
celery_task_started_created{hostname="instance32",name="app.email_services.send_email",queue_name="celery"} 1.6855433895425956e+09
celery_task_started_created{hostname="instance32",name="app.tasks.search_for_new_available_staff",queue_name="celery"} 1.6855776002542472e+09
celery_task_started_created{hostname="instance32",name="app.tasks.search_for_new_available_shifts",queue_name="celery"} 1.6855776002660449e+09
celery_task_started_created{hostname="instance32",name="app.tasks.search_for_expired_shifts",queue_name="celery"} 1.6855776002685978e+09
celery_task_started_created{hostname="instance32",name="app.tasks.search_for_unconfirmed_shifts",queue_name="celery"} 1.6855776004233575e+09
celery_task_started_created{hostname="instance32",name="celery.backend_cleanup",queue_name="celery"} 1.6855920001335135e+09

Celery / Tasks / Overview:

image

Question - a lot of metrics with the same name but different "hostname"

Hi! Thank you for creating and supporting such a helpful project!

I have a question - we use celery==5.2.6 and the latest celery-exporter (docker) and using Grafana Cloud (free version)
After a week I see that ran out of free tier on Grafana and we have more then 10 000 metrics for celery!

What happened:
There're a lot of metric with hostname=gen1234@myhostname and only one with celery@myhostname

celery_task_received_total{hostname="gen516136@myhostname",name="app.tasks.send_email"} 0.0
celery_task_received_total{hostname="gen516136@myhostname",name="app.tasks.send_email"} 0.0
celery_task_received_total{hostname="celery@myhostname",name="app.tasks.send_email"} 0.0

Look like gen1234 metrics are created for each thread in worker and after restart of celery worker daemon celery-exporter keep sending these metrics to prometheus even if they don't have new data. Restaring help here, so I just configured CD process and we restart celery-exporter's docker after celery workers.

The question is - is there a way to filter out them from the metrics and what is their purpose?

Need prerequisites to run.

First had to switch my base docker image to OS with ldd version > 2.28, now stuck on

ModuleNotFoundError:
No module named 'celery.fixups'

Probably an option to be installed via pip would be nice.

Suggestion: add README for setting "celery-exporter" as datasource

As the title suggests, it would be nice if there was a README on how to add this exporter as a datasource in Grafana.

For me, I'm having trouble have get my local Grafana instance to find the exporters http server. I'm using compose to bring up my containers, but I still keep getting 404 when adding a new datasource (http://celery-exporter:9808/metrics). Im able to see my tasks at localhost:9808/metrics, as well as curl the metrics endpoint from a container on the network. It's quite strange

Edit:

Figured it out. Your Flask server does not expose api/v1/query, which Grafana relies on to query metrics

Screen Shot 2021-04-02 at 3 40 33 PM

Edit 2:

Adding image from container curl
Screen Shot 2021-04-02 at 3 57 48 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.