prometheus / client_python Goto Github PK

View Code? Open in Web Editor NEW

3.9K 71.0 794.0 6.05 MB

Prometheus instrumentation library for Python applications

License: Apache License 2.0

Python 100.00%

instrumentation python prometheus metrics

client_python's Introduction

Prometheus Python Client

The official Python client for Prometheus.

Installation

pip install prometheus-client

This package can be found on PyPI.

Documentation

Documentation is available on https://prometheus.github.io/client_python

client_python's People

Contributors

Stargazers

Watchers

Forkers

brian-brazil andreafagan korfuri afefelov marcusmartins arturhoo justyns gdvalle barkerd427 machineperson andersschuller kvikas njohns-grovo clusterhq pbdeuchler iksaif shamrin pombredanne deedubs bendemaree jinty samdroid-apps willhogan ericl devopstiger bugcy013 kingsqv rmohr cigor alex thomasdesr spfz syncano ouziel-slama fauxfaux ulope paulcavallaro vad jpbelleau acdha rod-dot-codes rud dewsly dattanh pramodtoraskar leadpages jonashaag tim-seoss deathowl mrueg informatic seanhoughton lispython techscientist maxiv-kitscontrols brianjgmartin pyatil antialiasis jamessewell jcaxmacher supriya-premkumar microdog danni-m okdtsk tomscytale tuksik mariamsawires kyleroot stephanerb savoie beezz dds justdoit0823 gustavofcarvalho blackduckx real666maverick kstrempel namliz madskristiansen dergraf colemanja91 dmick closeio melnikk quantumghost kehao95 reymont bz2 manics hedgepigdaniel hugoren shadow4125 aimestereo aneeshkp edwardbetts plynte noirbee david-caro insequent abuisine

client_python's Issues

Metrics all to zero ... noob

Hi,
thanks for your work on the lib!
disclaimer: i'm a noob in prometheus and in using this lib... so i might miss something obvious

i'm trying to use it in my flask app
so i did :

c = Counter('my_failures_total', 'Description of counter')

@app.route('/brol')
def brol():
    c.inc()     # Increment by 1
    c.inc(1.6)  # Increment by given value
    return 'hello world'


if __name__ == '__main__':
    start_http_server(9012)
    app.run()

i go with my browser on localhost:5000/brol multiple times...
then go to localhost:9012 and the response i got is :

# HELP my_failures_total Description of counter
# TYPE my_failures_total counter
my_failures_total 0.0

even if i refresh /brol or :9012 the my_failures_total is always 0.

btw i often got in my console


Exception in thread Thread-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/Users/bmaron/Code/profiler-gui/venv/lib/python2.7/site-packages/prometheus_client/exposition.py", line 86, in run
    httpd = HTTPServer((addr, port), MetricsHandler)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 420, in __init__
    self.server_bind()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/BaseHTTPServer.py", line 108, in server_bind
    SocketServer.TCPServer.server_bind(self)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 434, in server_bind
    self.socket.bind(self.server_address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 48] Address already in use

i suppose it's related, but tried to relaunch, change port, ... it seems to be caused by the way i instantiate the start_http_server

Add the standard exports

We should have the standard exports such as cpu time, start time and memory.

generate_latest causes a 500 internal server error

I'm using https://github.com/ekesken/prom_marathon_exporter to get metrics from marathon.

It uses the generate_latest function from exposition.py which seems to crash on following metric: "jvm.threads.deadlocks":{"value":[]} which is in the gauges array on the marathon master. It is expecting a float in stead of an empty array.

I'm currently using marathon Version 1.1.2.

I'm not sure whether this is an issue with this module or with the prom_marathon_exporter.

Stacktrace:
ERROR:http:Exception on /metrics [GET]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functionsrule.endpoint
File "/prom_marathon_exporter/http.py", line 25, in metrics
prom_metrics = generate_latest(REGISTRY)
File "/usr/local/lib/python2.7/dist-packages/prometheus_client/exposition.py", line 67, in generate_latest
output.append('{0}{1} {2}\n'.format(name, labelstr, core._floatToGoString(value)))
File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 558, in _floatToGoString
elif math.isnan(d):
TypeError: a float is required

Support sharing registry across multiple workers (where possible)

Similar to prometheus/client_ruby#9, when using the python client library on workers that get load balanced by something like gunicorn or uwsgi, each scrape it hits only one worker since they can't share state with others.

At least uwsgi supports sharing memory: http://uwsgi-docs.readthedocs.org/en/latest/SharedArea.html
This should be used to share the registry across all workers. Maybe gunicorn supports something similar.

The description of the repo is… misleading ;)

"Prometheus instrumentation library for Go applications" should be "Prometheus instrumentation library for Python applications" I think.

Please add support for pushgateways protected with basic authentication

Currently there doesn't seem to be any way to provide basic authentication credentials for talking to pushgateways (urllib2 doesn't allow putting them in the URL). Would be great if this could be added.

Non-string label values cause export-time failures

Label values are not checked for string-likeness, and are not explicitly converted to strings. This causes the following problems:

c = Counter('http_requests_total_by_method', 'Count of requests by HTTP method.', ['method']

def handle_request(request):
  c.labels(request.method).inc()

to fail silently if request.method is None (or a tuple, or an object, etc.). At export time, the following will happen:

Exception happened during processing of request from ('127.0.0.1', 55703)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 651, in __init__
    self.handle()
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request
    method()
  File "/.../env/local/lib/python2.7/site-packages/prometheus_client-0.0.7-py2.7.egg/prometheus_client/__init__.py", line 404, in do_GET
    self.wfile.write(generate_latest(REGISTRY))
  File ".../env/local/lib/python2.7/site-packages/prometheus_client-0.0.7-py2.7.egg/prometheus_client/__init__.py", line 392, in generate_latest
    for k, v in labels.items()]))
AttributeError: 'NoneType' object has no attribute 'replace'

There are several ways around this:

Raise a ValueError in c.labels() if the label values are not all string-like
In c.labels() or in generate_latest, coerce all label values to the unicode (or string) type
In generate_latest, drop labels that can't be rendered, but render the other labels for that metric.
In generate_latest, drop metrics that can't be rendered, but render the rest.

I'm not a fan of the options that drop data, because there's no worse monitoring than monitoring that lies to you. I think coercing to string may work, but I'd prefer if the code could raise. It already raises if the code provides the wrong number of label values, so raising on the wrong type of label values isn't unexpected.

Add set(>=_value) for Counters

Counters should only increase, but sometimes – when instrumenting third party software like I do right now – I only get snapshots of the counters. But it’s still totally counters so it would be nice if I could treat them as such.

If you’d agree this is useful, I’d make a quick PR that adds a protected set() method that allows to set a counter to >=self._value.

Callback function to let program know that collection is starting

I would like to collect metrics from a program I can't modify. The only way is to query it with UDP packets. Querying the program can be easily implemented in python. However, it is only necessary to do this as often as prometheus queries the instance.
As I see it, there is currently no way to let the program know when prometheus is making a http request. Such information could be used to trigger collection from the program via UDP once, saving ressources and bandwidth.

how to add labels with pushgateway example?

g.labels says:
object has no attribute 'labels'
thanks

Handle multiprocess setups using preloading and equivilents

Sometimes application raise error.

At line https://github.com/prometheus/client_python/blob/master/prometheus_client/core.py#L361

Stacktrace (последний вызов снизу):

  File "flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "flask/app.py", line 1473, in full_dispatch_request
    rv = self.preprocess_request()
  File "flask/app.py", line 1666, in preprocess_request
    rv = func()
  File "core/middleware.py", line 43, in before_request_middleware
    metrics.requests_total.labels(env_role=metrics.APP_ENV_ROLE, method=request.method, url_rule=rule).inc()
  File "prometheus_client/core.py", line 498, in labels
    self._metrics[labelvalues] = self._wrappedClass(self._name, self._labelnames, labelvalues, **self._kwargs)
  File "prometheus_client/core.py", line 599, in __init__
    self._value = _ValueClass(self._type, name, name, labelnames, labelvalues)
  File "prometheus_client/core.py", line 414, in __init__
    files[file_prefix] = _MmapedDict(filename)
  File "prometheus_client/core.py", line 335, in __init__
    for key, _, pos in self._read_all_values():
  File "prometheus_client/core.py", line 361, in _read_all_values
    encoded = struct.unpack_from('{0}s'.format(encoded_len).encode(), self._m, pos)[0]

We run application via uwsgi:

[uwsgi]
chdir=/usr/src/app/
env = APP_ROLE=dev_uwsgi
wsgi-file = /usr/src/app/app.wsgi
master=True
vacuum=True
max-requests=5000
harakiri=120
post-buffering=65536
workers=16
listen=4000
# socket=0.0.0.0:8997
stats=/tmp/uwsgi-app.stats
logger=syslog:uwsgi_app_stage,local0
buffer-size=65536
http = 0.0.0.0:8051

Decorators break function signatures

The client does a common anti-pattern of doing a closure with *args,**kwargs but that sadly breaks introspection of wrapped methods.

This is a concrete problem e.g. if you wrap Pyramid views that get different arguments depending on how many arguments are present (either context, request or just request).

You can find more information on this in these blog posts: https://github.com/GrahamDumpleton/wrapt/tree/master/blog

Solutions to it is to either use or copy/paste from the wrapt (better) or decorator (OK-ish) packages.

Not at all clear how to add labels to the Summary(...) time() decorator pattern

In the README, in one of the samples, we have:

# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')

# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):

But what if I want to add labels to this? I can't figure out how to make it work. My current code which doesn't work is:

TRADE_REQUEST_TIME = Summary(
    'request_processing_seconds',
    'Time spent processing a request',
    labelnames=('method', 'endpoint', 'whitelabel'),
    labelvalues=('get', '/trade/', settings.WHITELABEL_NAME))


@TRADE_REQUEST_TIME.time()
def render_trade_page(request, sport):
    ...

But I get:

    @TRADE_REQUEST_TIME.time()
AttributeError: '_LabelWrapper' object has no attribute 'time'

Any advice?

Reduce locks in multiproc mode

Currently there's a lock on the metric, and one on the mmapped file.

Given that multiproc is in use it's unlikely that any threading is going on (and there's still the GIL) so these could both be replaced by a single global lock. This would bring us down to one mutex, which is the same as non-multiproc which means we'd have about the same performance.

Print out response body when a 500 Internal Server Error is received from the pushgateway

I accidentally sent malformed data to a push gateway and got back a 500 Internal Server Error. The body of the response had information about the malformed data but all that was printed was the fact that I got a 500 Internal Server Error.

Example code:

import prometheus_client

reg = prometheus_client.CollectorRegistry()
c1 = prometheus_client.Counter("foo", "bar", labelnames=["baz", "stuff"], registry=reg)
c1.labels({"baz": 1, "stuff": 2}).inc(1)
c2 = prometheus_client.Counter("foo", "bar", labelnames=["baz", "stuff"], registry=reg)
c2.labels({"baz": 3, "stuff": 4}).inc(2)
prometheus_client.push_to_gateway('localhost:8000', job='example', registry=reg)

Output:

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    prometheus_client.push_to_gateway('localhost:8000', job='example', registry=reg)
  File "/usr/local/lib/python2.7/dist-packages/prometheus_client/exposition.py", line 110, in push_to_gateway
    _use_gateway('PUT', gateway, job, registry, grouping_key, timeout)
  File "/usr/local/lib/python2.7/dist-packages/prometheus_client/exposition.py", line 144, in _use_gateway
    resp = build_opener(HTTPHandler).open(request, timeout=timeout)
  File "/usr/lib/python2.7/urllib2.py", line 437, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error

Calling instance_ip_grouping_key leads to a ResourceWarning on Python 3.4

I noticed this when working on fixing #46. When I run python3 setup.py test, I get the following warning in the test output:

test_instance_ip_grouping_key (tests.test_client.TestPushGateway) ... tests/test_client.py:478: ResourceWarning: unclosed <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 58422)>
self.assertTrue('' != instance_ip_grouping_key()['instance'])

I can also reproduce this warning outside of the tests by running python3 -Wdefault and then calling instance_ip_grouping_key.

Should this function explicitly close the socket it opens instead of relying on it to be cleaned up automatically?

start_http_server should put metrics on a /metrics url by default

Since prometheus assumes the metrics will be on /metrics by default, the client library should make this assumption too (and provide a configurable to change it).

Double quotes in labels shouldn't result in malformed data

Test script:

#!/usr/bin/env python2
from BaseHTTPServer import HTTPServer
from prometheus_client import Counter
from prometheus_client import MetricsHandler

counter = Counter('cronjobs_total', 'All cronjobs', ['server', 'username', 'command'])
counter.labels('someserver', 'someaccount', 'cd "/foo/bar/"').inc()

server_address = ('', 7700)
httpd = HTTPServer(server_address, MetricsHandler)
httpd.serve_forever()

When you add this as a Prometheus endpoint you will get a scraping error:

text format parsing error in line 3: unexpected end of label value %!q(*string=0xc20a6bc040)

The generated output when visiting port 7700 shows unespaced double quotes:

# HELP cronjobs_total All cronjobs
# TYPE cronjobs_total counter
cronjobs_total{username="someaccount",command="cd "/foo/bar/"",server="someserver"} 1.0

Release 0.0.9

Hi :) I'm currently using the client from head because I need d183810 which isn't included in 0.0.8 (even though the source control suggests otherwise). Can you release the latest changes to Pypi? Thanks!

`start_http_server` fails when given an IPv6 address to bind

It would be good not to force an IPv4 fallback.

Pushgateway shouldn't default to default registry

See #39. This should be a required parameter.

tests fail for Python 3.4 on Debian 8

I forked the repo, have working tests on Python2, but they fail on Python 3.4.

python3 --version: Python 3.4.2
Debian 8 (jessie)

python3 setup.py test 2> out.txt

test_block_decorator (tests.test_client.TestCounter) ... ok
test_function_decorator (tests.test_client.TestCounter) ... ok
test_increment (tests.test_client.TestCounter) ... ok
test_negative_increment_raises (tests.test_client.TestCounter) ... ok
test_block_decorator (tests.test_client.TestGauge) ... ok
test_function_decorator (tests.test_client.TestGauge) ... ok
test_gauge (tests.test_client.TestGauge) ... ok
test_gauge_function (tests.test_client.TestGauge) ... ok
test_counter (tests.test_client.TestGenerateText) ... ok
test_escaping (tests.test_client.TestGenerateText) ... ok
test_gauge (tests.test_client.TestGenerateText) ... ok
test_nonnumber (tests.test_client.TestGenerateText) ... ok
test_summary (tests.test_client.TestGenerateText) ... ok
test_unicode (tests.test_client.TestGenerateText) ... ok
test_block_decorator (tests.test_client.TestHistogram) ... ok
test_function_decorator (tests.test_client.TestHistogram) ... ok
test_histogram (tests.test_client.TestHistogram) ... ok
test_labels (tests.test_client.TestHistogram) ... ok
test_setting_buckets (tests.test_client.TestHistogram) ... ok
test_child (tests.test_client.TestMetricWrapper) ... ok
test_incorrect_label_count_raises (tests.test_client.TestMetricWrapper) ... ok
test_invalid_names_raise (tests.test_client.TestMetricWrapper) ... ok
test_labels_by_dict (tests.test_client.TestMetricWrapper) ... ok
test_labels_coerced_to_string (tests.test_client.TestMetricWrapper) ... ok
test_namespace_subsystem_concatenated (tests.test_client.TestMetricWrapper) ... ok
test_non_string_labels_raises (tests.test_client.TestMetricWrapper) ... ok
test_remove (tests.test_client.TestMetricWrapper) ... ok
test_namespace (tests.test_client.TestProcessCollector) ... ok
test_working (tests.test_client.TestProcessCollector) ... ok
test_working_584 (tests.test_client.TestProcessCollector) ... ok
test_working_fake_pid (tests.test_client.TestProcessCollector) ... ok
test_delete (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "DELETE /job/my_job HTTP/1.1" 201 -
ERROR
test_instance_ip_grouping_key (tests.test_client.TestPushGateway) ... /root/client_python/tests/test_client.py:479: ResourceWarning: unclosed <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketType.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 53921)>
  self.assertTrue('' != instance_ip_grouping_key()['instance'])
ok
test_push (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "PUT /job/my_job HTTP/1.1" 201 -
ERROR
test_push_with_complex_groupingkey (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "PUT /job/my_job/a/9/b/a%2F+z HTTP/1.1" 201 -
ERROR
test_push_with_groupingkey (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "PUT /job/my_job/a/9 HTTP/1.1" 201 -
ERROR
test_pushadd (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "POST /job/my_job HTTP/1.1" 201 -
ERROR
est_pushadd_with_groupingkey (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "DELETE /job/my_job/a/9 HTTP/1.1" 201 -
ERROR
test_block_decorator (tests.test_client.TestSummary) ... ok
test_function_decorator (tests.test_client.TestSummary) ... ok
test_summary (tests.test_client.TestSummary) ... ok
test_labels (tests.graphite_bridge.TestGraphiteBridge) ... ok
test_nolabels (tests.graphite_bridge.TestGraphiteBridge) ... ok
test_sanitizing (tests.graphite_bridge.TestGraphiteBridge) ... ok

======================================================================
ERROR: test_delete (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/client_python/tests/test_client.py", line 465, in test_delete
    delete_from_gateway(self.address, "my_job")
  File "/root/client_python/prometheus_client/exposition.py", line 103, in delete_from_gateway
    _use_gateway('DELETE', gateway, job, None, grouping_key, timeout)
  File "/root/client_python/prometheus_client/exposition.py", line 122, in _use_gateway
    resp = build_opener(HTTPHandler).open(request, timeout=timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 455, in open
    response = self._open(req, data)
  File "/usr/lib/python3.4/urllib/request.py", line 473, in _open
    '_open', req)
  File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 1202, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.4/urllib/request.py", line 1177, in do_open
    r = h.getresponse()
  File "/usr/lib/python3.4/http/client.py", line 1172, in getresponse
    response.begin()
  File "/usr/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''

======================================================================
ERROR: test_push (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
======================================================================
ERROR: test_push_with_complex_groupingkey (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
======================================================================
ERROR: test_push_with_groupingkey (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
======================================================================
ERROR: test_pushadd (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
======================================================================
ERROR: test_pushadd_with_groupingkey (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
----------------------------------------------------------------------
Ran 44 tests in 0.062s

FAILED (errors=6)

Histogram for Event listeners

I am new to this and I don't know how to solve this.

We are using locust.io for our testing and we want to add prometheus for our metrics. Locust provides events http://docs.locust.io/en/latest/extending-locust.html which has request type , name, time and length for all methods. Here is the sample

from locust import events

def my_success_handler(request_type, name, response_time, response_length, **kw):
    print "Successfully fetched: %s" % (name)

events.request_success += my_success_handler

We would like to use this functionality to provide histogram data for our metrics. But one thing we realized is that we couldn't provide labels to histogram to identify which request is it for.

Is there any way we could achieve this instead of creating an histogram for each method. We have 10's of REST API calls we test with locust.

Thanks

RFE: Summary doesn't provide quantiles

ValueError when getting open file descriptors limit

Code in process_collectors.py relies on parsing /proc/PID/limits when gettings system limits. However, on some configurations, there could be "unlimited" string instead of a numeric value in that file. This lead to ValueError when this string is passed into float method.

Full example exception is as follows:

Traceback (most recent call last):
  File "/venv/lib/python3.5/site-packages/tornado/web.py", line 1443, in _execute
    result = method(*self.path_args, **self.path_kwargs)
  File "/venv/lib/python3.5/site-packages/video/stats/events/endpoint/endpoint.py", line 194, in get
    resp = generate_latest(core.REGISTRY)
  File "/venv/lib/python3.5/site-packages/prometheus_client/exposition.py", line 33, in generate_latest
    for metric in registry.collect():
  File "/venv/lib/python3.5/site-packages/prometheus_client/core.py", line 54, in collect
    for metric in collector.collect():
  File "/venv/lib/python3.5/site-packages/prometheus_client/process_collector.py", line 83, in collect
    'Maximum number of open file descriptors.', value=float(line.split()[3]))
ValueError: could not convert string to float: 'unlimited'

This happens in a Docker container with ubuntu, for instance, which has:

root@87220b5caf91:/app# cat /proc/76888/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            9223372036854775807  bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             35184372088832       bytes
Max core file size        unlimited            9223372036854775807  bytes
Max resident set          unlimited            1073741824           bytes
Max processes             unlimited            4000                 processes
Max open files            unlimited            65536                files
Max locked memory         unlimited            1073741824           bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       128                  512                  signals
Max msgqueue size         unlimited            8192                 bytes

Parsing untyped metrics

Parsing of untyped metrics yeilds more metric families than it should (got 3, expected 1 with 2 samples).
Or am I missing something?

>> 
from prometheus_client.parser import text_string_to_metric_families
text = """# HELP redis_connected_clients Redis connected clients
# TYPE redis_connected_clients untyped
redis_connected_clients{instance="rough-snowflake-web",port="6380"} 10.0
redis_connected_clients{instance="rough-snowflake-web",port="6381"} 12.0
"""
for m in text_string_to_metric_families(text):
    print m.name, m.samples

...

redis_connected_clients []
redis_connected_clients [(u'redis_connected_clients', {u'instance': u'rough-snowflake-web', u'port': u'6380'}, 10.0)]
redis_connected_clients [(u'redis_connected_clients', {u'instance': u'rough-snowflake-web', u'port': u'6381'}, 12.0)]

Too many decimal places in output

Some bibliography so it doesn't sound like pure opinion:

[from Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics]

Also, Mathematica, Octave, MATLAB and other mathematical software hide some decimal places when it doesn't matter.

I get some lines like the following:

request_processing_seconds_sum 9.3512252295
request_latency_seconds_sum 15.77115141185

And I am suggesting that they get changed to:

request_processing_seconds_sum 9.351
request_latency_seconds_sum 15.771

How many decimal places we are going to use is open for discussion, but 10 seems not an OK decision for a final product.

There is also a need to define where we are going to perform the rounding (we could truncate, but rounding seems statistically fairer). It could be done in the _samples functions of the classes defined in core.py, but maybe there is a better place.

Thanks for reading.

ImportError: No module named 'resource' on Windows

I tried to use the client on Windows with python 2.7 and i have the following error :

ImportError: No module named 'resource'

This error occurs because 'resource' module is only available for UNIX system.

Is 'resource' module usage could be optional in a Windows application ?

Regards.

Problem with labels method on metrics objects

I am expecting to be able to set labels on a Counter like in the examples:

from prometheus_client import Counter
c = Counter('my_requests_total', 'HTTP Failures', ['method', 'endpoint'])
c.labels(method='get', endpoint='/').inc()

I am using prometheus-client (0.0.14) from PyPI.

$ python
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 26 2016, 12:10:39) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from prometheus_client import Counter, CollectorRegistry
>>> reg = CollectorRegistry()
>>> c = Counter('items_total', 'number of items total', registry=reg)
>>> c.labels(first=1, another=2).inc()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Counter' object has no attribute 'labels'
>>>
>>> type(c)
<class 'prometheus_client.core.Counter'>
>>> dir(c)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_reserved_labelnames', '_samples', '_type', '_value', 'collect', 'count_exceptions', 'inc']

The Counter object does not seem to have a labels method attached to it. Perhaps some recent changes has affected the application of the decorator that applies it to the metrics?
#99 kind of indicates this same problem but has words relating it to pushgateway.

Expose http endpoint details

I’m building an infrastructure that uses service discovery to find metrics. Now, I find it very tedious to set ports by hand so I prefer to listen on port 0 and then use service discovery to tell prometheus about it:

>>> from BaseHTTPServer import HTTPServer
>>> h = HTTPServer(("", 0), None)
>>> h.server_port
57782
>>> h.server_name
'alpha-2.local'

Currently there’s no way to achieve that using client_python. It would be super helpful if you’d expose httpd from https://github.com/prometheus/client_python/blob/master/prometheus_client/exposition.py#L64 .

As a bonus point, this also solves the multiple processes problem from #30 in a more prometheus-like way I find: just expose multiple metrics and let prometheus figure it out.

Allow GraphiteBridge to set a prefix

Hi,

while testing and maybe transitioning, I’m very grateful for the GraphiteBridge.

However as far as I can tell, there’s no way to namespace (with a dot) my metrics and they just land in the main “directory”.

Could you please add a way to set a prefix that is prepended to all metrics?

Multiprocessing works differently in Linux and Windows

Hi
That's my small example script. Create a subdirectory "metrics" in the folder from you are running the script and set the environment variable.

#!/usr/bin/python
'''
This is just an example of how to use prometheus_client in multiprocessing application
'''
import os
if 'prometheus_multiproc_dir' not in os.environ:
    os.environ['prometheus_multiproc_dir']='metrics'

from prometheus_client import Counter
from prometheus_client import start_http_server
from prometheus_client import CollectorRegistry
from prometheus_client.multiprocess import MultiProcessCollector
from prometheus_client import core
from multiprocessing import Pool


def worker(n):
    counter = Counter('counter', 'counter', registry=None)
    counter.inc()

def main():
    print("Main PID", os.getpid())
    clear_metrics_dir()
    registry = CollectorRegistry()
    MultiProcessCollector(registry)
    core.REGISTRY = registry
    start_http_server(8000)

    p = Pool(processes=10)
    p.map_async(worker, range(300000))

    p.close()
    p.join()
    print(registry.get_sample_value('counter'))


def clear_metrics_dir():
    dir = os.getcwd()
    os.chdir(os.environ['prometheus_multiproc_dir'])
    [os.remove(file) for file in os.listdir()]
    os.chdir(dir)


if __name__ == '__main__':
    main()

If I run this in Windows I got this output:

Main PID 27372
300000.0

and there are 10 files in the metrics subdirectory.

But on Linux the output is different:

Main PID 24671
42961.0

And there is only one file in the directory.

I'm not quite sure but my guess is the following:
In the core.py at lines 441-444 the variable _ValueClass is initialized:

if 'prometheus_multiproc_dir' in os.environ:
    _ValueClass = _MultiProcessValue()
else:
    _ValueClass = _MutexValue

And then that class is instantiated and it creates the files where the metrics data will be stored.
The class itself is created in the function _MultiProcessValue() and it uses os.getpid() as a default value for the argument.
In Windows when you start another processes with multiprocessing your main module is reloaded for each of the processes. And the prometheus_client library is also reloaded and the class is constructed for each of the new process.
In Lunux the module is loaded only once and therefore the class is created also only once.
All that makes the library write all the metrics of all the processes to the same file on Linux! That makes the metrics basically wrong, and in the worst case the file got corrupted and the HTTP server can not read from the file.
Though the stuff seems to work well if the load on the processes is small. E.g. if you put sleep in the worker function and reduce the number of cycles it can (sometimes) work fine.

If i change the core.py in this way, it works fine on both Linux and Windows (lines 395-406):

def _MultiProcessValue(__pid=os.getpid()):
    files = {}
    files_lock = Lock()

    class _MmapedValue(object):
        '''A float protected by a mutex backed by a per-process mmaped file.'''

        _multiprocess = True

        def __init__(self, typ, metric_name, name, labelnames, labelvalues, multiprocess_mode='', **kwargs):
            pid = os.getpid()
            if typ == 'gauge':

but then the unit tests will most probably fail.

Lack of process_* metrics

Recently I’ve run into the problem that my processes don't have any process_ metrics although the platform (Ubuntu Trusty) hasn’t changed.

I’m not sure how to debug the problem, do you have any idea what could lead to this problem? They are not using the new multiprocessing feature.

Initializing a metric with a tuple for labels leads to incorrect label count

g = prometheus_client.Gauge(
            "my_super_special_metric",
           "This is  ametric", 
           ("label"))
g.labels({"label" : "somevalue"}).set(1)

Leads to "incorrect label names".
Changing it to:

g = prometheus_client.Gauge(
            "my_super_special_metric",
           "This is  ametric", 
           ["label"])
g.labels({"label" : "somevalue"}).set(1)

works normally.

Since both are iterable types, it would be idiomatic python to ensure that any non-string sequence type supplied for the labels be treated as a list of labels.

how to instrument a bottle app

Hi, this a newbie question. How can I instrument a python WSGI bottle or flask app that has its own server ?

two threads ?

import sys

from bottle import route, run
from prometheus_client import start_http_server, Summary
import random
import time

REQUEST_TIME = Summary('request_processing_seconds','Time Spent processing request')

@route('/')
@REQUEST_TIME.time()
def handler():
    print "Hello world"
    time.sleep(random.random)

run(host='0.0.0.0', port=8080)

How to remove counters and labels from web server export

Hello
Could you please explain me how to delete some counters/labels from output?
For example i created a gauge
g = Gauge('superapp_performance', 'SuperDuper app RPS gauge',
['instance','location','country'])
Then i updating it with info from apps:
g.labels(instance_id,location_name,country_name).set(hugeRPS)

And at some point i don't have this instance anymore, and i want to delete information about it. How to do it ?
Thanks

Add text format parser

Zmon has written a partial implementation, we should provide one out of the box.

Adding a custom path to the http server for health checks

Hi guys, I am using the library to implement a grpc service running on mesos (via aurora) and using the prometheus port to handle health checks as well as exporting metrics. I wonder therefore if there is an easy way to add a new path to the http server used within. In other words, I want port:/internal/status or similar to return empty HTTP:OK without any metrics.

Thanks and cheers,
Simon

process_collector.py assumes /proc/self/fd is readable for process_open_fds

Apache with prefork MPM, not threaded, runs process_collector.py as the chosen non-root user, but /proc/self/fd/ is still owned by root:root so its 0500 permissions cause an OSError exception, e.errno == EACCES, on the os.listdir() attempt. A patch to give up on that metric when that occurs: process_open_fds.txt

stop_http_server()

There is no way to stop the metrics server if you need to reconfigure (change the port).

Export _INF publicly

Hi,

For my Django exporter I need to import _INF from prometheus_client because I build my own buckets (powers of 2, powers of 10) and I need _INF at the end.

Could you expose it publicly in the API (in init.py ideally, since it was moved in 0.0.10)?

Throw 500 error if an exception is encountered

Hello all,

At the moment if a Python exception is encountered it looks like a 200 reponse with a blank body is returned. This does not result in up{job='jobname'} returning 0.

That's fine if you are using an up_jobname metric alongside the up metric - but some people dual use the up metric so it means exporter OR app are down (as suggested in the documentation).

An option to choose the behavior would be great.

In the meantime is there a way to explicitly throw a 5xx?

Cheers,
James

An exception in a custom collector results in a 200 with a blank body

If you have a custom collector registered, and that collector exceptions, then the client produces a 200 with no metrics; not even the build in ones.

#!/usr/bin/python3

import time
from prometheus_client import start_http_server
from prometheus_client.core import REGISTRY


class CustomCollector(object):
    def collect(self):
        raise Exception('bang')

REGISTRY.register(CustomCollector())
start_http_server(23456)

while True:
    time.sleep(1e9)

% curl -v http://localhost:23456/
*   Trying ::1...
* connect to ::1 port 23456 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 23456 (#0)
> GET / HTTP/1.1
> Host: localhost:23456
> User-Agent: curl/7.47.0
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: BaseHTTP/0.6 Python/3.5.1+
< Date: Thu, 16 Jun 2016 09:18:40 GMT
< Content-Type: text/plain; version=0.0.4; charset=utf-8
< 
* Closing connection 0

----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 43974)
Traceback (most recent call last):
  File "/usr/lib/python3.5/socketserver.py", line 313, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.5/socketserver.py", line 341, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/lib/python3.5/http/server.py", line 415, in handle
    self.handle_one_request()
  File "/usr/lib/python3.5/http/server.py", line 403, in handle_one_request
    method()
  File "/home/faux/code/client_python/prometheus_client/exposition.py", line 76, in do_GET
    self.wfile.write(generate_latest(core.REGISTRY))
  File "/home/faux/code/client_python/prometheus_client/exposition.py", line 55, in generate_latest
    for metric in registry.collect():
  File "/home/faux/code/client_python/prometheus_client/core.py", line 54, in collect
    for metric in collector.collect():
  File "err.py", line 10, in collect
    raise Exception('bang')
Exception: bang
----------------------------------------

I believe this should either return a 500, or return the rest of the metrics, with this missing, or the rest of the metrics and a client_generation_errors{collector="CustomCollector"} 1, or...

can generate malformed data by creating a duplicate Counter

I accidentally generated malformed data by creating two Counters on the same Registry that had the same name, help text.

Example code:

import prometheus_client

reg = prometheus_client.CollectorRegistry()
c1 = prometheus_client.Counter("foo", "bar", labelnames=["baz", "stuff"], registry=reg)
c1.labels({"baz": 1, "stuff": 2}).inc(1)
c2 = prometheus_client.Counter("foo", "bar", labelnames=["baz", "stuff"], registry=reg)
c2.labels({"baz": 3, "stuff": 4}).inc(2)
prometheus_client.push_to_gateway('localhost:8000', job='example', registry=reg)

Data sent in request to push gateway:

# HELP foo bar
# TYPE foo counter
foo{baz="1",stuff="2"} 1.0
# HELP foo bar
# TYPE foo counter
foo{baz="3",stuff="4"} 2.0

Response sent from the push gateway:

text format parsing error in line 4: second HELP line for metric name "foo"

I'd prefer either getting an error from prometheus_client telling me that I can't add two identical metrics. Alternatively, I'd like for prometheus_client to merge the two metrics and not send two identical HELP lines.

Use getrusage in ProcessCollector for CPU

This gets more accurate numbers and also works on Mac - however only works on the process itself.

Support for co-routines

Hi,

currently, the Summary timing metric (probably others too but that’s the one I played with) is useless if you want to use it with generators (yield/yield from) since they return immediately but run much longer.

This is especially painful if want to use asyncio on Python 3.4+.

It would be great if the prometheus Python client could grow some kind of support for that use case.

Add support for Tornado based exporter

We're considering using this in the Jupyter project, and would love for it to have a Tornado exporter similar to the twisted one.

Partial Labels

Unless I’m missing something, it’s currently necessary to specify all labels at once. It would be nice tho if I could do something like this:

M = Counter("a counter", "counter that counts", ["id", "type"])
...
c = M.labels({"id": id})  # id is determined at runtime/initialization
...
c.labels({"type": "internal").inc()  # type is unclear until it happens
...
c.labels({"type": "external").inc()

Does that make sense?

Gauge should allow timing durations

Useful for batch jobs

Multiproc files not cleaned up if worker gets segfault/oom/...

In multiproc mode, if one of the gunicorn workers gets killed because of a segfault or out-of-memory situation, it doesn't get a chance to clean up its .db files. These files are thus left in the multiproc folder, distorting the metrics.

Maybe this can be solved by moving the cleanup hook into the gunicorn "master" process. My suggestion is to solve this in the Python client instead, by deleting all .db files with dead PIDs before aggregating the metrics.