The official Python client for Prometheus.
pip install prometheus-client
This package can be found on PyPI.
Documentation is available on https://prometheus.github.io/client_python
Prometheus instrumentation library for Python applications
License: Apache License 2.0
The official Python client for Prometheus.
pip install prometheus-client
This package can be found on PyPI.
Documentation is available on https://prometheus.github.io/client_python
Hi,
thanks for your work on the lib!
disclaimer: i'm a noob in prometheus and in using this lib... so i might miss something obvious
i'm trying to use it in my flask app
so i did :
c = Counter('my_failures_total', 'Description of counter')
@app.route('/brol')
def brol():
c.inc() # Increment by 1
c.inc(1.6) # Increment by given value
return 'hello world'
if __name__ == '__main__':
start_http_server(9012)
app.run()
i go with my browser on localhost:5000/brol multiple times...
then go to localhost:9012 and the response i got is :
# HELP my_failures_total Description of counter
# TYPE my_failures_total counter
my_failures_total 0.0
even if i refresh /brol or :9012 the my_failures_total is always 0.
btw i often got in my console
Exception in thread Thread-1:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/Users/bmaron/Code/profiler-gui/venv/lib/python2.7/site-packages/prometheus_client/exposition.py", line 86, in run
httpd = HTTPServer((addr, port), MetricsHandler)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 420, in __init__
self.server_bind()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/BaseHTTPServer.py", line 108, in server_bind
SocketServer.TCPServer.server_bind(self)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 434, in server_bind
self.socket.bind(self.server_address)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 48] Address already in use
i suppose it's related, but tried to relaunch, change port, ... it seems to be caused by the way i instantiate the start_http_server
We should have the standard exports such as cpu time, start time and memory.
I'm using https://github.com/ekesken/prom_marathon_exporter to get metrics from marathon.
It uses the generate_latest function from exposition.py which seems to crash on following metric: "jvm.threads.deadlocks":{"value":[]} which is in the gauges array on the marathon master. It is expecting a float in stead of an empty array.
I'm currently using marathon Version 1.1.2.
I'm not sure whether this is an issue with this module or with the prom_marathon_exporter.
Stacktrace:
ERROR:http:Exception on /metrics [GET]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functionsrule.endpoint
File "/prom_marathon_exporter/http.py", line 25, in metrics
prom_metrics = generate_latest(REGISTRY)
File "/usr/local/lib/python2.7/dist-packages/prometheus_client/exposition.py", line 67, in generate_latest
output.append('{0}{1} {2}\n'.format(name, labelstr, core._floatToGoString(value)))
File "/usr/local/lib/python2.7/dist-packages/prometheus_client/core.py", line 558, in _floatToGoString
elif math.isnan(d):
TypeError: a float is required
Similar to prometheus/client_ruby#9, when using the python client library on workers that get load balanced by something like gunicorn or uwsgi, each scrape it hits only one worker since they can't share state with others.
At least uwsgi supports sharing memory: http://uwsgi-docs.readthedocs.org/en/latest/SharedArea.html
This should be used to share the registry across all workers. Maybe gunicorn supports something similar.
"Prometheus instrumentation library for Go applications" should be "Prometheus instrumentation library for Python applications" I think.
Currently there doesn't seem to be any way to provide basic authentication credentials for talking to pushgateways (urllib2 doesn't allow putting them in the URL). Would be great if this could be added.
Label values are not checked for string-likeness, and are not explicitly converted to strings. This causes the following problems:
c = Counter('http_requests_total_by_method', 'Count of requests by HTTP method.', ['method']
def handle_request(request):
c.labels(request.method).inc()
to fail silently if request.method is None (or a tuple, or an object, etc.). At export time, the following will happen:
Exception happened during processing of request from ('127.0.0.1', 55703)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 321, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 651, in __init__
self.handle()
File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
self.handle_one_request()
File "/usr/lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request
method()
File "/.../env/local/lib/python2.7/site-packages/prometheus_client-0.0.7-py2.7.egg/prometheus_client/__init__.py", line 404, in do_GET
self.wfile.write(generate_latest(REGISTRY))
File ".../env/local/lib/python2.7/site-packages/prometheus_client-0.0.7-py2.7.egg/prometheus_client/__init__.py", line 392, in generate_latest
for k, v in labels.items()]))
AttributeError: 'NoneType' object has no attribute 'replace'
There are several ways around this:
c.labels()
if the label values are not all string-likec.labels()
or in generate_latest
, coerce all label values to the unicode (or string) typegenerate_latest
, drop labels that can't be rendered, but render the other labels for that metric.generate_latest
, drop metrics that can't be rendered, but render the rest.I'm not a fan of the options that drop data, because there's no worse monitoring than monitoring that lies to you. I think coercing to string may work, but I'd prefer if the code could raise. It already raises if the code provides the wrong number of label values, so raising on the wrong type of label values isn't unexpected.
Counters should only increase, but sometimes – when instrumenting third party software like I do right now – I only get snapshots of the counters. But it’s still totally counters so it would be nice if I could treat them as such.
If you’d agree this is useful, I’d make a quick PR that adds a protected set()
method that allows to set a counter to >=self._value
.
I would like to collect metrics from a program I can't modify. The only way is to query it with UDP packets. Querying the program can be easily implemented in python. However, it is only necessary to do this as often as prometheus queries the instance.
As I see it, there is currently no way to let the program know when prometheus is making a http request. Such information could be used to trigger collection from the program via UDP once, saving ressources and bandwidth.
g.labels says:
object has no attribute 'labels'
thanks
Sometimes application raise error.
At line https://github.com/prometheus/client_python/blob/master/prometheus_client/core.py#L361
Stacktrace (последний вызов снизу):
File "flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "flask/app.py", line 1473, in full_dispatch_request
rv = self.preprocess_request()
File "flask/app.py", line 1666, in preprocess_request
rv = func()
File "core/middleware.py", line 43, in before_request_middleware
metrics.requests_total.labels(env_role=metrics.APP_ENV_ROLE, method=request.method, url_rule=rule).inc()
File "prometheus_client/core.py", line 498, in labels
self._metrics[labelvalues] = self._wrappedClass(self._name, self._labelnames, labelvalues, **self._kwargs)
File "prometheus_client/core.py", line 599, in __init__
self._value = _ValueClass(self._type, name, name, labelnames, labelvalues)
File "prometheus_client/core.py", line 414, in __init__
files[file_prefix] = _MmapedDict(filename)
File "prometheus_client/core.py", line 335, in __init__
for key, _, pos in self._read_all_values():
File "prometheus_client/core.py", line 361, in _read_all_values
encoded = struct.unpack_from('{0}s'.format(encoded_len).encode(), self._m, pos)[0]
We run application via uwsgi:
[uwsgi]
chdir=/usr/src/app/
env = APP_ROLE=dev_uwsgi
wsgi-file = /usr/src/app/app.wsgi
master=True
vacuum=True
max-requests=5000
harakiri=120
post-buffering=65536
workers=16
listen=4000
# socket=0.0.0.0:8997
stats=/tmp/uwsgi-app.stats
logger=syslog:uwsgi_app_stage,local0
buffer-size=65536
http = 0.0.0.0:8051
The client does a common anti-pattern of doing a closure with *args,**kwargs
but that sadly breaks introspection of wrapped methods.
This is a concrete problem e.g. if you wrap Pyramid views that get different arguments depending on how many arguments are present (either context, request
or just request
).
You can find more information on this in these blog posts: https://github.com/GrahamDumpleton/wrapt/tree/master/blog
Solutions to it is to either use or copy/paste from the wrapt (better) or decorator (OK-ish) packages.
In the README, in one of the samples, we have:
# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
But what if I want to add labels to this? I can't figure out how to make it work. My current code which doesn't work is:
TRADE_REQUEST_TIME = Summary(
'request_processing_seconds',
'Time spent processing a request',
labelnames=('method', 'endpoint', 'whitelabel'),
labelvalues=('get', '/trade/', settings.WHITELABEL_NAME))
@TRADE_REQUEST_TIME.time()
def render_trade_page(request, sport):
...
But I get:
@TRADE_REQUEST_TIME.time()
AttributeError: '_LabelWrapper' object has no attribute 'time'
Any advice?
Currently there's a lock on the metric, and one on the mmapped file.
Given that multiproc is in use it's unlikely that any threading is going on (and there's still the GIL) so these could both be replaced by a single global lock. This would bring us down to one mutex, which is the same as non-multiproc which means we'd have about the same performance.
I accidentally sent malformed data to a push gateway and got back a 500 Internal Server Error. The body of the response had information about the malformed data but all that was printed was the fact that I got a 500 Internal Server Error.
Example code:
import prometheus_client
reg = prometheus_client.CollectorRegistry()
c1 = prometheus_client.Counter("foo", "bar", labelnames=["baz", "stuff"], registry=reg)
c1.labels({"baz": 1, "stuff": 2}).inc(1)
c2 = prometheus_client.Counter("foo", "bar", labelnames=["baz", "stuff"], registry=reg)
c2.labels({"baz": 3, "stuff": 4}).inc(2)
prometheus_client.push_to_gateway('localhost:8000', job='example', registry=reg)
Output:
Traceback (most recent call last):
File "test.py", line 8, in <module>
prometheus_client.push_to_gateway('localhost:8000', job='example', registry=reg)
File "/usr/local/lib/python2.7/dist-packages/prometheus_client/exposition.py", line 110, in push_to_gateway
_use_gateway('PUT', gateway, job, registry, grouping_key, timeout)
File "/usr/local/lib/python2.7/dist-packages/prometheus_client/exposition.py", line 144, in _use_gateway
resp = build_opener(HTTPHandler).open(request, timeout=timeout)
File "/usr/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error
I noticed this when working on fixing #46. When I run python3 setup.py test
, I get the following warning in the test output:
test_instance_ip_grouping_key (tests.test_client.TestPushGateway) ... tests/test_client.py:478: ResourceWarning: unclosed <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 58422)>
self.assertTrue('' != instance_ip_grouping_key()['instance'])
I can also reproduce this warning outside of the tests by running python3 -Wdefault
and then calling instance_ip_grouping_key
.
Should this function explicitly close the socket it opens instead of relying on it to be cleaned up automatically?
Since prometheus assumes the metrics will be on /metrics by default, the client library should make this assumption too (and provide a configurable to change it).
Test script:
#!/usr/bin/env python2
from BaseHTTPServer import HTTPServer
from prometheus_client import Counter
from prometheus_client import MetricsHandler
counter = Counter('cronjobs_total', 'All cronjobs', ['server', 'username', 'command'])
counter.labels('someserver', 'someaccount', 'cd "/foo/bar/"').inc()
server_address = ('', 7700)
httpd = HTTPServer(server_address, MetricsHandler)
httpd.serve_forever()
When you add this as a Prometheus endpoint you will get a scraping error:
text format parsing error in line 3: unexpected end of label value %!q(*string=0xc20a6bc040)
The generated output when visiting port 7700 shows unespaced double quotes:
# HELP cronjobs_total All cronjobs
# TYPE cronjobs_total counter
cronjobs_total{username="someaccount",command="cd "/foo/bar/"",server="someserver"} 1.0
Hi :) I'm currently using the client from head because I need d183810 which isn't included in 0.0.8 (even though the source control suggests otherwise). Can you release the latest changes to Pypi? Thanks!
It would be good not to force an IPv4 fallback.
See #39. This should be a required parameter.
I forked the repo, have working tests on Python2, but they fail on Python 3.4.
python3 --version: Python 3.4.2
Debian 8 (jessie)
python3 setup.py test 2> out.txt
test_block_decorator (tests.test_client.TestCounter) ... ok
test_function_decorator (tests.test_client.TestCounter) ... ok
test_increment (tests.test_client.TestCounter) ... ok
test_negative_increment_raises (tests.test_client.TestCounter) ... ok
test_block_decorator (tests.test_client.TestGauge) ... ok
test_function_decorator (tests.test_client.TestGauge) ... ok
test_gauge (tests.test_client.TestGauge) ... ok
test_gauge_function (tests.test_client.TestGauge) ... ok
test_counter (tests.test_client.TestGenerateText) ... ok
test_escaping (tests.test_client.TestGenerateText) ... ok
test_gauge (tests.test_client.TestGenerateText) ... ok
test_nonnumber (tests.test_client.TestGenerateText) ... ok
test_summary (tests.test_client.TestGenerateText) ... ok
test_unicode (tests.test_client.TestGenerateText) ... ok
test_block_decorator (tests.test_client.TestHistogram) ... ok
test_function_decorator (tests.test_client.TestHistogram) ... ok
test_histogram (tests.test_client.TestHistogram) ... ok
test_labels (tests.test_client.TestHistogram) ... ok
test_setting_buckets (tests.test_client.TestHistogram) ... ok
test_child (tests.test_client.TestMetricWrapper) ... ok
test_incorrect_label_count_raises (tests.test_client.TestMetricWrapper) ... ok
test_invalid_names_raise (tests.test_client.TestMetricWrapper) ... ok
test_labels_by_dict (tests.test_client.TestMetricWrapper) ... ok
test_labels_coerced_to_string (tests.test_client.TestMetricWrapper) ... ok
test_namespace_subsystem_concatenated (tests.test_client.TestMetricWrapper) ... ok
test_non_string_labels_raises (tests.test_client.TestMetricWrapper) ... ok
test_remove (tests.test_client.TestMetricWrapper) ... ok
test_namespace (tests.test_client.TestProcessCollector) ... ok
test_working (tests.test_client.TestProcessCollector) ... ok
test_working_584 (tests.test_client.TestProcessCollector) ... ok
test_working_fake_pid (tests.test_client.TestProcessCollector) ... ok
test_delete (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "DELETE /job/my_job HTTP/1.1" 201 -
ERROR
test_instance_ip_grouping_key (tests.test_client.TestPushGateway) ... /root/client_python/tests/test_client.py:479: ResourceWarning: unclosed <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketType.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 53921)>
self.assertTrue('' != instance_ip_grouping_key()['instance'])
ok
test_push (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "PUT /job/my_job HTTP/1.1" 201 -
ERROR
test_push_with_complex_groupingkey (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "PUT /job/my_job/a/9/b/a%2F+z HTTP/1.1" 201 -
ERROR
test_push_with_groupingkey (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "PUT /job/my_job/a/9 HTTP/1.1" 201 -
ERROR
test_pushadd (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "POST /job/my_job HTTP/1.1" 201 -
ERROR
est_pushadd_with_groupingkey (tests.test_client.TestPushGateway) ... 127.0.0.1 - - [13/Sep/2015 13:18:04] "DELETE /job/my_job/a/9 HTTP/1.1" 201 -
ERROR
test_block_decorator (tests.test_client.TestSummary) ... ok
test_function_decorator (tests.test_client.TestSummary) ... ok
test_summary (tests.test_client.TestSummary) ... ok
test_labels (tests.graphite_bridge.TestGraphiteBridge) ... ok
test_nolabels (tests.graphite_bridge.TestGraphiteBridge) ... ok
test_sanitizing (tests.graphite_bridge.TestGraphiteBridge) ... ok
======================================================================
ERROR: test_delete (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/root/client_python/tests/test_client.py", line 465, in test_delete
delete_from_gateway(self.address, "my_job")
File "/root/client_python/prometheus_client/exposition.py", line 103, in delete_from_gateway
_use_gateway('DELETE', gateway, job, None, grouping_key, timeout)
File "/root/client_python/prometheus_client/exposition.py", line 122, in _use_gateway
resp = build_opener(HTTPHandler).open(request, timeout=timeout)
File "/usr/lib/python3.4/urllib/request.py", line 455, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 473, in _open
'_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1202, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.4/urllib/request.py", line 1177, in do_open
r = h.getresponse()
File "/usr/lib/python3.4/http/client.py", line 1172, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
======================================================================
ERROR: test_push (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
======================================================================
ERROR: test_push_with_complex_groupingkey (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
======================================================================
ERROR: test_push_with_groupingkey (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
======================================================================
ERROR: test_pushadd (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
======================================================================
ERROR: test_pushadd_with_groupingkey (tests.test_client.TestPushGateway)
----------------------------------------------------------------------
[....]
----------------------------------------------------------------------
Ran 44 tests in 0.062s
FAILED (errors=6)
I am new to this and I don't know how to solve this.
We are using locust.io for our testing and we want to add prometheus for our metrics. Locust provides events http://docs.locust.io/en/latest/extending-locust.html which has request type , name, time and length for all methods. Here is the sample
from locust import events
def my_success_handler(request_type, name, response_time, response_length, **kw):
print "Successfully fetched: %s" % (name)
events.request_success += my_success_handler
We would like to use this functionality to provide histogram data for our metrics. But one thing we realized is that we couldn't provide labels to histogram to identify which request is it for.
Is there any way we could achieve this instead of creating an histogram for each method. We have 10's of REST API calls we test with locust.
Thanks
Code in process_collectors.py
relies on parsing /proc/PID/limits
when gettings system limits. However, on some configurations, there could be "unlimited" string instead of a numeric value in that file. This lead to ValueError when this string is passed into float
method.
Full example exception is as follows:
Traceback (most recent call last):
File "/venv/lib/python3.5/site-packages/tornado/web.py", line 1443, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/venv/lib/python3.5/site-packages/video/stats/events/endpoint/endpoint.py", line 194, in get
resp = generate_latest(core.REGISTRY)
File "/venv/lib/python3.5/site-packages/prometheus_client/exposition.py", line 33, in generate_latest
for metric in registry.collect():
File "/venv/lib/python3.5/site-packages/prometheus_client/core.py", line 54, in collect
for metric in collector.collect():
File "/venv/lib/python3.5/site-packages/prometheus_client/process_collector.py", line 83, in collect
'Maximum number of open file descriptors.', value=float(line.split()[3]))
ValueError: could not convert string to float: 'unlimited'
This happens in a Docker container with ubuntu, for instance, which has:
root@87220b5caf91:/app# cat /proc/76888/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited 9223372036854775807 bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 35184372088832 bytes
Max core file size unlimited 9223372036854775807 bytes
Max resident set unlimited 1073741824 bytes
Max processes unlimited 4000 processes
Max open files unlimited 65536 files
Max locked memory unlimited 1073741824 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 128 512 signals
Max msgqueue size unlimited 8192 bytes
Parsing of untyped metrics yeilds more metric families than it should (got 3, expected 1 with 2 samples).
Or am I missing something?
>>
from prometheus_client.parser import text_string_to_metric_families
text = """# HELP redis_connected_clients Redis connected clients
# TYPE redis_connected_clients untyped
redis_connected_clients{instance="rough-snowflake-web",port="6380"} 10.0
redis_connected_clients{instance="rough-snowflake-web",port="6381"} 12.0
"""
for m in text_string_to_metric_families(text):
print m.name, m.samples
...
redis_connected_clients []
redis_connected_clients [(u'redis_connected_clients', {u'instance': u'rough-snowflake-web', u'port': u'6380'}, 10.0)]
redis_connected_clients [(u'redis_connected_clients', {u'instance': u'rough-snowflake-web', u'port': u'6381'}, 12.0)]
Some bibliography so it doesn't sound like pure opinion:
[from Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics]
Also, Mathematica, Octave, MATLAB and other mathematical software hide some decimal places when it doesn't matter.
I get some lines like the following:
request_processing_seconds_sum 9.3512252295
request_latency_seconds_sum 15.77115141185
And I am suggesting that they get changed to:
request_processing_seconds_sum 9.351
request_latency_seconds_sum 15.771
How many decimal places we are going to use is open for discussion, but 10 seems not an OK decision for a final product.
There is also a need to define where we are going to perform the rounding (we could truncate, but rounding seems statistically fairer). It could be done in the _samples
functions of the classes defined in core.py, but maybe there is a better place.
Thanks for reading.
I tried to use the client on Windows with python 2.7 and i have the following error :
ImportError: No module named 'resource'
This error occurs because 'resource' module is only available for UNIX system.
Is 'resource' module usage could be optional in a Windows application ?
Regards.
I am expecting to be able to set labels on a Counter like in the examples:
from prometheus_client import Counter
c = Counter('my_requests_total', 'HTTP Failures', ['method', 'endpoint'])
c.labels(method='get', endpoint='/').inc()
I am using prometheus-client (0.0.14) from PyPI.
$ python
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 26 2016, 12:10:39)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from prometheus_client import Counter, CollectorRegistry
>>> reg = CollectorRegistry()
>>> c = Counter('items_total', 'number of items total', registry=reg)
>>> c.labels(first=1, another=2).inc()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Counter' object has no attribute 'labels'
>>>
>>> type(c)
<class 'prometheus_client.core.Counter'>
>>> dir(c)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_reserved_labelnames', '_samples', '_type', '_value', 'collect', 'count_exceptions', 'inc']
The Counter object does not seem to have a labels method attached to it. Perhaps some recent changes has affected the application of the decorator that applies it to the metrics?
#99 kind of indicates this same problem but has words relating it to pushgateway.
I’m building an infrastructure that uses service discovery to find metrics. Now, I find it very tedious to set ports by hand so I prefer to listen on port 0 and then use service discovery to tell prometheus about it:
>>> from BaseHTTPServer import HTTPServer
>>> h = HTTPServer(("", 0), None)
>>> h.server_port
57782
>>> h.server_name
'alpha-2.local'
Currently there’s no way to achieve that using client_python
. It would be super helpful if you’d expose httpd
from https://github.com/prometheus/client_python/blob/master/prometheus_client/exposition.py#L64 .
As a bonus point, this also solves the multiple processes problem from #30 in a more prometheus-like way I find: just expose multiple metrics and let prometheus figure it out.
Hi,
while testing and maybe transitioning, I’m very grateful for the GraphiteBridge.
However as far as I can tell, there’s no way to namespace (with a dot) my metrics and they just land in the main “directory”.
Could you please add a way to set a prefix that is prepended to all metrics?
Hi
That's my small example script. Create a subdirectory "metrics" in the folder from you are running the script and set the environment variable.
#!/usr/bin/python
'''
This is just an example of how to use prometheus_client in multiprocessing application
'''
import os
if 'prometheus_multiproc_dir' not in os.environ:
os.environ['prometheus_multiproc_dir']='metrics'
from prometheus_client import Counter
from prometheus_client import start_http_server
from prometheus_client import CollectorRegistry
from prometheus_client.multiprocess import MultiProcessCollector
from prometheus_client import core
from multiprocessing import Pool
def worker(n):
counter = Counter('counter', 'counter', registry=None)
counter.inc()
def main():
print("Main PID", os.getpid())
clear_metrics_dir()
registry = CollectorRegistry()
MultiProcessCollector(registry)
core.REGISTRY = registry
start_http_server(8000)
p = Pool(processes=10)
p.map_async(worker, range(300000))
p.close()
p.join()
print(registry.get_sample_value('counter'))
def clear_metrics_dir():
dir = os.getcwd()
os.chdir(os.environ['prometheus_multiproc_dir'])
[os.remove(file) for file in os.listdir()]
os.chdir(dir)
if __name__ == '__main__':
main()
If I run this in Windows I got this output:
Main PID 27372
300000.0
and there are 10 files in the metrics subdirectory.
But on Linux the output is different:
Main PID 24671
42961.0
And there is only one file in the directory.
I'm not quite sure but my guess is the following:
In the core.py at lines 441-444 the variable _ValueClass is initialized:
if 'prometheus_multiproc_dir' in os.environ:
_ValueClass = _MultiProcessValue()
else:
_ValueClass = _MutexValue
And then that class is instantiated and it creates the files where the metrics data will be stored.
The class itself is created in the function _MultiProcessValue() and it uses os.getpid() as a default value for the argument.
In Windows when you start another processes with multiprocessing your main module is reloaded for each of the processes. And the prometheus_client library is also reloaded and the class is constructed for each of the new process.
In Lunux the module is loaded only once and therefore the class is created also only once.
All that makes the library write all the metrics of all the processes to the same file on Linux! That makes the metrics basically wrong, and in the worst case the file got corrupted and the HTTP server can not read from the file.
Though the stuff seems to work well if the load on the processes is small. E.g. if you put sleep in the worker function and reduce the number of cycles it can (sometimes) work fine.
If i change the core.py in this way, it works fine on both Linux and Windows (lines 395-406):
def _MultiProcessValue(__pid=os.getpid()):
files = {}
files_lock = Lock()
class _MmapedValue(object):
'''A float protected by a mutex backed by a per-process mmaped file.'''
_multiprocess = True
def __init__(self, typ, metric_name, name, labelnames, labelvalues, multiprocess_mode='', **kwargs):
pid = os.getpid()
if typ == 'gauge':
but then the unit tests will most probably fail.
Recently I’ve run into the problem that my processes don't have any process_
metrics although the platform (Ubuntu Trusty) hasn’t changed.
I’m not sure how to debug the problem, do you have any idea what could lead to this problem? They are not using the new multiprocessing feature.
g = prometheus_client.Gauge(
"my_super_special_metric",
"This is ametric",
("label"))
g.labels({"label" : "somevalue"}).set(1)
Leads to "incorrect label names".
Changing it to:
g = prometheus_client.Gauge(
"my_super_special_metric",
"This is ametric",
["label"])
g.labels({"label" : "somevalue"}).set(1)
works normally.
Since both are iterable types, it would be idiomatic python to ensure that any non-string sequence type supplied for the labels be treated as a list of labels.
Hi, this a newbie question. How can I instrument a python WSGI bottle
or flask app that has its own server ?
two threads ?
import sys
from bottle import route, run
from prometheus_client import start_http_server, Summary
import random
import time
REQUEST_TIME = Summary('request_processing_seconds','Time Spent processing request')
@route('/')
@REQUEST_TIME.time()
def handler():
print "Hello world"
time.sleep(random.random)
run(host='0.0.0.0', port=8080)
Hello
Could you please explain me how to delete some counters/labels from output?
For example i created a gauge
g = Gauge('superapp_performance', 'SuperDuper app RPS gauge',
['instance','location','country'])
Then i updating it with info from apps:
g.labels(instance_id,location_name,country_name).set(hugeRPS)
And at some point i don't have this instance anymore, and i want to delete information about it. How to do it ?
Thanks
Zmon has written a partial implementation, we should provide one out of the box.
Hi guys, I am using the library to implement a grpc service running on mesos (via aurora) and using the prometheus port to handle health checks as well as exporting metrics. I wonder therefore if there is an easy way to add a new path to the http server used within. In other words, I want port:/internal/status or similar to return empty HTTP:OK without any metrics.
Thanks and cheers,
Simon
Apache with prefork MPM, not threaded, runs process_collector.py as the chosen non-root user, but /proc/self/fd/ is still owned by root:root so its 0500 permissions cause an OSError exception, e.errno == EACCES, on the os.listdir() attempt. A patch to give up on that metric when that occurs: process_open_fds.txt
There is no way to stop the metrics server if you need to reconfigure (change the port).
Hi,
For my Django exporter I need to import _INF from prometheus_client because I build my own buckets (powers of 2, powers of 10) and I need _INF at the end.
Could you expose it publicly in the API (in init.py ideally, since it was moved in 0.0.10)?
Hello all,
At the moment if a Python exception is encountered it looks like a 200 reponse with a blank body is returned. This does not result in up{job='jobname'}
returning 0.
That's fine if you are using an up_jobname
metric alongside the up metric - but some people dual use the up metric so it means exporter OR app are down (as suggested in the documentation).
An option to choose the behavior would be great.
In the meantime is there a way to explicitly throw a 5xx?
Cheers,
James
If you have a custom collector registered, and that collector exceptions, then the client produces a 200 with no metrics; not even the build in ones.
#!/usr/bin/python3
import time
from prometheus_client import start_http_server
from prometheus_client.core import REGISTRY
class CustomCollector(object):
def collect(self):
raise Exception('bang')
REGISTRY.register(CustomCollector())
start_http_server(23456)
while True:
time.sleep(1e9)
% curl -v http://localhost:23456/
* Trying ::1...
* connect to ::1 port 23456 failed: Connection refused
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 23456 (#0)
> GET / HTTP/1.1
> Host: localhost:23456
> User-Agent: curl/7.47.0
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: BaseHTTP/0.6 Python/3.5.1+
< Date: Thu, 16 Jun 2016 09:18:40 GMT
< Content-Type: text/plain; version=0.0.4; charset=utf-8
<
* Closing connection 0
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 43974)
Traceback (most recent call last):
File "/usr/lib/python3.5/socketserver.py", line 313, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 341, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
self.handle()
File "/usr/lib/python3.5/http/server.py", line 415, in handle
self.handle_one_request()
File "/usr/lib/python3.5/http/server.py", line 403, in handle_one_request
method()
File "/home/faux/code/client_python/prometheus_client/exposition.py", line 76, in do_GET
self.wfile.write(generate_latest(core.REGISTRY))
File "/home/faux/code/client_python/prometheus_client/exposition.py", line 55, in generate_latest
for metric in registry.collect():
File "/home/faux/code/client_python/prometheus_client/core.py", line 54, in collect
for metric in collector.collect():
File "err.py", line 10, in collect
raise Exception('bang')
Exception: bang
----------------------------------------
I believe this should either return a 500, or return the rest of the metrics, with this missing, or the rest of the metrics and a client_generation_errors{collector="CustomCollector"} 1
, or...
I accidentally generated malformed data by creating two Counters on the same Registry that had the same name, help text.
Example code:
import prometheus_client
reg = prometheus_client.CollectorRegistry()
c1 = prometheus_client.Counter("foo", "bar", labelnames=["baz", "stuff"], registry=reg)
c1.labels({"baz": 1, "stuff": 2}).inc(1)
c2 = prometheus_client.Counter("foo", "bar", labelnames=["baz", "stuff"], registry=reg)
c2.labels({"baz": 3, "stuff": 4}).inc(2)
prometheus_client.push_to_gateway('localhost:8000', job='example', registry=reg)
Data sent in request to push gateway:
# HELP foo bar
# TYPE foo counter
foo{baz="1",stuff="2"} 1.0
# HELP foo bar
# TYPE foo counter
foo{baz="3",stuff="4"} 2.0
Response sent from the push gateway:
text format parsing error in line 4: second HELP line for metric name "foo"
I'd prefer either getting an error from prometheus_client telling me that I can't add two identical metrics. Alternatively, I'd like for prometheus_client to merge the two metrics and not send two identical HELP lines.
This gets more accurate numbers and also works on Mac - however only works on the process itself.
Hi,
currently, the Summary timing metric (probably others too but that’s the one I played with) is useless if you want to use it with generators (yield
/yield from
) since they return immediately but run much longer.
This is especially painful if want to use asyncio
on Python 3.4+.
It would be great if the prometheus Python client could grow some kind of support for that use case.
We're considering using this in the Jupyter project, and would love for it to have a Tornado exporter similar to the twisted one.
Unless I’m missing something, it’s currently necessary to specify all labels at once. It would be nice tho if I could do something like this:
M = Counter("a counter", "counter that counts", ["id", "type"])
...
c = M.labels({"id": id}) # id is determined at runtime/initialization
...
c.labels({"type": "internal").inc() # type is unclear until it happens
...
c.labels({"type": "external").inc()
Does that make sense?
Useful for batch jobs
In multiproc mode, if one of the gunicorn workers gets killed because of a segfault or out-of-memory situation, it doesn't get a chance to clean up its .db
files. These files are thus left in the multiproc folder, distorting the metrics.
Maybe this can be solved by moving the cleanup hook into the gunicorn "master" process. My suggestion is to solve this in the Python client instead, by deleting all .db
files with dead PIDs before aggregating the metrics.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.