Giter VIP home page Giter VIP logo

zalando / pgobserver Goto Github PK

View Code? Open in Web Editor NEW
315.0 43.0 64.0 4.86 MB

A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases

Home Page: https://zalando.github.io/PGObserver/

License: Other

Shell 1.40% Python 33.33% PLpgSQL 24.98% JavaScript 1.01% CSS 2.31% Makefile 0.04% HTML 13.41% Java 21.12% Dockerfile 0.42% TSQL 1.98%
monitoring data-infrastructure

pgobserver's Introduction

PGObserver

PGObserver is a battle-tested monitoring solution for your PostgreSQL databases. It covers almost all the metrics provided by the database engine's internal statistics collector, and works out of the box with all PostgreSQL versions (beginning with 9.0) as well as AWS RDS. You don’t have to install any non-standard, server-side database extensions to take advantage of its core functionality, nor do you need to register any privileged users.

Monitored Metrics Include:

  • Stored procedure data: number of calls, run time per procedure, self time per procedure
  • All executed statements data: query runtimes, call counts, time spent on IO
    • based on the pg_stat_statements module, which must be enabled on the DB
  • Table IO statistics: number of sequential and index scans, number of inserts, (hot) updates, deletes, table and index size, heap hits vs disk hits
  • General database indicators: number of backends, exceptions, deadlocks, temporary files written
  • Schema usage: procedure calls, IUD
  • CPU load: needs a plpythonu stored procedure from the sql/data_collection_helpers folder (plpythonu isn't available on RDS)
  • WAL (XLOG) volumes
  • Index usage

For some metrics you must install data-gathering wrapper functions — also known as stored procedures — on the server being monitored. This will enable you to circumvent the superuser requirements.

Go here for some PGObserver screenshots and a nice illustration by Zalando Tech resident artist Kolja Wilcke.

Additional Features

With some extra setup (see instructions), you can also:

  • monitor blocked processes (needs a cron script on the host DB)
  • monitor pg_stat_statements (needs an enabled pg_stat_statements extension)
  • Do cron aggregations for speeding up sproc load and database size graphs; these are useful when monitoring tens of instances
  • export metrics data to InfluxDB for custom charting/dashboarding with Grafana or some other tool

Status

Still in use but does not receive active attention or development as stored procedure usage has dropped in new projects.

How PGObserver Works

A Java application gathers metrics by querying PostgreSQL performance views (pg_stat_*). You can configure gathering intervals for the different metrics on a per-host, per-metric basis. This enables you to gather more details for critical systems and provide fewer details for less-important systems — thereby reducing the amount of data stored.

Additionally, you can configure sets of hosts to monitor from different Java processes — for example, when deploying to multiple locations with limited accessibility.

PGObserver’s frontend is a standalone Python + CherryPy application; the "screenshots" folder includes basic examples. Charts are rendered with the JS Flot library.

To help you generate generate minimalistic test data for a local test setup, we’ve included [this] (https://github.com/zalando/PGObserver/blob/master/frontend/src/testdata.py) script.

Quick Test Run Using Vagrant

Make sure you've installed the latest version of Vagrant. Use Vagrant to clone PGObserver to the machine where you want to run it. Then run from the PGObserver base directory:

git clone https://github.com/zalando/PGObserver.git
cd PGObserver
vagrant up

This last step will take a while, as PGObserver performs the following inside the virtual machine:

  • Fetches and starts an official PostgreSQL 9.3 Docker image
  • Compiles the gatherer for you, creates a Docker image, and starts it inside the VM
  • Creates a Docker image for the frontend and starts it inside the VM
  • Exposes ports 38080 and 38081 for the frontend and the gatherer, respectively. You can then open the frontend on port 38080 and configure a database cluster to monitor — e.g., http://localhost:38080/hosts/

The easiest way to run it somewhere else is to change the config files and create your own Docker images to deploy. Just point it to the PostgreSQL cluster where you created the PGObserver database.

Setup

Install:

  • Python 2.7 (to run PGObserver’s frontend)
  • Pip (to prepare Python dependencies)
pip install -r frontend/requirements.txt
  • the PostgreSQL contrib modules pg_trgm and btree_gist. These should come with your operating system distribution in a package named postgresql-contrib (or similar).

Create a schema by executing the SQL files from your sql/schema folder on a Postgres database where you want to store monitoring data:

cat sql/schema/*.sql | psql -1 -f - -d my_pgobserver_db

Configuration

Start by preparing your configuration files for gatherer and frontend; the provided examples are good starting points.

  • set your database connection parameters: name, host and port
  • configure the usernames and passwords for gatherer and frontend; find defaults here
  • set gather_group (important for gatherer only; enables many gatherer processes)
  • create an unprivileged user on the database you want to monitor; to do selects from the system catalogs

Configuring Hosts to Monitor

You can either:

  • Insert an entry to the monitor_data.hosts table to include the connection details and to-be-monitored features of the cluster you want to monitor (include the same password that you used in the previous step); OR
  • use the "frontend" web application's (next step) /hosts page, inserting all needed data and pressing "add", followed by "reload" to refresh menus
    • set host_gather_group to decide which gatherer monitors which cluster
    • to decide which schemas are scanned for sprocs statistics, review the table sproc_schemas_monitoring_configuration. Defaults are provided.

Some features will require you to create according helper functions on the databases being monitored:

  • CPU load monitoring requires a stored procedure from cpu_load.sql. This is a plpythonu function, so a superuser is needed.
  • For pg_stat_statement monitoring, you need this file.
  • For table & index bloat query, you need this.
  • Blocking processes monitoring requires setup from this folder.

Run the frontend by going into the "frontend" folder and running run.sh, which creates a "python src/web.py" and puts it in the background:

frontend$ ./run.sh --config frontend.yaml

Build the data gatherer single jar, including dependencies, by going to the "gatherer" folder and running:

mvn clean verify assembly:single

Start data monitoring daemons by running run.sh.

Troubleshooting Hint


You might have to change your PostgreSQL server configuration to gather certain types of statistics. Please refer to the Postgres documentation on The Statistics Collector and pg_stat_statements.

Contributions

PGObserver welcomes contributions to the community. Please go to the Issues page to learn more about planned project enhancements and noted bugs. Feel free to make a pull request and we'll take a look.

Thank You

Thank you to our Zalando contributors, as well as Fabian Genter.

License

Copyright 2012 Zalando GmbH

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

pgobserver's People

Contributors

a1exsh avatar alexeyklyukin avatar codechimp avatar deric avatar evakbe avatar fabian-gentner avatar hjacobs avatar igoramadas avatar jan-m avatar jbspeakr avatar kmoppel avatar lappleapple avatar mawenzel avatar s-soroosh avatar slitsche avatar sstirbu avatar williamn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgobserver's Issues

setup instructions

the set up instructions are a little sparse.

Some additional steps i had to do -

$ easy_install cherrypy 
$ easy_install Jinja2
$ easy_install argparse

Some additional things that could be included:

  • what version of python is required
  • what python dependencies are required
  • what versions of postgresql are supported

Additionally the setup requires directly inserting rows into the database. I think it would be worthwhile to make a simple install script to initialize the data.

No such file pgobserver.yaml Error (vagrant build).

I know there is a sample one in the gatherer directory, but I'm not sure about the vagrant build process:

==> default: ldconfig deferred processing now taking place
==> default:  ---> a4b7b19c1c01
==> default: Removing intermediate container 1f5d7dcf1b93
==> default: Step 2 : RUN mkdir -p /app
==> default:  ---> Running in 430f78046c53
==> default:  ---> 2d56fb5b4e81
==> default: Removing intermediate container 430f78046c53
==> default: Step 3 : COPY pgobserver.yaml /app/pgobserver.yaml
==> default: pgobserver.yaml: no such file or directory

Failing "Quick Test Run Using Vagrant"

Hi,

I tried to test PGObserver using the Quick Test Run Using Vagrant, and i got an error at STEP 6:

default: Step 6 : RUN pip install -r /app/requirements.txt
default: ---> Running in b64ab7ca8a57
default: Downloading/unpacking CherryPy==3.2.4 (from -r /app/requirements.txt (line 1))
default: Cannot fetch index base URL http://pypi.python.org/simple/
default: Could not find any downloads that satisfy the requirement CherryPy==3.2.4 (from -r /app/requirements.txt (line 1))
default: No distributions at all found for CherryPy==3.2.4 (from -r /app/requirements.txt (line 1))
default: Storing complete log in /root/.pip/pip.log
default: The command '/bin/sh -c pip install -r /app/requirements.txt' returned a non-zero code: 1
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

It may be related to pypi, if so, fell free to close this issue.

Thank you for your work !

Improve log output with current database context

In the log files in some cases the current database is not logged.
For instance:

SEVERE: Error during Load gathering java.sql.SQLException: Timed out waiting for a free available connection. at com.jolbox.bonecp.DefaultConnectionStrategy.getConnectionInternal(DefaultConnectionStrategy.java:88)

does not allow to check the server side.

Ignore the 'recent 2 API schemas' rule for sprocs if newer version of them were not called yet

On one occasion by looking at the sprocs/all page of pgObserver, I couldn't find the sproc that was actively called on the running database. The reason was, it was older than 2 APIs ago. The problem is, no newer versions of that sproc were called (the backend was still at the N-3 API version) and the data for it was missing altogether.

The limitations, AFAIK, are there for the case the when the sproc changes significantly over the time, the data for the older version becomes irrelevant. Another reason is to reduce the amount of data to process. I think we may still satisfy both those requirements if we only include the data for a sproc which is older than N-2 revisions of API if the following conditions are true:

  • the sproc belongs to at least N-8 API or later (if one API release corresponds to a single week, it's unlikely that we will be running anything older than 2 months).
  • there is no data for the calls of the sproc with the same name, but later API revisions.
  • the sproc was called recently (i.e. during past week)

In other words, if the sproc A exists in R8, R9, R10, R11 and R12 APIs, the latest one in the database is R12 and it was called with R8 and R10 versions only during the last week, then we do include data for R10 only, because there were no calls for R11 and R12.

If, on the other hand, the same sproc gets called with R11 and R12, then only those 2 revisions are included, discarding the other ones.

frontend error

Hi!!

I'm trying to start the frontend via run.sh and i get this error

Traceback (most recent call last):
File "web.py", line 88, in
main()
File "web.py", line 76, in main
root.report = report.Report()
AttributeError: 'NoneType' object has no attribute 'report'

You guys can help?

Thanks!

Wrong REGEX to check for valid S3 URL

While trying to deploy the the frontend, I'll receive following error:

Exception: Invalid S3 url: https://s3.eu-central-1.amazonaws.com/laas-config/config.yaml    

Most likely, the reason is the regex to check for valid S3 URLs in https://github.com/zalando/PGObserver/blob/149e01867d0be82acfe4fdf184782a53fe4a4c05/frontend/src/aws_s3_configreader.py:

    m = re.match('https?://s3-(.*)\.amazonaws.com/(.*)', s3_url)
    if not m:
        raise Exception('Invalid S3 url: {}'.format(s3_url))

This regex works for all S3 URLs in eu-west-1 where at least our links look as follows:

https://s3-eu-west-1.amazonaws.com/zalando-stups-mint-123456-eu-west-1/app-credentials/client.json

However, in eu-central-1, the links seem to have a slightly different structure:

https://s3.eu-central-1.amazonaws.com/zalando-stups-mint-123456-eu-central-1/pgobserverfrontend/client.json

Stored procedure data not collected

In the web interface, the stored procedure timings are not being collected. Is there an extension that needs to be installed? Am I missing any steps?
The only step I have not done is create the CPU load scripts, since I do not need it.

PGObserver Docker image

Trying to setup PGObserver on Fedora 25 I run into several problems.

Therefore I wonder, is there a all-in-one 'one-click' PGObserver Docker image planned or already existing?

If not would it be something you would like to have or do you think its not a good idea - if I would create one?

Best,
Oliver

pg_stat_statements not working

Hi, I've got a problem with logging pg_stat_statements ... I've got the module enabled in database (view pg_stat_statements exists in Schemas->public->Views) and there's of course some data. But when I try to access it on frontend, it doesn't show any statistics ... Also host selection is empty (see the picture).

screen shot 2017-04-28 at 11 28 52

Error log writes only this: ERROR: function zz_utils.get_load_average() does not exist, but I think it's not related to my problem. Any help would be appreciated :)

configuration problems

Hi there,

I have problems to displayed all tables with Top 10 SProcs....... On the same page, Load Average 15min Sprocs shows a graph. I have already configured the pg_stat_statement module in my cluster.

shared_preload_libraries = 'pg_stat_statements' # (change requires restart) pg_stat_statements.max = 10000 pg_stat_statements.track = all

With this enabled, i am able to do a
select * from pg_stat_statement
when connected to my database in psql and get the expected result, but not in PGObserver web frontend.

I get an warning message in pgmon_java_*****

Nov 24, 2016 2:23:25 PM de.zalando.pgobserver.gatherer.SprocGatherer getSchemasToBeMonitored WARNING: NOT nspname LIKE ANY (array['public','pg\_%','information_schema','tmp%','temp%']::text[]) AND nspname LIKE ANY (array['%']::text[]) Nov 24, 2016 2:23:25 PM de.zalando.pgobserver.gatherer.SprocGatherer gatherData

Are there some configurations of postgresql that i am missing?
Could you please give me some more advice.

Thanks

Matthias

500 Internal Server Error on /perfstatstatements

It seems like the frontend has trouble reading some queries from the pg_stat_statements table:

500 Internal Server Error
...
perf_stat_statements.html, line 84, in block "body"
    <tr class="hiddenrow" style="display:none" ><td colspan="8"><textarea readonly>{{ d['query'] }}</textarea></td>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 210: ordinal not in range(128)

This row seems to cause trouble in perf_stat_statements.html:

<tr class="hiddenrow" style="display:none" ><td colspan="8"><textarea readonly>{{ d['query'] }}</textarea></td>

Same problem with the perf_stat_statements_detailed.html by the way:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 210: ordinal not in range(128)

Hopefully you know how to fix it?

EDIT:

I guess I found the query which causes the problem:

UPDATE afg SET status = status_in, updated_at = now(), display = display_in, urls = urls_in, type = type_in, headline = headline_in, description_1 = description_1_in, description_2 = description_2_in, destination_url = destination (...)

Guess the (...) is the problem? Cause when I delete

<tr class="hiddenrow" style="display:none" ><td colspan="8"><textarea readonly>{{ d['query'] }}</textarea></td>

from perf_stat_statements.html, the /perfstatstatements loads fine. And the show graph url loads perfectly fine for all queries, except for the queries which start with

UPDATE afg SET status = status_in, updated_at = now(), 

So I guess the (...) is causing problems.

error in version check of maven in build.sh

The command gathering the Maven version in build.sh should read
MAVEN_VERSION=$(mvn --version | grep -o '3.' | head -n 1)

The original (with head first and then grep) writes "3.\n3." into the variable which is different from "3."

Improve index bloat estimation

We had an index that was shown as 0,3 bloat ratio (meaning 0,3%) and 0 bytes to win by the PGObserver estimator, while the estimate at https://github.com/ioguix/pgsql-bloat-estimation/blob/master/btree/btree_bloat.sql shown it as 50% bloated.

After creating the new index with the same parameters it was indeed shown to be smaller than the original one. (the original one was 18GBs, and the replacement took only 7GB). Therefore, it looks like the bloat estimator needs to be re-examined.

PGObserver resets its accumulated sp_calls metrics on failover of the underlying DB

During the failover of the database being monitored we observed that PGObserver resets its sp_calls (total number of calls) metrics to 0, confusing our reporting script. This is a bug, it should instead continue from the previously collected values and, possibly, forgot writing new data while those measurements are not available.

Provide more Documentation for the Frontend

Hi there,

I tried to use the tool today.

It was not possible for me, to get it running properly. At least for the python part I was missing documentation, how to get it running... Can you provide me some more Informations?

Typo in schema.sql

There is a Typo in the schema.sql Script, Line 120.

It is "host_grather_group" and should be "host_gather_group"

pgobserver.conf

Hi,
I'm trying to install PGObserver. I have cloned the project from git. However, I cannot find the pgobserver.conf file. I'm referring to the third line in the setup guide:
Copy pgobserver.conf to the home folder: ~/.pgobserver.conf

Running find etc on the PGObserver directory didn't return anything useful.

Thanks and best regards,
Andreas

Top pg_stat_statements entries, Host: is empty

Hi,

I have a problem in Pg_stat_statements report: I have nothing data in Host, if I change url with one of my "host_ui_shortname" the data are displayed correct.

It is strange that only in this page my hosts are not loaded,

Could you please give me some information for resolve this problem?

Thanks
luca

configuration problems

Hey there,

i tried this tool today and unfortunately i am having some problems. I am able to start the frontend and also the gatherer. The frontend also display some information about my configured database. So i guess that the basic configuration is okay. However on several pages of the front end, no data is displayed. For example, the tables Top 10 Sprocs last 1 hour by total run time and Top 10 Sprocs last 3 hours by total run time are not displayed, while on the same page, Load Average 15min Sprocs only shows a graph. Furthermore the following tables dont show anything:

  • Perf. Issues - Tables
  • Perf. Issues - SProcs
  • Perf. Issues - Indexes
  • Pg_Stat_Statements report

Especially Pg_Stat_Statements report is very important for me, but does not display any data. I have already configured the pg_stat_statement module in my cluster like this (added in postgresql.conf):

shared_preload_libraries = 'pg_stat_statements'         # (change requires restart)
pg_stat_statements.max = 10000
pg_stat_statements.track = all

Restarted everything, but still no luck.

With this enabled, i am able to do a
select * from pg_stat_statement
when connected to my database in psql and get the expected result, but not in PGObserver web frontend.

Furthermore i get the following error message when trying to access the "Top Tables" page:

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 656, in respond
response.body = self.handler()
File "/usr/local/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 188, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 34, in __call__
return self.callable(*self.args, **self.kwargs)
File "/home/ilexx/Downloads/PGObserver/frontend/src/tablesfrontend.py", line 132, in index
top_tables = tabledata.getTopTables(None, date_from, date_to, order, limit)
File "/home/ilexx/Downloads/PGObserver/frontend/src/tabledata.py", line 388, in getTopTables
pattern = '%' + pattern + '%' 
TypeError: cannot concatenate 'str' and 'NoneType' objects

Are there some configurations of postgresql that i am missing?
Could you please give me some more advice.
So far, thanks...

Add database column to Performance Analyses views

At the moment we have multiple databases in one cluster. We have even multiple clusters on one host.

If we want to use the Performance Analyses views we can only select hosts. If you have configured multiple databases on one cluster and one host you have multiple hosts in the selection box but no way to view what database you are actually on.

It would be really helpful to use the UI Longname or UI Shortname in the menu to view the performance-stuff, in stead off the hostname. That way you can view all the other databases on the same host or cluster as well.

Why not gathering public schema procedures

When the public schema procedures drops In SprocGatherer#getQuery?
I have some procedures in public schema and i need to gather their statistics, Can i include public schema in sql query If there is not side effect?

How to measure Sprocs (stored procedures) ?

It's not very clear to me how to measure the sproc statistics.

So far I know I've got these 3 tables:

sproc_performance_data
sproc_schemas_monitoring_configuration
sprocs

This is my sproc_schemas_monitoring_configuration table:
screen shot 2015-11-04 at 15 52 31

Do I need to insert rows in those other 2 tables? Or should this be enough? Or should the scmc_host_id be equal to the host id in the hosts table, which is monitored?

Index Statistics per object

We're using pgObserver to monitor our postgresql instances. It's really nice and has provided us with an easy way to view database activity.

As best as I can tell from looking at the gatherer code there is currently no collection of index statistics. It would be really useful if information about index use and bloat was available.

We're interested in the data that is available from postgres as described here:
https://wiki.postgresql.org/wiki/Index_Maintenance
http://www.postgresql.org/docs/current/static/monitoring-stats.html#MONITORING-STATS-SETUP

Is this feature planned?

Thanks for making pgObserver!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.