Giter VIP home page Giter VIP logo

eida-statistics's People

Contributors

actions-user avatar jschaeff avatar vpet98 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

eida-statistics's Issues

Use a templating system for the documentation

In order to insert dynamic content (like the full URL) in the documentation, it would be interesting to use a templating system (jinja2 is a popular option).

The URL prefix can be taken from an environment variable like

EIDASTATS_API_HOST=server.exemple.gr
EIDASTATS_API_PATH=/eidaws/statistics/1

Permission error on table dataselect_stats problem is not reported to client

We should reply error 500 in such cases and rollback the transaction.

2023-04-06 14:30:23,068 INFO  [ws_eidastats.helper_functions:134][MainThread] Registering 3557 statistics.
2023-04-06 14:30:23,094 ERROR [ws_eidastats.helper_functions:142][MainThread] Postgresql error 42501 registering statistic
2023-04-06 14:30:23,094 ERROR [ws_eidastats.helper_functions:143][MainThread] ERROR:  permission denied for table dataselect_stats

2023-04-06 14:30:23,094 INFO  [ws_eidastats.helper_functions:144][MainThread] Statistics successfully registered

inconsistency in clients cardinality.

(Reported by @vpet98)
I noticed some inconsistency, to an extent that I don't know if should be ignored, about the number of clients and HLL objects in the results that the webservice returns.

Try this: https://ws.resif.fr/eidaws/statistics/1/dataselect/public?start=2023-01&country=GR&details=country&format=json
And then the same in node level: https://ws.resif.fr/eidaws/statistics/1/dataselect/public?start=2023-01&country=GR&level=node&details=country&format=json

You would expect adding the clients of the results of the second query to be approximately equal to the clients in the first query. But the difference is quite noticeable (first query 78 clients, second query in total 103 clients).
And is even worse for countries with more clients (in another example I had 2232 vs 3115 clients).

My SQL query includes this in the select clause: hll_union_agg(dataselect_stats.clients), which has to be correct.
Then I use this library: https://github.com/AdRoll/python-hll.
And as the library indicates in its README, I print the cardinality like this: HLL.from_bytes(NumberUtil.from_hex(row.clients[2:], 0, len(row.clients[2:]))).cardinality(), for each row that the SQL query returns.
7:50 PM

Could you have a quick look at it if there is time?

eida_statsman : add interface to manage networks and nodes policy

  • toggle default policy on a node
    • when an operator tries to change the policy on a node, there is 2 possible behaviours:
      • if default policy is changed to "open", then make sure that all networks is open, show to the operator the list of networks with resulting restriction
      • else, make sure all networks conform to the default policy. Opening networks has to be done manually
  • toggle policy on a network
  • list policies for networks (optionally filter by node)

Starttime mandatory

To be more consistent with other FDSN webservice and reduce the default amount of responses, make starttime mandatory, endtime can be optional.

All node upgrade eida-statistics-aggregator to 0.6.0

Hello @ALL

I released a new version for the dataselect statistics aggregator.
This release adds identification of temporary networks by their extended identifier. Wich is important in the statistics because otherwise we mix up statistics from different networks sharing the short network code.

Please all node, could you upgrade ? Depending on your installation method, this should not be much more work than:

pip3 install --upgrade eida-statistics-aggregator

Please note, minimal python version is 3.6 but it can run in it's isolated environment without problem. It has been tested up to python 3.10

Please report in this issue when you're done:

  • ODC
  • GFZ
  • KOERI
  • UIB-NORSAR
  • LMU
  • NIEP
  • ICGC
  • ETH
  • BGR
  • NOA
  • AFAD
  • INGV

Extra information at the top of CSV format

I like "a lot" the extra lines with comments you included at the top of the CSV (#40 ).
Could you please consider to include an extra piece of information?
For instance: rejected or malformed parameters?

# request_parameters: start=2022-01&end=2022-12&details=month&format=csv
# rejected_parameters: groupby=day

Sort CSV output

CSV output should be sorted by date when details=month or year

Exemple :
curl -X 'GET' 'https://ws.resif.fr/eidaws/statistics/1/dataselect/public?start=2022-01&end=2022-12&details=month&format=csv'

# version: 1.0.0
# request_parameters: start=2022-01&end=2022-12&details=month&format=csv
date,node,network,station,location,channel,country,bytes,nb_reqs,nb_successful_reqs,clients
2022-09,*,*,*,*,*,*,49249517419520,93309158,61742567,3752
2022-04,*,*,*,*,*,*,52075391539200,70253741,56097249,5135
2022-03,*,*,*,*,*,*,35866232961024,76959640,62862467,6096
2022-07,*,*,*,*,*,*,47809205437440,100682394,86495962,4220
2022-08,*,*,*,*,*,*,41827452808448,199812690,111005715,3361
2022-10,*,*,*,*,*,*,34598181185536,84436994,64883858,4267
2022-06,*,*,*,*,*,*,54756623463168,92399681,75015880,4025
2022-12,*,*,*,*,*,*,75743023855104,115305619,82503762,4524
2022-02,*,*,*,*,*,*,49705000816128,92485574,76534546,4626
2022-05,*,*,*,*,*,*,70791218339072,69100676,53027093,4143
2022-11,*,*,*,*,*,*,31853315838464,122664892,65935181,4714
2022-01,*,*,*,*,*,*,47874364480512,70079038,57161798,3733

All webservice methods in one Flask application

Curently, the webservices /statistics/1/* and /dataselectstats are written to be executed in separate flask applications.

I would like to serve both in one single application:

PUSH /dataselectstats => statistics ingestion
GET /dataselectstats => statistics query
GET /query
GET /health
GET / => documentation

Besides, do not declare all the statistics/1/ part in the routes, as they will be set on the deployment side.

You can reorganize the project to split the routes and the methods as you see fit.

Output of human example links

Hello,

Thanks for this very nice webservice.
Playing with the example links for human, I noted one question about the csv content.

The nb_reqs column appears always at None. Shouldn't it be at least the same number at the column nb_successful_reqs ?

Also, the country column is always showing *. Maybe this feature is not yet implemented ?

Group all statistics regarding restricted networks in "Other"

When giving statistics to a user that is not authorized to see stats
AND
When there is more than one level in the result
Show all the restricted statistics summed up in an "Other" network item.

If there is only one restricted network in the result, reply 403 unauthorized

empty stats for GFZ

Thanks for publishing this interface. When retrieving yearly network statistics for each node I get results for all nodes except GFZ:

https://ws.resif.fr/eidaws/statistics/1/dataselect/query?start=2022-01&end=2022-12&datacenter=GFZ&aggregate_on=month,station,country&format=json

returns an empty result. The same happens with unknown data center names. Better would be to return an error if the data center name is invalid.

I also tried "../submit/.." instead of "../dataselect/..". This doesn't work at all.

A simple method to get nodes and networks

We miss 2 public endpoints

  • /nodes to list all nodes in json format with their default policy
  • /networks to list all known networks with their restriction policy

The endpoint _nodes could be deleted.

Use just one connexion to database backend

Instead of issuing one connexion to the SQL backend on each request, use the SQL alchemy native method to interact with the database.

This is usually done with a singleton object managing the database connexion, and all the other functions build the SQL statement and pass it to this object.

Add a webservice for getting the statistics

First task for this is to build an API in the openAPI3 standard, for instance using the swagger online tools.

In order to imagine a suitable API, you can look at the matrix document. First 2 rows define the questions and the granularity level.

The code attached to this project needs a better documentation, I'm on it (see issue #11)

The datamodel is specified in the code : https://github.com/EIDA/eida-statistics/tree/main/backend_database

You can use this project to bring up your own empty database if needed.

You can create a directory for the webservice specification and implementation at the root of this project.

Layout of the documentation

  • Change the title (Swagger UI -> EIDA statitistics)
  • Remove the banner where user can change the opapi.yaml URL

Strange distribution of data from some nodes.

Something strange happens with network FR.

FR seems to be distributed through RESIF, ETH and ICGC.

It might be that the ETH logging for FR stops in the beginning of 2022?? so this might be a temporary problem, but it would be nice to understand what is happening and whether something needs to be fixed.

Clear bug is that the number of users per year only shows ETH.

image

See result of this query: https://ws.resif.fr/eidaws/statistics/1/dataselect/public?network=FR&start=2021-01&end=2023-12&level=node&format=json

public(2).csv

Change parameter aggregate_on

On /public and /restricted methods, change aggregate_on to:

level

  • one value in datacenter,network,station,location,channel
  • if no value is provided, the server responds at EIDA level, all datacenters grouped

details

Will show the details of the query.
Possible values are:

  • month or year
  • countries

multiple values are allowed. If month and year are specified, reply 400 and a nice detail.

Inefficient caching of FdsnNetExtender.extend()

FdsnNetExtender.extend(self, net, date_string) has lru_cache(maxsize=1000), but since date_string is different most of the time, caching seems to be inefficient. In any case, I can observe urls like http://www.fdsn.org/ws/networks/1/query?fdsn_code=3E being downloaded hundreds of times. Sometimes this causes an exception, which seems to be the reason of incomplete statistics at GFZ.

Maybe date_string should be reduced to year (two different temporary networks with the same code never exist in same year?). Alternatively I would suggest caching the result of urlopen(request).

eidastats_man manage authorizations

We said that the central operator should manage authorizations for networks.

The cli eida_statsman should help us do that.

eida_statsman network set group ABCD

Having openapi spec file served on the right protocol (https)

In the branch fix_openap3_proto the deployment uses pyramid_openapi3 delivered by vpet github repository.

Now, we need to force the protocol to https, I don't remember how to do so in the code. @vpet98 can you help ?

It should be configured with an environment variable EIDASTATS_API_PROTO

Add more functional tests

Wherever there is logic in the code, we should test that it does what it should.

For instance the restriction function with those use cases:

  • node with default policy OPEN, network without restriction inversion
  • node with default policy OPEN, network with restriction inversion
  • node with default policy CLOSED, network with restriction inversion
  • node with default policy CLOSED, network without restriction inversion

query without argument should fail

We should make some arguments mandatory and not allow sucking all the database by issuing /query without parameters ...

Maybe make one of start / end param mandatory

Upgrade fdsnnetextender

In order to fix #8 , I released a new version of the fdsnnetextender package on wich the aggregator relies.

@ALL could you make an update of fdsnnetextender on all nodes ?

pip install --upgrade fdsnnetextender

The targetted version is 3.3.0

  • ODC
  • GFZ
  • KOERI
  • UIB-NORSAR
  • LMU
  • NIEP
  • ICGC
  • ETH
  • BGR
  • NOA
  • INGV

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.