Giter VIP home page Giter VIP logo

azafea's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azafea's Issues

Add a composite index on ping country + created_at

@ramcq tried to run the following query:

SELECT DISTINCT p.country,
                (SELECT count(pq1.id)
                   FROM ping_v1 pq1
                   WHERE pq1.country = p.country
                   AND pq1.created_at >= '2019-01-01'::date
                   AND pq1.created_at < '2019-04-01'::date) AS q1,
                (SELECT count(pq2.id)
                   FROM ping_v1 pq2
                   WHERE pq2.country = p.country
                   AND pq2.created_at >= '2019-04-01'::date
                   AND pq2.created_at < '2019-07-01'::date) AS q2
  FROM ping_v1 p
  WHERE p.created_at >= (now() - '1 day'::interval);

And it was very slow. The query plan looks like this:

                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=4780.17..4782.27 rows=144 width=32)
   ->  Sort  (cost=4780.17..4780.70 rows=210 width=32)
         Sort Key: p.country, ((SubPlan 1)), ((SubPlan 2))
         ->  Bitmap Heap Scan on ping_v1 p  (cost=5.78..4772.07 rows=210 width=32)
               Recheck Cond: (created_at >= (now() - '1 day'::interval))
               ->  Bitmap Index Scan on ix_ping_v1_created_at  (cost=0.00..5.73 rows=210 width=0)
                     Index Cond: (created_at >= (now() - '1 day'::interval))
               SubPlan 1
                 ->  Aggregate  (cost=11.31..11.32 rows=1 width=8)
                       ->  Bitmap Heap Scan on ping_v1 pq1  (cost=4.18..11.30 rows=1 width=4)
                             Recheck Cond: ((created_at >= '2019-01-01'::date) AND (created_at < '2019-04-01'::date))
                             Filter: ((country)::text = (p.country)::text)
                             ->  Bitmap Index Scan on ix_ping_v1_created_at  (cost=0.00..4.18 rows=3 width=0)
                                   Index Cond: ((created_at >= '2019-01-01'::date) AND (created_at < '2019-04-01'::date))
               SubPlan 2
                 ->  Aggregate  (cost=11.31..11.32 rows=1 width=8)
                       ->  Bitmap Heap Scan on ping_v1 pq2  (cost=4.18..11.30 rows=1 width=4)
                             Recheck Cond: ((created_at >= '2019-04-01'::date) AND (created_at < '2019-07-01'::date))
                             Filter: ((country)::text = (p.country)::text)
                             ->  Bitmap Index Scan on ix_ping_v1_created_at  (cost=0.00..4.18 rows=3 width=0)
                                   Index Cond: ((created_at >= '2019-04-01'::date) AND (created_at < '2019-07-01'::date))

Adding an index on country does not have any significant impact.

However, adding a composite index on country and created_at, the query plan becomes:

                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=3465.26..3467.36 rows=144 width=32)
   ->  Sort  (cost=3465.26..3465.78 rows=210 width=32)
         Sort Key: p.country, ((SubPlan 1)), ((SubPlan 2))
         ->  Bitmap Heap Scan on ping_v1 p  (cost=5.78..3457.16 rows=210 width=32)
               Recheck Cond: (created_at >= (now() - '1 day'::interval))
               ->  Bitmap Index Scan on ix_ping_v1_created_at  (cost=0.00..5.73 rows=210 width=0)
                     Index Cond: (created_at >= (now() - '1 day'::interval))
               SubPlan 1
                 ->  Aggregate  (cost=8.17..8.18 rows=1 width=8)
                       ->  Index Scan using ix_ping_v1_country_created_at on ping_v1 pq1  (cost=0.15..8.17 rows=1 width=4)
                             Index Cond: (((country)::text = (p.country)::text) AND (created_at >= '2019-01-01'::date) AND (created_at < '2019-04-01'::date))
               SubPlan 2
                 ->  Aggregate  (cost=8.17..8.18 rows=1 width=8)
                       ->  Index Scan using ix_ping_v1_country_created_at on ping_v1 pq2  (cost=0.15..8.17 rows=1 width=4)
                             Index Cond: (((country)::text = (p.country)::text) AND (created_at >= '2019-04-01'::date) AND (created_at < '2019-07-01'::date))

The query still has a high cost, but it's roughly 27% better.

This will require implementing #4 first.

Consider BRIN indexes on created_at columns

We have many very large tables with a creation date column. The obvious example is ping_v1, which has 42 million rows at the time of writing and a created_at column recording the time the ping was received. There is, of course, an index on created_at. Empirically, however, postgresql only uses it for ranges below about 4½ months; beyond that it prefers to perform a seq scan of the whole table, which takes almost a minute.

During a random walk through the PostgreSQL documentation (I need better hobbies) I stumbled upon BRIN indexes.

BRIN stands for Block Range Index. BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table. […] For example, a table storing a store's sale orders might have a date column on which each order was placed, and most of the time the entries for earlier orders will appear earlier in the table as well[.]

BRIN indexes can satisfy queries via regular bitmap index scans, and will return all tuples in all pages within each range if the summary info stored by the index is consistent with the query conditions. The query executor is in charge of rechecking these tuples and discarding those that do not match the query conditions — in other words, these indexes are lossy. Because a BRIN index is very small, scanning the index adds little overhead compared to a sequential scan, but may avoid scanning large parts of the table that are known not to contain matching tuples.

created_at actually comes from the timestamp at which eos-activation-server received the ping, and rows of this table may be written by the 2 instances of the azafea daemon pulling from redis in parallel, so it is not guaranteed that rows in physical order are also in created_at order. However they are very strongly correlated. Assuming that the id field correlates exactly with physical order, of the 28771 rows in the past hour at the time of writing, only 63 are not in ascending order when sorted by created_at. Across the entire table of 42 million rows, 91 304 are out of order, by a total of offset 72.

So I believe that switching the created_at index to be a BRIN index would allow it to be used in more cases. It's worth a try at least.

According to the documentation some special maintenance is needed for BRIN indexes though.

Allow ignoring some metrics events

Some events have changed UUID over time, and the old UUIDs have been deprecated.

Currently those old events end up in the "unknown" tables, where they accumulate for no purpose.

We should have an ignore list for those old events.

Add a command to renormalize vendors

Something like:

$ azafea -c config.toml normalize-vendors MODEL_NAME COLUMN_NAME

This will be necessary to really fix #22.

And since the current vendor mapping is something I threw down quickly, it's going to go through improvements over time, and so we will need such a command when that happens.

Docker image does not build

Yo. With EOL of Ubuntu 19.04, the Docker image no longer builds:

$ podman build --tag azafea .
STEP 1: FROM ubuntu:disco
Getting image source signatures
Copying blob 0a4ccbb24215 done
Copying blob 4dc9c2fff018 done
Copying blob 5ff1eaecba77 done
Copying blob c0f243bc6706 done
Copying config c88ac1f841 done
Writing manifest to image destination
Storing signatures
STEP 2: ENV LANG C.UTF-8
03253aa050eac02e2a013a9d976efeca759419a3687e9415bd0763e8510fad59
STEP 3: WORKDIR /opt/azafea/src
32bcd4c990adefe5f48b33dc0e34fdc8ac2c0827e3752f101131906771e1a320
STEP 4: COPY Pipfile.lock .
4fa7b5c1b7438a8a0e627e2ed5f2a141d85cda003eb1c2022ffa0a11ef2afc0f
STEP 5: ARG build_type
63956f5e5f602095f3fbb575998a2ad1a17836569126ab1688a704fd3d6ceaad
STEP 6: RUN apt --quiet --assume-yes update &&     apt --quiet --assume-yes --no-install-recommends install         gcc         gir1.2-glib-2.0         gobject-introspection         libcairo2-dev         libffi-dev         libgirepository-1.0-1         libgirepository1.0-dev         libglib2.0-dev         libpq5         libpq-dev         python3         python3-dev         python3-pip         python3-setuptools         python3-wheel         &&     pip3 install pipenv &&     pipenv install --ignore-pipfile &&     if [ "${build_type}" = "dev" ]; then         pipenv install --ignore-pipfile --dev     ; else         apt --quiet --assume-yes autoremove --purge             gcc             libcairo2-dev             libffi-dev             libgirepository1.0-dev             libglib2.0-dev             libpq-dev             python3-dev             &&         rm -rf /var/cache/{apt,debconf}                /var/lib/apt/lists/*                /var/log/{apt,dpkg.log}                ~/.cache     ; fi
Ign:1 http://security.ubuntu.com/ubuntu disco-security InRelease
Err:2 http://security.ubuntu.com/ubuntu disco-security Release
  404  Not Found [IP: 2001:67c:1562::15 80]
Ign:3 http://archive.ubuntu.com/ubuntu disco InRelease
Ign:4 http://archive.ubuntu.com/ubuntu disco-updates InRelease
Ign:5 http://archive.ubuntu.com/ubuntu disco-backports InRelease
Err:6 http://archive.ubuntu.com/ubuntu disco Release
  404  Not Found [IP: 2001:67c:1360:8001::24 80]
Err:7 http://archive.ubuntu.com/ubuntu disco-updates Release
  404  Not Found [IP: 2001:67c:1360:8001::24 80]
Err:8 http://archive.ubuntu.com/ubuntu disco-backports Release
  404  Not Found [IP: 2001:67c:1360:8001::24 80]
Reading package lists...
E: The repository 'http://security.ubuntu.com/ubuntu disco-security Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu disco-backports Release' does not have a Release file.
Error: error building at STEP "RUN apt --quiet --assume-yes update &&     apt --quiet --assume-yes --no-install-recommends install         gcc         gir1.2-glib-2.0         gobject-introspection         libcairo2-dev         libffi-dev         libgirepository-1.0-1         libgirepository1.0-dev         libglib2.0-dev         libpq5         libpq-dev         python3         python3-dev         python3-pip         python3-setuptools         python3-wheel         &&     pip3 install pipenv &&     pipenv install --ignore-pipfile &&     if [ "${build_type}" = "dev" ]; then         pipenv install --ignore-pipfile --dev     ; else         apt --quiet --assume-yes autoremove --purge             gcc             libcairo2-dev             libffi-dev             libgirepository1.0-dev             libglib2.0-dev             libpq-dev             python3-dev             &&         rm -rf /var/cache/{apt,debconf}                /var/lib/apt/lists/*                /var/log/{apt,dpkg.log}                ~/.cache     ; fi": error while running runtime: exit status 100

It seems Ubuntu deletes its update repos once a release hits EOL? That's pretty aggressive. Anyway, I tried upgrading to 19.10:

diff --git a/Dockerfile b/Dockerfile
index 425b321..1df323b 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,4 +1,4 @@
-FROM ubuntu:disco
+FROM ubuntu:eoan
 
 ENV LANG C.UTF-8

But that fails because it hits a file descriptor limit:

Collecting zipp>=0.5 (from importlib-metadata<2,>=0.12; python_version < "3.8"->virtualenv->pipenv)
  Downloading https://files.pythonhosted.org/packages/b2/34/bfcb43cc0ba81f527bc4f40ef41ba2ff4080e047acb0586b56b3d017ace4/zipp-3.1.0-py3-none-any.whl
Building wheels for collected packages: distlib
  Running setup.py bdist_wheel for distlib ... done
  Stored in directory: /root/.cache/pip/wheels/6e/e8/db/c73dae4867666e89ba3cfbc4b5c092446f0e584eda6f409cbb
Successfully built distlib
Installing collected packages: certifi, virtualenv-clone, appdirs, filelock, distlib, zipp, importlib-metadata, virtualenv, pipenv
Successfully installed appdirs-1.4.3 certifi-2020.4.5.1 distlib-0.3.0 filelock-3.0.12 importlib-metadata-1.6.0 pipenv-2018.11.26 virtualenv-20.0.18 virtualenv-clone-0.5.4 zipp-3.1.0
Exception:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/_internal/commands/install.py", line 430, in run
    wheel_cache.cleanup()
  File "/usr/lib/python3/dist-packages/pip/_internal/utils/temp_dir.py", line 58, in __exit__
    self.cleanup()
  File "/usr/lib/python3/dist-packages/pip/_internal/utils/temp_dir.py", line 81, in cleanup
    rmtree(self.path)
  File "/usr/share/python-wheels/retrying-1.3.3-py2.py3-none-any.whl/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/share/python-wheels/retrying-1.3.3-py2.py3-none-any.whl/retrying.py", line 212, in call
    raise attempt.get()
  File "/usr/share/python-wheels/retrying-1.3.3-py2.py3-none-any.whl/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/share/python-wheels/six-1.12.0-py2.py3-none-any.whl/six.py", line 693, in reraise
    raise value
  File "/usr/share/python-wheels/retrying-1.3.3-py2.py3-none-any.whl/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/lib/python3/dist-packages/pip/_internal/utils/misc.py", line 111, in rmtree
    onerror=rmtree_errorhandler)
  File "/usr/lib/python3.7/shutil.py", line 494, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.7/shutil.py", line 432, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.7/shutil.py", line 432, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.7/shutil.py", line 432, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 2 more times]
  File "/usr/lib/python3.7/shutil.py", line 436, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/usr/lib/python3.7/shutil.py", line 434, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 24] Too many open files: '__pycache__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/base_command.py", line 143, in main
    status = self.run(options, args)
  File "/usr/lib/python3/dist-packages/pip/_internal/commands/install.py", line 430, in run
    wheel_cache.cleanup()
  File "/usr/lib/python3/dist-packages/pip/_internal/req/req_tracker.py", line 32, in __exit__
    self.cleanup()
  File "/usr/lib/python3/dist-packages/pip/_internal/req/req_tracker.py", line 67, in cleanup
    self._temp_dir.cleanup()
  File "/usr/lib/python3/dist-packages/pip/_internal/utils/temp_dir.py", line 81, in cleanup
    rmtree(self.path)
  File "/usr/share/python-wheels/retrying-1.3.3-py2.py3-none-any.whl/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/share/python-wheels/retrying-1.3.3-py2.py3-none-any.whl/retrying.py", line 212, in call
    raise attempt.get()
  File "/usr/share/python-wheels/retrying-1.3.3-py2.py3-none-any.whl/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/share/python-wheels/six-1.12.0-py2.py3-none-any.whl/six.py", line 693, in reraise
    raise value
  File "/usr/share/python-wheels/retrying-1.3.3-py2.py3-none-any.whl/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/lib/python3/dist-packages/pip/_internal/utils/misc.py", line 111, in rmtree
    onerror=rmtree_errorhandler)
  File "/usr/lib/python3.7/shutil.py", line 485, in rmtree
    onerror(os.lstat, path, sys.exc_info())
  File "/usr/lib/python3/dist-packages/pip/_internal/utils/misc.py", line 119, in rmtree_errorhandler
    if os.stat(path).st_mode & stat.S_IREAD:
OSError: [Errno 24] Too many open files: '/tmp/pip-req-tracker-1urt61iw'
Error: error building at STEP "RUN apt --quiet --assume-yes update &&     apt --quiet --assume-yes --no-install-recommends install         gcc         gir1.2-glib-2.0         gobject-introspection         libcairo2-dev         libffi-dev         libgirepository-1.0-1         libgirepository1.0-dev         libglib2.0-dev         libpq5         libpq-dev         python3         python3-dev         python3-pip         python3-setuptools         python3-wheel         &&     pip3 install pipenv &&     pipenv install --ignore-pipfile &&     if [ "${build_type}" = "dev" ]; then         pipenv install --ignore-pipfile --dev     ; else         apt --quiet --assume-yes autoremove --purge             gcc             libcairo2-dev             libffi-dev             libgirepository1.0-dev             libglib2.0-dev             libpq-dev             python3-dev             &&         rm -rf /var/cache/{apt,debconf}                /var/lib/apt/lists/*                /var/log/{apt,dpkg.log}                ~/.cache     ; fi": error while running runtime: exit status 2

I tried increasing the soft fd limit:

diff --git a/Dockerfile b/Dockerfile
index 425b321..2b21912 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -7,6 +7,7 @@ WORKDIR /opt/azafea/src
 COPY Pipfile.lock .
 
 ARG build_type
+RUN ulimit -n 4096
 RUN apt --quiet --assume-yes update && \
     apt --quiet --assume-yes --no-install-recommends install \
         gcc 

That failed because both the hard and soft limits inside the container build are 1024. (Not sure why; my host machine has a higher hard limit, so podman must be lowering the limit itself.)

I tried increasing the hard limit as well:

diff --git a/Dockerfile b/Dockerfile
index 425b321..f47e2cf 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,4 +1,4 @@
-FROM ubuntu:disco
+FROM ubuntu:eoan
 
 ENV LANG C.UTF-8
 
@@ -7,6 +7,9 @@ WORKDIR /opt/azafea/src
 COPY Pipfile.lock .
 
 ARG build_type
+
+RUN ulimit -H -n 4096
+RUN ulimit -S -n 4096
 RUN apt --quiet --assume-yes update && \
     apt --quiet --assume-yes --no-install-recommends install \
         gcc \

That failed:

STEP 6: RUN ulimit -H -n 4096
/bin/sh: 1: ulimit: error setting limit (Operation not permitted)
Error: error building at STEP "RUN ulimit -H -n 4096": error while running runtime: exit status 2

Not quite sure why I have permission to use apt but not ulimit. I don't know much about containers.

I tried upgrading to 20.04, since that seems like a good environment to target for the next couple years:

diff --git a/Dockerfile b/Dockerfile
index 425b321..1df323b 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,4 +1,4 @@
-FROM ubuntu:disco
+FROM ubuntu:focal
 
 ENV LANG C.UTF-8

That failed due to dpkg repeatedly crashing. The focal image seems to be in real bad shape.

I gave up on Ubuntu and switched to Debian:

diff --git a/Dockerfile b/Dockerfile
index 425b321..1df323b 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,4 +1,4 @@
-FROM ubuntu:disco
+FROM debian:buster
 
 ENV LANG C.UTF-8

That got a lot farther than the other attempts, but eventually failed after installing pipenv:

To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
E: Could not open lock file /var/lib/dpkg/lock-frontend - open (2: No such file or directory)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?
Error: error building at STEP "RUN apt --quiet --assume-yes update &&     apt --quiet --assume-yes --no-install-recommends install         gcc         gir1.2-glib-2.0         gobject-introspection         libcairo2-dev         libffi-dev         libgirepository-1.0-1         libgirepository1.0-dev         libglib2.0-dev         libpq5         libpq-dev         python3         python3-dev         python3-pip         python3-setuptools         python3-wheel         &&     pip3 install pipenv &&     pipenv install --ignore-pipfile &&     if [ "${build_type}" = "dev" ]; then         pipenv install --ignore-pipfile --dev     ; else         apt --quiet --assume-yes autoremove --purge             gcc             libcairo2-dev             libffi-dev             libgirepository1.0-dev             libglib2.0-dev             libpq-dev             python3-dev             &&         rm -rf /var/cache/{apt,debconf}                /var/lib/apt/lists/*                /var/log/{apt,dpkg.log}                ~/.cache     ; fi": error while running runtime: exit status 100

I tried converting the dockerfile to use CentOS, but hit the 1024 fd limit there too. I even used ulimit -H -n 4096 to raise the hard fd limit on my host to make sure that's not used to set the ulimit inside the container, but that had no effect. At this point, I think I'll give up on containers and try installing the old-fashioned way....

Make it easier to write integration tests

Writing integration tests at the moment is a bit too complex. We want more tests, and as such we want to make writing them easier. 🙂

There is a lot of duplication right now between all the existing integration tests. Can we share some of it?

Each test function is responsible for clearing its database (dropping all tables) at the end, but it's very easy to forget to do that, and then we start having interferences between test functions which cause failures that are very hard to debug. In addition, if an integration test fails before the end, then its database won't be cleared.

Implement ACL in PotsgreSQL

Azafea processes and stores everything it gets sent.

In the case of Endless, that means it stores data from multiple deployments.

We eventually want Endless employees and contractors not to have access to everything, but only to the data corresponding to the deployments they work on.

That means figuring out a way to allow access only to certain rows when querying.

One possibility is to grant row-level permissions: https://www.postgresql.org/docs/11/ddl-rowsecurity.html

Another possibility would be to create views which pre-filter the data, and grant people access only to certain views.

Add a dummy handler by default?

We could have a handler which does nothing except logging the values it pulls from its Redis queue.

This would provide a nice way to check that everything is working before spending time on writing a custom event handler.

It would also provide a working configuration out of the box, contrarily to what we have now (Azafea refuses to run unless admins configure an event handler), as well as an example handler to help write custom ones.

However, we need to make sure that this dummy handler is disabled when other handlers are configured.

Figure out data migrations

Right now, event handlers can define their custom model to store data in them. The tables are created before running Azafea, with the initdb command.

However, if an existing model needs to change its schema, then we would have no way other than moving its current data to a different table, then running initdb again and finally moving the old data into the new table with an ad-hoc script.

We can probably do something better with Alembic, adding a new migrate command which would run the migrations of all the models associated to configured queues.

This will also require some documentation:

  • the page about how to write custom handlers will need some additions for writing migrations;
  • the deployment page will need to cover how to run migrations.

Better handle missing configuration options

The config system is nice in that it validates the configuration, and specifically the types of the options, hopefully giving administrators good error messages about what failed.

But try it with the following configuration:

[queues.te]

And you get a completely mystifying error:

Traceback (most recent call last):
  File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mathieu/Projects/endless/azafea/azafea/__main__.py", line 26, in <module>
    sys.exit(args.subcommand(args))
  File "/home/mathieu/Projects/endless/azafea/azafea/cli.py", line 120, in do_print_config
    config = Config.from_file(args.config)
  File "/home/mathieu/Projects/endless/azafea/azafea/config/__init__.py", line 177, in from_file
    queues[name] = Queue(**queue_options)
TypeError: __init__() missing 1 required positional argument: 'handler'

We should be able to do better.

Handle failing connections to Redis/PostgreSQL

Right now, Azafea checks whether it can connect to Redis and PostgreSQL at startup.

However, those connections might break during operation, and Azafea can't just crash when that happens.

Error replaying unknown metrics

In the dev environment we got the following error running metrics-2 replay-unknown:

[ERROR] azafea.event_processors.endless.metrics.events._base: Metric event c75af67f-cf2f-433d-a060-a670087d93a1 takes no payload, but got <<@a{sv} {}>>
[ERROR] azafea.event_processors.endless.metrics.events._base: An error occured while processing the event:
Traceback (most recent call last):
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 520, in replay_unknown_singular_events
    payload=payload)
  File "<string>", line 4, in __init__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 441, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 438, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 122, in __init__
    payload_fields = self._parse_payload(payload)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 145, in _parse_payload
    raise WrongPayloadError(f'Metric event {self.__event_uuid__} needs a '
azafea.event_processors.endless.metrics.events._base.WrongPayloadError: Metric event 449ec188-cb7b-45d3-a0ed-291d943b9aa6 needs a a{sv} payload, but got [{'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <false>, 'AllowUserInstallation': <true>, 'IsAdministrator': <false>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}] (aa{sv})
[ERROR] azafea.event_processors.endless.metrics.events._base: An error occured while processing the event:
Traceback (most recent call last):
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 520, in replay_unknown_singular_events
    payload=payload)
  File "<string>", line 4, in __init__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 441, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 438, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 122, in __init__
    payload_fields = self._parse_payload(payload)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 145, in _parse_payload
    raise WrongPayloadError(f'Metric event {self.__event_uuid__} needs a '
azafea.event_processors.endless.metrics.events._base.WrongPayloadError: Metric event 449ec188-cb7b-45d3-a0ed-291d943b9aa6 needs a a{sv} payload, but got [{'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <false>, 'AllowUserInstallation': <true>, 'IsAdministrator': <false>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}] (aa{sv})
[ERROR] azafea.event_processors.endless.metrics.events._base: An error occured while processing the event:
Traceback (most recent call last):
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 520, in replay_unknown_singular_events
    payload=payload)
  File "<string>", line 4, in __init__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 441, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 438, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 122, in __init__
    payload_fields = self._parse_payload(payload)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 145, in _parse_payload
    raise WrongPayloadError(f'Metric event {self.__event_uuid__} needs a '
azafea.event_processors.endless.metrics.events._base.WrongPayloadError: Metric event 449ec188-cb7b-45d3-a0ed-291d943b9aa6 needs a a{sv} payload, but got [{'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <false>, 'AllowUserInstallation': <true>, 'IsAdministrator': <false>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}] (aa{sv})
[ERROR] azafea.event_processors.endless.metrics.events._base: An error occured while processing the event:
Traceback (most recent call last):
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 520, in replay_unknown_singular_events
    payload=payload)
  File "<string>", line 4, in __init__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 441, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 438, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 122, in __init__
    payload_fields = self._parse_payload(payload)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 145, in _parse_payload
    raise WrongPayloadError(f'Metric event {self.__event_uuid__} needs a '
azafea.event_processors.endless.metrics.events._base.WrongPayloadError: Metric event 449ec188-cb7b-45d3-a0ed-291d943b9aa6 needs a a{sv} payload, but got [{'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <false>, 'AllowUserInstallation': <true>, 'IsAdministrator': <false>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}] (aa{sv})
[ERROR] azafea.event_processors.endless.metrics.events._base: An error occured while processing the event:
Traceback (most recent call last):
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 520, in replay_unknown_singular_events
    payload=payload)
  File "<string>", line 4, in __init__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 441, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 438, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 122, in __init__
    payload_fields = self._parse_payload(payload)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 145, in _parse_payload
    raise WrongPayloadError(f'Metric event {self.__event_uuid__} needs a '
azafea.event_processors.endless.metrics.events._base.WrongPayloadError: Metric event 449ec188-cb7b-45d3-a0ed-291d943b9aa6 needs a a{sv} payload, but got [{'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <false>, 'AllowUserInstallation': <true>, 'IsAdministrator': <false>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}] (aa{sv})
[ERROR] azafea.event_processors.endless.metrics.events._base: An error occured while processing the event:
Traceback (most recent call last):
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 520, in replay_unknown_singular_events
    payload=payload)
  File "<string>", line 4, in __init__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 441, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 438, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 122, in __init__
    payload_fields = self._parse_payload(payload)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 145, in _parse_payload
    raise WrongPayloadError(f'Metric event {self.__event_uuid__} needs a '
azafea.event_processors.endless.metrics.events._base.WrongPayloadError: Metric event 449ec188-cb7b-45d3-a0ed-291d943b9aa6 needs a a{sv} payload, but got [{'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <false>, 'AllowUserInstallation': <true>, 'IsAdministrator': <false>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}] (aa{sv})
[ERROR] azafea.event_processors.endless.metrics.events._base: An error occured while processing the event:
Traceback (most recent call last):
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 520, in replay_unknown_singular_events
    payload=payload)
  File "<string>", line 4, in __init__
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 441, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 438, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 122, in __init__
    payload_fields = self._parse_payload(payload)
  File "/azafea/azafea/event_processors/endless/metrics/events/_base.py", line 145, in _parse_payload
    raise WrongPayloadError(f'Metric event {self.__event_uuid__} needs a '
azafea.event_processors.endless.metrics.events._base.WrongPayloadError: Metric event 449ec188-cb7b-45d3-a0ed-291d943b9aa6 needs a a{sv} payload, but got [{'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <false>, 'AllowUserInstallation': <true>, 'IsAdministrator': <false>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}, {'AllowSystemInstallation': <true>, 'AllowUserInstallation': <true>, 'IsAdministrator': <true>, 'OarsFilter': <('oars-1.1', @a{ss} {})>, 'AppFilter': <(false, @as [])>}] (aa{sv})
**
GLib:ERROR:../glib/gvariant-serialiser.c:1367:g_variant_serialised_n_children: code should not be reached

Try out timescaledb

Timescaledb is a PostgreSQL extension: https://github.com/timescale/timescaledb

It seems like it could help manage the database going forward, adding things like automatic partitioning of time series.

It could even improve performance depending on the size of the data: https://blog.timescale.com/timescaledb-vs-6a696248104e/

However, there are two caveats to pay attention to:

  1. how does this work with SQLAlchemy?

    it's probably possible to make the two work nicely together, but it might take a bit of experimentation at first;

  2. we need to be very careful to stick to the open source edition

    timescaledb is open-core, and the source is available even for the proprietary additions, so we need to carefully check which of its features we use and not depend on the nonfree ones (at least not in Azafea itself, custom event handlers are free to do that if they are willing to pay for the proprietary features)

Should we move the activation/ping handlers out of this repository?

The way Azafea is designed is it lets anybody write their own handlers, depending on the metrics they collect and how they want to process and store them in PostgreSQL.

We currently have two event handlers in the tree:

  • azafea/event_processors/activation/v1.py
  • azafea/event_processors/ping/v1.py

Both are specific to the kind of data Endless collects and how we process it, and they might not make sense for other organizations interested in deploying Azafea.

It could also create false expectations that we might accept all the custom handlers from everybody in this repository, or even that people have to submit their handlers or else Azafea won't be able to load them.

Should we move them out to a separate repository, maybe something like azafea_endless?

Docker image does not use config generated by entrypoint

In the current code, the entrypoint script templates config.toml.j2 to /tmp/config.toml from environment variables. However, the full ENTRYPOINT command does not instruct azafea to use it and instead it tries to read the configuration from /etc/azafea/config.toml unless you specify the command at runtime as -c /tmp/config.toml ....

The fix would be simple - add -c /tmp/config.toml to the ENTRYPOINT array. However, that's at odds with the running documentation. That says you should bind mount /etc/azafea from the host with the expectation that azafea will use /etc/azafea/config.toml. If the ENTRYPOINT command is changed to use -c /tmp/config.toml, that usage would break.

I think the options are:

  1. Make the change to have the image use /tmp/config.toml and update the documentation to pass the configuration as environment variables. This is how the postgres image (and many others in the Docker world) are handled.
  2. Drop the custom entrypoint script and assume the configuration has been been added to the container at /etc/azafea/config.toml like the documentation says.

I prefer 1 as it fits better with container orchestration software. The reason this came up is that someone is trying to use azafea on AWS ECS and it can't really work out of the box without some customization. In either case you can always build an image with a custom entrypoint to handle the configuration in the way you prefer. That's what we do at Endless and why we don't run into this problem.

Instructions for installing postgres don't work

The instructions for installing Postgres have a couple problems. First problem is that the instructions start out using /var/lib/pgsql/ but then later switch to /var/lib/postgresql. Only the first two lines use the /var/lib/pgsql/ form, so they should probably be switched to /var/lib/postgresql.

Then the next step only works when using sudo docker rather than podman:

$ podman run --env=PGDATA=/var/lib/postgresql/azafea/data/pgdata --env=POSTGRES_PASSWORD=S3cretPgAdminP@ssw0rd --publish=5432:5432 --volume=/var/lib/postgresql/azafea/data:/var/lib/postgresql/azafea/data:rw postgres:latest
mkdir: cannot create directory ‘/var/lib/postgresql/azafea/data/pgdata’: Permission denied

At this point in the instructions, /var/lib/postgresql/azafea/data already exists and is owned by root, but /var/lib/postgresql/azafea/data/pgdata` does not exist. If I create that directory too, then the podman command fails with:

chmod: changing permissions of '/var/lib/postgresql/azafea/data/pgdata': Permission denied

At this point, I'm wondering which user is supposed to own which directories here. I understand that the next step in the installation instructions has me creating an azafea user inside the container, but no such user exists on the host system, and if I were to create that user on the host, then the uids would not match anyway. Probably this is an easy problem, but I'm a stupid desktop developer and don't know anything.

Optimize some queries

Here is a dump of queries I've been asked to run on activation/ping data.

Real-world queries provide good insight into what should be optimized and how (adding indexes, splitting columns, adding pre-processed tables, …)

Let's make good use of PostgreSQL's EXPLAIN on those!


  • Countries with the most new OEM activations over the past 365 days:
SELECT count(id) AS count, country
  FROM activation_v1
  WHERE image LIKE 'eosoem-%'
    AND created_at >= NOW() - INTERVAL '365 DAYS'
  GROUP BY country
  ORDER BY count DESC
  LIMIT 10;
  • Countries with the most active OEM installations over the past 365 days, limited to machines which have sent 8 pings or more:
SELECT count(ping_v1.id) AS count, ping_v1.country
  FROM ping_v1
  JOIN ping_configuration_v1 ON ping_v1.config_id = ping_configuration_v1.id
  WHERE ping_configuration_v1.image LIKE 'eosoem-%'
    AND ping_v1.count >= 8
    AND ping_v1.created_at >= NOW() - INTERVAL '365 DAYS'
  GROUP BY ping_v1.country
  ORDER BY count DESC
  LIMIT 10;

Figure out storing very big integers in PostgreSQL

Some of the Endless metrics events have values as unsigned 64 bits integers.

The biggest integer PostgreSQL can store, bigint is a signed 64 bits integer, which is too small to store those.

We have a few possibilities, which need to be investigated, off the top of my head:

  1. pretend they are signed 64 bits integers: we could just store them anyway as is; the value in DB would be wrong, but it could be cast when stored/retrieved; that would make queries harder?

  2. store them as binary blobs (bytea): we could just store the binary representation of the numbers (e.g 10 would be stored as 1010); that would make queries harder?

  3. store them as strings: we could store the string representation of the numbers (e.g 10 would be stored as "10"); queries wouldn't be too hard, it would just be frustrating to always remember to use quotes;

  4. store them as numeric(20, 0): this type allows number of arbitratry size (up to 1000 digits!) with exact precision (as opposed to floats); however calculations with such numbers are very slow compared to integers;

There might be other possibilities…

Support Redis SSL

Currently if you want to connect to an SSL enabled Redis server, you need to use a TLS proxy. It should be possible to do that natively since redis-py appears to support it with an ssl keyword argument.

Better handle unknown configuration options

The config system is nice in that it validates the configuration, and specifically the types of the options, hopefully giving administrators good error messages about what failed.

But try it with the following configuration:

[main]
foo = "bar"

And you get a completely mystifying error:

Traceback (most recent call last):
  File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mathieu/Projects/endless/azafea/azafea/__main__.py", line 26, in <module>
    sys.exit(args.subcommand(args))
  File "/home/mathieu/Projects/endless/azafea/azafea/cli.py", line 120, in do_print_config
    config = Config.from_file(args.config)
  File "/home/mathieu/Projects/endless/azafea/azafea/config/__init__.py", line 161, in from_file
    main = Main(**overrides.get('main', {}))
TypeError: __init__() got an unexpected keyword argument 'foo'

We should be able to do better.

This is possible with a simple typo, for example:

[postgresql]
pasword = "super secret"

Such a typo would be hard to spot by someone stressfully trying to deploy Azafea with a deadline, and we can't just ignore the unknown option because then the default password would be used.

So we should completely abort on such cases (as is the case currently), but with a proper error message.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.