Giter VIP home page Giter VIP logo

ckan-service-provider's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ckan-service-provider's Issues

SSL handshake error resulting in "Process completed but unable to post to result_url"

Related to ckan/datapusher#82

Whenever CKAN runs with self-signed TLS certificate, DataPusher processes the data but fails to complete jobs with TLS handshake error.

2017/11/30 23:22:19 [info] 21780#21780: *278 SSL_do_handshake() failed (SSL: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca:SSL alert number 48) while SSL handshaking, client: 192.168.2.185, server: 0.0.0.0:5000

This happens even if SSL_VERIFY = False is set in datapusher_settings.py.

This is caused by web.py send_result() method which is called at the end of the job processsing. Since this metod is part of ckan-service-provider and not datapusher, it's unaffected by SSL_VERIFY setting and forces the verification anyway.

Show stats

How may jobs failed and how many succeeded at /status. Suggested by @tobes.

Release Needed to Fix SQLAlchemy Incompatibility

As fixed in #30, it is causing us a lot of trouble that the version of ckan-service-provider on PyPi requires a specific version of SQLAlchemy (see issue datacats/ckan-multisite#37, datacats/datacats#334).

If a release of ckan-service-provider were released on PyPi, this would allow our automated tooling to pull down the correct version of SQLAlchemy without weird hackery such as installing the correct version of SQLAlchemy manually or grabbing ckan-service-provider from Git.

Consider switching from APScheduler to celery

I really consider switching to celery before releasing the datapusher because there are a number of problems with APScheduler and the biggest disadvantage of celery, the setup, does not really apply to this because there won't be many set-ups anyway. Also, setting up the service with a separate worker process is not simple, either.

The reasons for celery are:

  • More functionalities, such as resubmitting
  • Better tool support such as https://github.com/mher/flower for monitoring
  • Better suited for web applications. In APScheduler you really have to think about the scope if your connections and everything. In celery people have already thought about all this.
  • Logging, APScheduler makes it really difficult (see http://stackoverflow.com/questions/15392058/capture-logs-in-apscheduler)
  • More docs and examples
  • Switching probably isn't too much work because we can reuse the routing and everything

@kindly Since you initially chose APSheduler, I'd be interested in your thoughts.

Engine disposal for true concurrency

In SQLAlchemy, the Engine refers to a connection pool.

Typically, "the Engine is intended to normally be a permanent fixture established up-front and maintained throughout the lifespan of an application. It is not intended to be created and disposed on a per-connection basis; it is instead a registry that maintains both a pool of connections as well as configurational information about the database and DBAPI in use, as well as some degree of internal caching of per-database resources."

However, as pointed out in the Engine Disposal section of https://docs.sqlalchemy.org/en/14/core/connections.html:

"When a program uses multiprocessing or fork(), and an Engine object is copied to the child process, Engine.dispose() should be called so that the engine creates brand new database connections local to that fork. Database connections generally do not travel across process boundaries."

This bug was unearthed while working on the datapusher to make it concurrent - i.e. having it use PostgreSQL, using multiple uwsgi workers.

ckan/datapusher#200
ckan/datapusher#198

ckan-service-provider maintains a job database using sqlalchemy. Currently, the SQLALCHEMY_DATABASE_URI defaults to sqlite - which is not meant to be used as a concurrent database, resulting in database locks.

Changing SQLALCHEMY_DATABASE_URI to a postgresql connect string eliminated the database lock issues. However, since database connections do not travel across process boundaries, psycopg2 was giving another error:

(psycopg2.OperationalError) SSL error: decryption failed or bad record mac

Resolved this issue by setting 'lazy-apps = true' in uwsgi. (ckan/datapusher#201 (comment))

Another fix that will not just be specific to datapusher, but for other ckan-service-provider clients, would be to have each worker/process have its own Engine by using Engine.dispose().

https://virtualandy.wordpress.com/2019/09/04/a-fix-for-operationalerror-psycopg2-operationalerror-ssl-error-decryption-failed-or-bad-record-mac/

Synchronous jobs via GET

It is necessary, that we have a simple endpoint that lets users add a task via a URL. Also, at this URL, the data should be returned instantly. Basically, it should behave like a static resource.

flask-login and Werkzeug versions are incompatible

flask-login 0.5.0 and Werkzeug 2.1.x are not compatible, due to flask-login using a function that was removed in 2.1.0.

In our situation we are using the CKAN Datapusher library in a docker container, which imports the ckanserviceprovider web module. It started failing to run after a recent build, due to this error at startup:

Traceback (most recent call last):
  File "/etc/ckan/datapusher.wsgi", line 2, in <module>
    import ckanserviceprovider.web as web
  File "/usr/lib/ckan/datapusher/lib/python3.7/site-packages/ckanserviceprovider/web.py", line 13, in <module>
    import flask_login as flogin
  File "/usr/lib/ckan/datapusher/lib/python3.7/site-packages/flask_login/__init__.py", line 16, in <module>
    from .login_manager import LoginManager
  File "/usr/lib/ckan/datapusher/lib/python3.7/site-packages/flask_login/login_manager.py", line 24, in <module>
    from .utils import (login_url as make_login_url, _create_identifier,
  File "/usr/lib/ckan/datapusher/lib/python3.7/site-packages/flask_login/utils.py", line 13, in <module>
    from werkzeug.security import safe_str_cmp
ImportError: cannot import name 'safe_str_cmp' from 'werkzeug.security' (/usr/lib/ckan/datapusher/lib/python3.7/site-packages/werkzeug/security.py)

I've explicitly installed the newer version of flask-login to get around this for now, but hopefully this could be improved in this library?

Use sqlalchemy.engine_from_config

@amercader We were able to switch from sqlite to postgres by simply replacing the sqlalchemy_database_uri default sqlite uri with a postgres uri.

However, if I want to pass more engine configuration settings through sqlalchemy to postgres, its not possible.

Similar to CKAN and the datastore, ckan service provider should use sqlalchemy.engine_from_config as well so users can pass more engine configuration settings through.

CartoDB support

CartoDB is increasingly being used to store, access and visualize geospatial data in the cloud. As recorded in this issue, there is a need for CKAN-CartoDB integration, but with some differences how far this integration should go.

  • I personally would like to be able to add a resource to CKAN via a CartoDB link and ideally have a preview of the data (using the CartoDB API). We have other resource types than CartoDB, so the default service-provider features should remain.
  • The original feature request by @Bu1G is a step further, with CKAN controlling CartoDB as a datastore.

@acouch has developed CartoDB integration for DKAN, replacing the native datastore with CartoDB, but that code cannot be ported to CKAN as the codebase is different (see comment).

Any idea what needs to be done to tackle this?

Note: I'm still new to CKAN and not sure if this is functionality the ckan-service-provider can provide. If not, to which repository should I add this issue?

BUG: APScheduler version needs to be no greater than 3.9.1.post1

If ckanserviceprovider is installed from pypi, it does not work properly as it pulls in v3.10 (released Jan 31, 2023), which causes it to not process jobs properly:

install_requires = [
"APScheduler>=2.1.2,<4",
"Flask>=1.1.1",
"SQLAlchemy>=1.3.15,<1.4.0",
"requests>=2.23.0",
"future",
]

Even if ckanserviceprovider is installed from source it also doesn't work as the requirements.txt file specifies APScheduler 3.9.1, which was yanked:

# Suggested versions
APScheduler==3.9.1
requests==2.27.1
Flask==2.1.1
Flask-Login==0.5.0
future==0.18.2

Proposed solution:

  • set APScheduler>=2.1.2,<3.10.0 in setup.py
  • set APScheduler==3.9.1.post1 in requirements.txt

Version of flask-login is not pinned and update has broken CSP

Flask-login is unpinned in setup.py, and whilst it was on 0.2.11 it was fine. It's now been upgrade to 0.3.0 and a method (current_user.is_authenticated()) is now a property.

Options for fixes:

  1. Pin flask-login to 0.2.11
  2. Pin to 0.3.0 and change is_authenticated() to is_authenticated.

Mismatch between requirement versions in setup.py and requirements.txt

setup.py has requirements unpinned:

      install_requires=[
            'APScheduler',
            'Flask',
            'SQLAlchemy',
            'requests',
            'flask-admin',
            'flask-login'
      ],

requirements.txt has (some) requirements pinned:

APScheduler==2.0.3
Flask==0.9
SQLAlchemy==0.7.8
requests==0.14.1
flask-admin
flask-login

I think they should be pinned and only listed in one place. I don't know how I ended up with APScheduler==3.0.0rc1 which breaks things.

Incompatible with required Flask_login version

Since 0.3.0 is_active and other attributes are no longer functions:

see: https://github.com/maxcountryman/flask-login/blob/master/CHANGES#L77

The actuel required version of Flask_login is 0.5.0 which breaks
ckan-service-provider:

Exception on /user [GET]
Traceback (most recent call last):
  File "/var/lib/ckan/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/var/lib/ckan/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/var/lib/ckan/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/var/lib/ckan/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/var/lib/ckan/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/var/lib/ckan/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/var/lib/ckan/lib/python3.7/site-packages/ckanserviceprovider/web.py", line 327, in user
    'is_active': user.is_active(),
TypeError: 'bool' object is not callable
500 GET /user (172.17.0.1) 1.27ms

See https://github.com/maxcountryman/flask-login/blob/master/CHANGES#L67

setup.py incompatible with CKAN deps

Current CKAN master specifies sqlalchemy 1.1.11, so we get an error when installing this and then CKAN's requirements.txt:

ckanserviceprovider 0.0.7 has requirement SQLAlchemy<1.3.0,>=1.2.7, but you'll have sqlalchemy 1.1.11 which is incompatible.

This is occurred since this PR was merged: #39

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.