packed-vzw / resolver Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 3.0 698 KB

The Resolver application is a tool for creating, managing, and using persistent URIs.

License: Other

Python 74.18% CSS 0.64% JavaScript 2.49% HTML 20.95% Shell 1.39% Dockerfile 0.34%

resolver's People

Contributors

Stargazers

Watchers

Forkers

pieterdp digifaps jnuyens

resolver's Issues

Configuration form does not save the Logo URL value

1/ Log into the resolver as an administrator
2/ Go to http://resolver.domain/resolver/settings (settings page)
3/ Add an URL to the logo (ie. http://vlaamsekunstcollectie.be/images/header3_nl.gif)
4/ Click on save
5/ You will be redirected to the form. In chrome, the value is apparently saved: it's still filled out in the input field, however no green notification message saying that the form was actually saved.
6/ Navigate away from the page (ie. click on entities)
7/ Go back to settings (click on Settings in the menu)
8/ Observe how the field Logo URL is blank as if nothing was ever filled out.

=> I'm seeing the same behaviour with the field "default notice"

If I add something like "test" => the value will be saved (green notification tells me so)
If I delete the input => the "test" value is not removed or overwritten
If I amend the input like "testtest" => value will be updated.

Required to remove the URL if you want to disable a PURI

Duplicate of #9

Supervisor installation instructions throws errors

Clean install on a vagrant Ubuntu box. I've created a 'resolver' user and installed the resolver under /home/resolver/resolver

Following https://github.com/PACKED-vzw/resolver/blob/master/INSTALL.md with these settings:

/etc/supervisor/conf.d/resolver.conf:

[program:resolver]
directory = /home/resolver/resolver
command = bash supervisor/start_server.sh
user = resolver
autostart = true
autorestart = true

/home/resolver/resolver/supervisor/start_server.sh:

##
# Configuration settings (EDIT THESE)
##
RESOLVER_USER="resolver"
RESOLVER_NAME="resolver"
PROXY_NAME="127.0.0.1"
PROXY_PORT="8080"
RESOLVER_DIR="/home/resolver/resolver"

This is the error when I try to start supervisor:

resolver: ERROR (abnormal termination)

/var/log/supervisor/resolver-err.log contains:

gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
Traceback (most recent call last):
  File "/usr/local/bin/gunicorn", line 11, in <module>
    sys.exit(run())
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/wsgiapp.py", line 74, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/base.py", line 185, in run
    super(Application, self).run()
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/base.py", line 71, in run
    Arbiter(self).run()
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 196, in run
    self.halt(reason=inst.reason, exit_status=inst.exit_status)
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 292, in halt
    self.stop()
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 343, in stop
    time.sleep(0.1)
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 209, in handle_chld
    self.reap_workers()
  File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 459, in reap_workers
    raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>

So far, I could isolate the error to the venv environment not being booted correctly:

if [ ! -d "$c_dir""/""$server_name""/bin" ]; then
    virtualenv "$server_name"
    . "$server_name""/bin/activate"
else
    . "$server_name""/bin/activate"
fi

Not quite sure what's wrong with them. If I try to execute them directly from the shell, I get the same error.

su resolver
cd /home/resolver/resolver
bash supervisor/start_server.sh

Then again, this command doesn't give me any trouble at all (save for a little typo in that file: timout should be timeout!)

su resolver
cd /home/resolver/resolver
bash run_gunicorn.sh

Acceptable data formats are hardcoded in the DB

The acceptable data formats for the Data document type are hardcoded in the model, and thus also in the database (as an enum).

More flexibility is required, so the acceptable types can be set in a configuration file, or removed. It doesn't really matter which type a data document has.

POST accepts form-data/x-www-form-urlencoded, but PUT only JSON

A POST request to the API only accepts form-datavalues in the body, but PUT requests require JSON.

Expected: either both of them require JSON, or both require form-data. To prevent backwards-incompatibility, proposal is to add support for form-data to PUT.

Import breaks entirely with a SQL foreign key constraint violation

I'm trying to import 4.000 rows of data. The previous version of resolver completed the import successfully, importing all 4.000 rows of data.

I just updated the test installation I'm using to v1.5 of the resolver. Importing fails after 49 imported entities with this error:

(IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`resolver`.`document`, CONSTRAINT `document_ibfk_1` FOREIGN KEY (`entity_id`) REFERENCES `entity` (`id`) ON DELETE CASCADE ON UPDATE CASCADE)') 'INSERT INTO document (entity_id, enabled, notes, url, type) VALUES (%s, %s, %s, %s, %s)' ('1923-AE-49+50', 0, '', None, 'data')

I've tried dropping the entire database, creating a new database and initialising it via intialise.py but still the import fails with the error above.

I will send the offending CSV file by mail. This issue is a reference.

Branching and creating formal releases

Currently, there are 4 active git branches active:

master
docker
flask-sqlalchemy
link-templates

We should take care that development can be tracked in a clear and consistent manner by all stakeholders. This includes: (outside) developers, system administrators, institutions relying on the software,...

This way, we can announce/push upgrades/updates without breaking older installations.

Here are a few proposals:

1/
the 'master' branch contains last commits. I propose to use this branch for main line development (merging pull requests, fixes, etc.)

2/
Feature branches should be merged back in the 'master' branch once development is finished.
Remove remotely pushed feature branches once they have served their purpose to avoid confusion.

ie. are we able to remove the 'link-templates' branch or is it still in use?

3/
We should tag and create formal releases once the software hits a milestone (fixed a number of critical bugs, improvements, etc.)

We should git tag the code in the master branch.
We should create Github releases based on git tags
We should keep a CHANGELOG
We should use semantic versioning when naming and referencing releases.

Given that the software is already starting to be used, we should move ahead with the actions in point 3.

What are your thoughts?

Reset password form is not complete

The functionality of the 'reset password' form is not complete. Adding a user requires you to:

Confirm your password.
Restricts you password to a string >7 and < 64 characters.

The latter requirements is especially important. The current reset form allows you to set a password of < 7 characters. But those passwords are - apparently - invalid and won't let you log in:

How to reproduce:

1/ Create a new user with a password > 7 characters
2/ Reset the users password
3/ Use a password < 7 characters (ie. foobar)
4/ Logout
5/ Try to login => notice that you will be redirected to the signin form and that you don't get any warning or error messages.

Performing requests to the api (api/login) throw an 400 Bad Request Error

When performing a login ($server/api/login) request, the following error is returned:

HTTP/1.1 400 BAD REQUEST
Date: Mon, 14 Sep 2015 13:53:33 GMT
Server: gunicorn/19.1.0
Content-Type: text/html
Content-Length: 148
Connection: close

<title>400 Bad Request</title>

Bad Request

CSRF token missing or incorrect.

Username & password are correct (but changing them does not make a difference). According to the source code, the API is csrf-exempt, but there still seems something wrong.

API: Add entities/documents in bulk

Might be a good idea to extend the API to allow users to create or even update entities and/or documents in bulk.

Amended web server configuration

When you navigate to a page on domain.be (ie. domain.be/about) you should see a page published on the website of the institution. The collection/* and resolver/* directories should point to the resolver application.

At this point, I'm looking at 2 different approaches:

Permanent redirecting

Webserver A listening on domain.be forwards all traffic for those urls to a completely different domain (ie. resolver.domain.be) where a webserver B is configured per the resolver documentation.

This can be achieved by adding several 301 redirect rules in the configuration of webserver A.

ex. http://vlaamsekunstcollectie.be/collection/work/id/0000_GRO1270_II

Amended web server configuration

Webserver A listening on domain.be and handles all traffic but the configuration splits out the requests:

All website traffic will be routed to the document root where the CMS is hosted.
Resolver URL's will be processed via the proxy configuration and passed on to gunicorn.

In this case, no second domain with 301 redirects are involved. This would be the preferred setup.

It took me a bit to figure out how to configure the latter situation, so as a reference: this is a working NGinX configuration:

server {
    listen 80;

    # Make site accessible from http://domain.be/
    # This is a catch-all domain configuration.
    # see: http://nginx.org/en/docs/http/server_names.html#miscellaneous_names
    server_name _;

    location /resolver {
        proxy_pass      http://127.0.0.1:8080;
        proxy_redirect      off;
        proxy_set_header    Host        $host;
        proxy_set_header    X-Real-IP   $remote_addr;
        proxy_set_header    X-Fowarded-For  $proxy_add_x_forwarded_for;
        proxy_read_timeout  300s;
    }

    location /collection {
        proxy_pass      http://127.0.0.1:8080;
        proxy_redirect      off;
        proxy_set_header    Host        $host;
        proxy_set_header    X-Real-IP   $remote_addr;
        proxy_set_header    X-Fowarded-For  $proxy_add_x_forwarded_for;
        proxy_read_timeout  300s;
    }

    location /static {
        proxy_pass      http://127.0.0.1:8080;
        proxy_redirect      off;
        proxy_set_header    Host        $host;
        proxy_set_header    X-Real-IP   $remote_addr;
        proxy_set_header    X-Fowarded-For  $proxy_add_x_forwarded_for;
        proxy_read_timeout  300s;
    }
    location / {
        # process all regular requests and pass them on to the CMS
    }
}

This configuration could probably be optimised.
We should add this to the documentation.

Large imports break the import functionality

Description
Trying to import a large CSV file (6600 rows / 1.2Mb) breaks the importer

Steps to reproduce

Create a large import file via Open Refine
Navigate to the import interface
Import the file in the resolver

Actual behavior

The browser redirects to a 502 or 504 error
The file is only imported partly
There are no log files allowing me to see if the error is triggered by a particular row in the CSV or because the process times out.

Expected behaviour

A correct import and an redirect to a "success" landing page with a terse report of what happened.

Proposed resolution

Setting the --time-out flag on the gunicorn process to 240 seconds does some good: The CSV file gets imported entirely. However, I still see a 504 timeout message in the browser and no reports or log files.

Investigate other ways to import large CSV files.

Bulk delete entries impossible

Version 1.5 introduced a change in importing entities. Instead of clearing the entire database and importing all files, the import entails an incremental update.

Existing records are updated and new records are added. However, there is no way to delete old records in bulk. You can only delete them manually via the interface.

=> Deleting say 50 items in a list of 12.500 items becomes a chore since you have to filter one by one by PID.
=> There is no way to clear the entire database of all the records from the UI (use case: you made a serious mistake in your import and you need to start over)

Suggestion 1: Add a "purge" button that allows you to purge the entire database.
Suggestion 2: Add a form with a single text area element that allows you to input a list of PID's - delimited by newlines or breaks - that should be purged from the database.

Document import script

We should document the usage of the import script.

Steps I had to figure out before I could use this:

Login as the resolver user (user dedicated to managing the installation or with appropriate permissions)
Make sure you have the import file somewhere on your server (SFTP or otherwise)
Go to the resolver installation location (cd into it)
Activate venv `. servername/bin/activate```
Execute the command `python import_csv.py <absolute_path_to_csv_file>``
Wait
You should see a small summary when the import is finished.

Object ID collisions when dealing with data from multiple institutions?

Problem

Our resolver contains persistent URI's pointing to objects coming from multiple institutions. Each institution has its own format/scheme of identifying objects via a unique ID.

The ID is unique within the domain of that institution. However, if subsets originating from different domains using similar identification schemes (ie. incremental numbering) then a collision between ID's is possible.

Example: Work A in institution Z is identified as 001 while Work B in institution Y is also identified as 001.

How to reproduce

Create a CSV with a 2 object entries
Make sure each object has the same object identifier
Import the CSV in the resolver

Observed behaviour
Notice how only 1 entry is created.

Expected behaviour
A correct import with 2 different entries both containing active persistent URI's (data & representation).

Resolution
Introduce a "domain" or "namespace" property in the datamodel. This could be used to encapsulate subsets using similar identification schemes.

I'm proposing the generic "domain" label instead of "institution" to make this property as flexible as possible. This way, it remains easy to create subsets within an institution which use the same identification schemes (ie subcollections with use the same numbering format)

Impact
The persistent URI itself will need to be modified to include the "namespace" or "domain" property.

Settings page still says 1.5 although upgraded to 1.5.1

When you upgrade the resolver from version 1.5 to 1.5.1, the version tag in the "settings" page still says "1.5" instead of "1.5.1". So, it's unclear whether or not an upgrade was successful.

Cause: the VERSION constant in resolver/init.py has not been adjusted prior to packaging and releasing.

This is a manual action, so easily forgotten. Packaging and releasing is a process that can easily be automated - including updating this constant - avoiding these issues.

Resolver API does not work in debian package

Feature request: omgaan met meertaligheid?

In de laatste weken hebben we data voorbereid voor opvoer in de VKC resolver. Ik heb een lijst opgesteld in Open Refine van alle werken die momenteel gepubliceerd staan op onze thematische websites op basis van de sitemap.xml files die op elke website gepubliceerd worden. Daarbij publiceren we per object als data URL de HTML detail pagina en als representatie URL's de thumbnail en de zoom variant van elke item.

Een open vraag blijft het omgaan met meertalige gepubliceerde data voor hetzelfde object binnen dezelfde organisatie.

We 'verrijken' de nederlandstalige beschrijvingen in Collective Access door er een engelse vertaling van relevante velden toe te voegen. Op onze thematische websites wordt die informatie gebruikt om voor elk object zowel de nederlandse als de engelse beschrijving weer te geven:

vb.
http://vlaamseprimitieven.vlaamsekunstcollectie.be/nl/collectie/feestmaal
http://vlaamseprimitieven.vlaamsekunstcollectie.be/en/collection/banquet

Aangezien je aan in de resolver per entity slechts 1 data URL kan koppelen, moesten we een keuze maken welke variant we wilden publiceren. Wij (VKC) kozen voor de EN versie omdat de gepubliceerde data dan ook buiten ons taalgebied bruikbaar is.

Blijft de vraag of we de NL variant eveneens zouden kunnen publiceren.

Daarom ben ik even op zoek gegaan naar een voorbeeld waar dit wordt gedaan:

Er werd op europees niveau een pilootproject uitgevoerd waarbij er een resolver service werd opgezet: http://uri.semic.eu/
Datasets van een aantal entiteiten binnen de Europese Commissie werden gekoppeld in de piloot resolver.

Data van de Publications Officie of the EU is een voorbeeld waar men moest omgaan met publicaties die in meerdere talen werden uitgegeven:

http://uri.semic.eu/#PO

Concreet wordt er aan de identifier nog extra contextuele informatie geplakt zoals de taal. Voorbeelden:

http://publications.europa.eu/resource/oj/JOC_2013_004_R.ENG
http://publications.europa.eu/resource/oj/JOC_2013_004_R.NLD

Daarnaast wordt er ook verwezen naar een Named Authority List voor de bruikbare set van taal suffixen:

http://publications.europa.eu/mdr/authority/

Meer achtergrondinformatie in deze presentatie: www.slideshare.net/stijngoedertier/towards-a-persistent-uri-service-for-eu-institutions-a-proofofconcpet

Ik vroeg me het volgende af: Kunnen we iets gelijkaardigs doen in onze resolver service?

Technisch staat er niets in de weg om vandaag dergelijke PURI's aan te maken via de resolver:

vb.
http://vlaamsekunstcollectie.be/collection/work/id/0000_GRO0023_I.ENG
http://vlaamsekunstcollectie.be/collection/work/id/0000_GRO0023_I.NLD

(URI's nog niet actief!)

Daarbij wel volgende bedenkingen:

=> Het toevoegen van de contextuele informatie hoort gestructureerd gebeuren.
=> Horen we eveneens gebruik maken van een soort Named Authority List om de taal prefix vast te leggen?
=> PURI's moeten uiteindelijk terug stromen naar de collectiebeheersystemen: de PURI velden in de object beschrijvingen horen daar ook taalgevoelige PURI's te ondersteunen.

Wat is jullie visie over het publiceren van PURI's voor meertalige beschrijvingen van objecten?

Ik heb het eindrapport PIDS er even bijgenomen op blz. 22 wordt er een voorbeeld gegeven van genormaliseerde data waarbij de zoekopdracht "La table du jardin" meerdere, anderstalige, resultaten opleverde door de demonstrator. Daarbij is het wel zo dat die allemaal verschillende organisaties worden gepubliceerd (MSK, Lukas & CVG) Er wordt in het rapport verder geen vermelding gemaakt over taalgevoelige PURI's.

Met vriendelijke groeten,
Matthias Vandermaesen

Display version on administration pages

Currently, the UI doesn't tell you which version of the resolver is installed.

For update/upgrade purposes, it would be useful to display the installed version of the resolver on the administration pages and/or the statistics page.

If anything, let's not display the version on any public facing page (login page, etc.) for security/safety reasons (never disclose version number of your software package!)

Importing disabled URL's is impossible

Problem

I'm trying to import an object with 3 persistent URI's : 1 data URL + 2 representation URL's. The representation URL's should not be enabled in the resolver. When a visitor tries to access them, they should see the note as defined in the "notes" column of the import file.

However, when trying to generate a "disabled" persistent URI via the import file, the resolver will still allow access to the referenced resource.

How to reproduce

Create a new import file
Add 1 object with 1 representation URL (jpg) that should be referenced.
Set the "enabled" column to "0" for the resource URL
Set the "notes" column to a custom note for the affected row
Import the file in a resolver instance
navigate to http://yourresolver/collection/work/representation/OBJECT_ID

Actual result

You are redirected to the resource by the resolver.

Expected result

You should see a page served by the resolver with the message as defined in the "notes" column of the import file.

Downloadable log after import

Problem:

I'm importing a file with 5046 rows (1682 objects). The import is completed successfully but the stats tell me that there are only 1666 objects in the resolver.

There is no way to tell what happened to the 24 missing objects.

How to reproduce:

I've checked the import file and discovered that there are 24 rows that have duplicate PID's. The PID should be unique, however, sometimes an object can be published on multiple websites:

ex.
PID: 100:
http://vlaamseprimitieven.vlaamsekunstcollectie.be/en/collection/the-story-of-saint-didacus-of-alcala-0
http://barokinvlaanderen.vlaamsekunstcollectie.be/en/collection/the-story-of-saint-didacus-of-alcala-9

However, it's possible that this not the only reason why on object wasn't imported.

Actual result:

On the surface, the importer fails to import all the rows in the file and fails to report why.

Expected result:

If you're importing a limited number of items, it's easy to spot duplicates. If you're importing 1682 objects, the resolver should give you a report of duplicate rows which were not imported.

Everyone can change each others password

Referring to #43:

The password reset field allows you to change the password of a user. However, the form does not check if the user is changing their own password, or someone else's password. As a regular user, it is even possible to change the password of the "admin" account, locking the admin out of the system.

The password reset field should have an extra field: Current password. Users have to enter the current password besides the new passwords of the account they want to update. This restricts them from changing other people's password.

Exception: the admin user does not see / does not has to fill out the "current password" field => they can change the passwords of all accounts.

Once #43 lands, this should be next thing to tackle re: reset password form

Unicode import not succesful

When importing a UTF-8-encoded (by PHP) CSV-file (https://drive.google.com/file/d/0B_yiyeb5zhbCTVdobHRhR3pTc2M/view?usp=sharing), the following error is returned:
" 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128) "

User rights

User rights must be supported.

Large dataset causes sluggish overview UI

Description

The overview page takes some time to load when a large set (ie 2200 objects) was imported;

Steps to reproduce

Create a large import file via Open Refine
Navigate to the import interface
Import the file in the resolver

Actual behavior

Slow loading of the page. You have to wait before the pagination pager becomes active/clickable.

Expected behaviour

A usable interface;

Proposed resolution

TBD

Format column empty results in opaque error message

Problem:

If you try to import a CSV with a record that has an empty "format" property (null value), the resolver will break and return an opaque "Something went terribly wrong!" error message instead of a verbose message telling me what went wrong and why.

This makes it impossible for a data publisher to debug the CSV import file and correct the offending rows.

I could debug this by logging in via SSH into the server, opening up application.log and finding this error:

[ERROR] 2015-08-04 15:38:31,853 -- Incorrect data format
Traceback (most recent call last):
  File "/home/resolver/resolver/venv/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/resolver/resolver/venv/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/resolver/resolver/venv/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/resolver/resolver/venv/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/resolver/resolver/venv/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/resolver/resolver/resolver/controllers/user.py", line 18, in inner
    return func(*args, **kwargs)
  File "/home/resolver/resolver/resolver/controllers/csv.py", line 95, in admin_csv_import
    notes=record[6])
  File "<string>", line 4, in __init__
  File "/home/resolver/resolver/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 260, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/home/resolver/resolver/resolver/model/data.py", line 27, in __init__
    raise Exception("Incorrect data format")
Exception: Incorrect data format

How to reproduce

Create an import file with 1 entry
Leave the format property of the record empty (no html, json, etc.)
Notice how the import process breaks.

Cause:

The Data class in data.py does raise an exception with a message "Incorrect data format" (line 27), but the error is not reflected in the UI.

The exception also breaks the import process meaning that only a partial import is executed.

Resolution:

Skip/Ignore the offending row and let the importer finish the rest of the import instead of breaking all together
Add rows where required fields are empty or with offending data types to a (downloadable) error log that is presented to the data publisher via the UI after the import ends.

I can't remove a representation from the interface

Problem

When an entity has 2 representations, I can't delete the non-reference representation.

How to reproduce

Import an object CSV with 1 data URL & 2 representation URLs
Navigate to the edit form for the newly created entity in the resolver
Click on the non-reference representation URL
Click on the "delete" button

Actual result

I see an error message "Something went terribly wrong"
When I go back to the edit form of the entity, the representation URL was not removed.

Expected result

The representation URL was sucessfully removed from the system.

Importmodule herdenken

Complexere importbestanden ondersteunen (fail, update etc.)

Wrong resolving from inside the application to a strange syntax

When you click on created links from inside the Resolver application environments, like the entity overview page:

or the landing page of the workPID http://domain/collection/work/id/identifier:

you get referred not to the previously assigned resource (an html page), but to a strange link with syntax like http://domain/resolver/entity/domain/collection/identifier
and so you get a 'Page not found' as a result

The persistent URI's for Data http://domain/collection/work/data/identifier and Representation http://domain/collection/work/representation/identifier work perfectly on their own.

Issue when importing datasets

There is an issue when importing datasets (CSV files) when the same records already exist in the database.

The import code appends a new document to an existing entity without checking whether said document already exists.

Expected behaviour:
When a document with the same URL already exists for an entity, the resolver should not append the same document again.

heroku install instruction failure

Hi, just followed the Heroku install instructions.
The line 'echo "\npsycopg2" >> requirements.txt' does not work.
For me the following did work:
echo "psycopg2==2.5.4" >> requirements.txt

Installation requires changes in code (start_server.sh)

Per the installation instructions:

Edit the start_server.sh script in the supervisor-directory and update the following configuration settings:

This means you have to alter the code of the application in order to configure it for your environment.

Such a setup is pretty much unmaintainable for application managers. More so if they are using automated tools to roll out updates or changes to the code. Chances are that this file will be overwritten on future updates.

Conceivable, a maintainer could exclude this file from receiving (automated) updates. But a future update contains a code change / bugfix in this file, the maintainer is required to manually edit / integrate that change.

Suggestion: Since it's a bash script: use environment variables instead and use add a ~/.resolver/config file of sorts. It's up to the application manager to include the configuration in their dot files so the run_server.sh script picks it up automatically.

API: session cookie only?

The API contains several data-altering calls (Create/Update/Delete) over HTTP. These calls need to be secured so that only authenticated / authorized users can access them. We don't want unauthorised users to alter the database.

The current implementation solely relies on session cookies to authenticate a user. This is a security hazard as it leaves the application still wide open for XSRF (Cross Site Request Forgery) aanvallen.

See: https://miki.it/blog/2015/8/10/put-io-api-design-issues/

At this point, I could construct a HTML Form with a "PUT" on the "/api/entity/1234" HTTP url on another website. If I redirect a logged in user to that website, he would automagically alter PID 1234 as soon as (s)he submits the form.

Worth considering to crank up security by using something akin to https://gregorynicholas.github.io/flask-xsrf/

Support multiple institutions in one resolver

Support multiple institutions, generating their own PURLs using a specific subdomain in one resolver installation.

Feature request: Export all Persistent URI's from the resolver

Problem

A related data publisher wants to import objects with data & representation URL's which reference resources created by our organisation (website). We don't want their persistent URL's to directly resolve to the website resources.

Instead, their resolver should point to URI's generated by our resolver.

To be able to do this, we need to be able to export a list of relevant persistent URI's generated by our resolver.

Request

The resolver does not have the possible to export lists of persistent URI's. I would like to have a button which allows me to:

Export a CSV list of all the persistent URI's in the system
Each record has these columns
- PID
- entity type
- persistent URI
- enabled
- notes
- reference
- order

Provide an 'original id' column in the export

The CSV export has a column 'PID' which contains the transformed object ID's so they can be easily used in the persistent URL's.

The exported data can be used for common household tasks outside the resolver ie. match against a subset of ie. 20 items to check if they were already added to the resolver.

Problem is that because of the transformation, the PID does not match with the object id.
Ex. 0000.GRO1341.I vs 0000_GRO1341_I

You need to transform one of the ID's in your queries before you can match records. Most query languages don't support REGEX replace functionality so you need to do a manual transform

Solution would be to store the original ID's and return them in the export, perhaps side-by-side with the transformed version.

PID generation rules?

I'm trying to understand how the resolver is transforming an identification number into a valid PID. Or better: what rules the resolver is following to perform the transformation.

I decided to look at the resolver code directly, but lacking enough inline documentation, several parts are a bit of a mystery to me.

This function in resolver/util.py is responsible for the transformation:

_clean_re = re.compile(r'[\t !"#$%&\'()*/<=>?@\[\\\]^`{|}]+')
def cleanID(ID):
    patterns = [
        # Exceptions
        ('- ','-'),(' -','-'),('\)+$',''),('\]+$', ''),('\°+$', ''),
        # Simple replacements
        ('\.','_'),(' ','_'),('\(','_'),('\)','_'),('\[','_'),('\]','_'),
        ('\/','_'),('\?','_'),(',','_'),('&','_'),('\+','_'),('°','_'),
        # Replace 1 or more underscores by a single underscore
        ('_+', '_')]
    partial = reduce(lambda str, t: re.sub(t[0], t[1], str),
                     patterns,
                     ID)
    # For safety, let's give it another scrub.
    result = []
    for word in _clean_re.split(partial):
        result.extend(unidecode(word).split())

    return unicode(''.join(result))

The first part is pretty much clear. The ID string is run through a reduce function. On each iteration, a pattern matching function is applied and any matches are replaced with an underscore.

Sidenote: I was wondering why you don't do just this, instead of using a reduce map function. I assume the latter approach iterates and applies sub for each pattern in patterns which seems far more expensive then just applying sub twice.

partial = re.sub([\]+\)+\\etc],'_', ID)
partial = re.sub('_+','_', partial)

The second part is a bit more mysterious:

    # For safety, let's give it another scrub.
    result = []
    for word in _clean_re.split(partial):
        result.extend(unidecode(word).split())

    return unicode(''.join(result))

Seems like the processed partial is first split with the pattern defined in re.compile but that pattern contains characters like (back)slash which have been converted in the first part of the function, so why are we trying to match them again here?

Then each word is run through some weird unidecode and split operation, pushed onto an array, joined again and again munched by an unicode function => why? what is the purpose of these 4 lines of code?

Add support for JSON to POST entity

Add support for JSON-data to the POST-call for entities.

Export yields a file named badrecords.csv

I just tried to export a list of persistent URI's from a resolver. I got an export file titled "badrecords.csv". The file does contain all 5100 entries which exists within the resolver, so that seems to be correct.

Suggestion: an export file should be called "export" and contain a timestamp of the export.

ie. export_230920151534.csv (exported 23 september 2015 at 15:34)

Version 1.5.0-1?

I noticed you just created a new version release with no documentation or release notes.

Per the semver (http://semver.org/) conventions:

A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92.

Is this a pre-release? What is the rationele behind doing a pre-release at this point?

Make 'Add entity' button available on the 'edit' entity form

Flow:

Go to resolver.be/resolver/entity
Click on 'Add entity'
Fill out the 'ID' and 'Title'
Click on 'Add' to add the entity
You will be taken to resolver/entity/OBJECT_ID

Now if you want to add a new entity, you have to:

Click on 'Entities'
Wait for the entire list to load (if you have 5.000 entries, this could take a while)
Pickup the previous flow at step 1.

Now imagine doing this for i.e. 15 entities.

error change pasword

Dag Bert,

Zoals beloofd hierbij een mailtje met de melding dat ik mijn paswoord
niet kan wijzigen via het menu Users. Wanneer ik dit probeer, krijg ik
de volgende melding:

Bad RequestCSRF token missing or incorrect.

Groeten,
Guenevere

UnicodeEncodeError bij imports

Some kind of issue with the Unicode encoding/decoding during imports:

Error: Entity 1911-F: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 83: ordinal not in range(128)

Can't remove entities from the entity overview list

How to reproduce:

1/ Log in into the resolver
2/ Go to /resolver/entity (overview of all entities)
3/ Try to click on the 'x' removal button next to an existing entity.
4/ Notice how nothing happens.

Tested in: Chrome & Safari.

Editing a user: form is undefined

When trying to edit an existing user, I get a "form is undefined" error:

1/ Login into the resolver
2/ Go to /resolver/user
3/ Create a new user called 'MyUser'
4/ Click on the 'test' user name in the list of users.
5/ You will land on a page with the URL 'resolver/user/MyUser'
6/ Notice the Server error: "form is undefined" error.

Critical issue since this prevents editing existing users including the default "admin" user.

Typo in run_gunicorn.sh

exec gunicorn -w 4 -b 127.0.0.1:8080 resolver:wsgi_app --timout 900 --graceful-timeout 900

should be:

exec gunicorn -w 4 -b 127.0.0.1:8080 resolver:wsgi_app --timeout 900 --graceful-timeout 900

Otherwise you get a "timout parameter not recognised" error.

Debian package: location of initialise.py?

From the installation document:

Execute the following command to create the default administrator user
python /usr/share/resolver/bin/initialise.py

However, in the source code initialise.py is not in a bin/ folder but in the root folder of the project.

Why is there a difference?

packed-vzw / resolver Goto Github PK

resolver's People

Contributors

Stargazers

Watchers

Forkers

resolver's Issues

Bad Request

Recommend Projects

Recommend Topics

Recommend Org