Giter VIP home page Giter VIP logo

lakesuperior's People

Contributors

acoburn avatar mbklein avatar scossu avatar whikloj avatar ysuarez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lakesuperior's Issues

Errors with LDP-NR resources

Environment

Operating system: OS X

Python version: 3.6

LAKEsuperior release, branch, or commit #: 1.0.0-alpha12 from pypi

Steps to reproduce

Given a file in file.txt with the content:

this is a file.

And this request:

curl localhost:8000/ldp/ -XPOST -H"Content-Type: text/plain" -i --data-binary @file.txt

Then issue a GET request on the newly created resource.

curl -i <the resource>

Observed behavior

HTTP/1.1 500 INTERNAL SERVER ERROR
Server: gunicorn/19.7.1
Date: Mon, 09 Apr 2018 19:28:41 GMT
Connection: keep-alive
Content-Type: text/html
Content-Length: 291

Expected behavior

The file contents. Or at least logs describing the error (there was nothing of interest in the error logs or in the console of the running application).

Other notes worth mentioning

Requests to the metadata (e.g. /fcr:metadata) respond correctly with RDF.

It is also worth noting that the binary is present on the filesystem in ./data/ldpnr_store

Support PATCH on the root resource

Environment

Operating system: OS X

Python version: 3.6.4

LAKEsuperior release, branch, or commit #: 1.0.0-alpha12 from pypi

Steps to reproduce

  1. Start lakesuperior
  2. PATCH to the root container
curl -XPATCH -i localhost:8000/ldp/ -H"Content-Type: application/sparql-update" --data-binary @sparql.txt

(The same sparql update applied to a non-root resource succeeds)

INSERT {
    <> <http://purl.org/dc/terms/subject> [
       a <http://example.org/Subject> ;
       <http://www.w3.org/2000/01/rdf-schema#label> "A subject" ]
} WHERE {}

Observed behavior

Response:

HTTP/1.1 405 METHOD NOT ALLOWED
Server: gunicorn/19.7.1
Date: Mon, 09 Apr 2018 18:51:11 GMT
Connection: keep-alive
Content-Type: text/html
Allow: HEAD, POST, GET, OPTIONS
Content-Length: 178

Expected behavior

A successful response (e.g. 200 or 204)

Other notes worth mentioning

I notice that PATCH is not included in the Allow header, but the root resource also advertises the Allow-Patch header, so it's a little inconsistent.

Output reports for CLI operations

Key Requirements

  • Direct relevant output to a specific report file for CLI operations that expect it, e.g. migration or integrity checks

Implementation

  • Create configuration option to specify an output file
  • If no output file is specified, a file with a default prefix and a unique serial number is used
  • Direct specific output to this file (e.g. by using a custom log level and handler)
  • Use a machine-parsable format for the output
  • Make this feature available for CLI operations expecting some sort of reporting: check_fixity (not yet implemented), check_refint, migrate.
    • N.B. stats may not need a report file since its output is predictably contained and may be more likely used for piping console output directly

Complete header handling

Alignment with Fedora4 is spotty in regard to header support. The following headers need to be addressed:

  • Accept
  • If-Match (BONUS)
  • If-None-Match (#75)
  • If-Modified-Since (BONUS)
  • If-Unmodified-Since
  • Range
  • Limit (will not be imlpemented; use Prefer headers to either show all children or none)
  • (PUT + POST) Digest

Migration tool

Implement a tool that allows to:

  • Move a Fedora 4 repository (either LAKEsuperior or not) to another one via HTTP
  • Optionally:
    • copying all binaries
    • Creating zero-byte placeholder files

Alpha 8 TODO

  • Slice keys
  • Complete messaging
  • Term query API
  • Better management of versions
  • Complete revert to version and resurrect
  • Write tests for LDP containers
  • Migration tool
  • Complete header handling
    • Accept
    • If-None-Match
    • If-Modified-Since
    • Range
    • Limit
    • (PUT) Digest
    • Others

Complete messaging

Implement messaging using ActiveStreams, which is halfway there.

Delta messaging optional for now if not too much of a hassle.

Occasional MDB_BAD_RSLOT error

Environment

Operating system: RHEL 6

Python version: 3.5.1

LAKEsuperior release, branch, or commit #: 1.0.0a16

Steps to reproduce

Perform a large batch of SPARQL queries on the server. The repository is quite large (~170M triples), however the queries are not very demanding (<1s response time).

Observed behavior

Occasionally the following error is raised:

2018-05-07 17:33:03,437 ERROR flask.app - Exception on /query/sparql [POST]
Traceback (most recent call last):
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/lakesuperior/endpoints/query.py", line 86, in sparql
    out_stream = query_api.sparql_query(qstr, fmt)
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/lakesuperior/api/query.py", line 128, in sparql_query
    with TxnManager(rdf_store) as txn:
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/lakesuperior/store/ldp_rs/lmdb_store.py", line 62, in __enter__
    self.store.begin(write=self.write)
  File "/data/local/lake/lsup_env/lib64/python3.5/site-packages/lakesuperior/store/ldp_rs/lmdb_store.py", line 341, in begin
    self.data_txn = self.data_env.begin(buffers=True, write=write)
lmdb.BadRslotError: mdb_txn_begin: MDB_BAD_RSLOT: Invalid reuse of reader locktable slot

Expected behavior

No error should be raised.

Other notes worth mentioning

This exception comes from the LMDB store and seems to be related to thread handling. See https://www.openldap.org/lists/openldap-devel/201409/msg00001.html

Incorrect content-type in responses

Environment

Operating system: OS X

Python version: 3.6

LAKEsuperior release, branch, or commit #: 1.0.0-alpha12 from pypi

Steps to reproduce

curl -i <any resource>

Observed behavior

The response is Turtle, but the Content-Type header is "text/html"

Expected behavior

The Content-Type header should be "text/turtle"

Enable full provenance tracking

The underlying LAKEsuperior data model allows for a fairly rich provenance tracking. Even more fine-grained information, such as per-statement provenance complete with added and removed triples that allow to build a full log-like provenance trail, should be delegated to a specific subsystem such as the messenger.

Some work has already been done to support full delta logging in the messenger but is not complete. This ticket is to complete that work and write tests for it.

Alpha 10 TODO

  • Full provenance tracking
  • AuthN/Z
    • Authentication
    • WebAC
  • Blank nodes

Term Query API + UI

Provide access to triples method of underlying storage to enable high-performance simple term queries.

Expose via Python API and UI.

UI should have some user-friendly facility to combine multiple term queries via AND.

Correction to instructions README.md

Line 7 Run ./lsup_admin bootstrap to initialize the binary and graph stores

should read:

Run ./lsup-admin bootstrap to initialize the binary and graph stores

Blank node support

Reportedly, some operations on blank nodes are already handled by the underlying RDFLib implementation.

This ticket is to assess how complete this support is and to write tests to verify compliance.

Revisit direct and indirect containment logic

Currently, direct containment is determined by the presence of ldp:membershipResource and ldp:hasMemberRelation predicates, and indirect containment by the additional ldp:insertedContentRelation predicate. In these cases, the ldp:DirectContainer or ldp:IndirectContainer RDF types are inferred.

The logic should work the other way around: the RDF types determine the type of container, and must be explicitly stated; and the membership predicates are given a default object is one is not provided.

Also include support for inverse relationship predicate ldp:isMemberOfRelation.

Allowing to change membership triples in an LDP-DC or LDP-IC is TBD, especially in regard to how to handle already established relationships.

See https://fedora.info/2018/11/22/spec/#ldpdc and https://fedora.info/2018/11/22/spec/#ldpic

Improve version management

Currently versions are quite wasteful because they back up a lot of server-managed properties.

Explore the possibility of only versioning parts of a resource that are meaningful.

As part of this ticket, performance of fcr:versions should also be improved which is curently very slow even in 10K-resource repositories.

Error raised on homepage when installing from wheel

Environment

Operating system: N/A

Python version: N/A

LAKEsuperior release, branch, or commit #: 1.0.0a13 and later

Steps to reproduce

Observed behavior

500 Internal Server Error

Expected behavior

Home page should be displayed.

Root cause

The VERSION file is not included in the wheel distribution. The home page makes use of this to display the revision version. The application raises an error because it can't find the file containing the version number.

Notes

It's worth finding a better place for the version number (e.g .module variable).

Add support for hash URIs as membership resources

Environment

Operating system: OS X

Python version: 3.6

LAKEsuperior release, branch, or commit #: alpha 14

Steps to reproduce

  1. Create a membership resource.
  2. Create a direct container with the ldp:membershipResource pointing to a hash URI on that membership resource.

For example: given a membership resource of http://localhost:8000/ldp/resource the DC would include the triple: <> ldp:membershipResource <http://localhost:8000/ldp/resource#members>.

Observed behavior

When adding child resources to the DC, there are no membership triples generated for the member resource.

Expected behavior

The member resource would contain triples with the DC child resources.

Other notes worth mentioning

You may want to take a look at some of the "bug tracker" examples in the LDP primer: https://www.w3.org/TR/ldp-primer/

Calculate LDP-RS checksum

Calculate the checksum of individual LDP-RS, store them and expose them on the LDP API.

The checksum should be used both for the Digest header (base64-encoded, as per RFC 3230) and for the ETag header.

If-None-Match header support (R+W)

Support If-None-Match HTTP header for GET/HEAD and PUT, as per RFC 7232, section 3.2.

With GET requests, the server will return a 304 Not Modified if the resource ETag corresponds to any the client-provided ETags.

With PUT requests, the server will return a 412 Precondition Failed and the resource will not be updated if a match is found. This can be used in combination with the special * value to prevent an update of a location that contains an existing resource.

Add support for ldp:MemberSubject

Environment

Operating system: OS X

Python version: 3.6

LAKEsuperior release, branch, or commit #: alpha 14

Steps to reproduce

Create an Indirect Container with the triple:

<> ldp:insertedContentRelation ldp:MemberSubject .

Now add child resources to the indirect container via PUT or POST.

Observed behavior

500 Error Response

Expected behavior

200 Response. But more importantly, the membership resource should then be populated with member triples exactly as if the indirect container were a direct container.

Other notes worth mentioning

You may want to read up on the use of ldp:MemberSubject in the LDP spec.

Reformat docstrings

Docstrings have been entered with a format that doesn't help generating automatic API docs.

All docstrings should be reformatted to allow processing with Sphinx.

Link header with anchor parameter

Environment

Operating system: OS X

Python version: 3.6

LAKEsuperior release, branch, or commit #: 1.0.0alpha12 from pypi

Steps to reproduce

Create an LDP-NR as with #47

Observed behavior

The response includes a response header such as:

Link: <http://localhost:8000/ldp/ee7d7990-24d5-41d3-bc3b-0a59cfd2b739/fcr:metadata>; rel="describedby"; anchor="<http://localhost:8000/ldp/ee7d7990-24d5-41d3-bc3b-0a59cfd2b739>"

Expected behavior

I would not expect to see the anchor parameter to be enclosed in <> characters.

That is, I would expect:

Link: <http://localhost:8000/ldp/ee7d7990-24d5-41d3-bc3b-0a59cfd2b739/fcr:metadata>; rel="describedby"; anchor="http://localhost:8000/ldp/ee7d7990-24d5-41d3-bc3b-0a59cfd2b739"

Other notes worth mentioning

https://tools.ietf.org/html/rfc8288 -- this is the specification for web linking (which defines the anchor param)
The ABNF for the anchor param is "URI Reference", as defined by https://tools.ietf.org/html/rfc3986#section-4.1

POSTing to root node omits slash

LAKEsuperior release, branch, or commit #: master

Steps to reproduce

POST http://localhost:8000/ldp/

Observed behavior

http://localhost:8000/ldp0383c3b1-62da-4f94-b9f3-f7c3a778340d

Expected behavior

http://localhost:8000/ldp/0383c3b1-62da-4f94-b9f3-f7c3a778340d

Referential integrity checks

Implement an API method and comand-line utility for checking referential integrity of a repository.

This is critical to verify that a migration produces a consistent result.

Alpha 9 TODO

  • Separate read-only graph/resource generation from RW graph/res generation
  • Fixity checks

Gunicorn worker reboot does not close the LMDB store

When any of the GUnicorn workers reboot (after 256 requests by default configuration) LMDB readers are not released, i.e. the store is not closed properly. These readers eventually accumulate and the application quits when the max readers limit (126 by default) is reached.

Complete revert from version and resurrect

Reverting to version has very light test coverage and has apparently been broken from previous refactorings.

The "resurrect" feature, which is reverting a tombstone to the latest active version, should also be completed as part of this.

Fixity checks

Implement fixity checks in Python API, REST API and CLI.

AuthN/Z

  • Support basic authentication and sessions
  • Support "on behalf of" mode for Python API
  • Implement WebAC

Support content negotiation

Environment

Operating system: OS X

Python version: 3.6

LAKEsuperior release, branch, or commit #: 1.0.0-alpha12 from pypi

Steps to reproduce

curl -i <any resource> -H"Accept: application/ld+json"

Observed behavior

The response is Turtle

Expected behavior

The response should be JSON-LD

Other notes worth mentioning

LDP requires that servers support JSON-LD serializations (I personally see no reason to support RDF+XML). And if you are already supporting Turtle, N-Triples tends also to be really easy to support.

Cannot add children to direct containers

Environment

Operating system: Various

Python version: N/A

LAKEsuperior release, branch, or commit #: 1.0.0a13

Steps to reproduce

Create a direct container with PUT (currently POST is unavailable due to #56) and attempt to insert child resources.

Observed behavior

(from @acoburn 's post)

Once the direct container is successfully created, I am unable to create child resources in it (neither via PUT or POST). This is the response:

"The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."

But there is nothing in any of the logs, other than these lines in the access log:

127.0.0.1 - - [13/Apr/2018:11:24:50 -0400] "POST /ldp/dc4 HTTP/1.1" 500 291 "-" "curl/7.54.0"
127.0.0.1 - - [13/Apr/2018:11:35:11 -0400] "PUT /ldp/dc4/child1 HTTP/1.1" 500 291 "-" "curl/7.54.0"

Expected behavior

Resources should be created under the container with the correct relationships.

Other notes worth mentioning

Indirect containers might exhibit similar behavior. Needs testing.

./lsup-admin bootstrap fails

Following README.MD instructions at line 7, when running the command:

./lsup-admin bootstrap

Fails with the following error for me

Reading configuration at /root/lake/lakesuperior/etc.defaults
Traceback (most recent call last):
File "./lsup-admin", line 7, in
import lakesuperior.env_setup
File "/root/lake/lakesuperior/lakesuperior/env_setup.py", line 2, in
from lakesuperior.globals import AppGlobals
File "/root/lake/lakesuperior/lakesuperior/globals.py", line 6, in
from lakesuperior.dictionaries.namespaces import ns_collection as nsc
ImportError: No module named dictionaries.namespaces

LDP-RS creation with POST and Turtle payload results in a LDP-NR.

Environment

Operating system: various

Python version: N/A

LAKEsuperior release, branch, or commit #: 1.0.0a13

Observed behavior

(from @acoburn 's email:)

[...] issue that I ran into related to trying to create a direct container via POST (PUT seems to work correctly). This is the file I'm using:

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ldp: <http://www.w3.org/ns/ldp#>

<> dcterms:title "Direct Container" ;
    ldp:membershipResource <http://localhost:8000/ldp/member> ;
    ldp:hasMemberRelation dcterms:relation .

(assume the pre-existing presence of /ldp/member)

Whenever I try to create a new resource via POST, the new resource is stored as an LDP-NR. Here are the various commands I've used:

curl -i localhost:8000/ldp -XPOST --data-binary @dc.ttl
curl -i localhost:8000/ldp -XPOST --data-binary @dc.ttl -H"Content-Type: text/turtle"
curl -i localhost:8000/ldp -XPOST --data-binary @dc.ttl -H"Content-Type: text/turtle" -H"Slug: dc"
curl -i localhost:8000/ldp -XPOST --data-binary @dc.ttl -H"Content-Type: text/turtle" -H"Slug: dc" -H"Link: <http://www.w3.org/ns/ldp#DirectContainer>; rel=\"type\""

In every case, the new resource is an LDP-NR. Though when I use equivalent commands to create a resource via PUT, everything works correctly. This same issue seems to apply to regular RDF source/container resources as well: they are created correctly as an LDP-RS under PUT but not under POST.

Expected behavior

Containers should be created with POST the same way they are created with PUT.

Other notes worth mentioning

This may be the case with indirect containers as well. Needs testing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.