cygri / pubby Goto Github PK

View Code? Open in Web Editor NEW

89.0 89.0 44.0 18.52 MB

A Linked Data frontend for SPARQL endpoints

Home Page: http://www4.wiwiss.fu-berlin.de/pubby/

License: Apache License 2.0

Java 97.92% JavaScript 0.83% CSS 1.26%

pubby's People

Contributors

Stargazers

Watchers

pubby's Issues

Missing conf:projectHomepage causes NullPointerException

Reported on Pubby 0.3.3. This should be handled more gracefully, as there certainly can be datasets without a homepage.

Various issues with custom metadata

This is a catch-all issue for Pubby's custom metadata features.

There are four ways how metadata can end up in Pubby-served representations:

DataURLServlet and ValuesDataURLServlet each add some hardcoded metadata triples (primaryTopic, label); this ends up in the RDF variants only.
conf:rdfDocumentMetadata can be defined on each dataset to add some custom properties that will be asserted about the document in the RDF variants. It supports triples with fixed predicate and object only.
conf:metadataTemplate can be defined on each dataset to add a metadata graph based on a flexible template, where various “magic” IRIs in the template are replaced with values provided by the system. The generated triples show up in the RDF representations, and as a separate metadata table on the HTML representations.
The generated HTML pages contain some “metadata” that is coded in the header and footer templates: site title, page title, links to RDF variants, link to SPARQL endpoint, link to RDF browsers.

These are quite redundant. Ideally, there would be a single mechanism.

For 2. and 3., the specification of metadata happens on the dataset level. This decision was made because different data sources may have different metadata (provenance, creator, etc.). But there are some issues with this:

In the easiest case, one may want to simply specify metadata on the configuration level, for example a license triple. This is currently not supported. Simple things should be simple, hard things possible.
Pubby should only add metadata for a given dataset if that data source actually contributed to the result. Currently, all metadata from all datasets is always added to the response. Changing this is difficult because the distinction happens deep within some DataSource implementation, and at the point where we deal with metadata (in the servlets) it is no longer easily visible.

There are a number of other issues:

There is an ugly hack where the metadata code tries to get hold of the query that was used to describe the resource. This is not thread-safe and turns DataSource into a leaky abstraction. It is also broken now because we may use multiple queries to assemble a single response. Maybe DataSource needs an additional ProvenanceLog argument on some/all methods?
The only way to make additional metadata show up in the HTML pages is by using number 3 above, or by modifying the templates. I find 3 a bit heavyweight for things like stating a license. Why do the metadata tables look so different?
I find the use case for the metadata templates somewhat unclear. Who needs a detailed trace of the operations that were performed to create the representation? I understand the point of metadata on the document level (publisher, source, etc.), but on the representation level it seems like useless noise. Also, things that “peek under the hood”, like an account of the database queries performed, seem of limited value and potentially a security risk. Is the use case clearly articulated somewhere? What would be a template that most Pubby users would find useful?

Re-design the application's URL space

The current design of the URI space is pretty bad and really should be re-done. Examples of problems:

Can't have resources with /data/ in certain places in the URI.
Formats cannot be indicated through something that looks like a file extension, but has to be done via ?output=xxx and that's just ugly.
303 and content negotiation is done at the same place, rather than 303-redirecting to a generic resource which then replies according to accept header

New URI space structure

A resource hosted by Pubby, 303-redirects to /!about/resouce-id:

/resource-id

The information resource describing such a resource. First one does content negotiation, the others are format-specific variants:

/!about/resource-id
/!about.ttl/resource-id
/!about.html/resource-id

For everything except the 303-redirecting resource handler, instead of the relative URI resource-id we could have an absolute one, to support browsing resources not hosted by Pubby:

/!about/http://example/resource-id
/!about.ttl/http://example/resource-id
/!about.html/http://example/resource-id

Value pages, showing the values of a property on a certain resource, with possibility to invert the direction to show incoming arcs (/i/). Again, there is a content negotiated resource and then format specific variants.

/!values/ex:prefixed/resource-id
/!values/i/ex:prefixed/resource-id
/!values.ttl/ex:prefixed/resource-id
/!values.ttl/i/ex:prefixed/resource-id

Again, the resource can be identified via full URI:

/!values/ex:prefixed/http://example/resource-id

We need a way of doing these value pages even if no prefix is declared for ex. The challenge is to find a way of indicating where the property URI ends and the resource identifier part starts. This could be done in a number of ways. (1) encoding the length of the URI in the address; (2) %-encoding the entire property URI; (3) using a delimiter such as /// that is unlikely to occur in a property URI.

/!values/33/http://vocab.example.com/ns%23foo/resource-id
/!values/http://vocab.example.com/ns%23foo///resource-id

Finally, the “home” resource will be 302-redirected to conf:indexResource if that is defined:

Implementation

Do our own URL routing. Let RootServlet handle the entire URI space except for /static/. Then pull all the logic of creating or interpreting URIs from the various places (HypermediaControls, servlets, web.xml) into a single class. There could actually be multiple versions of that class, like a legacy URI router that implements the “old” behaviour.

A plugin system

This might be premature, but here are some ideas.

The goal of a plugin system would be to do vocabulary-specific stuff in the generated HTML pages. For example, if a page has geo:lat and geo:long properties, a map could be shown. Or if a page is a void:Dataset with a void:sparqlEndpoint property, then a query form could be shown.

A plugin could consist of:

An ASK SPARQL query to be run against the fetched resource description. If the query matches, then the plugin will be active on the page.
One or more additional SPARQL queries that the plugin asks to run. The plugin can choose whether they are run against the fetched resource description (fast) or against the original data source (i.e. endpoint). The queries can include the ?__this__ variable to refer to the current resource. Results are included as JSON in a <script> block in the generated page.
One or more Javascript files to be included via <script> tags. Plugin scripts to be added at the end of the <script> sections of the core Pubby page; order of the resulting <script> tags needs to be maintained.
Zero or more CSS files that are included via <link rel="style"> tags.
A specific Javascript function that is called on page load, and that will modify the page's DOM structure, perhaps using the provided JSON objects.

This would mean no Java code in the plugin, making them much easier to deploy and share. The whole thing could be bundled as a .zip or .jar, with a Turtle file as manifest that describes the whole thing, and dropped somewhere in WEB-INF. At startup, Pubby could check that directory for plugins.

Make Pubby use IRIs

Pubby defaults to %-encoding everything. Toyofumi Fujiwara [email protected] tried to fix that with the changes below. It didn't work completely, but that was most likely due to issues with the dataset, not due to issues with the Pubby code. The approach looks good to me.

MappedResource.java
 -
  public MappedResource(String relativeWebURI, String datasetURI,
      Configuration config, Dataset dataset) {
    String decodeRelativeWebURI;
    try{
      decodeRelativeWebURI = URLDecoder.decode(relativeWebURI,"utf-8");
    }
    catch(UnsupportedEncodingException ex){
      decodeRelativeWebURI = relativeWebURI;
      throw new RuntimeException(ex);
    }
    this.relativeWebURI = decodeRelativeWebURI;
    //this.relativeWebURI = relativeWebURI;
    this.datasetURI = datasetURI;
    this.serverConfig = config;
    this.datasetConfig = dataset;
  }
 -
  public String getPageURL() {
    String encodeRelativeWebURI;
    try{
      encodeRelativeWebURI = URLEncoder.encode(relativeWebURI,"utf-8");
    }
    catch(UnsupportedEncodingException ex){
      encodeRelativeWebURI = relativeWebURI;
      throw new RuntimeException(ex);
    }

    //return serverConfig.getWebApplicationBaseURI() + "page/" + relativeWebURI;
    return serverConfig.getWebApplicationBaseURI() + "page/" + encodeRelativeWebURI;
  }

RemoteSPARQLDataSource.java
 - 
  private Model execDescribeQuery(String query) {
    previousDescribeQuery = query;
    //QueryEngineHTTP endpoint = new QueryEngineHTTP(endpointURL, query);
    QueryEngineHTTP endpoint;
    try{
      endpoint = new QueryEngineHTTP(endpointURL, URLDecoder.decode(query,"utf-8"));
    }
    catch(UnsupportedEncodingException ex){
      endpoint = new QueryEngineHTTP(endpointURL, query);
      throw new RuntimeException(ex);
    }

    if (defaultGraphName != null) {
      endpoint.setDefaultGraphURIs(Collections.singletonList(defaultGraphName));
    }

    return endpoint.execDescribe();
  }

Please add release tags

Hi,
I'd be interested to package pubby for Debian. To do so it would be very nice if you would add release tags to the Github repository. I noticed that you are providing versioned releases from your home page. Please tag the according commits here to enable easy downloads.
Thanks, Andreas.

Missing index.html ?

Hi,
I followed the installation instructions and unpackaged the download file into a tomcat8 root (on a Debian Jessie system). When trying to load http://localhost:8080/ I only get

HTTP Status 404 – Not Found
Type Status Report
Description The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.
Apache Tomcat/8.5.14 (Debian)

I personally have no experience with tomcat at all but all other apps I've looked at are featuring some index.html file. Is there any chance that this is simply missing or what am I doing wrong.
Kind regards, Andreas.
PS: I'm afraid that would be a user question but I have not found any mailing list or forum to ask pubby questions.

Conflicting SLF4 jars

WEB-INF/lib has slf4j-log4j12-1.6.4.jar and slf4j-api-1.5.8.jar causing:

SLF4J: The requested version 1.6 by your slf4j binding is not compatible with [1.5.5, 1.5.6, 1.5.7, 1.5.8]
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.

Allow Pubby to generate HTML pages for resources outside of the webBase/datasetBase

Use case

The RDF graph in my store may span multiple domains and may include loaded vocabularies and other third-party resources. Users want to browse the entire graph in an HTML view, and don't want to “fall out” of the Pubby universe when they click on a URI that's outside of our main namespace.

At the same time, the /data space should still reflect what's in the store, so that we can use it to verify how to write SPARQL queries against the data.

Design sketch

On the server configuration level, add a property conf:isProxyFor, where the values are resources that are treated as URI namespaces. Whenever a URI that starts with that namespaces is to be shown in an HTML page, then the user-visible URI remains as in current Pubby. But it is hyperlinked to a proxy page that describes the original URI.

Let's say for example that conf:isProxyFor has the value http://example.com/. If http://example.com/widget/1234 occurs in the endpoint, it may be shown just like that in the HTML page, or perhaps as ex:widget/1234 if an appropriate namespace mapping was defined. However, when clicked, the hyperlink targets something like http://localhost:8080/proxypage/http://example.com/widget/1234, assuming Pubby runs on localhost:8080. That URL shows an HTML page with the properties of the widget. The HTML page describes itself as being a description of http://example.com/widget/1234

There would be a corresponding /proxydata URL that returns a Turtle description of http://example.com/widget/1234.

This feature would make it possible to display resource descriptions without needing to do all the weird and counter-intuitive base rewriting stuff. There is no longer a need for potentially squeezing multiple namespaces into the single Pubby namespace.

URIs in the endpoint that start with the conf:datasetBase (or conf:webBase if no dataset base is configured) would still be handled directly in the main Pubby namespace. So if conf:datasetBase were http://example.com/, then Pubby would rewrite the widget URI to http://localhost:8080/widget/1234, and that's what would show up in the HTML pages and in /data.

Implementation notes

HypermediaResource may need a boolean method that answers whether the IRI (getAbsoluteIRI()) is proxied by Pubby. Anything in the namespaces listed with conf:isProxyFor is assumed to be proxied by Pubby (that's why Pubby needs to put up a proxy for it). Those resources then need to use the /proxypage and /proxydata stuff in HypermediaResource.getPageURL() and getDataURL().

The HTML display logic (in ResourceDescription, PageURLServlet, and the Velocity templates) would need to hyperlink to the absolute IRI if the target resource is not proxied by Pubby, and display HypermediaResource.getPageURL() otherwise.

Special characters in URL string return 404 Error

My dataset contains scandinavian characters such as "æøå". When browsing to a ressource containing one of these letters, Pubby return 404 Not Found error. I am browsing to --> http://localhost:8080/ontology/page/harAfhængighed but it returns --> http://localhost:8080/ontology/harAfh%C3%A6ngighed with error "The requested resource does not exist at this server, or no information about it is available."

Is there any work around?

Show better error screen when remote server responds with an error

Currently it throws an HttpException stack trace.

Allow use of CONSTRUCT instead of DESCRIBE

This could be a configurable choice. It would avoid the problem where some endpoints don't return the “incoming” arcs.

Distribute Pubby as a pre-packaged .war

Currently, Pubby is distributed as a “disassembled” web application directory. To install and run it, one has to copy that directory into the webapps directory of a pre-installed servlet container. To configure Pubby, one has to change files inside the web application. In particular, one has to set the location of the configuration file in /WEB-INF/web.xml (or modify the configuration file in the default location, /WEB-INF/config.ttl). This is bad because one cannot upgrade Pubby by simply replacing the Pubby directory, as this would overwrite the configuration changes. One needs to apply the configuration change in the new version, either by manually editing web.xml again, or by copy-pasting the file around.

It would be better if we could simply ship a pubby.war archive that one drops into the servlet container, where the configuration is made somewhere outside of the web application.

Unfortunately there is no particularly good way of doing this. This issue here captures the result of some research into different ways of achieving this goal.

Hardcoded configuration file location

One option would be to hardcode the location of the configuration file. It could be an absolute path (/etc/pubby-config.ttl), a path relative to the user home (~/.pubby-config.ttl), or a path relative to the current directory (./conf/pubby-config.ttl).

A number of problems with this approach:

Conventions for this kind of file differ between operating systems
Doesn't easily facilitate multiple instances of Pubby with different configurations (e.g., for multiple virtual hosts)
Whichever approach is chosen, it's likely to work poorly in some scenarios

This seems bad.

System-wide configuration

So if the location of the configuration file cannot be hardcoded, then it needs to be passed into the web application. This could be done through a system-wide setting. Options include:

Setting a system variable on servlet container startup, and reading it in the webapp
Adding a directory to the classpath, and putting a pubby.properties file there (or directly pubby-config.ttl) and reading it from within the webapp

Again there are a number of problems:

Doesn't easily facilitate multiple instances of Pubby with different configurations (e.g., for multiple virtual hosts)
Procedures differ between servlet containers
The classpath option involves fun with system classpath vs. app classpath on Tomcat, which is usually not fun
This is the right way of configuring a servlet container, but not really a webapp

Per-webapp configuration

So the right way of doing this should be based on per-webapp settings, because this allows running multiple instances of Pubby with different configurations. So this will involve making some settings in the servlet container configuration that are passed to the individual webapp. There seem to be two options here: JNDI and overriding context-param.

JNDI

This is a J2EE thing that seems rather complicated. It supports “environment parameters” that are simple strings that can be set in various ways, and then read from an application. But its real purpose is to provide factories for things to enable looser coupling of apps. So we'd be using a very complex system to achieve a rather simple task. Also, JNDI support is optional for servlet containers. Tomcat seems to have it out of the box, but Jetty requires extra jars and extra configuration. Altogether, this approach seems to be dragging in a whole lot of unpleasant complexities.

Setting `context-param` from outside of the webapp

Webapp configuration can be done using the context-param element inside the web.xml. Servlet containers generally provide a way of setting or overriding context parameters from outside of the web application. Architecturally, this seems to be the right approach. Downsides:

It works differently for each servlet container
It generally means that one doesn't deploy by simply dropping a war file into webapps, but by messing with XML files that sit in different locations; this it's likely to be unfamiliar to many users
It seems to be overly complicated in Jetty, requiring two small XML files

Nevertheless, this may be the way to go. So let's assume we want to set the config-file parameter that Pubby already uses to find the config file from outside of the web application, so that we can ship the web application as a simple pubby.war.

Setting `context-params` on Tomcat

Put a pubby.xml in $CATALINA_BASE/conf/[engine_name]/[host_name] (e.g., conf/Catalina/localhost). It should look somewhat like this (not actually tested):

<Context docBase="/Path/to/pubby.war">
  <Parameter name="config-file" value="/Path/to/config.ttl" override="false"/>
</Context>

Setting @override to false prevents the value from web.xml (if any) from overriding the value provided here. If the @docBase is omitted, then webapps/pubby.war in the Tomcat directory is assumed.

More information:

Setting `context-params` in Jetty

Put a pubby.xml into Jetty's /contexts directory. It should look something like this (not tested; the paths don't make sense in this example):

<Configure class="org.mortbay.jetty.webapp.WebAppContext">
  <Set name="contextPath">/pubby</set>
  <Set name="war"><systemproperty name="jetty.home" default="."/>/webapps/pubby.war</set>
  <Set name="overrideDescriptor"><SystemProperty name="jetty.home" default="."/>/my/path/to/override-web.xml</Set>
</Configure>

The override-web.xml looks like a normal web.xml, but is applied after the one in the web application, and can thus override stuff:

<web-app>
  <context-param>
    <param-name>config-file</param-name>
    <param-value>/Path/to/config.ttl</param-value>
  </context-param>
</web-app>

One deployment option here would be to have pubby.war and override-web.xml both located in some user directory outside of the servlet container. The general downside here is that one has to provide redundant absolute paths everywhere—twice in the context XML and once in the override-web.xml.

More information:

Setting `context-params` in the Jetty Maven plugin

This would use the same approach as for Jetty above, but instead of the context XML file, one only needs to point to the override-web.xml in the plugin configuration:

<project>
    ...
    <plugins>
        <plugin>
            ...
            <artifactId>jetty-maven-plugin</artifactId>
            <configuration>
                <webAppConfig>
                  ...
                  <overrideDescriptor>src/main/resources/override-web.xml</overrideDescriptor>
                </webAppConfig>
            </configuration>
        </plugin>
        ...
    </plugins>
    ...
</project>

Since the build should be location-independent, the actual location of the config file cannot be specified as an absolute path, which means that probably the config file has to be put onto the classpath.

More information:

Setting `context-param` for a command-line app with embedded Jetty

In case we want to also provide a command-line app that starts up an embedded Jetty to run Pubby. That should be straightforward. Use WebAppContext.setInitParams() to set the context param, or WebAppContext.setOverrideDescriptors() with a temporary file.

PREFIX not working in latest version of Pubby

When delcaring own prefix in Pubby e.g. "@Prefix dgi: http://localhost:8080/ontology/ ." -->

The prefix is not showed in the HTML output of classes, properties or individuals. Pubby shows a questionmark (?) before every ObjectProperty.

Is there any solution to this?

Cannot work with Sesame 2.7.9 endpoint due to x-binary-rdf?

Error parsing configuration file file:/blah-blah/apache-tomcat-8.0.0-RC3/webapps/pubby/WEB-INF/config.ttl: Endpoint http://blah-blah/openrdf-sesame/repositories/somedatabase returned Content Type: application/x-binary-rdf which is not a supported RDF graph syntax

Links to the blank node detail page sometimes not showing

There are quite a few cases where Pubby shows “[5 anonymous resources]” as the value of a property, but doesn't turn it into a link. I've identified a few:

If no prefix is defined for the property. (That's because the /pathpage and /pathdata URLs are built using a QName, and no prefix means no QName.)
If the prefix is returned by the endpoint but not defined in the config.ttl. (That's because when dereferencing a /pathpage, Pubby needs to expand the QName into a full property IRI, so that it can ask about that property in the endpoint. At this point, Pubby doesn't know anything about prefixes that are only defined in the endpoint, it needs to rely on those defined in config.ttl only.)
If the property is in the datasetBase namespace (and that namespace is different from webBase). (That's because the property QName in the path page/data URL expands to a URI in the webBase namespace, and is not translated back to an original form in the datasetBase namespace.)

Some of these issues can perhaps be fixed in code, others mentioned in the documentation.

Running as non-ROOT web application currently broken

The webapp needs to be the ROOT webapp or Pubby will end up at http://server/pubby/pubby/something.

Missing triples in data when long list of values for a property

(Using the latest code from Git)
When a property has too many values it is displayed this way in the HTML :

ore:aggregates [57 values]

However I would expect all the values (57 in this case) to be present in the RDF returned from the RDF view. But they are not. How is an application consuming this RDF supposed to know that these 57 values are to be fetched at some other URL ?

Is there a way to desactivate this behavior that hides the list of values when it is too long ? if yes would it make them appear in the RDF ?

Cheers

unit test failure: testSafariGetsHTML null expected:<[text/html]> but was:<[application/x-turtle]>

testSafariGetsHTML(de.fuberlin.wiwiss.pubby.negotiation.PubbyNegotiatorTest) Time elapsed: 0.001 sec <<< FAILURE!
junit.framework.ComparisonFailure: null expected:<[text/html]> but was:<[application/x-turtle]>
at junit.framework.Assert.assertEquals(Assert.java:81)
at junit.framework.Assert.assertEquals(Assert.java:87)
at de.fuberlin.wiwiss.pubby.negotiation.PubbyNegotiatorTest.testSafariGetsHTML(PubbyNegotiatorTest.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:79)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Bad conf:supportsIRIs in doc/config.ttl

In the master/doc/config.ttl file, in the line 75, the parameter conf:supportsIRIs it is specified in this way:

conf:supportsIRIs false;

But pubby crashes. Perhaps it should be so:

conf:supportsIRIs "false";

I am using pubby version 0.3.3

Provide a command-line app with embedded Jetty

Provide a command-line application that allows quickly starting up a Pubby server without the need for a pre-installed servlet container. This would be great for testing, development, etc.

Possible example invocations:

pubby --config config.ttl
pubby --load somefile.rdf
pubby --sparql http://dbpedia.org/sparql

From a packaging point of view, if we want to do this with Maven, it probably requires a separate Maven project, making everything quite a bit more complicated.

Update WebURIServlet

There's a limitation on pubby when you want to add it to existing Java web applications. If the application has complex servlet filters it will cause some trouble when accessing default pubby URLs (/resource, /page...).

I created a custom WebURL filter to replace the default WebURI filter in resource/ URLs, but now the dataset is not available to RDF browsers: it does not reach the WebURIServlet, hence no redirect... Any ideas?

Requesting Backend with Username/Password

When configuring Pubby with an Sparql endpoint which needs authentication (e.g. http://user:[email protected]/sparql) Pubby answers with

,-----------------------------

HTTP Status 500 - JenaException: Only well-formed absolute URIrefs can
be included in RDF/XML output:
http://<username:@:8080/parliament/sparql> Error:
37/HAS_PASSWORD in slot 6

`-----------------------------

With Pubby's current logic, RDF output should actually have URIs that have the username and password in them for all resources... ;-) So it's almost a good thing that the Jena library refuses to serialize these URIs.

It would be great if Pubby could handle URL authentication to access also Sparql endpoints within an authentication realm.

Documented Build Process Hangs after Starting Jetty

I am trying to build the latest sources from the Git repository based on the instructions found in the Source code and development section in the doc/index.html file.

Unfortunately, the step mvn jetty:run does not seem to terminate, it hangs at

[INFO] Started Jetty Server

After that output, nothing further seems to happen (or does it really take that long (more than 2 hours) for anything else to show up?).

I have tried this both on Windows 7 and on Fedora, same result.

As suggested in this StackOverflow question, switching the version of the Jetty Maven plugin is supposed to help, but didn't appear to change anything for me (rather than the default version 8.1.9, I tried versions 8.1.10 to 8.1.12).

Moreover, it may be noteworthy that a few of the last log messages before the starting of the Jetty server read as follows:

2014-10-06 10:57:20.224:INFO:oejpw.PlusConfiguration:No Transaction manager found - if your webapp requires one, please configure one.
2014-10-06 10:57:22.138:INFO:/:######## PUBBY CONFIGURATION ERROR ########
2014-10-06 10:57:22.138:INFO:/:Expected IRI object, found literal: [] conf:metadataTemplate 'metadata.ttl'.
2014-10-06 10:57:22.138:INFO:/:###########################################

At least the "Pubby configuration error" seems unrelated to the Jetty problem, however, as I commented out the respective line in the config.ttl file, the error went away, but the Jetty problem persists.

It is well possible I am missing something obvious; neither Maven nor Jetty belong to the software stack I am very experienced with.

missing URL encoding ?

first: a happy new year !

I experienced some problems with pubby, stable release Pubby 0.3.3 .
Indexing some stuff to my triple store a space in a URI is URL-encoded correctly into a '+'. , i.e. like this:

@prefix frbr: <http://purl.org/vocab/frbr/core#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://lobid.org/resource/DE-929:Freihand+5-8>
rdf:type frbr:Item .

This resource is linked with an other resource in pubby. When I click in pubby to follow that link I got a pubby produced "404 Not Found".
But its working SPARQling 4store directly.

Something similar with url-encoded '/' (which is then a %2F). Following that link in pubby brings up a http-server "page not found" which seems to be because the %2F is by pubby re-interpreted as an '/' - and the so-created link is obviously wrong .
(see http://lobid.org/resource/HT002511874 , under frbr:exemplar http://test.lobid.org/resource/DE-290:630%2FSchn )
Using 4store as backend, and querying this resource in 4store directly using SPARQL does work (naturally only if I "curl --data-urlencode" the query) .
Again, its working SPARQling 4store directly.

Another problem is with ':' , if I urlencode it pubby doesnot made correct links - if the colon is left decoded pubby works.

Could it be that pubby follows (internal) URIs without --data-urlencode it , or am I doing something wrong?

-o

Pubby deploy hangs forever in Tomcat 8

I am using Pubby with OpenRDF Sesame running on the same Tomcat server. The problem appears to be that if the Sesame repository is not already loaded when Pubby starts, Pubby hangs forever, which means that the Tomcat manager never starts and the only way to get things working again is to manually remove Pubby from the Tomcat webapp directory.

It would be great if Pubby could just fail and not hang up things forever if it can't find the back-end server. (Maybe setting a connect timeout on a URLConnection somewhere would fix this?)

Unable to make a query and display information ("Bad request") with GraphDB triple store

Note: this only happens with current versions, not the releases found at http://wifo5-03.informatik.uni-mannheim.de/pubby/ )

I have a working SPARQL Endpoint managed using GraphDB deployed locally. I want to use pubby to display all the linked data available in this endpoint. I used Pubby 0.3.3, and it was working fine (using this configuration file: http://pastebin.com/s0qYPptQ ) but I was bothered by the fact that anonymous nodes were not displayed, except for their quantity. As it seemed that this was fixed in future versions, I decided to build pubby 0.4 from the source code available here. After fixing some deployment errors (such as deleting non-working tests and non-working Javadoc generation) I managed to get a working deployment... only to see this message displayed on pubby's webpage:

Configuration error

Pubby failed to properly initialize. This is probably due to an error in the configuration file. The message was:

Error parsing configuration file <file:/var/lib/tomcat8/webapps/pubby_seecr/WEB-INF/config.ttl>: Error making the query, see cause for details

I used the same configuration file, since I could not see any kind of difference between 0.3.3's config files and 0.4's. So I checked GraphDB's log, and they say this:

[INFO ] 2016-07-15 14:21:10,137 [repositories/BIO | o.o.h.s.r.s.StatementsController] GET statements
[INFO ] 2016-07-15 14:21:10,139 [repositories/BIO | o.o.h.s.ProtocolExceptionResolver] Client sent bad request ( 406)
org.openrdf.http.server.ClientHTTPException: No acceptable file format found.
        at org.openrdf.http.server.ProtocolUtil.getAcceptableService(ProtocolUtil.java:190)
        at org.openrdf.http.server.repository.statements.StatementsController.getExportStatementsResult(StatementsController.java:348)
        at org.openrdf.http.server.repository.statements.StatementsController.handleRequestInternal(StatementsController.java:113)
[...]

This error usually happens if the types described in the "Accept:" field of the GET request are wrong or empty, but checking with tcpdump I noticed that they weren't wrong at all.

So I'm not sure what to do and especially why this only happens with the most recent versions.

Any clue? Thanks :)

Better detection of configuration errors

Pubby should check for misspelled terms in the conf: namespace in the config file, and for terms defined on the wrong kind of entity (on a dataset resource instead of a configuration resource, or vice versa).

Errors should be thrown as the kind of exception that's caught in the ServletContextInitializer.

%-encoded versions of # and ? in IRIs in the original data don't work

That's because we %-encode the unencoded versions of these characters when rewriting IRIs.

Original IRI in the data => IRI where Pubby makes that data accessible:

http://dataset-base/foo?bar#baz => http://pubby-base/foo%3Fbar%23baz
http://dataset-base/foo%3Fbar%23baz => http://pubby-base/foo%3Fbar%23baz

Requested IRI in the web application => IRI that Pubby looks for in the dataset

http://pubby-base/foo%3Fbar%23baz => http://dataset-base/foo?bar#baz

So, if we have %23 or %3F in the original IRI, Pubby will not round-trip them correctly.

The solution of %-encoding the percent sign as %25 (so %23 becomes %2523) isn't nice, as it would only work if we %-encode all percent signs in any original data IRI. This means that %20 and other common %-sequences will now become really ugly. We want to keep Pubby's workings predictable and rewrite as little as possible, so this is bad.

A better solution is perhaps to think hard about ways of not requiring the escaping of # and ? in the first place. The former is needed because of its special role in IRIs (the part after the hash is not sent to the server when an HTTP request is made). The latter is, I believe, treated special because of the ?output=xxx thing we support, and perhaps because of uncertainty whether it's possible to still get exactly the original IRI after the servlet container has chopped it into request params.

cygri / pubby Goto Github PK

pubby's People

Contributors

Stargazers

Watchers

Forkers

pubby's Issues

New URI space structure

Implementation

Use case

Design sketch

Implementation notes

Hardcoded configuration file location

System-wide configuration

Per-webapp configuration

JNDI

Setting context-param from outside of the webapp

Setting context-params on Tomcat

Setting context-params in Jetty

Setting context-params in the Jetty Maven plugin

Setting context-param for a command-line app with embedded Jetty

Recommend Projects

Recommend Topics

Recommend Org

Setting `context-param` from outside of the webapp

Setting `context-params` on Tomcat

Setting `context-params` in Jetty

Setting `context-params` in the Jetty Maven plugin

Setting `context-param` for a command-line app with embedded Jetty