iay / mdq-server Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 6.0 13.05 MB

Metadata Query Protocol server implementation using Shibboleth MDA

Java 84.64% XSLT 9.70% Smarty 3.36% CSS 0.20% Dockerfile 1.57% Shell 0.54%

mdq-server's Introduction

If you want to know more about me, the about page on my web site is the best place to go.

mdq-server's People

Contributors

Stargazers

Watchers

Forkers

khazelton philsmart mmoayyed geoffroya smstong mzarifa

mdq-server's Issues

adding Last-Modified response header to mdq-server

I know the spec doesn't require a Last-Modified header but the response would be more readable (by humans) if it had one. Can one be added?

exception in refresh does not trigger degraded health

We had a problem where (many) refresh operations were hitting Java out of heap errors (via the "uncaught exception" handler. I would have expected this to cause the health endpoint to report a DEGRADED status, but that doesn't seem to have been the case. We did see some timeouts.

A failure in the refresh operation should always result in degraded health, at least until the next successful refresh.

The reason for this isn't obvious, but it's probably related to the fact that as currently coded the degraded status is only signalled if the last successful refresh is more than two refresh intervals ago. So if, for example, every second refresh succeeds, you'll never notice.

response from mdq-server needs a Date header

A typical response from the mdq-server doesn't have a Date header. Can one be added?

AD FS cannot load metadata for entities with forward slashes in their IDs

Given an mdq-server deployment accessible over HTTPS, Active Directory Federation Services versions 2.0 and 3.0 cannot download metadata for entities with the / (forward slash) character in their entity IDs. This happens because AD FS converts all occurrences of %2F in the URL-encoded (a/k/a percent-encoded) entity ID back into the / character, causing mdq-server to reject the entity metadata query with a HTTP 404 (not found) response.

To give a concrete example, assume a base URL of https://mdq.example.com/entities/ and a URL-encoded entity ID of https%3a%2f%2fidp.example.com%2fshibboleth. Configure a new claims provider trust in AD FS using the metadata URL of https://mdq.example.com/entities/https%3a%2f%2fidp.example.com%2fshibboleth. When it attempts to load the metadata, it will issue the following HTTP request: GET /entities/https%3a//idp.example.com/shibboleth.

use different HTML template for /entities endpoint

At the moment, when rendering results for a browser, we use the same template (queryResult.tpl). This means that, for example, the full signed XML document is included when returning the "everything" aggregate.

It would make more sense to use different HTML templates for each endpoint, and make the one for the "everything" aggregate not include the rendered metadata (perhaps always, perhaps if it exceeds a certain length).

It should also be possible to override each of the HTML templates from outside the executable bundle.

remove stale rendered results

The new cache system in #18 will replace old rendered results when they are queried for and the generation match, but there are a couple of situations where this will not happen:

an entity is referenced and then disappears from the source
an entity is referenced and then not queried for again

In each of these cases, the rendered result will be retained indefinitely. This probably isn't a major issue but it would make sense to sweep the cache from time to time to remove stale entries.

An alternative would be some kind of positive invalidation signal from the lower layers, or an assumption that once the generation had changed on any result then all cached results should be discarded.

add Spring Boot info endpoint

See http://blog.frankel.ch/more-devops-for-spring-boot.

Workstation/developer specific resources should not be present in git

Hi @iay
Shouldn't specific resources linked to IDE/developer workstation be stripped of the repo?

set content length when appropriate on responses

display names of tagged collection identifiers

It would be useful to include the identifiers associated with tagged collections along with the number of such collections.

ItemCollectionLibrary stops refreshing after a while

We've observed a case where the ItemCollectionLibrary refresh process, which is supposed to be executed periodically, stops happening after something like a month.

The only clue in the logs is that an "executing source pipeline" log message appears but that the next expected message ("source pipeline executed; n results") never does:

        log.debug("executing source pipeline");
        try {
            sourcePipeline.execute(newItemCollection);
        } catch (PipelineProcessingException e) {
            log.warn("source pipeline execution error", e);
            return;
        }
        log.debug("source pipeline executed; {} results", newItemCollection.size());

The rest of the application still responds to requests using previously fetched metadata.

One obvious possibility is that some stage in the source pipeline is blocked or looping. However, I would have expected blocking from the network to be subject to timeouts, and there was no evidence that a thread was hogging the CPU.

A second possibility is that the execution of the source pipeline encountered an exception which wasn't logged, and the periodic execution was cancelled as a result. The Runnable submitted looks like this:

                    new Runnable() {

                        public void run() {
                            refresh();
                            log.debug("next refresh estimated at {}", new DateTime().plus(refreshInterval));
                        }

                    }

The doRefresh method does include a try...catch but only for PipelineProcessingException. This means that something like a NullPointerException would just fall out of the top of run. My reading of the API for ScheduledThreadPoolExecutor and ThreadPoolExecutor indicates that such exceptions are ignored unless afterExecute is overridden in a subclass, and will also cause recurrences to be cancelled.

http://stackoverflow.com/questions/2248131/handling-exceptions-from-java-executorservice-tasks includes some discussion around this and several approaches to the issue. Perhaps the simplest is to wrap the refresh call in run with a try...catch (Throwable).

implement content negotiation with multiple renderers

The current implementation of EntitiesController is wired up to a single MetadataService from which it queries for rendered metadata. Different presentations of this are made by the view, either queryAllResult or queryResult. This allows for limited content negotiation, for example to present a human-readable page if a browser makes the request, vs. SAML metadata otherwise.

However, this does not allow for content negotiation which would result in different renderings being performed, for example using either SAML or JSON formats depending on requested type.

One way to address this would be to change EntitiesController to be aware of multiple MetadataService instances and their resulting content types, and query an appropriate one depending on the requested MIME types.

It might be possible to just gather up all the MetadataService beans available in the context, allow each of them to specify their supported content types and potentially make them implement Ordered as well.

implement timed metadata refresh

The MetadataService should refresh its metadata from the source pipeline every so often, after the initial refresh it performs on startup. Most of the code to do this is already in place (in particular the locking should be correct) except for the actual timer operation, which we can probably pull mostly intact from Chad's original work.

Document how the components interact

There is no overall documentation for the implementation, although there is some JavaDoc for the individual classes.

ItemCollectionLibrary uses unchecked or unsafe operations

Noticed during compilation. Probably worth digging into this sometime, but not a high priority.

include ETag value in response headers

rework cache invalidation using generation tags

At present, the MetadataService layer maintains a cache of Results which it invalidates whenever it (periodically) asks the ItemCollectionLibrary to refresh the collection. This is suboptimal in several ways, and will tend to cause cached results to be discarded more frequently than they need to be if the ItemCollectionLibrary gets more intelligent about refresh, for example by pre-checking the source document and retaining previous results if the source document has not changed (aside: this probably requires reworking the single source pipeline into a document fetch and processing pipeline). It also doesn't work at all if several MetadataServices are clients of the same ItemCollectionLibrary.

A better solution would be to attach a generation tag (e.g., a UUID) to each IdentifiedItemCollection and have the MetadataService remember that in its cache along with the corresponding Result. The MetadataService would then revalidate cache contents by making an IdentifiedItemCollection query and comparing generation tags. If the tag matched, the cached Result would still be valid; if not, the cached Result would be discarded.

This would also move the control of refresh firmly back into ItemCollectionLibrary, which I think is preferable to the current hack.

separate out deployment profiles

Other than for purely development purposes, details of particular deployments should not appear in this project. In particular, the incommon profile should be moved out into the mdq-server-incommon project.

This will mean refactoring the bean definition files and making it possible to override some bean definitions using a file external to the executable bundle.

implement conditional GET

This is dependent on #1, as without the cache the ETag will always be different due to changes in the rendered result's validUntil and ID attributes.

Missing docs?

Are there any docs that would help one build and deploy this module?

ItemCollectionLibrary should be able to suppress refresh

At present, a #refresh() on ItemCollectionLibrary unconditionally runs the source pipeline. This means that the library will be updated whether anything has changed or not, which in turn means that it may change more often than necessary.

There are a couple of ways to approach this. One is to discard the results of the source pipeline if it exactly matches the results of the previous run.

Another is to suppress running the source pipeline if we know (through evaluation of a configurable predicate) that the result will be the same. In the usual case, the predicate could be made up from checks on the last modified date or ETag of each of the documents which will be accessed by the source pipeline.

look into Metrics for instrumentation

There may be some clashes with Spring Boot's annotations, but Metrics looks like a nice instrumentation framework, rather than reinventing that wheel.

Add the <mdrpi:PublicationInfo> extension element

Add an <mdrpi:PublicationInfo> extension element to the /entities aggregate as well as the /entities/collection:idps aggregate. Likewise add this extension element to each signed entity descriptor. In all cases, set the publisher XML attribute to the location of the aggregate or entity metadata.

begin unit testing

There are no unit tests in this code at the moment. As things get more developed, we need to be able to avoid regressions by pinning down some of the behaviour.

add health monitoring for ItemCollectionLibrary refresh

Spring Boot has a nice facility (part of spring-boot-actuator) for adding health monitoring across an application by having components implement a HealthIndicator interface. It would make sense to implement this for ItemCollectionLibrary to include details of the current state and the result of the last refresh, if any.

The downside would be that the application would require Spring Boot, which at the moment it does not except for some details around configuration.

use @Duration and converters on period configuration parameters

At present, there are a couple of duration parameters expressed in terms of longs representing milliseconds, for example MetadataService.refreshInterval, which might be initialised with "3600000". This is rather inconvenient and it would be much nicer to be able to say "P1H" in ISO8601 notation.

We can do this if we use the Shibboleth @Duration annotation on the bean definition and an appropriate conversionService definition imported from idp-conf.

Upgrade Spring Boot dependency to something more current

We should upgrade our Spring Boot dependency, currently in the 1.5 range, to something much more current. The latest 2.3.x has a number of interesting features relating to containerisation, so would be a good aspiration:

https://www.youtube.com/watch?v=WL7U-yGfUXA

This may require other dependency updates, of course. It may also affect the versions of Java we run under, although I think Java 8 is still supported by Spring Boot.

client address logging doesn't work behind a proxy

It would be nice if the client IP address logging could be made to work if the mdq-server is running behind a proxy. There's usually some kind of header for this, and an optional ability to make use of that would be easy to do.

expose signing certificate on an endpoint

It might be useful to expose the certificate used by the signing profile on an endpoint, to simplify access to it.

Build failure due to missing ukf-mda jar/pom files

Maven returned the following error:

[ERROR] Failed to execute goal on project mdq-server: Could not resolve dependencies for project uk.org.iay.mdq:mdq-server:jar:0.0.1-SNAPSHOT: Could not find artifact uk.org.ukfederation:ukf-mda:jar:0.9.0 in shib-release (https://build.shibboleth.net/nexus/content/groups/public) -> [Help 1]

As a workaround I downloaded the missing files from https://github.com/ukf/ukf-mda/releases/download/v0.9.0/ukf-mda-0.9.0.jar (saved as ~/.m2/repository/uk/org/ukfederation/ukf-mda/0.9.0/ukf-mda-0.9.0.jar) and https://raw.githubusercontent.com/ukf/ukf-mda/v0.9.0/pom.xml (saved as ~/.m2/repository/uk/org/ukfederation/ukf-mda/0.9.0/ukf-mda-0.9.0.pom).

(FreeBSD/amd64 10.2-RELEASE, openjdk8-8.66.17_3, maven31-3.1.1_1)

experimental: implement JSON entity listing endpoint

One prerequisite to this would be #18 to avoid cache invalidation issues when using a shared item collection library.

It may or may not be worth putting part of the implementation for this in a Pipeline so that we can migrate over to a multi-MetadataService and content negotiation approach if this proves to be interesting in the long term. The job being done is probably straightforward enough not to require this initially, though.

I'm also in two minds as to whether it's worth looking at implementing this using MDA-77 as a generic way to generate arbitrary JSON without needing to tweak Java to change the schema. The simpler approach of just directly coding to the javax.json API is probably good enough to start with, though.

use ItemTag item metadata to create named collections

The MetadataService should interpret ItemTag objects in each Item's metadata as representing a named collection into which the Item should be added.

Combined with stages to conditionally add tags to Items, this can be used to add arbitrary named collections to the identifier space responded to by the service.

Upgrade to MDA 0.10.0-SNAPSHOT dependency

In its role as a vehicle to help test and debug the MDA product, this project should currently be dependent on the latest development snapshot.

clarify the response for an "all entities" request

When I type URL

http://mdq-beta.incommon.org/global/x-entity-list

into a browser window, the identifier is shown as "null (ID_ALL)," which is a less than obvious response. Can this be clarified?

the Content-Language response header from mdq-server isn't always relevant

The Content-Language response header might be useful when returning HTML but when the content type is application/samlmetadata+xml, a Content-Language response header is not necessary (and in fact, confusing).

disable Spring Boot diagnostic endpoints by default

The POM file includes spring-boot-starter-actuator as a dependency to gain access to various "production-ready features" such as health endpoints. However, a number of these are potentially security sensitive and would be better being enabled only in a development environment.

A reasonable default would appear to be:

endpoints.enabled=false
endpoints.health.enabled=true

See http://docs.spring.io/spring-boot/docs/current-SNAPSHOT/reference/htmlsingle/#production-ready.

Explicit <mdrpi:RegistrationInfo> elements will be added to InCommon metadata

This is just a heads up! Not sure if this requires any further action.

InCommon Operations will add explicit <mdrpi:RegistrationInfo> elements to production metadata according to the following schedule:

preview aggregate: Wed, Feb 4, 2015
main aggregate: Wed, Feb 11, 2015
fallback aggregate: Web, Feb 18, 2015

I assume the mdq-server instance at mdq-beta.incommon.org is consuming the main (production) aggregate, in which case the middle date above is most relevant.

disable requests from crawlers

At least one current deployment is seeing requests from crawlers such as Google's. This is presumably because they have found the site through links from other sites.

It would make sense to at least have the ability to block these through a robots.txt file. That's not completely trivial, however, if the context root isn't the site root. It might be worth either making those independent or looking into how to get two application contexts (or static content) into the Jetty instance that Spring Boot includes.

add a controller to force refresh

For some applications, it would be nice to have an endpoint to poke to force a refresh.

Maven throws SunCertPathBuilderException when building mdq-server on Mac OS X

On Mac OS X 10.11.6 with Oracle JDK 1.8.0_92 and Apache Maven 3.3.9 (the maven3 package from MacPorts), Maven reports the following error when it tries to download the uk.org.ukfederation:ukf-mda:pom:0.9.4 artifact:

[ERROR] Failed to execute goal on project mdq-server: Could not resolve dependencies for project uk.org.iay.mdq:mdq-server:jar:0.0.1-SNAPSHOT: Failed to collect dependencies at uk.org.ukfederation:ukf-mda:jar:0.9.4: Failed to read artifact descriptor for uk.org.ukfederation:ukf-mda:jar:0.9.4: Could not transfer artifact uk.org.ukfederation:ukf-mda:pom:0.9.4 from/to ukf (https://apps.iay.org.uk/nexus/content/repositories/ukf): sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -> [Help 1]

I'd thought that perhaps the JVM didn't trust the apps.iay.org.uk certificate, but when I tried to add the Let's Encrypt X3 certificate to the cacerts keystore, keytool said that the certificate was already trusted (alias ... already exists).

use DEGRADED health status if multiple refreshes are missed

The health indicator for ItemCollectionLibrary always gives an UP status if the bean is operational. It would make sense to introduce a test for the last successful refresh being more than a couple of refresh periods ago, and using a custom DEGRADED status for that.

The status ordering would also have to be set so that DEGRADED superseded UP for health aggregation purposes.

sort aggregate in render pipeline

It might be easier to debug some things if we used a sort order when creating the EntitiesDescriptor in the render pipeline. At the moment the default ordering is used, which ends up just being whichever order things were in the source.

validUntil window and cacheDuration should be settable via properties

At the moment, the validUntil window and the cacheDuration interval are part of beans.xml, which is to say they aren't easily tweakable by deployers. They should be settable via properties.

discard invalidated cache data before re-rendering

In the case where MetadataService re-renders a previously cached result, it does not discard the invalidated result until it is overwritten by the new one. This means that the Java garbage collector can not reclaim the memory in use by that result for use by the new rendering operation. This means more heap is required than necessary.

The cache entry could be deleted in this situation to reduce peak memory usage.

This isn't a perfect solution, though, as additional requests for the same result happens during the rendering operation, all of those rendering operations will happen in parallel, each using additional memory. It might be worth thinking about serialising this using Future<> somehow.

use profiles to switch between development and production

We need a couple of different profiles, with the following varying between them:

source of metadata (internal XML file vs. eduGAIN vs. some production source)
signature credentials (internal resources vs. external resources vs. PKCS11)

mvn clean install fails out of the box

Tests fail during clean install.
After updating parent version (11.1.0-SNAPSHOT is no more available), I tried the following without success:
mvn clean install
mvn clean install -Prelease

[ERROR] Cannot instantiate class uk.org.iay.mdq.server.EntitiesControllerTest
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: There was an error in the forked process

Any clue?

cache rendered results

The MetadataService should cache results before forwarding them, so that the rendering pipeline doesn't need to be run on multiple requests for the same identifier.

add signing to render pipeline

Use GitHub Actions to do per-push testing

move from annotations to XML configuration where appropriate

Some of the current configuration is performed using annotations such as @RequestMapping. This is very convenient when building a fixed configuration but isn't flexible particularly towards situations where the controller bean is instantiated more than once.

Examine all uses of annotations to see whether future flexibility requires them to be moved into the equivalent XML configuration.