Giter VIP home page Giter VIP logo

Comments (8)

zerocrates avatar zerocrates commented on July 30, 2024

Can you reproduce the messages (or some of them) here?

These things often only show up for particular combinations of data and can sometimes be hard to get "from scratch."

from oaipmhrepository.

zerocrates avatar zerocrates commented on July 30, 2024

As a concrete example, I ran my own instance of the Repository against the first validator you mentioned, from oaipmh.com (which is much nicer-looking than any of the validators that existed when I first wrote this plugin), and I didn't get any errors from any of the "available commands" options on the left.

re.cs.uct.ac.za (the same old ugly validator that I did have when first writing this) has been churning away for quite some time with no results.

I do get quite a few notices and a few errors from the base-search validator.

The ones I don't consider relevant or valid

  • prodding me to implement deletion tracking
  • Saying more records should be returned at once (this is configurable)
  • Saying resumption tokens should last a day or more (I'm not sure I agree, but it's configurable anyway)
  • Complaining about the dc:language value (this plugin won't, and shouldn't, I think, do anything to the values the user feels like inputting themselves)

The two errors are interesting, and don't show up from other sources.

One complains about the "toolkit" metadata in Identify. As far as I can tell, this one is happening because the schema document for the toolkit namespace isn't where it used to be anymore. Virginia Tech (where it's supposed to be) doesn't seem to have much to do with OAI-PMH anymore, so I wouldn't expect that to change. I'm not sure this really matters all that much, but it could be resolved by simply removing that metadata section altogether.

The one really valid one I see there is a complaint about day-granularity harvesting not working correctly. There does seem to be a problem with "until" that's making it wrongly exclude records with datestamps exactly the same as the date requested. The spec's very clear that both sides should be inclusive.

from oaipmhrepository.

zerocrates avatar zerocrates commented on July 30, 2024

I've fixed several issues with the date processing: day-granularity dates weren't interpreted as UTC dates, and they weren't correctly "inclusive" of the whole day. The new code forces UTC interpretation of standalone days, and converts the "until" handling to tack one "granularity unit" (a day or a second) onto the specified date and then uses an exclusive < operator in the SQL.

Additionally, there's an update to the date handling to actually handle the checks against added and modified separately. The old code only worked right in the cases where both dates were inside, before, or after the requested range. If modified was on one side of the range and added on the other, each could pass one of the from/until sides of the check and give a false result.

The OVAL validator still complains about the selective harvesting after these changes, but I believe that error is actually the result of a bug in the validator.

from oaipmhrepository.

Daniel-KM avatar Daniel-KM commented on July 30, 2024

Hi,

I'm sometime busy and with the time difference, you reply faster than me...

I try your update.

For the first, I didn't have results for ListRecords CDWALITE / MODS / OMEKA-XML (only OAI_DC), but this is due to a slow response of a server (more than 30 seconds). It doesn't appear on other sites I check. I'm looking for the reason why the response is so long.

For the Repository explorer of the university of Cape Town, the problem is that there is no complete response (test of non-Omeka OAI repository works, even if it needs some minutes to check). It seems to be related to the speed of response too.

For Oval, ok for the toolkit. I have no alert about language, because documents got it. For batch size, you can set the default to 100 and not 50. Same for expiration token (1440). And why some settings are in config.ini and some others in config form?

The last error was the one you say (No incremental (day granularity) harvesting of ListRecords. Harvest for reference date 2014-09-30 returned record with date 2014-10-13.) and this is an error of the validator. In fact, it doesn't understand that the harvest is done against added and modified dates, but the datestamp of a record is only the newest one.

And can you set the earliest date stamp for identify (<earliestDatestamp>1970-01-01T00:00:00Z</earliestDatestamp>) (see #6).

Thanks,

Sincerely,

Daniel Berthereau
Infodoc & Knowledge management

from oaipmhrepository.

Daniel-KM avatar Daniel-KM commented on July 30, 2024

Hi,

I just think to another issue related to added / modified, but it's hard to resolve. Protocol says that the date should be updated only if there is a change in a metadata of the record. But often, records are edited by users or contributors, and they save it without any change, or a change of a non-exposed metadata (public/reserved, featured, etc.)... So the modified date is updated, even if all metadata are identic.

Sincerely,

Daniel Berthereau
Infodoc & Knowledge management

from oaipmhrepository.

Daniel-KM avatar Daniel-KM commented on July 30, 2024

Hi,

The slow response for METS and omeka-xml records list are related to the fact that the option oaipmh_repository_expose_files is not checked and files are always added. In my case, there may be more than one thousand files attached to a single item (pages of digitalized books), so there is a time out. See #9.

Sincerely,

Daniel Berthereau
Infodoc & Knowledge management

from oaipmhrepository.

zerocrates avatar zerocrates commented on July 30, 2024

Wow, that is quite a large number of files per item. I could see a memory issue easily, as METS and omeka-xml (especially) are very verbose formats so the sheer size of the XML response could get unwieldy. A timeout is an interesting result, though. Even many thousand extra files over expectations shouldn't easily cause a timeout here. Well, at any rate, the expose flag should be being consistently applied.

I think for at least one of the validators there was/is also some timeouts happening trying to load that "toolkit" XSD. I reached out to the organization that once hosted that schema to see if it could be restored or the timeout fixed, but I haven't heard back.

from oaipmhrepository.

Daniel-KM avatar Daniel-KM commented on July 30, 2024

Hi,

Files are many, but they are displayed with the Internet Archive BookReader (https://github.com/Daniel-KM/BookReader) (example: https://patrimoine.mines-paristech.fr/document/Combes_Traite_1844). And pdf can't be embedded, because they are too heavy.

Sincerely,

Daniel Berthereau
Infodoc & Knowledge management

from oaipmhrepository.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.