cul-it / qa_server Goto Github PK
View Code? Open in Web Editor NEWA rails app with questioning authority gem installed to serve as a QA server.
License: Apache License 2.0
A rails app with questioning authority gem installed to serve as a QA server.
License: Apache License 2.0
Having observed a cataloger at work with existing tools, one of the ways they search for a term is using an alpha sorted terms and jumping to the term they want to explore.
This can be replicated using one of the following approaches...
Perhaps support both 2 and 3 with an option that can be passed in to the alpha jsp and a UI that allows the user to select stemmed or unstemmed.
Expand context to include gn:parentADM1, gn:parentADM2, and gn:parentCountry
Request by @hudajkhan
Searching Geonames for something like 'Ithaca' returns results that include hotels and other establishments in and around Ithaca, but not the city of Ithaca itself. Is it possible to use subauthorities to make a query that only returns results that are country, state, or region?
http://lookup.ld4l.org/qa/search/linked_data/geonames_ld4l_cache?q=Ithaca&maxRecords=4
[
{
"uri":"http://sws.geonames.org/261707/",
"id":"http://sws.geonames.org/261707/",
"label":"Ithaca (GR)"
},
{
"uri":"http://sws.geonames.org/5122432/",
"id":"http://sws.geonames.org/5122432/",
"label":"Ithaca (US)"
},
{
"uri":"http://sws.geonames.org/8133849/",
"id":"http://sws.geonames.org/8133849/",
"label":"Ithaca (GR)"
},
{
"uri":"http://sws.geonames.org/261708/",
"id":"http://sws.geonames.org/261708/",
"label":"Itháki (GR)"
}
]
http://lookup.ld4l.org/qa/search/linked_data/geonames_ld4l_cache?q=Ithaca&maxRecords=4
Request from @hudajkhan...
with respect to oclc fast, is there a way to just get events? Is that one of the options?
@eichmann's asked which rdf:type identifies events.
M. Futornick responded that http://schema.org/Event identifies events.
OCLC FAST triplestores and indices have been rebuilt with all 8 categories. The entity types now valid for those services are: Concept, Event, Intangible, Organization, Person, Place, and Work
Need to explore why. It was working last week.
To get back RDF from MESH's APIs, it seems like we need to use their SPARQL API, but SPARQL isn't great for generic keyword searching.
Samvera code base has MESH lookups, maybe this is something we can turn on. https://github.com/samvera/questioning_authority/tree/main/lib/qa/authorities
We’ve discussed the possibility of converting json response into linked data that QA can recognize and use the linked data module to translate.
See https://id.nlm.nih.gov/mesh/swagger/ui#/lookup/lookupTerms for API documentation.
It is important to get early feedback if an authority becomes unavailable. The validation that confirms authority availability is slow to run because it runs a query for every subauthority for every authority. Because of this, the monitoring can't initiate the check, wait, and confirm pass/fail. The time to complete would likely not finish before the monitor decided whether there was a pass/fail causing many false failures.
Work to complete...
Working with sysopts for the last 2 items. https://culibrary.atlassian.net/browse/DLITSYS-2779
Create direct lookups for the following LC Demographics.
The Samvera QA Code seems to offer search for some LC vocabs: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=612799256
Dumb down this app so that LD4P/qa_server engine is driving all the functionality.
Suggested by Greg
One search query is needed for each of the following hierarchy levels...
Hierarchy Level | Query |
---|---|
"Activities" | |
"Activities__Disciplines" | |
"Activities__Events" | |
"Activities__Functions" | |
"Activities__Physical_and_Mental" | |
"Activities__Processes_and_Techniques" | |
"Activities__activities" | |
"Agents" | domestic |
"Agents__Living_Organisms" | domestic |
"Agents__Organizations" | |
"Agents__People" | domestic |
"Agents__agents" | |
"Associated_Concepts" | |
"Associated_Concepts__Associated_Concepts" | |
"Brand_Names" | |
"Brand_Names__Brand_Names" | |
"Materials" | |
"Materials__Materials" | |
"Objects" | |
"Objects__Built_Environment" | |
"Objects__Components" | |
"Objects__Furnishings_and_Equipment" | |
"Objects__Object_Genres" | |
"Objects__Object_Groupings and Systems" | |
"Objects__Visual_and_Verbal_Communication" | |
"Physical_Attributes" | |
"Physical_Attributes__Attributes_and_Properties" | |
"Physical_Attributes__Color" | |
"Physical_Attributes__Conditions_and_Effects" | |
"Physical_Attributes__Design_Elements" | |
"Styles_and_Periods" | |
"Styles_and_Periods__Styles_and_Periods" |
@sfolsom Can you help fill these in?
Create direct lookups for the following LC MARC Countries.
The Samvera QA Code seems to offer search for some LC vocabs: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=1202667279
Note that the URI is encoded. When Chrome executes the URL it decodes the %2E back into a . which Rails interprets as a path separator. Firefox and Safari on Mac did not have a problem.
The addition of the term identifier as part of the URL is following the precedence set by QA before linked data processing was added. So I am hesitant to change that precedence. But I could add a route in addition to the current one that allows for the format...
Chrome and Rails do not seem to have a problem when the URI is a parameter to the URL.
Create direct lookups for the following LOC smaller cataloging vocabs.
See https://id.loc.gov/techcenter/searching.html for API documentation.
The Samevera QA Code seems to offer search for some LC authorities: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
Create direct lookups for the following LOC smaller cataloging vocabs.
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=1379361528
@elrayle has already kicked this off with the systems group: https://culibrary.atlassian.net/browse/DLITSYS-2748
I assume we'll also need to update the documentation to change the URLs
Search query results for testing have been identified by catalogers. Tests need to be created that runs test queries and analyzes the results.
Create direct lookups for the following LC Classification Schemes.
The Samvera QA Code seems to offer search for some LC vocabs: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
In order to pull this lookup off, we'd have to use the free SRU API described here: https://isni.oclc.org:2443/isni/docs/isni-sru-search-api-guidelines.pdf (searching the name keyword), and then map the XML to the QA response.
< mainName > values = "label": "Value"
< forename > and < surname > values concatenated = "label": "Concatenated string"
results with < personalName > = "type": "Person"
< organisationType > values = "type": "Value"
Describe repos that are artifacts from the grant and point to repos being used for production going forward.
Disable those that we aren't replacing (e.g. SVDE) and those we successfully replace with direct lookups.
QA does not include performance data for jsonld or n3 format even if it is requested. This causes an exception when QaServer attempts to access that data.
Samvara QA code has a sample yml file to add a controlled list locally to a QA instance: https://github.com/samvera/questioning_authority/blob/main/config/authorities/states.yml
RDA Registry doesn't seem to have a search API that we can use to create a direct lookup, so we need to add a file for each of the RDA Reference value vocabularies found here: https://www.rdaregistry.info/termList/. The ids in the yml will need to be the URI for the term, and the term in the yml should be the preferred term in English.
I'm not sure if our instance is set up for these types of lookups, so we'll have to do some testing.
Work with metadata team and Dave to identify fields for searching and weighting of search results.
See @sfolsom's document describing what context should be searched and what should be displayed to the user... Indexing of External Data for Look-ups
General notes about indexing:
As we explore effects of indexing, the last area of other fields may be tweaked to determine the best weighting.
Create direct lookups for the following LC RBMS Controlled Vocabulary.
The Samvera QA Code seems to offer search for some LC vocabs: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=686515295
Create direct lookups for the following LC Subjects.
The Samvera QA Code seems to offer search for LC Subjects: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=221282005
I discussed with Dean and we think it should stay at lookup.ld4l.org and be branded "LD4L Authority Lookup Service"
Underneath the title we should explain:
This service provide authority lookup for single terms and search for terms matching a query. It has been created as part of the LD4L Labs project, and is operated by Cornell University Library in collaboration with the School of Library and Information Science, University of Iowa. This work was funded through a grant from the Andrew W. Mellon Foundation.
I'm guessing at how Dave would want the Iowa contribution acknowledged so you might ping him to see whether that is OK.
@eichmann, Can you confirm that NALT and AgroVoc on your server do not support any entities? I believe they are each single vocabularies without division points.
There is an additional request to facet aat on top hierarchy concepts. Dave has implemented the subset at URL: http://deep-thought.slis.uiowa.edu:8081/ld4l_services/getty_aat.jsp
The QA config needs to be updated to support the faceted hierarchy concepts.
This will be addressed in QA either as a sub-authority or as a simple additional param for aat. More exploration is needed.
Issue #7 Add Getty vocabularies (PR #22)
Issue #24 Add Getty configs to linked_data_authorities project
Create direct lookups for the following LC Thesaurus for Graphic Materials (TGM).
The Samvera QA Code seems to offer search for some LC vocabs: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=1559658996
Create direct lookups for the following Cultural Heritage Organizations.
The Samvera QA Code seems to offer search for some LC names: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API:
https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=71593845
Currently, monitor status shows a summary of the count of failing/passing tests at midnight last night. It also shows the list of tests that are failing.
This proposes adding a section that shows historical failures and statistics for failures to answer questions like...
The Dashboard currently only shows a count of the number of tests failing. This proposes extending the Dashboard controller to store the names of the tests that are failing and to keep this information over time to enable long term trend analysis.
Remove cached lookups we're not replacing and those we successfully create direct lookups for.
It is normal for the Monitor Status page to return a 500. This lets other processes to be triggered to show that at least one authority is down.
Monitor Status page shows full html of the page when response code = 500.
Monitor Status page shows Something Went Wrong modal when response code = 500.
#33 Show history of failures on Monitor Status screen
#18 Setup monitoring of status of the authority configs
Lookup is currently not returning results. Determine if this lookup is still needed before trying to fix it. If it isn't needed, disable the lookup in both QA and Sinopia.
We're currently using ruby 2.7.3 and rails 5.2, both of which have reached end of life:
https://endoflife.date/rails
https://endoflife.date/ruby
As part of this, we may need to update the qa gem from 5.8.0 to 5.9.0 for ruby 3.1 support: https://github.com/samvera/questioning_authority/releases/tag/v5.9.0. Worryingly, the release notes for 5.9.0 say it supports rails 6.1 - may need to do some testing to see if rails 7 can also be supported.
What do we need to do to update ruby? Is this set in https://github.com/LD4P/qa_server_container?
Create direct lookups for the following LC Names.
The Samvera QA Code seems to offer search for LC names: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=745128104
Authorities/sub-authorities:
Currently, dbpedia_ld4l_cache is configured to have the same subauthorities (aka entities) as loc configs. I'm not sure this is correct.
@eichmann Can you confirm the entities supported by http://services.ld4l.org/ld4l_services/dbpedia_name_batch.jsp?entity=XXX&query=YYY
And do you have sample queries I can use to test each entity?
DBPedia's SPARQL Endpoint: https://dbpedia.org/sparql.
This might be too complicated: https://github.com/dbpedia/ontology-driven-api.
Likely want to be searching on a combination of rdfs:label and foaf:name. The rest of the modeling is too uneven across the different entity types to do anything general. I was considering dbo:abstract as a possibility for a display value, but the abstracts are too long to present to a cataloger in a lookup.
To get back RDF from Getty's APIs, it seems like we need to use their SPARQL API, but SPARQL may not be great for generic keyword searching.
Samvera code base has Getty lookups, maybe this is something we can turn on:
Add yaml term connection test for new direct lookups
NOTE: Cannot use Google Analytics because the majority of the usage will be through curl.
There is a "Search" at the top menu of homosaurus.org. Doing a search gives results that include other formats at the bottom. One can replace the "q=" paramter for one of those formats for a machine readable format. For example, for a turtle format for "ze" would be: https://homosaurus.org/search/v3.ttl?q=ze
Because this returns RDF, we might(?) be able to use the linked data module, https://github.com/cul-it/qa_server/blob/dev/config/authorities/linked_data/homosaurus_ld4l_cache.json (Would need to point to Homosaurus instead of http://ld4l.org/ld4l_services/cache.)
NB, when RDF results lack ranking predicate QA sorts results alphabetically.
Create direct lookups for the following LC Medium of Performance Thesaurus for Music
The Samvera QA Code seems to offer search for some LC vocabs: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=239265704
Create direct lookups for the following LC Genre.
The Samvera QA Code seems to offer search for LC Genres: https://github.com/samvera/questioning_authority/wiki/Connecting-to-Library-of-Congress-%28LOC%29.
Related links:
See https://id.loc.gov/techcenter/searching.html for API documentation.
Example of a direct search config to base new config off:
Data to bring in and translate, if available in the API: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=696491968
Request from @sfolsom to add the Getty vocabularies providing an analysis of the vocabulary at...
https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit?usp=sharing
This query has a blank node in the results. Seems like we should always suppress blank nodes. Not completely sure if this is happening in the processing in qa_server or services.ld4l.org. Definitely coming in the services.ld4l.org results, but could be the qa_server post-processing of the graph that lets the blank node slip through.
You can see this at...
http://elr37-dev.library.cornell.edu/qa/search/linked_data/locnames_ld4l_cache/person?q=frankie%20valli&maxRecords=10
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.