Giter VIP home page Giter VIP logo

Comments (13)

ludmilamarian avatar ludmilamarian commented on September 25, 2024

total number of records (23-10-2018 16:43): 161'952

from cds-migrator-kit.

ludmilamarian avatar ludmilamarian commented on September 25, 2024

I am trying to identify the correct query that will give us all the records that we need to move during the migration.

These are the queries that the above collections have:

Customary law (('internalnote:"legal service library" 080__a:"346.5*"',),)
Highway Code (('internalnote:"legal service library" 080__a:"351.81"',),)
Environmental Law (('internalnote:"legal service library" 080__a:"349.6"',),)
Law of research (('internalnote:"legal service library" 080__a:"349.77"',),)
Criminal Law (('internalnote:"legal service library" 080__a:"343*"',),)
Nuclear Law (('internalnote:"legal service library" 080__a:"349.7"',),)
Fiscal Law (('internalnote:"legal service library" 080__a:"351.7*"',),)
Social Security & Public Health (('internalnote:"legal service library" 080__a:"349.3"',),)
Building Law (('internalnote:"legal service library" 080__a:"349.44"',),)
Legal Research (('internalnote:"legal service library" 080__a:"340*"',),)
Labour Law (('internalnote:"legal service library" 080__a:"349.2*"',),)
Public & Administrative Law (('internalnote:"legal service library" 080__a:"342*"',),)
Civil Law (('internalnote:"legal service library" 080__a:"347*"',),)
International Law (('internalnote:"legal service library" 080__a:"341*"',),)
Legal Service Library (('internalnote:"legal service library"',),)

CERN Bookshop (('indicator:BOOKSHOP',),)
CERN Computing Bookshop (('indicator:BOOKSHOP',),)
Book proposals (('697C:BOOKSUGGESTION',),)

Books held by LHC (('collection:BOOK indicator:virLHCiascpp',),)
Books held by PS-PO (('collection:BOOK indicator:"virPSpo*"',),)
Books held by EST (('collection:BOOK indicator:"virEST*"',),)
Books held by TIS (('collection:BOOK indicator:"virTIS*"',),)

Pauli's scientific book collection (('collection:PAULISCIENTIFICBOOK',),)
Periodicals (('collection:PERI',),)
English Book Club (('indicator:"English Book Club"',),)
UDC (('collection:UDC',),)
Standards (('collection:STANDARD or indicator:STANDARD',),)
Proceedings (('collection:PROCEEDINGS',),)
eBooks (('media:ebook',),)
Books (('collection:BOOK indicator:BOOK -690C:BOOKSUGGESTION',),)
  • it looks like we are missing the books suggestions collection
  • I found 2 records: https://cds.cern.ch/record/1478620 and https://cds.cern.ch/record/2145348 that are not part of any of the above collections, but they have the indicator Book. What shall we do with them?
  • there are several records like https://cds.cern.ch/record/2644585 which are not part of the books collection, because they do not have the indicator book. Should they be migrated?
  • The Legal service library has currently ~1000 records, but not all of them have the Book indicator (for example: https://cds.cern.ch/record/1165746). Is it ok not to have it? (they are Article, so it looks ok, but just wanted to check.)

from cds-migrator-kit.

ludmilamarian avatar ludmilamarian commented on September 25, 2024

as discussed, the UDC collection should not be migrated -> @agentilb shall we mark it as hidden?
This brings the total number of records to 156'285

from cds-migrator-kit.

ludmilamarian avatar ludmilamarian commented on September 25, 2024

the best that I could find for the query is:

search_pattern(p='indicator:"english book club" OR indicator:BOOK OR indicator:STANDARD OR collection:PROCEEDINGS OR collection:PERI OR collection:PAULISCIENTIFICBOOK OR internalnote:"legal service library" OR media:ebook -980:deleted')

This query though is not a perfect match with the union of all collections, I think because of some of the reports. This query returns ~500 results more than it should. (some example of record ids: 1285807, 1286045, 1286047, 1478620, 2036686, 2270978)

Also, I think we should find a way not to rely on media:ebook (everything should already have one of the other indicators or collection tags). I think this is also because of some reports. There are currently 96 records that are tagged only with media:ebook but none of the other indicators or collection tags. (some example of record ids: 1988457, 1988461, 2066376, 2255181, 2255182, 2255183, 2255184, 2255185, 2255211)

from cds-migrator-kit.

ludmilamarian avatar ludmilamarian commented on September 25, 2024

and another question:

from cds-migrator-kit.

agentilb avatar agentilb commented on September 25, 2024

Just one comment about the collections, some are obsolete and are not necessary anymore:
Books held by LHC
Books held by PS-PO
Books held by EST
Books held by TIS
CERN Computing Bookshop (which is now CERN Bookshop)
I'm waiting an answer for all the collections within the LSL, but I believe they will have to be removed as well.

For the periodicals, I need to update a bit the data model, the data model as I draw it was not meant for Periodicals, and therefore doesn't cover all the fields we use for periodicals.

Yes, UDC can be marked as hidden.

The Pauli Scientific Books could be migrated with the other books, they are a specific collection though (should not be searchable within the normal book collection). They would have document type: book and collection: Pauli's scientific book collection

Indeed we should add the Book Suggestions. The search is 697C:BOOKSUGGESTION

Regarding the search you want to identify, do we need absolutely to do one single search? Or couldn't we go collection by collection to make sure we do not miss anything?

Regarding the REPORTS, this is an absolute mess there, but thanks to the list of items, I can identify the ones that should be books or proceedings. Most of them should actually be preprints, and won't be part of this migration step. I'll do some cleaning next week, and let you know about the records you mentioned.

Not sure why it is indicated Not for circulation for the Standard collection. There are many items there.
I think last time we met, we agreed that all items should have a '4 weeks loan' status.

from cds-migrator-kit.

ludmilamarian avatar ludmilamarian commented on September 25, 2024

For the Books held by XX are they already part of another collection or we will just mark them as obsolete (as UDC) and will never migrate them?

For the Pauli Scientific Books: do they have items attached? Otherwise, would it be better if they are part of the archives?

For the standards: I do not remember exactly, but maybe we were discussing that everything is for reference and nothing for loan? Or there will be items for loan as well?
Yes, all items will have a 4 weeks loan period, unless they are for reference.

For the data model: I think it will be cleaner if we have another data model for periodicals and another one for standards, since their display is also going to be different. Basically, they can share a lot with the books, and then each individual type (book, standard, periodical, other?) can have their own "private" fields.

For the search query: we will need to identify several queries (one per schema type: books, standards, periodicals). And by combining all the queries we can have one query of all records that need to be migrated. The goal is to have a query that joins together all collections. Trying to do that, I've noticed that sometimes data is not perfectly aligned with the collection (I pointed already some records that have some indicators or some collection tags, but not others, and this means that they are not part of any collection). The final goal is to make sure that we migrate every record part of the mentioned collections, but also, that we do not leave behind records that might not be in any of the above collections, that should have been migrated.
In any case, at this point, having one query (even if not completely accurate) is useful for us to run some data cleaning on all the records (for example determining 035__9 tags) and to spot outliers.

from cds-migrator-kit.

agentilb avatar agentilb commented on September 25, 2024

For the Books held by XX: I have cleaned them, so those collections should be empty hopefully (they should be anyway in Books or Proceedings.

For the Pauli Scientific Books: indeed, they do not have items attached. We can put them in the Archive collection indeed, but the data model will be the same as for the books.

For the standards, they all should be 4 weeks loan.

Ok to have a separate data model for the periodicals, it makes sense indeed. I'be already one prepared. Do you need it already?
For the standards, the only field that differs compared to the books is the standard number (already included in the data model). But ok to have a different model if it is easier for you.

For the definition of the records to be migrated, this is indeed not easy, and as you, I discover many tricky cases... I'm now working on the cases you mentioned.

from cds-migrator-kit.

agentilb avatar agentilb commented on September 25, 2024

I got the confirmation that sub collections of LSL are not needed anymore (all books are included in Legal Service Library).

Customary law | 2
Highway Code | 6
Environmental Law | 7
Law of research | 12
Criminal Law | 21
Nuclear Law | 22
Fiscal Law | 24
Social Security & Public Health | 24
Building Law | 36
Legal Research
Labour Law
Public & Administrative Law | 122
Civil Law
International Law

from cds-migrator-kit.

ludmilamarian avatar ludmilamarian commented on September 25, 2024

updated query search_pattern(p='indicator:"english book club" OR indicator:BOOK OR indicator:STANDARD OR collection:PROCEEDINGS OR collection:PERI OR internalnote:"legal service library" OR media:ebook -980:deleted')

@agentilb I understand that you are working on seeing which reports need to be added, which might have an impact on the query so I will leave this ticket opened until we figure out exactly the final configuration

from cds-migrator-kit.

agentilb avatar agentilb commented on September 25, 2024

After checking the list of items included in the report collection, some 900 records have been moved to the Book collection (= added 690C_ $$aBOOK) so they will be migrated together with the others.

Many items remain in the report collection, but they can be migrated in the second step.

The query should include the collection: YELLOW REPORT.
After some cleaning done on the collection, the criteria: media:ebook is not necessary in the query.

from cds-migrator-kit.

ludmilamarian avatar ludmilamarian commented on September 25, 2024

updated query search_pattern(p='indicator:"english book club" OR indicator:BOOK OR indicator:STANDARD OR indicator:"YELLOW REPORT" OR collection:PROCEEDINGS OR collection:PERI OR internalnote:"legal service library" -980:deleted')

from cds-migrator-kit.

ludmilamarian avatar ludmilamarian commented on September 25, 2024

Final query:

search_pattern(p='690C_:BOOK OR 690C_:STANDARD OR 690C_:"YELLOW REPORT" OR 690C_:BOOKSUGGESTION OR 980__:PROCEEDINGS OR 980__:PERI OR 697C_:LEGSERLIB OR 697C_:"ENGLISH BOOK CLUB" -980__:DELETED')

from cds-migrator-kit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.