cerndocumentserver / cds-migrator-kit Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
borrowing requests are formerly known as ILLs of type "book"
Migrate the access rights for books. In principle, all book records on CDS are public, but maybe there are some corner cases.
Based on the requirements, the following collections need to be migrated. Compute a query (or generate only one collection) in order to identify all records that need to be moved as part of the Books satellite.
Collection | # of records |
---|---|
Book Proposals | 633 |
CERN Bookshop | 1049 |
Legal Service Library | 1160 |
CERN Yellow Reports | 1183 |
Periodicals-> different display, no individual items per volume | 2290 |
English Book Club | 4705 |
Standards | 12959 |
Proceedings | 22973 |
eBooks | 81184 |
Books | 105341 |
962__l:PHOPHO-> it seems to correspond to records (mostly Bulletin articles) linked with photos.
In this case, the corresponding record has a 035__ with $$9PHOPHO
Ex: https://cds.cern.ch/record/46124 linked to https://cds.cern.ch/record/43022?ln=en
962__l:MMD it seems to correspond to records (mostly Bulletin articles) linked with photos.
In this case, the corresponding record has 970__a:'MMD' (and/or 035__9:'MMD')
Ex: https://cds.cern.ch/record/749053 linked to https://cds.cern.ch/record/615876
962__l:ADMBUL it seems to correspond to photo records linked with Bulletin issues (all are from the years 2000-2001).
In this case, the corresponding record has 035:'ADMBUL'
Ex: https://cds.cern.ch/record/41801 linked to https://cds.cern.ch/record/44476?ln=en
Related to #15
as stated in
https://codimd.web.cern.ch/zsbfBMrYTE60GKz3ttWo5g
check for required fields etc
Libraries
Some libraries need to be cleaned. This is the list of libraries to keep:
'163322','CERN Central Library','CH-1211 Geneva 23'
'1046','CERN LSL Library','CH-1211 Geneva 23'
'38875','CERN Depot 1, bldg. 2 (DE1)','CH-1211 Geneva 23, basement, building 2'
'49101','CERN Depot 2, bldg. 2 (DE2)','Basement, building 2'
'8070','CERN Depot 3, bldg. 2 (DE3)','Basement, building 2’
'15732','CERN Depot 4, bldg. 500-S-023 (DE4)',’'
'15156','CERN ARC Library','CH-1211 Geneva 23’
'167','CERN Didactic Library',’'
'4886','English Book Club','English Book Club'
- CERN TE-VSC Library
For these libraries, delete them but the attached items should be moved to CERN Central Library
- CERN Central Library BIB
- Same for Press Office
Items
In legacy, items with status
:
scanning
: there will be a new list of items to bulk update items and set the status to scanning
.cancelled
, not arrived
, untraceable
: there should be 0 items with these statuses, so they should disappearnot published
, out of print
: to be keptEnsure that the item without barcode has been fixed:
(('', 810479L, 12L, 'Periodical', '', '2014 vol.39 no.10', 'Reference', 'on shelf', datetime.datetime(2014, 9, 28, 14, 35, 35), datetime.datetime(2015, 3, 5, 10, 42, 27), 0L, ''),)
Migrate collection related data
cds-dojson books branch not installed through travis requirements
migrator report dryrun <file.json>
fails if a KeyError
exception is raised.
Find about about migration of that data.
Create OpenShift config to be able to re-deploy the project in another environment.
orders are formerly known as ills with types ("acq-book", "acq-standard", "proposal-book", "article")
parent of #16
parent of #17 [CLOSED]
parent of #18
items
(~300K items).Items
migration consists of 2 main objects:
id
(barcode), primary key, unique, not nulltype
, possible values:
external
: 43 libraries (ex. RERO, Nebis, Springer, ...), 0 items referencing theminternal
: 24 libraries (ex. CERN LHC Library, CERN EP Library, CERN ARC Library, CERN Courier, ...)main
: 2 libraries, CERN Central Library and Other but 0 items referencing this last onehidden
: 1 library, CERN Central Storage, 0 items referencing itname
, address
, email
, phone
, notes
Comparing this legacy data model with the new data model, libraries will corresponds to different rooms
or buildings
, belonging to the unique Location
. It means that Items will have a reference to a room and not to a location.
The proposed schemas for libraries are the followings:
location
:{
"location": {
"locid": "1", # id format to be defined
"name": "CERN Central Library"
}
}
internal locations
:{
"internal_locations": [
{
"phone": "",
"ilocid": 0, # id format to be defined
"name": "Legal Service Library",
"legacy_id": 3,
"address": "",
"notes": "",
"locid": "1",
"email": ""
},
{
"phone": "",
"ilocid": 1,
"name": "CERN Central Library",
"legacy_id": 6,
"address": "CH-1211 Geneva 23",
"notes": "",
"locid": "1",
"email": "[email protected]"
},
...
}
The location
-> internal location
structure is needed because Invenio ILS, in case of multiple locations
, will change the loan workflow to "transit" books between different locations.
barcode
, primary key, unique, not nullid_bibrec
, reference to the documentid_crcLIBRARY
, reference to the librarycollection
, possible values Monograph
, Reference
, Archives
, Library
, Conference
, LSL Depot
, Oversize
, Official
, Pamphlet
, CDROM
, Standards
, Video & Trainings
, Periodical
. To be deleted #3.location
, a string for the UDC classification (ex. R 614(02) PAN blue, Blacksburg 1981, 530.145.29 MAN, 539.125 WOR, 621.3.02 NAI, ...)description
, Volumes/Series identifier, see belowloan_period
, 4 weeks, 1 week, Referencestatus
: see belownumber_of_requests
: int, number of requests while item on loanexpected_arrival_date
: related to acquisitioncreation_date
, modification_date
One item does not have the barcode, to be fixed.
Currently, volume or series definition is in the description
field. A few numbers:
v.
or vol.
: ~39Kv.
or vol.
: ~5K (ex. "volume", "missing", "part. 1", "1978", "Part 2", "v 1")The proposed schema for item is the following:
{
"itemid": "1", # id format to be defined
"docid": "1", # id format to be defined
"ilocid": "1", # id format to be defined
"legacy_bibrec_id": "111111",
"legacy_library_id": "3",
"barcode": "83-0384-4",
"classification": "530.24 SEI",
"description": "",
"status": "LOANABLE",
"circulation_restriction": "Reference",
"medium": "",
"created": "date",
"updated": "date"
}
Library
type
external
of libraries used for? Looks like ILL.type
hidden
of libraries used for?type
= main
, name
=Other
(0 items)?Items
collection
for?Periodical
, there are 47K items, but they are not displayed in the interface to the user. For example: https://cds.cern.ch/record/229779. Take into account that we wanted to delete collections, see #3.location
for? UDC classification?loan_period
4 weeks
as default and then restrict for Reference
for some items? Is 1 week really needed? (There are 19K items with 1 weeek)status
.Medium
field wanted: how this affect loans? Possible values: --
, online
, paper
, CD-ROM
, DVD
, VHS
. Currently it does not exist, are you going to update 300K items?number_of_requests
: how do you use this today?expected_arrival_date
: related to acquisition ?In our system, to model Series/Volumes/Items, we will have the following:
Series
: a Document referencing a list of VolumesVolume
: a Document referencing a list of ItemsEdition
: it is just the name of the Document (of a Series, of a Volume, of a simple Document) - Item 1 (item)
- Volume 1 (doc) <
- Item 2 (item)
Series (doc) <
- Item 1 (item)
- Volume 2 (doc) <
- Item 2 (item)
BibRecs have information about Volumes in fields:
246__v
: volume300__a
: will contain the number of volumesIn legacy, Series
are BibRecs and Volumes
are Items
where Item description states the volume name, for example:
Series
3 volumes
246__v
/300__a
)description
field that contains the volume name (several different formats, tricky...). The number of Items attached to that BibRec should correspond to the 300__a
number...Series
by fixing metadata and attaching all the Volumes BibRecsMore or less 25,000 records need to be corrected.
One way to see if the value is an Aleph number is that the number stats with 000s:
https://cds.cern.ch/search?ln=en&sc=1&p=962__b%3A%22000*%22+or+785%3A%22000*%22+or+770%3A%22000*%22+or+780%3A%22000*%22+or+787%3A%22000*%22+or+772%3A%22000*%22&action_search=Search&op1=a&m1=a&p1=&f1=&c=Articles+%26+Preprints&c=Books+%26+Proceedings&c=Presentations+%26+Talks&c=Periodicals+%26+Progress+Reports&c=Multimedia+%26+Outreach
(The search is maybe not 100% accurate).
The fields that need to be checked are:
Additionally:
The matching needs to be done against 970__a where there is ‘CER’ and one needs to replace this value with the corresponding CDS record number.
Here is an example:
https://cds.cern.ch/record/1163043?ln=en
As far as as know, there is also the field 035 that contains Aleph Numbers when $$9CERCER:
Ex: https://cds.cern.ch/record/1220684?ln=en -> to be checked if it is still in use for something
( requires #21 )
Make sure that we do not migrate 506 tags for public records, as this would mean the records will be restricted in the new system. Probably the best way would be to clean the data before the migration (remove 506 for public records)
command line to load dry run of records from legacy
log report with data to be cleaned
There are 58 records (in addition to the 13 mentioned in #15 ) that have different Aleph numbers in 035 and 970. These records need to be fixed manually (they concern future migrations of CERN Research Output).
Record: http://cds.cern.ch/record/194091 035$$9CERCER$$a0104950 970$$a000105690CER
Record: http://cds.cern.ch/record/195686 035$$9CERCER$$a0099711 970$$a000107310CER
Record: http://cds.cern.ch/record/202170 035$$9CERCER$$a0117910 970$$a000113835CER
Record: http://cds.cern.ch/record/209171 035$$9CERCER$$a0127446 970$$a000121097CER
Record: http://cds.cern.ch/record/209860 035$$9CERCER$$a0162005 970$$a000121816CER
Record: http://cds.cern.ch/record/213210 035$$9CERCER$$a2209846 970$$a000125367CER
Record: http://cds.cern.ch/record/243637 035$$9CERCER$$a0269426 970$$a000159398CER
Record: http://cds.cern.ch/record/269418 035$$9CERCER$$a2226562 970$$a000188427CER
Record: http://cds.cern.ch/record/271584 035$$9CERCER$$a2226566 970$$a000190935CER
Record: http://cds.cern.ch/record/284699 035$$9CERCER$$a2226573 970$$a000205010CER
Record: http://cds.cern.ch/record/288412 035$$9CERCER$$a0219865 970$$a000209164CER
Record: http://cds.cern.ch/record/309561 035$$9CERCER$$a0222517 970$$a000232098CER
Record: http://cds.cern.ch/record/326783 035$$9CERCER$$a0251850 970$$a000250165CER
Record: http://cds.cern.ch/record/388496 035$$9CERCER$$a2187195 970$$a000314599CER
Record: http://cds.cern.ch/record/392830 035$$9CERCER$$a0319049 NO 970
Record: http://cds.cern.ch/record/410746 035$$9CERCER$$a0338008 NO 970
Record: http://cds.cern.ch/record/426599 035$$9CERCER$$a2175829 NO 970
Record: http://cds.cern.ch/record/426600 035$$9CERCER$$a2175830 NO 970
Record: http://cds.cern.ch/record/430066 035$$9CERCER$$a2188979 970$$a002179441CER
Record: http://cds.cern.ch/record/433058 035$$9CERCER$$a2271797 970$$a002182579CER
Record: http://cds.cern.ch/record/448286 035$$9CERCER$$a2199243 NO 970
Record: http://cds.cern.ch/record/476281 035$$9CERCER$$a2229795 NO 970
Record: http://cds.cern.ch/record/501204 035$$9CERCER$$a2256379 NO 970
Record: http://cds.cern.ch/record/504321 035$$9CERCER$$a2259580 NO 970
Record: http://cds.cern.ch/record/504326 035$$9CERCER$$a2259586 NO 970
Record: http://cds.cern.ch/record/504759 035$$9CERCER$$a2266321 970$$a002260033CER
Record: http://cds.cern.ch/record/506607 035$$9CERCER$$a2176523 970$$a002261933CER
Record: http://cds.cern.ch/record/519146 035$$9CERCER$$a2275568 NO 970
Record: http://cds.cern.ch/record/532655 035$$9CERCER$$a2173483 970$$a002289647CER
Record: http://cds.cern.ch/record/535960 035$$9CERCER$$a2293011 NO 970
Record: http://cds.cern.ch/record/545306 035$$9CERCER$$a2302560 NO 970
Record: http://cds.cern.ch/record/553396 035$$9CERCER$$a2194463 970$$a002310926CER
Record: http://cds.cern.ch/record/566853 035$$9CERCER$$a2343329 970$$a002324720CER
Record: http://cds.cern.ch/record/585796 035$$9CERCER$$a2343691 970$$a002344568CER
Record: http://cds.cern.ch/record/599531 035$$9CERCER$$a2358567 NO 970
Record: http://cds.cern.ch/record/610249 035$$9CERCER$$a2369623 NO 970
Record: http://cds.cern.ch/record/621944 035$$9CERCER$$a2348993 970$$a002382392CER
Record: http://cds.cern.ch/record/682499 035$$9CERCER$$a0271495 970$$a002408955CER
Record: http://cds.cern.ch/record/684098 035$$9CERCER$$a2212346 970$$a002410511CER
Record: http://cds.cern.ch/record/684138 035$$9CERCER$$a2192143 970$$a002410551CER
Record: http://cds.cern.ch/record/684225 035$$9CERCER$$a2196586 970$$a002410638CER
Record: http://cds.cern.ch/record/685431 035$$9CERCER$$a2350522 970$$a002411814CER
Record: http://cds.cern.ch/record/685675 035$$9CERCER$$a0218878 970$$a002412058CER
Record: http://cds.cern.ch/record/686348 035$$9CERCER$$a2371741 970$$a002412713CER
Record: http://cds.cern.ch/record/688730 035$$9CERCER$$a2284468 970$$a002415020CER
Record: http://cds.cern.ch/record/689235 035$$9CERCER$$a2361267 970$$a002415471CER
Record: http://cds.cern.ch/record/698612 035$$9CERCER$$a0113323 970$$a000410321CER
Record: http://cds.cern.ch/record/700006 035$$9CERCER$$a0113318 970$$a000411845CER
Record: http://cds.cern.ch/record/781363 035$$9CERCER$$a2194481 970$$a002471749CER
Record: http://cds.cern.ch/record/798281 035$$9CERCER$$a4091040 970$$a002487763CER
Record: http://cds.cern.ch/record/879171 035$$9CERCER$$a0043327 970$$a002552298CER
Record: http://cds.cern.ch/record/978549 035$$9CERCER$$a0236412 970$$a002642021CER
Record: http://cds.cern.ch/record/1015056 035$$9CERCER$$a0197554 970$$a002675719CER
Record: http://cds.cern.ch/record/1023569 035$$9CERCER$$a0232429 970$$a002682666CER
Record: http://cds.cern.ch/record/1330744 035$$9CERCER$$a0068296 970$$a002949798CER
Record: http://cds.cern.ch/record/2026616 035$$9CERCER$$a0016553 NO 970
Record: http://cds.cern.ch/record/2306622 035$$9CERCER$$a0245770 NO 970
Record: http://cds.cern.ch/record/2306623 035$$9CERCER$$a0245770 NO 970
Migrate the CDS Books records.
after changes on the main branch the web app is not working on openshift - it needs fixing, and it should be deployed to books-migrator-qa
subjects
= { "scheme": "ICS", "value" : 084__c }962__b: record id of related field
962__k: relation "other" description: "is chapter of the book"
Find and implement solution of
1 case: charts of the articles from inspire (?) to discuss
2 case: compressed full text attached to the record - EItem
3 case: icons of subformat - they should be dropped
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.