marcusbarnes / islandora_compound_batch Goto Github PK
View Code? Open in Web Editor NEWProvides the basic ability to batch import compound objects into Islandora.
License: GNU General Public License v3.0
Provides the basic ability to batch import compound objects into Islandora.
License: GNU General Public License v3.0
https://github.com/MarcusBarnes/islandora_compound_batch/blob/master/includes/object.inc#L172 is not used; you can comment out the code in this function and still load compound objects. The parent compound object's content model is assigned at https://github.com/MarcusBarnes/islandora_compound_batch/blob/master/includes/object.inc#L113.
All Islandora modules use 7.x as their default branch. We should replace this repo's master
with `7.x' to be consistent with that convention.
I was just watching a demo where islandora_compound_batch was used as part of Circle CI script. It would be good to start creating tagged releases that can be used within their devops scripts, etc. This would avoid situations where merged pull-requests create changes that break these scripts.
This feature is listed in the README's 'todo'. I propose adding a drush option --content_models
(for consistency with Islandora Batch) that would take as its value a semicolon-separated list of extension => content model pairs, with a double colon (::
) as the delimiter within pairs, e.g.,
--content_models=pdf::islandora:foo
--content_models=pdf::islandora:foo;jpg::islandora:barCModel
If the --content_models
parameter is present, its mappings would have precedence over the mappings defined in $this->extensionToContentModelMap.
If this sounds like a viable plan, I'd be happy to take a stab at it.
We already perform validation on --parent
but we should add validation on --target
and other applicable options as well. The ones used introduced in Islandora/islandora_batch#94 and Islandora/islandora_batch#95 seem pretty good, so we could just port those over. There's no reason we couldn't also incorporate the change discussed in #19 into the same PR.
One of this module's UX characteristics is that it requires the user to generate structure.xml
files. Is there any reason why we can't make islandora_compound_batch_preprocess
do that for the user? The user would only need to create structure.xml
files where the default sort order for children needed to be overridden.
Coming out of #14, we should support the creation of child objects that only have a MODS.xml or MARC.xml file and no payload file (PDF, TIFF, etc.). Currently, islandora_compound_batch determines the child's content model based on the extension of its payload file; additionally, if the payload file is absent, the child is not created. We will need to provide a drush option to let users indicate which content model to assign to metadata-only child objects.
Tagging @egesu to make sure this issue represents the intended use case.
I'm building some compounds and trying this out again. I noticed that because I use a numbering scheme to generate my sub-directories they fall out of order if there are more than 9 items.
ie.
<islandora_compound_object title="pa_aar">
<child content="pa_aar/1"/>
<child content="pa_aar/10"/>
<child content="pa_aar/2"/>
<child content="pa_aar/3"/>
<child content="pa_aar/4"/>
<child content="pa_aar/5"/>
<child content="pa_aar/6"/>
<child content="pa_aar/7"/>
<child content="pa_aar/8"/>
<child content="pa_aar/9"/>
</islandora_compound_object>
Adding a natural language sort (or perhaps a configurable sort option) would help.
I just added this line
sort($stuffindirectory, SORT_NATURAL);
below here which results in my desired ordering.
<islandora_compound_object title="pa_aar">
<child content="pa_aar/1"/>
<child content="pa_aar/2"/>
<child content="pa_aar/3"/>
<child content="pa_aar/4"/>
<child content="pa_aar/5"/>
<child content="pa_aar/6"/>
<child content="pa_aar/7"/>
<child content="pa_aar/8"/>
<child content="pa_aar/9"/>
<child content="pa_aar/10"/>
</islandora_compound_object>
Luke Taylor pointed out https://github.com/discoverygarden/limerick_ingest during the July 13, 2016 Islandora DevOps Interest Group call. Review the code in limerick_ingest for ideas on handling compound objects.
If you uncheck the option "Only allow compound objects to have child objects associated with them" at admin/islandora/solution_pack_config/compound_object (Administration > Islandora > Solution Pack Configuration > Compound Object Solution Pack), any object, regardless of its content model, can have children. Currently, the content model of the parent objects created by this module is hard coded at https://github.com/MarcusBarnes/islandora_compound_batch/blob/master/includes/object.inc#L175. It would be useful to all the user to pass in a --parent_content_model
parameter to override this.
Other batch modules such as Islandora book batch have an option to produce OCR on appropriate datastreams (files) on batch ingest via islandora_ocr. Based on how you anticipate using Islandora Compound Batch in the near future, would this be a useful feature to add to the to-do? Please comment or give your reaction in order to vote and chime-in. Thank you.
The note "Note: --target applies to drush 6 and below, while --scan_target replaces this keyword in drush 7 and above." in the examples in the "OBJ extension to content model mappings" section is out of place and somewhat confusing. Also, we should indicated that the relationship assigned to children is the one that is configured in the Compound SP's "Child relationship predicate" (added in #29).
Is tree_to_compound_object.xsl
still useful or has it been supplanted by create_structure_files.php
?
Fatal error: Class 'IslandoraCompoundBatch' not found in /opt/mounts/drupal/ldl/sites/all/modules/islandora_compound_batch/includes/batch.form.inc on line 75
$preprocessor = new IslandoraCompoundBatch($connection, $parameters);
The class IslandoraCompoundBatch is not defined. I'm seeing the previous commit of this line was:
$preprocessor = new IslandoraNewspaperBatch($connection, $parameters);
which referenced islandora_compound_batch.inc:
class IslandoraNewspaperBatch extends IslandoraScanBatch .....
The file islandora_compound_batch.inc was removed in later commits. Along with it, the class IslandoraNewspaperBatch (or IslandoraCompoundBatch) was eliminated.
The GUI won't work until there is an IslandoraCompoundBatch class defined.
Any advice on your reasoning before I try to craft my own IslandoraCompoundBatch class?
Use case:
I'd like to do all of my derivative generation before I submit my batch to Islandora so that I can accelerated ingest rates.
Problem:
When I check "Defer derivative generation during ingest" on /admin/islandora/configure
and I create a batch set using islandora_compound_batch with the resulting batch when ingested contains empty objects only containing MODS, DC, and RELS-EXT. The objects don't even contain the TIF OBJ that was submitted!
If however when I uncheck "Defer derivative generation during ingest" on /admin/islandora/configure
the resulting objects when ingested contains all of the appropriate datastreams, including the OBJ. My sense is though that those datastreams have been generated by Islandora and thus the versions that I pregenerated are not actually taken.
For both example cases above islandora_compound_batch was pointing at a directory full of object folders containing the appropriate datastreams with respective file names. Example:
./smith_ssc_324_digital_object_323
./smith_ssc_324_digital_object_323/structure.xml
./smith_ssc_324_digital_object_323/MODS.xml
./smith_ssc_324_digital_object_323/OCR.txt
./smith_ssc_324_digital_object_323/TN.jpg
./smith_ssc_324_digital_object_323/00001
./smith_ssc_324_digital_object_323/00001/JPG.jpg
./smith_ssc_324_digital_object_323/00001/JP2.jp2
./smith_ssc_324_digital_object_323/00001/MODS.xml
./smith_ssc_324_digital_object_323/00001/TN.jpg
./smith_ssc_324_digital_object_323/00001/OBJ.tif
./smith_ssc_324_digital_object_323/00002
./smith_ssc_324_digital_object_323/00002/JPG.jpg
./smith_ssc_324_digital_object_323/00002/JP2.jp2
./smith_ssc_324_digital_object_323/00002/MODS.xml
./smith_ssc_324_digital_object_323/00002/TN.jpg
./smith_ssc_324_digital_object_323/00002/OBJ.tif
./smith_ssc_324_digital_object_323/00003
./smith_ssc_324_digital_object_323/00003/JPG.jpg
./smith_ssc_324_digital_object_323/00003/JP2.jp2
./smith_ssc_324_digital_object_323/00003/MODS.xml
./smith_ssc_324_digital_object_323/00003/TN.jpg
./smith_ssc_324_digital_object_323/00003/OBJ.tif
...
Here are my exact commands:
drush -v --user=compass_admin islandora_compound_batch_preprocess --scan_target=/mnt/ingest/smith/compound-large-image-sample --namespace=test --parent=smith:test
drush -v --user=1 islandora_batch_ingest --ingest_set=774
I can send you a sample ingest directory if needed.
Possible desired outcomes:
Assumptions:
I've only tried this with large image objects using TIF files as the OBJ. I'm assuming that this is an issue for other child object types.
Run sanity tests to confirm that islandora_compound_batch works as expected with the recently released Islandora 7.x-1.9, creating tickets should any issues arise.
I found that I was unable to run drush icbp
using later versions of drush; I'm on a pretty recent version, 9.0-dev.
Anyway, it turned out that using an option name other than target
to indicate the source directory made my errors go away. Unfortunately, I made this fix long ago on our fork (which is somehow no longer linked), so I don't have error output or anything to help explain why I've made this change.
Maybe I'll just leave this here in case it can be useful: lsulibraries@422e2e0
We have been assuming the the Child relationship predicate is the default 'isConstituentOf' in the addRelationshipsForChild method
islandora_compound_batch/includes/object.inc
Line 181 in 0f6d81f
Thanks to @bseeger for spotting this (see #27 (comment)).
Update README to update the change of output from create_structure_files.php
I'm using Compound Batch for the first time since the resolution of #2. Even though I've pulled in the latest code (I'm running at fde7203) and run drush devel-reinstall islandora_compound_batch
so that the db gets updated, none of the children in my batch are being ingested. Only the parents are.
Here's the structure of the islandora_compound_batch table:
mysql> describe islandora_compound_batch;
+---------------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| child_content_value | text | NO | | NULL | |
| object_id | bigint(20) unsigned | NO | | 0 | |
| object_xpath | text | NO | | NULL | |
| parent_xpath | text | NO | | NULL | |
| object_pid | bigint(20) unsigned | NO | | 0 | |
| parent_pid | varchar(255) | NO | | | |
| batch_id | bigint(20) unsigned | NO | | 0 | |
+---------------------+---------------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)
Here's one of the compound object subdirectories in the input directory:
999/
├── 991
│ ├── MODS.xml
│ └── OBJ.jpg
├── 992
│ ├── MODS.xml
│ └── OBJ.jpg
├── 993
│ ├── MODS.xml
│ └── OBJ.jpg
├── 994
│ ├── MODS.xml
│ └── OBJ.jpg
├── 995
│ ├── MODS.xml
│ └── OBJ.jpg
├── 996
│ ├── MODS.xml
│ └── OBJ.jpg
├── 997
│ ├── MODS.xml
│ └── OBJ.jpg
├── 998
│ ├── MODS.xml
│ └── OBJ.jpg
├── MODS.xml
└── structure.xml
8 directories, 18 files
I've run create_structure_files.php
over my input directory. Here's the strucutre.xml
file for the input directory above:
<?xml version="1.0" encoding="utf-8"?>
<!--Islandora compound structure file used by the Compound Batch module. On batch ingest,
'islandora_compound_object' elements become compound objects, and 'child' elements become their
children. Files in directories named in child elements' 'content' attribute will be added as their
datastreams. If 'islandora_compound_object' elements do not contain a MODS.xml file, the value of
the 'title' attribute will be used as the parent's title/label.-->
<islandora_compound_object title="999">
<child content="999/991"/>
<child content="999/992"/>
<child content="999/993"/>
<child content="999/994"/>
<child content="999/995"/>
<child content="999/996"/>
<child content="999/997"/>
<child content="999/998"/>
</islandora_compound_object>
Anyone have any suggestions as to what I'm doing wrong? Won't somebody please think of the children? (Sorry, I couldn't resist that 😆)
It would be difficult to make this module 100% useful for all content models. For example, in this branch I added a mapping for 'txt' => 'islandora:sp_remoteMediaCModel',
for my custom Remote Media content model. But it is conceivable that someone else might use txt
for a different CModel.
Suggesting there could be an admin form where these mappings can be configured by the user and customized as needed to add more filetypes and CModels without hitting the code.
(Or if you're OK with my Remote Media update, I could just make a PR for it.)
My data directory has the following structure:
export/
└── S01E01
├── MODS.xml
├── TN.jpeg
├── episode
│ ├── MODS.xml
│ └── OBJ.mp3
├── structure.xml
└── transcript
├── MODS.xml
└── OBJ.pdf
I've provided a thumbnail in the S01E01 directory, but it is being completely ignored by the import process. I can add the thumbnail manually and things work as expected.
I'm using a stock Islandora 7.x-1.13 VM. The only additional module I've installed is islandora_compound_batch
.
@MarcusBarnes I was trying this out and I'm not sure what I am doing wrong, but I ended up with 200 objects instead a compound with 200 children.
I have run it twice, using the default --parent_relationship_pred
and then setting it to --parent_relationship_pred=isConstituentOf
. Same result after both ingests.
I have a directory structure like
/vagrant/test_compounds
/compound_1
/1
OBJ.jpg
/2
OBJ.jpg
....
/200
OBJ.jpg
MODS.xml
structure.xml
I created the structure.xml by running
php create_structure_files.php /vagrant/test_compounds/
It looks like this
<?xml version="1.0" encoding="utf-8"?>
<!--Islandora compound structure file used by the Compound Batch module. On batch ingest,
'islandora_compound_object' elements become compound objects, and 'child' elements become their
children. Files in directories named in child elements' 'content' attribute will be added as their
datastreams. If 'islandora_compound_object' elements do not contain a MODS.xml file, the value of
the 'title' attribute will be used as the parent's title/label.-->
<islandora_compound_object title="compound_1">
<parent title="compound_1/1"/>
<parent title="compound_1/10"/>
<parent title="compound_1/100"/>
<parent title="compound_1/101"/>
<parent title="compound_1/102"/>
<parent title="compound_1/103"/>
....
Then I ran
drush -u 1 islandora_compound_batch_preprocess --namespace=islandora --parent='islandora:compound_collection' --target=/vagrant/test_compounds
and drush -u 1 ibi --ingest_set=<set id>
Then I tried
drush -u 1 islandora_compound_batch_preprocess --namespace=islandora --parent='islandora:compound_collection' --parent_relationship_pred=isConstituentOf --target=/vagrant/test_compounds
Same result, when I go into islandora:compound_collection there are two objects named MODS and 400 named OBJ. They are all compound objects.
This is obviously not the expected behaviour, what did I mess up?
We should define a drupal_alter()
to allow modules to add to or override the extension -> collection model map at https://github.com/MarcusBarnes/islandora_compound_batch/blob/master/includes/utilities.inc#L20. Potential use cases include ingesting custom content models and defining which extensions should be which extensions should be assigned the binary compound model.
Possible related issue is #16.
Hi,
I'm using Islandora 7.x-1.10 and trying out this module. After ingest, I see the two parent level compound objects, but when I click on them, all I see is the MODS metadata - the image objects have not been associated with the compound objects. The images are not part of any collection, though they look like they were created correctly.
Have folks tried this module with v1.10?
There's a very good chance it's my lack of knowledge about how this works, but since I saw a github issue about whether it worked in v1.9 I thought I'd ask.
Thanks,
Bethany
By convention, Islandora modules use 7.x as their default branch. Any reason Islandora Compound Batch should use master?
This looks like a typo. Should be change it to "Utilities"?
@CadenArmstrong has provided me with a sample batch set of compound compound objects that are not being ingested as expected. The sample adheres to the documentation in the README. The drush commands appear to work as expected, but when you visit the --parent object, no items appear. Viewing the object PIDs, you see all the items in batch, rather than seperate compound objects.
Possible source of problem to investigate:
Double check Islandora versions where islandora_compound_batch is being used without issue against versions of modules in islandora_vagrant. Use test compound objects in islandora_vagrant to see if the issue is now appearing for batch sets that Marcus knows use to function as expected.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.