Running bulk import tool 2.0.1 on alfresco community edition 5.1.0 (r127059-b7) with postgres on ubuntu 16.04 (8GB ram, 8 vcpu)
I am attempting to import 565921 pdf files that had been exported from alfresco Community - v4.2.0 (r63893-b12) using version 0.0.6 of this bulk export tool https://github.com/gsdenys/alfresco-bulk-export
The export generated a total of 195614 folders, 565921 pdf files and 772691 properties.xml files (about 40GB). Export results
Performed Export with the following Parameters :
export folder : /export
node to export : workspace://SpacesStore/8ac68ee8-31b2-47ed-8c3f-eac8020a5935
ignore exported : false
export versions : false
bulk import revision scheme: true
Export elapsed time: minutes:568 , seconds: 34131
During import, approximately 221599 pdf files get imported or skipped (initial import, and also re-importing with replace = false) and then the import process fails with an exception.
The import process was run with these options
source default
source directory /import/Company Home/User Homes/
target space /User Homes
replace = not checked
sample tree output of root of source directory (/import)
Most folders have 1 pdf file, a few have 2 pdf files. No folders have more than 3 pdf files
+-- 8ac68ee8-31b2-47ed-8c3f-eac8020a5935.cache
+-- Company Home
| +-- User Homes
| +-- general-user
| | +-- 000000
| | | +-- 1308
| | | | +-- entry
| | | | | +-- 12112005200321
| | | | | | +-- 12112005200321_cdc_20130827162012.pdf
| | | | | | +-- 12112005200321_cdc_20130827162012.pdf.metadata.properties.xml
| | | | | +-- 12112005200321.metadata.properties.xml
| | | | | +-- 12112005200478
| | | | | | +-- 12112005200478_cdc_20130827155542.pdf
| | | | | | +-- 12112005200478_cdc_20130827155542.pdf.metadata.properties.xml
| | | | | +-- 12112005200478.metadata.properties.xml
| | | | | +-- 12112005200537
| | | | | | +-- 12112005200537_cdc_20130826164512.pdf
| | | | | | +-- 12112005200537_cdc_20130826164512.pdf.metadata.properties.xml
| | | | | +-- 12112005200537.metadata.properties.xml
| | | | | +-- 12112005226138
| | | | | | +-- 12112005226138_cdc_20130827155609.pdf
| | | | | | +-- 12112005226138_cdc_20130827155609.pdf.metadata.properties.xml
| | | | | +-- 12112005226138.metadata.properties.xml
| | | | | +-- 12112005226241
| | | | | | +-- 12112005226241_cdc_20130830082522.pdf
| | | | | | +-- 12112005226241_cdc_20130830082522.pdf.metadata.properties.xml
| | | | | +-- 12112005226241.metadata.properties.xml
| | | | | +-- 12112005226285
| | | | | | +-- 12112005226285_cdc_20130827155619.pdf
| | | | | | +-- 12112005226285_cdc_20130827155619.pdf.metadata.properties.xml
| | | | | +-- 12112005226285.metadata.properties.xml
| | | | | +-- 12112005226398
| | | | | | +-- 12112005226398_cdc_20130830090013.pdf
| | | | | | +-- 12112005226398_cdc_20130830090013.pdf.metadata.properties.xml
| | | | | +-- 12112005226398.metadata.properties.xml
| | | | | +-- 12112005226489
| | | | | | +-- 12112005226489_cdc_20130830154047.pdf
| | | | | | +-- 12112005226489_cdc_20130830154047.pdf.metadata.properties.xml
| | | | | +-- 12112005226489.metadata.properties.xml
The majority of pdf files were created by scanning documents on a Dell scanner.
Sorry I cannot provide any sample content.
I've re-run the import process with TRACE enabled, sifted through 40GB catalina.out file for this data
2016-06-27 00:42:42,559 User:admin INFO [bulkimport.impl.BatchImporterImpl] [BulkImport-Importer-0011] BULKIMPORT: Skipping '12112005832290-invoice.pdf' as it already exists in the repository and 'replace existing' is false.
2016-06-27 00:42:42,559 User:admin TRACE [util.transaction.SpringAwareUserTransaction] [BulkImport-Importer-0011] Completing transaction for [UserTransaction]
2016-06-27 00:42:42,559 User:admin DEBUG [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Before commit TransactionSychronizationImpl[ txnId=fd1fce7f-b675-4d89-8c0f-07343c031713]
2016-06-27 00:42:42,559 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Before Prepare - level 0
2016-06-27 00:42:42,559 User:admin DEBUG [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Before Prepare priorities:[4]
2016-06-27 00:42:42,559 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin DEBUG [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Prepared
2016-06-27 00:42:42,560 User:admin DEBUG [mybatis.spring.SqlSessionUtils] [BulkImport-Importer-0011] Transaction synchronization committing SqlSession [org.apache.ibatis.session.defaults.DefaultSqlSession@4057dea7]
2016-06-27 00:42:42,560 User:admin DEBUG [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Before completion: TransactionSychronizationImpl[ txnId=fd1fce7f-b675-4d89-8c0f-07343c031713]
2016-06-27 00:42:42,560 User:admin DEBUG [mybatis.spring.SqlSessionUtils] [BulkImport-Importer-0011] Transaction synchronization deregistering SqlSession [org.apache.ibatis.session.defaults.DefaultSqlSession@4057dea7]
2016-06-27 00:42:42,560 User:admin DEBUG [mybatis.spring.SqlSessionUtils] [BulkImport-Importer-0011] Transaction synchronization closing SqlSession [org.apache.ibatis.session.defaults.DefaultSqlSession@4057dea7]
2016-06-27 00:42:42,560 User:admin DEBUG [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] After completion (committed): TransactionSychronizationImpl[ txnId=fd1fce7f-b675-4d89-8c0f-07343c031713]
2016-06-27 00:42:42,560 User:admin DEBUG [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Bound resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,560 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,561 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,562 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,562 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,562 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,563 User:admin TRACE [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Fetched resource:
2016-06-27 00:42:42,563 User:admin DEBUG [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] After Completion: DONE
2016-06-27 00:42:42,563 User:admin DEBUG [util.transaction.TransactionSupportUtil] [BulkImport-Importer-0011] Unbound txn synch:TransactionSychronizationImpl[ txnId=fd1fce7f-b675-4d89-8c0f-07343c031713]
2016-06-27 00:42:42,563 User:admin DEBUG [util.transaction.SpringAwareUserTransaction] [BulkImport-Importer-0011] Committed user transaction: UserTransaction[object=org.alfresco.util.transaction.SpringAwareUserTransaction@1fcb8253, status=3]
2016-06-27 00:42:42,564 User:admin DEBUG [security.authentication.AuthenticationUtil] [BulkImport-Importer-0011] Removing the current security information.
2016-06-27 00:42:42,667 TRACE [util.cache.AbstractAsynchronouslyRefreshedCache] [BulkImport-Scanner] get() from cache for key on AbstractAsynchronouslyRefreshedCache [cacheId=compiledModelsCache]
2016-06-27 00:42:42,667 TRACE [util.cache.AbstractAsynchronouslyRefreshedCache] [BulkImport-Scanner] get() from cache for key on AbstractAsynchronouslyRefreshedCache [cacheId=compiledModelsCache]
(at least 10,000 more of the following lines)
2016-06-27 00:45:33,871 TRACE [util.cache.AbstractAsynchronouslyRefreshedCache] [BulkImport-Scanner] get() from cache for key on AbstractAsynchronouslyRefreshedCache [cacheId=compiledModelsCache]
2016-06-27 00:45:33,871 TRACE [util.cache.AbstractAsynchronouslyRefreshedCache] [BulkImport-Scanner] get() from cache for key on AbstractAsynchronouslyRefreshedCache [cacheId=compiledModelsCache]
2016-06-27 00:45:33,871 TRACE [util.cache.AbstractAsynchronouslyRefreshedCache] [BulkImport-Scanner] get() from cache for key on AbstractAsynchronouslyRefreshedCache [cacheId=compiledModelsCache]
2016-06-27 00:45:33,871 TRACE [util.cache.AbstractAsynchronouslyRefreshedCache] [BulkImport-Scanner] get() from cache for key on AbstractAsynchronouslyRefreshedCache [cacheId=compiledModelsCache]
2016-06-27 00:45:33,871 TRACE [util.cache.AbstractAsynchronouslyRefreshedCache] [BulkImport-Scanner] get() from cache for key on AbstractAsynchronouslyRefreshedCache [cacheId=compiledModelsCache]
then
2016-06-27 00:45:33,968 ERROR [bulkimport.impl.Scanner] [BulkImport-Scanner] BULKIMPORT: Bulk import from 'Default' failed.
java.util.concurrent.RejectedExecutionException: Task org.alfresco.extension.bulkimport.impl.Scanner$BatchImportJob@36b0e96c rejected from
org.alfresco.extension.bulkimport.impl.BulkImportThreadPoolExecutor@4615da6f[Running, pool size = 16, active threads = 16, queued tasks = 100, completed tasks = 269]
at org.alfresco.extension.bulkimport.impl.Scanner.submitBatch(Scanner.java:370)
at org.alfresco.extension.bulkimport.impl.Scanner.submitCurrentBatch(Scanner.java:329)
at org.alfresco.extension.bulkimport.impl.Scanner.submit(Scanner.java:298)
at org.alfresco.extension.bulkimport.source.fs.FilesystemBulkImportSource.scanDirectory(FilesystemBulkImportSource.java:223)
at org.alfresco.extension.bulkimport.source.fs.FilesystemBulkImportSource.scanDirectory(FilesystemBulkImportSource.java:245)
at org.alfresco.extension.bulkimport.source.fs.FilesystemBulkImportSource.scanDirectory(FilesystemBulkImportSource.java:245)
at org.alfresco.extension.bulkimport.source.fs.FilesystemBulkImportSource.scanDirectory(FilesystemBulkImportSource.java:245)
at org.alfresco.extension.bulkimport.source.fs.FilesystemBulkImportSource.scanDirectory(FilesystemBulkImportSource.java:245)
at org.alfresco.extension.bulkimport.source.fs.FilesystemBulkImportSource.scanDirectory(FilesystemBulkImportSource.java:245)
at org.alfresco.extension.bulkimport.source.fs.FilesystemBulkImportSource.scanFiles(FilesystemBulkImportSource.java:172)
at org.alfresco.extension.bulkimport.impl.Scanner.run(Scanner.java:188)
example export status results (I clicked stop hour after the import process failed)
Status: Stopped
Initiating User: admin
Source Name: Default
Source directory /import/Company Home/User Homes
Target Space: /Company Home/User Homes
Import Type: Streaming
Dry run: No
Batch Weight: 100
Threads: 0 active of 0 total
Start Date: 2016-06-27T02:20:19Z
Scan End Date: n/a
End Date: 2016-06-27T09:51:11Z
Scan Duration: 07h 30m 52s 457.031ms
Duration: 07h 30m 51s 990.186ms
Currently Importing:
Source (read) Statistics
Directories scanned: 217557 8.042 / sec
Files scanned: 1449099 53.566 / sec
Target (write) Statistics
Aspects associated: 1043712 38.582 / sec
Batches completed: 2226 0.082 / sec
Batches submitted: 2343 0.087 / sec
Bytes imported: 2530418309 93539.094 / sec
Content streamed: 0 0 / sec
In place content linked: 0 0 / sec
Metadata properties imported: 2915017 107.756 / sec
Nodes imported: 222599 8.229 / sec
Nodes skipped: 222599 8.229 / sec
Out-of-order retries: 0 0 / sec
Versions imported: 0 0 / sec