occrp-attic / convert-document Goto Github PK
View Code? Open in Web Editor NEWA docker container for LibreOffice and unoconv, used to generate PDF files from office-type documents.
License: MIT License
A docker container for LibreOffice and unoconv, used to generate PDF files from office-type documents.
License: MIT License
Dependabot couldn't authenticate with https://pypi.python.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
Dependabot couldn't authenticate with https://pypi.python.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
after converting word to pdf.
the document name it is displaying as 'Ahmed Yousry'
how to change it
I've setup test env with docker, and tried to crawl a folder of documents into it. Analysing step fails due the convert-document process crashing.
Any help/ideas are appreciated.
convert-document_1 | 2020-02-12T10:44:01.162750684Z convert.converter.SystemFailure: Conversion timed out.
convert-document_1 | 2020-02-12T10:44:01.162753154Z [2020-02-12 10:44:01 +0000] [8] [INFO] Worker exiting (pid: 8)
convert-document_1 | 2020-02-12T10:44:01.223130495Z [2020-02-12 10:44:01 +0000] [1] [INFO] Shutting down: Master
convert-document_1 | 2020-02-12T10:44:01.223308822Z [2020-02-12 10:44:01 +0000] [1] [INFO] Reason: Worker failed to boot.
convert-document_1 | 2020-02-12T10:44:03.065614689Z [2020-02-12 10:44:03 +0000] [1] [INFO] Starting gunicorn 20.0.4
convert-document_1 | 2020-02-12T10:44:03.066047113Z [2020-02-12 10:44:03 +0000] [1] [INFO] Listening at: http://0.0.0.0:3000 (1)
convert-document_1 | 2020-02-12T10:44:03.066297541Z [2020-02-12 10:44:03 +0000] [1] [INFO] Using worker: threads
convert-document_1 | 2020-02-12T10:44:03.068333102Z [2020-02-12 10:44:03 +0000] [8] [INFO] Booting worker with pid: 8
convert-document_1 | 2020-02-12T10:44:03.282516066Z INFO:convert.converter:Starting headless LibreOffice...
convert-document_1 | 2020-02-12T10:44:03.376447345Z javaldx failed!
convert-document_1 | 2020-02-12T10:44:03.377155774Z Warning: failed to read path from javaldx
convert-document_1 | 2020-02-12T10:44:03.519455381Z LibreOffice 6.3 - Fatal Error: The application cannot be started.
convert-document_1 | 2020-02-12T10:44:03.519652376Z User installation could not be completed.
convert-document_1 | 2020-02-12T10:44:15.373479270Z [2020-02-12 10:44:15 +0000] [8] [ERROR] Exception in worker process
convert-document_1 | 2020-02-12T10:44:15.373516996Z Traceback (most recent call last):
convert-document_1 | 2020-02-12T10:44:15.373522555Z File "/usr/local/lib/python3.7/dist-packages/gunicorn/arbiter.py", line 583, in spawn_worker
convert-document_1 | 2020-02-12T10:44:15.373525755Z worker.init_process()
convert-document_1 | 2020-02-12T10:44:15.373528395Z File "/usr/local/lib/python3.7/dist-packages/gunicorn/workers/gthread.py", line 92, in init_process
convert-document_1 | 2020-02-12T10:44:15.373536974Z super().init_process()
convert-document_1 | 2020-02-12T10:44:15.373540582Z File "/usr/local/lib/python3.7/dist-packages/gunicorn/workers/base.py", line 119, in init_process
convert-document_1 | 2020-02-12T10:44:15.373543293Z self.load_wsgi()
convert-document_1 | 2020-02-12T10:44:15.373545881Z File "/usr/local/lib/python3.7/dist-packages/gunicorn/workers/base.py", line 144, in load_wsgi
convert-document_1 | 2020-02-12T10:44:15.373558480Z self.wsgi = self.app.wsgi()
convert-document_1 | 2020-02-12T10:44:15.373560720Z File "/usr/local/lib/python3.7/dist-packages/gunicorn/app/base.py", line 67, in wsgi
convert-document_1 | 2020-02-12T10:44:15.373562910Z self.callable = self.load()
convert-document_1 | 2020-02-12T10:44:15.373564970Z File "/usr/local/lib/python3.7/dist-packages/gunicorn/app/wsgiapp.py", line 49, in load
convert-document_1 | 2020-02-12T10:44:15.373567139Z return self.load_wsgiapp()
convert-document_1 | 2020-02-12T10:44:15.373569169Z File "/usr/local/lib/python3.7/dist-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
convert-document_1 | 2020-02-12T10:44:15.373571369Z return util.import_app(self.app_uri)
convert-document_1 | 2020-02-12T10:44:15.373573398Z File "/usr/local/lib/python3.7/dist-packages/gunicorn/util.py", line 358, in import_app
convert-document_1 | 2020-02-12T10:44:15.373575518Z mod = importlib.import_module(module)
convert-document_1 | 2020-02-12T10:44:15.373577537Z File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
convert-document_1 | 2020-02-12T10:44:15.373579688Z return _bootstrap._gcd_import(name[level:], package, level)
convert-document_1 | 2020-02-12T10:44:15.373581737Z File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
convert-document_1 | 2020-02-12T10:44:15.373584248Z File "<frozen importlib._bootstrap>", line 983, in _find_and_load
convert-document_1 | 2020-02-12T10:44:15.373586467Z File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
convert-document_1 | 2020-02-12T10:44:15.373588696Z File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
convert-document_1 | 2020-02-12T10:44:15.373590947Z File "<frozen importlib._bootstrap_external>", line 728, in exec_module
convert-document_1 | 2020-02-12T10:44:15.373593155Z File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
convert-document_1 | 2020-02-12T10:44:15.373595326Z File "/convert/convert/app.py", line 18, in <module>
convert-document_1 | 2020-02-12T10:44:15.373597595Z converter = Converter()
convert-document_1 | 2020-02-12T10:44:15.373599606Z File "/convert/convert/converter.py", line 42, in __init__
convert-document_1 | 2020-02-12T10:44:15.373601734Z self.connect()
convert-document_1 | 2020-02-12T10:44:15.373603745Z File "/convert/convert/converter.py", line 71, in connect
convert-document_1 | 2020-02-12T10:44:15.373605834Z raise SystemFailure('Conversion timed out.')
convert-document_1 | 2020-02-12T10:44:15.373607904Z convert.converter.SystemFailure: Conversion timed out.
convert-document_1 | 2020-02-12T10:44:15.375962850Z [2020-02-12 10:44:15 +0000] [8] [INFO] Worker exiting (pid: 8)
```
Just as an information: This fails when doing multiple requests in parallel
It show "PDF export not supported."
When i try to curl an excel file, but word file works fine.
Thank you
Have a nice day
Let's say we provide outputFormat
params.
outputFormat
could be png
jpg
from .pdf
file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.