unoconv / unoserver Goto Github PK

View Code? Open in Web Editor NEW

494.0 494.0 69.0 192 KB

License: MIT License

Python 98.29% Makefile 1.71%

unoserver's Introduction

Automated conversion and styling using LibreOffice

Unoconv is deprecated

Please note that there is a rewrite of Unoconv called "Unoserver": https://github.com/unoconv/unoserver/

We are running Unoserver successfully in production, and it’s now the recommended solution.

Unoserver does not have all the features of Unoconv, which features it will get depends on a combination of what people want, and if someone wants to implement it.

Until Unoserver has all the major features people need, Unoconv is in bugfix mode, there will be no major changes. Once Unoserver has the major features of Unoconv, Unoconv will become unsupported.

Unoconv

Universal Office Converter (unoconv) is a command line tool to convert any document format that LibreOffice can import to any document format that LibreOffice can export. It makes use of the LibreOffice’s UNO bindings for non-interactive conversion of documents.

For practical reasons we mention LibreOffice, but OpenOffice is supported by unoconv as well.

Installing unoconv

unoconv can be installed using packages coming from your distribution, or simply by copying the unoconv python script to your system.

If you installed unoconv by hand, make sure you have the required LibreOffice or OpenOffice packages installed. A hard requirement is the UNO python bindings which are often inside a subpackage named libreoffice-pyuno or libobasis4.4-pyuno.

Various sub-packages are needed for specific import or export filters, e.g. XML-based filters require the xsltfilter subpackage, e.g. libobasis4.4-xsltfilter.

Important

Neglecting these requirements will cause unoconv to fail with unhelpful and confusing error messages.

To find a good Python installation to use to run unoconv, do the following:

To find which Python to use to run unoconv, you can try a script I made.

cd /tmp
wget -l https://gist.githubusercontent.com/regebro/036da022dc7d5241a0ee97efdf1458eb/raw/1bc0655423d196acd79a5d9fa60d2baada8dd534/find_uno.py
python3 find_uno.py

It should list all Pythons that have Libreoffice libraries installed.

How does unoconv work ?

unoconv starts its own office instance (if it cannot find an existing listener) that it then uses. There are some challenges to do this correctly, but in general this works fine.

Typically you would convert an ODT document to PDF by running:

unoconv -f pdf some-file.odt

Start your own unoconv listener

However, you can always start an instance yourself at the default port 2002 (or specify another port with -p/--port) and after use you can tear it down:

unoconv --listener &
sleep 20
unoconv -f pdf *.odt
unoconv -f doc *.odt
unoconv -f html *.odt
kill -15 %-

It is also possible to use a listener or LibreOffice instance that accepts connections on another system and use it from unoconv remotely. This way the conversion tasks are performed on a dedicated system instead of on the client system. This works only if you have a shared filesystem mounted at the same location.

Python and pyuno incompatibilities

Beware that the pyuno python module needs to be compiled with the exact same version of python that you are using to load it. A lot of people that run into problems loading pyuno are actually using a precompiled LibreOffice that they downloaded somewhere and is incompatible with the python version on their system.

To solve this issue, the project’s office suite ships with its own python interpreter located in the 'program' directory, this one should work flawlessly.

The most recent unoconv works around this issue by automatically detecting incompatibilities, and restarting itself using a compatible python (the same one that ships with LibreOffice).

You can influence the automatic detection by setting the UNO_PATH environment variable to point to an alternative LibreOffice installation, e.g.:

UNO_PATH=/opt/libreoffice4.4 unoconv -f pdf some-file.odt

But you can also force another python by using it to execute unoconv, e.g.:

/opt/libreoffice4.4/program/python.bin unoconv -f pdf some-file.odt

or on macOS:

/Applications/LibreOffice.app/Contents/MacOS/python unoconv -f pdf some-file.odt

or on Windows:

C:\Program Files (x86)\LibreOffice 4.4\program\python.exe unoconv -f pdf some-file.odt

Tip	If you plan to use unoconv extensively (or in an automated fashion) it is more efficient to use the correct python interpreter directly. Or event put it directly in the Shebang (the first line) of the unoconv script !

Using unoconv with no X display

Since OpenOffice 2.3 you do not need an X display for starting ooffice. However you may need the openoffice.org-headless package from your distribution. Since LibreOffice 2.4 nothing special is needed, running in headless mode does not require X.

For any older OpenOffice releases, remember that ooffice requires an X display, even when using it in headless mode. One solution is to use Xvfb to create a headless X display for ooffice.

Using unoconv with macOS

LibreOffice 3.6.0.1 or later is required to use unoconv under macOS. This is the first version distributed with an internal python script that works. No version of OpenOffice for macOS (3.4 is the current version) works because the necessary internal files are not included inside the application.

Problems running unoconv from Nginx/Apache/PHP

Some people have had difficulties using unoconv through webservices. Here is a list of probable causes and recommendations:

Use the latest version of unoconv (or GitHub master branch)
Use the most recent stable release of LibreOffice (less memory, more stable, fewer crashes)
Use the native LibreOffice python binary to run unoconv
Hardcode this native python path in the unoconv script shebang (or ensure PATH is set)
Ensure that the user running unoconv has write access to its HOME directory (ensure HOME is set)
Test with SELinux in permissive mode

It is recommended to open the unoconv script and modify the very first line to point directly to your installed LibreOffice python binary, so replace this:

#!/usr/bin/env python

with something like this:

#!/opt/libreoffice4.4/program/python

Conversion problems

If you encounter problems converting files, it often helps to try again. If you are using a listener, restarting the listener may help as well.

The reason for conversion failures are unclear, and they are not deterministic. unoconv is not the only project to have noticed problems with import and export filters using PyUNO. We assume these are related to internal state or timing issues that under certain conditions fail to correctly work.

If you can reproduce the problem on a specific file, please take the time to open the file in LibreOffice directly and export it to the desired format. If this fails, it needs to be reported to the LibreOffice project directly. If that works, we need to know !

We are looking into this with the LibreOffice developers to:

Collaborate closer to find, report and fix unexpected failures
Allow end-users to increase debugging and improve reporting to the project

Troubleshooting instructions

If you encounter a problem with converting documents using unoconv, please consider that this could be caused by a number of things:

incomplete LibreOffice installation
LibreOffice bug or regression specific to your version/distribution
LibreOffice import or export filter issue
problem related to stale lock files
problem related to the source document
problem related to permissions or SELinux
problem related to the python UNO bindings
problem related to the unoconv python script

It is recommended to follow all of the below steps to pinpoint the problem:

if this is the first time you are using LibreOffice/OpenOffice, make sure you have all the required sub-packages installed, depending on the distribution this could be the xsltfilter, headless, writer, calc, impress or draw sub-packages.
check if there is no existing LibreOffice process running on the system that could interfere with proper functioning
```
# pgrep -l 'office|writer|calc'
```
check that there are no stale lock files present, e.g. '.~lock.file.pdf#' or '.~lock.index.html#'
check that the LibreOffice instance handling UNO requests is not handling multiple requests at the same time
try using the latest unoconv release, or the latest version on Github at: https://github.com/dagwieers/unoconv/downloads
try the conversion by opening the file in LibreOffice and exporting it through LibreOffice directly
try unoconv with a different minor or major LibreOffice version to test whether it is a regression in LibreOffice
try to load the UNO bindings in python manually:
- do this with the python executable that ships with the LibreOffice package/installer
  # /opt/libreoffice4.4/program/python.bin -c 'import uno, unohelper'
- or alternatively, run the distribution python (with the distribution LibreOffice)
  # python -c 'import uno, unohelper'

try unoconv with a different python interpreter manually:

# /opt/libreoffice4.4/program/python.bin unoconv -f pdf test-file.odt

If you tried all of the above, and the issue still remains, the issue might still be related to import/export filters, LibreOffice or unoconv, so please report any information to reproduce the problem on the Github issue-tracker at: https://github.com/dagwieers/unoconv/issues

And do mention that you already tried the above hints to troubleshoot the issue.

Interesting information

If you’re interested to help out with development, here are some pointers to interesting sources:

[Tutorial] Import uno module to a different Python install http://user.services.openoffice.org/en/forum/viewtopic.php?f=45&t=36370&p=166783
UDK: UNO Development Kit http://udk.openoffice.org/
Python-UNO bridge http://www.openoffice.org/udk/python/python-bridge.html
Python and OpenOffice.org http://wiki.services.openoffice.org/wiki/Python
OpenOffice.org developer manual http://api.openoffice.org/DevelopersGuide/DevelopersGuide.html
Framework/Article/Filter/FilterList OOo 2 1 http://wiki.services.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_2_1
Framework/Article/Filter/FilterList OOo 3 0 http://wiki.services.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_3_0

Other implementations

Other implementations using python and UNO:

unoserver https://github.com/unoconv/unoserver/
convwatch http://cgit.freedesktop.org/libreoffice/core/tree/bin/convwatch.py
oooconv https://svn.infrae.com/oooconv/trunk/src/oooconv/filters.py
officeshots.org http://code.officeshots.org/trac/officeshots/browser/trunk/factory/src/backends/oooserver.py
cloudooo http://svn.erp5.org/erp5/trunk/utils/cloudooo.handler/ooo/cloudooo/handler/ooo/

Other tools that are useful or similar in operation:

Text based document generation: http://www.methods.co.nz/asciidoc/
DocBook to OpenDocument XSLT: http://open.comsultia.com/docbook2odf/
Simple (and stupid) converter from OpenDocument Text to plain text: http://stosberg.net/odt2txt/
Another python tool to aid in converting files using UNO: http://www.artofsolving.com/files/DocumentConverter.py http://www.artofsolving.com/opensource/pyodconverter

unoserver's People

Contributors

Stargazers

Watchers

unoserver's Issues

xlsx to pdf convert - consider the local date format

I managed to convert an xlsx file to pdf on my remote Ubuntu machine (no UI). (Great tool btw!)

But one minor issue remains:
In the xlsx file I defined a border that contains the current date (&d).
When I open the file in LibreOffice the date is correctly displayed in the local format (German, 17.03.2023).
However, on the server where I'm running unovoncert, the date in the pdf file is rendered in the American format. (03/17/2023)
I checked the timezone settings on the server with timedatectl and I got Europe/Berlin (which is correct!)

So how can I tell LibreOffice to use the local setting (headless, via command line/putty)
Or how can I tell unovoncert to use this timezone?

Thank you very much in advance!

Convert works fine in LibreOffice but unoconvert throws errors

pi@pibooks: ~~/convert/in $ unoconvert --convert-to docx /home/pi/convert/in/John\ Galsworthy\ -\ [Sfarsit\ de\ capitol]\ 01\ In\ asteptare\ #2.0~~5.doc /home/pi/convert/out/
INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening /home/pi/convert/in/John Galsworthy - [Sfarsit de capitol] 01 In asteptare #2.0~5.doc
INFO:unoserver:Exporting to /home/pi/convert/out/
INFO:unoserver:Using MS Word 2007 XML export filter
Traceback (most recent call last):
File "/usr/local/bin/unoconvert", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/unoserver/converter.py", line 247, in main
result = converter.convert(
File "/usr/local/lib/python3.9/dist-packages/unoserver/converter.py", line 201, in convert
document.storeToURL(export_path, output_props)
unoserver.converter.IOException: SfxBaseModel::impl_store file:///home/pi/convert/out failed: 0x11b(Error Area:Io Class:Abort Code:27) ./sfx2/source/doc/sfxbasemodel.cxx:3153 ./sfx2/source/doc/sfxbasemodel.cxx:1735

I'm doing something wrong ?

Using the UserInstallation env breaks soffice.bin execution

As in title.
I'm running unoserver a against custom soffice.bin.(cp-22.05.06)
It would be nice to pass or customize the default cmd line or, at least, remove the UserInstallation env.
Without it, a USER/.config/core/4 (not tied to /tmp) folder will be created and the soffice.bin will continue its execution flawlessly

Regards

Possibility to get page count and page sizes of a document?

Is there any way to obtain the number of pages + page sizes a LibreOffice-compatible document has, before converting it?

For some background, I am working on a private Python PDF tool that follows a concept of importing documents into a custom data model (currently PDFs or images), applying transformations and then crafting a new PDF according to this model.
I would like to use unoserver to integrate importing/conversion support for LO-compatible documents. However, a key requirement for my data model is that information on page count and page sizes is available before actually performing the conversion.

Multiple processes use 100% CPU

I'm using unoserver in pandora do convert office docs into PDFs and it all works very good, thank you for the tool!
The issue I have it that after a while, I have a whole bunch of soffice.bin processes using 100% CPU, even is no conversion is happening (but I can still convert files as expected). And I need to kill them manually, simply stopping unoserver doesn't kill them.

Do you have any idea how I could debug that? I noticed it first on a machine where AppArmor is enabled, but it seems to also happen without the profile.

I just noted that the problem mostly occurs for me on an ubuntu 20.04 server, but not on my dev box running Ubuntu 21.10. Could it be that the version of libreoffice on ubuntu 20.04 is not handling the signal properly?

I just noted that the problem mostly occurs for me on an ubuntu 20.04 server, but not on my dev box running Ubuntu 21.10. Could it be that the version of libreoffice on ubuntu 20.04 is not handling the signal properly?

And I'm thinking of an other way to handle the case: right now, I'm just launching unoserver with Popen, but I should use the UnoServer class instead and launch it with start so I have the libreoffice process object that I can check and kill it if need to.

Originally posted by @Rafiot in #19 (comment)

install issue

Hi there, great project thank you, one small issue here. When testing install and usage on Linux Mint 20.3 by following the install instructions in the readme.md, I found that the two installed scripts ended up with a top line shebang that looks like:

#!/opt/libreoffice7.2/program/python.bin

..and then found that the two installed scripts refuse to work unless changed to:

#!/opt/libreoffice7.2/program/python

I'm testing with Libreoffice v7.2.7 stable.

Not a showstopper but any idea what might be happening to cause this?

Ship systemd unit file with unoserver

It's likely that the most common use case would be to run unoserver in the background as a daemon. Shipping a systemd file (either packaged or in the README) would allow users to easily implement this.

Adding systemd support to the code itself (such as sending READY=1 at start time) would allow for using systemd's full potential, i.e. by using the 'Notify' type.

desktop.loadComponentFromURL works slow

Hi all.
I'm trying to convert MS Word(.doc) file to txt. The output is fine but it takes centuries. After some digging (add timing to uno related call), it seems self.desktop.loadComponentFromURL is very slow. I import unoserver.converter and run convert several times, the first five self.desktop.loadComponentFromURL took 19.27s, 50.96s, 83.38s, 114.64s, 144,44s. Am I doing sth wrong?

Parallel Unoserver with different ports hangs/gets stuck

Hi, not sure if appropriate to post here or not, please close/remove if so.

I'm trying to parallelize the conversion of files to PDF using unoserver based on this on the bottom of the page:
You should be able to on a multi-core machine run several unoservers with different ports. There is however no support for any form of load balancing in unoserver, you would have to implement that yourself in your usage of unoconverter.

What I did:
I'm using joblib to parallelize using cores and assigning each instance an id value. Each instance does an
os.system("unoserver --port " + str(port))
I see this as the output but it hangs there and doesn't proceed to the rest of the code where it'd start the actual conversion.

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers. INFO:unoserver:Starting unoserver. INFO:unoserver:Command: libreoffice --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/tmp35cy3xi9 --accept=socket,host=127.0.0.1,port=2000,tcpNoDelay=1;urp;StarOffice.ComponentContext INFO:unoserver:Starting unoserver. INFO:unoserver:Command: libreoffice --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/tmp9n0aw9aj --accept=socket,host=127.0.0.1,port=2001,tcpNoDelay=1;urp;StarOffice.ComponentContext

^You can see I'm using different ports - 2000 and 2001

How to go about parallelizing unoserver conversions?

dev environment: AWS Ubuntu instance, Python 3

Thanks

REST API implementation

I am planning to use unoserver with FAST API to create a document conversion service.
I donot want to call unoconvert as shell command. Can I get an example on how to use unoconvert natively in Python since FAST API is written in python only.

Can i simply instantiate the UnoConverter class from converter.py ?

Cancel ongoing conversion

Hello 👋

Thanks for maintaining unoconv and working on this new tool 👍

Q: How to cancel an ongoing conversion with unoserver?

Currently, in Gotenberg, I kill the unoconv process, which also kills the child processes. Killing the unoconvert process would be enough?

Concurrent conversions

Hello 👋

Currently, with unoconv, I'm using the --user-profile and --port flags in order to handle concurrent conversions.

With unoserver, I guess I'll need to implement a sort of lock mechanism to prevent concurrent calls to unoserver.

Yet, I would still need to specify the user profile directory so that I'll be able to clean it on "force kill" (see also #8).

Would it be possible to add this flag to unoserver?

Stacktrace when converting some files

It's not directly a problem with unoserver, but I have no idea how to efficiently report that to the libreoffice folks/people doing packaging for ubuntu.

Some files (I can mostly reproduce it with XLS files) cause a stack-trace that look like that:

Fatal exception: Signal 6
Stack:
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x3ffc3)[0x7f80bb86ffc3]
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x4013a)[0x7f80bb87013a]
/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f80bb675090]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f80bb67500b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f80bb654859]
/usr/lib/libreoffice/program/libmergedlo.so(+0x1219b92)[0x7f80bcab2b92]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN11Application5AbortERKN3rtl8OUStringE+0x98)[0x7f80bea12ed8]
/usr/lib/libreoffice/program/libmergedlo.so(+0x21c6026)[0x7f80bda5f026]
/usr/lib/libreoffice/program/libmergedlo.so(+0x3181ec1)[0x7f80bea1aec1]
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x18832)[0x7f80bb848832]
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x400a7)[0x7f80bb8700a7]
/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x7f80bb675090]
/usr/lib/libreoffice/program/libmergedlo.so(_ZNK3vcl6Window9GetCursorEv+0x4)[0x7f80be7473a4]
/usr/lib/libreoffice/program/libmergedlo.so(+0x276cfba)[0x7f80be005fba]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN9Scheduler22CallbackTaskSchedulingEv+0x2fb)[0x7f80bea0372b]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN14SvpSalInstance12CheckTimeoutEb+0x10e)[0x7f80beb835ce]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN14SvpSalInstance7DoYieldEbb+0x8b)[0x7f80beb836db]
/usr/lib/libreoffice/program/libmergedlo.so(+0x3179872)[0x7f80bea12872]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN11Application7ExecuteEv+0x45)[0x7f80bea14d35]
/usr/lib/libreoffice/program/libmergedlo.so(+0x21cdc2b)[0x7f80bda66c2b]
/usr/lib/libreoffice/program/libmergedlo.so(_Z10ImplSVMainv+0x51)[0x7f80bea1c731]
/usr/lib/libreoffice/program/libmergedlo.so(soffice_main+0xa3)[0x7f80bda80523]
/usr/lib/libreoffice/program/soffice.bin(+0x10b0)[0x55edfc86e0b0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f80bb656083]
/usr/lib/libreoffice/program/soffice.bin(+0x10ee)[0x55edfc86e0ee]

I think I can mostly reproduce it with libreoffice 7.3, and it is consistently crashing with the version in PPA right now (1:7.3.3~rc2-0ubuntu0.22.04.1~lo1), if I install the -nogui packages, and only in that case.

The solution is to install the full version with GUI support.

documentation: Unclear usage for unoserver to convert documents

It is not currently clear from the README how one uses the unoserver script to actively convert office documents. Usage only covers starting the server from the command line but not how one passes files for conversion to the server. This is more obvious in unoconvert, where infile and outfile are clear arguments.

It would be helpful for users if that usage information were included in the documentation.

RuntimeError: The input document is of an unknown document type.

Currently trying to convert and html to pdf but I get this error

unoconvert --convert-to pdf "hra-html.html" "hra.pdf"
INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening hra-html.html
Traceback (most recent call last):
  File "/home/www/api.to/venv/bin/unoconvert", line 8, in <module>
    sys.exit(main())
  File "/home/www/api.to/venv/lib/python3.8/site-packages/unoserver/converter.py", line 242, in main
    result = converter.convert(
  File "/home/www/api.to/venv/lib/python3.8/site-packages/unoserver/converter.py", line 160, in convert
    import_type = get_doc_type(document)
  File "/home/www/api.to/venv/lib/python3.8/site-packages/unoserver/converter.py", line 49, in get_doc_type
    raise RuntimeError(
RuntimeError: The input document is of an unknown document type. This is probably a bug.
Please create an issue at https://github.com/unoconv/unoserver

Only support for python3.7+?

Hi, just ported over unoconv, to unoserver. Thank you.
Few things, I had an issue trying to install it in a venv with python3.6.and it told me it was not allowed. Checked your install file and it show's 3.7+ but the documentation doesn't show that?

And also, I don't think sudo is necessary to install the package, as if they make a virtual env, as you show in your tests, with the --system-site-packages it uses that python.

Possibility to get a list of supported input mime types or file extensions?

I need some way to check whether a file is of an input type supported by UnoConverter.convert() or not, for there are multiple different importers in my library. Thus, it would be useful to have a list of mime types (or file extensions) that unoserver accepts.
It seems that this functionality was present in unoconv with the Fmt/FmtList classes. Would it be possible to add this back to unoserver?

Unoserver ran once and don't restart

Hello,
I succeed running unoserver/unoconverter once and then I never been able to start the server again after rebooting the machine:

$ sudo unoserver --daemon --executable /usr/lib/libreoffice/program/soffice.bin
INFO:unoserver:Starting unoserver.
INFO:unoserver:Command: /usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/tmpznzt5utn --accept=socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext
<subprocess.Popen object at 0x7fd1c8c1ad60>

$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 8940 160 ? Ssl 08:04 0:00 /init
root 9 0.0 0.0 8948 140 tty1 Ss 08:04 0:00 /init
olivier 10 0.0 0.0 18228 2772 tty1 S 08:04 0:00 -bash
root 76 0.0 0.0 8948 140 tty2 Ss 08:06 0:00 /init
olivier 77 0.0 0.0 18480 3324 tty2 S 08:06 0:01 -bash
olivier 805 0.0 0.0 18664 1888 tty2 R 10:45 0:00 ps aux

$ unoconvert --convert-to pdf test.fodt test.pdf
INFO:unoserver:Starting unoconverter.
Traceback (most recent call last):
File "/usr/local/bin/unoconvert", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 231, in main
converter = UnoConverter(args.interface, args.port)
File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 74, in init
self.context = self.resolver.resolve(
unoserver.converter.NoConnectException: Connector : couldn't connect to socket (Connection refused) /build/libreoffice-crGbO6/libreoffice-7.3.0~rc3/io/source/connector/connector.cxx:117

This happened in both Ubuntu 20.04.3 WSL and Virtualbox.

Is there anything that could prevent unoserver from restarting?

Thanks for this tools and your help!

xlsx to pdf convert - where to set the quality?

In my xlsx file is an png file, which is rendered in bad quality in my pdf.
Where can I set the print quality for the PDF?

Supporting math

Does Unoserver convert HTML with equations (MathML or LaTeX) to docx with editable equations?
Would this be possible?

Can't convert doc to html

hi, I test unoconvert with libreoffice7.1.7

unoconvert 97html转换文档.doc 977.html

cause an error:

INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening 97html转换文档.doc
Traceback (most recent call last):
  File "/root/miniconda3/envs/docvert/bin/unoconvert", line 9, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/docvert/lib/python3.8/site-packages/unoserver/converter.py", line 246, in main
    result = converter.convert(
  File "/root/miniconda3/envs/docvert/lib/python3.8/site-packages/unoserver/converter.py", line 186, in convert
    raise RuntimeError(
RuntimeError: Could not find an export filter from com.sun.star.text.TextDocument to graphic_HTML

but i can do libreoffice --headless --convert-to html html转换文档.doc succeed.

separate unoconvert so it can be run from user virtualenv without uno or system-site-packages

This would completely isolate unoconvert and make use of it in virtualenvs safer

unoconvert is already a remote process that doesn't in theory need access to uno directly (via import, etc), and using system-site-packages is a bit fragile - if your virtualenv is not setup correctly you might accidentally import a system package instead of one you thought you had in your virtualenv.

Thanks for great package!

Fail to convert some doc files to pdf files

Hi, thanks for developing and sharing this project.

I recently came across an issue while converting a .doc file to a .pdf file (unfortunately, I can't share doc file due to IP reasons).

The error was: RuntimeError: The input document is of an unknown document type. This is probably a bug.

Though I'm not familiar with the OpenOffice APIs, when I checked document.SupportedServiceNames for that particular file, the result is ('com.sun.star.document.OfficeDocument', 'com.sun.star.text.GenericTextDocument', 'com.sun.star.text.WebDocument').
Neither of these are listed in DocTypes. When I added com.sun.star.document.OfficeDocument in DocTypes, the new error states RuntimeError: Could not find an export filter from com.sun.star.document.OfficeDocument to pdf_Portable_Document_Format.

Now when I checked the DocumentService of export_filter, there is no entry for com.sun.star.document.OfficeDocument, and com.sun.star.text.GenericTextDocument but com.sun.star.text.WebDocument, so I replaced com.sun.star.document.OfficeDocument in DocTypes with com.sun.star.text.WebDocument but later found com.sun.star.text.WebDocument is deprecated.

Fortunately, It worked.

But my concern is, can this project be ported to support any kind of docs (.doc, .docx, .odt, .rtf, etc.) conversion to pdf?
Though the changes I made worked for the documents I have but I am afraid as

com.sun.star.text.WebDocument is deprecated
There might come some other type of .doc file which can fail during pdf cnversion.

I guess some change in export_filters query might include DocumentService for other DocTypes.

Thanks!

Conversion from html to rtf not possible

Conversion with libreoffice headless works:

> echo "<p>This is a test</p>" > sample.html
> libreoffice --nofirststartwizard --invisible --headless --convert-to 'RTF:Rich Text Format' sample.html --outdir /tmp/
convert /home/benbss/dev/baessler/PrefUsable/prefusable/sample.html -> /tmp/sample.RTF using filter : Rich Text Format

unoserver using unoconvert gives me:

> echo "<p>This is a test</p>" > sample.html 
> unoconvert --port 12345 --convert-to rtf sample.html /tmp/sample.rtf
INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening sample.html
Traceback (most recent call last):
  File "/usr/local/bin/unoconvert", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 247, in main
    result = converter.convert(
  File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 185, in convert
    raise RuntimeError(
RuntimeError: Could not find an export filter from com.sun.star.text.WebDocument to writer_Rich_Text_Format

If figured out, there is no filter defined in com.sun.star.document.FilterFactory.
Can anyone give me a hint how to get support for this conversion.

How can i use custom filter in convertation?

Hello! Nice job!

I want use HTML (StarWriter):EmbedImages filter for convertation like:
libreoffice --convert-to "html:HTML (StarWriter):EmbedImages" --outdir outdir_path file.docx

How can i use this filter with unoconvert?

My solution:
unoconvert --convert-to html:HTML (StarWriter):EmbedImages file.docx outdir_path
Doesn't work:
RuntimeError: Unknown export file type, unknown extension 'html:HTML (StarWriter):EmbedImages'

RAM usage & process is killed

We've recently migrated from unoconv to unoserver/unoconvert. Directly after migration we're seeing a spike in RAM usage on the soffice.bin process, after a while using 14GB RAM, and the unoserver process is killed.

What kind of information is needed to debug this further?

We're running unoserver 1.2

Fatal exception: Signal 6

I mass convert docx to pdfs using find terminal command and unoserver+unoconvert. I've installed using pip in virtualenv with the commands provided in the Readme.

It stared well, converted few documents, but then the following error occurred. Is it possible find is pushing the files too quick to unoserver?

 E: lt_string_value: assertion `string != ((void *)0)' failed
E: lt_string_value: assertion `string != ((void *)0)' failed
E: lt_string_value: assertion `string != ((void *)0)' failed
E: lt_string_value: assertion `string != ((void *)0)' failed
Application Error


Fatal exception: Signal 6
Stack:
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x3d523)[0x7f0b020d9523]
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x3d733)[0x7f0b020d9733]
/lib/x86_64-linux-gnu/libc.so.6(+0x37840)[0x7f0b01ef7840]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f0b01ef77bb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f0b01ee2535]
/usr/lib/libreoffice/program/libmergedlo.so(+0x11ed84c)[0x7f0b032e784c]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN11Application5AbortERKN3rtl8OUStringE+0x90)[0x7f0b04f4bc90]
/usr/lib/libreoffice/program/libmergedlo.so(+0x1f078a7)[0x7f0b040018a7]
/usr/lib/libreoffice/program/libmergedlo.so(+0x2e5725b)[0x7f0b04f5125b]
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x17762)[0x7f0b020b3762]
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x3d5ff)[0x7f0b020d95ff]
/lib/x86_64-linux-gnu/libc.so.6(+0x37840)[0x7f0b01ef7840]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN19LogicalFontInstance7AcquireEv+0x3)[0x7f0b04f95913]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN16GenericSalLayoutC1ER19LogicalFontInstance+0x8b)[0x7f0b04ed784b]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN15CairoTextRender13GetTextLayoutER14ImplLayoutArgsi+0x30)[0x7f0b04fe6e20]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN14SvpSalGraphics13GetTextLayoutER14ImplLayoutArgsi+0x21)[0x7f0b04fe6a81]
/usr/lib/libreoffice/program/libmergedlo.so(+0x2cd606b)[0x7f0b04dd006b]
/usr/lib/libreoffice/program/libmergedlo.so(_ZNK12OutputDevice12GetTextArrayERKN3rtl8OUStringEPliiPKN3vcl15TextLayoutCacheEPK9SalLayout+0x1ae)[0x7f0b04dd04fe]
/usr/lib/libreoffice/program/libmergedlo.so(_ZNK12OutputDevice12GetTextWidthERKN3rtl8OUStringEiiPKN3vcl15TextLayoutCacheEPK9SalLayout+0x15)[0x7f0b04dd05c5]
/usr/lib/libreoffice/program/libswlo.so(+0x81c48d)[0x7f0ab972648d]
/usr/lib/libreoffice/program/libswlo.so(+0x837e8a)[0x7f0ab9741e8a]
/usr/lib/libreoffice/program/libswlo.so(+0x7a8369)[0x7f0ab96b2369]
/usr/lib/libreoffice/program/libswlo.so(+0x7a6155)[0x7f0ab96b0155]
/usr/lib/libreoffice/program/libswlo.so(+0x7df06f)[0x7f0ab96e906f]
/usr/lib/libreoffice/program/libswlo.so(+0x7c05af)[0x7f0ab96ca5af]
/usr/lib/libreoffice/program/libswlo.so(+0x7c1759)[0x7f0ab96cb759]
/usr/lib/libreoffice/program/libswlo.so(_ZN11SwTextFrame10FormatLineER15SwTextFormatterb+0xa6)[0x7f0ab96a4a16]
/usr/lib/libreoffice/program/libswlo.so(_ZN11SwTextFrame7Format_ER15SwTextFormatterR16SwTextFormatInfob+0x43f)[0x7f0ab96a829f]
/usr/lib/libreoffice/program/libswlo.so(_ZN11SwTextFrame7Format_EP12OutputDeviceP13SwParaPortion+0x363)[0x7f0ab96a9103]
/usr/lib/libreoffice/program/libswlo.so(_ZN11SwTextFrame6FormatEP12OutputDevicePK13SwBorderAttrs+0x676)[0x7f0ab96a9a26]
/usr/lib/libreoffice/program/libswlo.so(+0x6be669)[0x7f0ab95c8669]
/usr/lib/libreoffice/program/libswlo.so(_ZN7SwFrame11PrepareMakeEP12OutputDevice+0x326)[0x7f0ab95c5dd6]
/usr/lib/libreoffice/program/libswlo.so(+0x6f2a11)[0x7f0ab95fca11]
/usr/lib/libreoffice/program/libswlo.so(+0x6f32ef)[0x7f0ab95fd2ef]
/usr/lib/libreoffice/program/libswlo.so(+0x6bcdf5)[0x7f0ab95c6df5]
/usr/lib/libreoffice/program/libswlo.so(_ZN7SwFrame11PrepareMakeEP12OutputDevice+0x326)[0x7f0ab95c5dd6]
/usr/lib/libreoffice/program/libswlo.so(+0x7023f0)[0x7f0ab960c3f0]
/usr/lib/libreoffice/program/libswlo.so(+0x706ce1)[0x7f0ab9610ce1]
/usr/lib/libreoffice/program/libswlo.so(+0x6e93d7)[0x7f0ab95f33d7]
/usr/lib/libreoffice/program/libswlo.so(+0x6ff839)[0x7f0ab9609839]
/usr/lib/libreoffice/program/libswlo.so(+0xa4479d)[0x7f0ab994e79d]
/usr/lib/libreoffice/program/libswlo.so(_ZN11SwViewShellC2ER5SwDocPN3vcl6WindowEPK12SwViewOptionP12OutputDevicel+0x28c)[0x7f0ab994ebdc]
/usr/lib/libreoffice/program/libswlo.so(_ZN13SwCursorShellC1ER5SwDocPN3vcl6WindowEPK12SwViewOption+0x35)[0x7f0ab936c055]
/usr/lib/libreoffice/program/libswlo.so(_ZN11SwEditShellC1ER5SwDocPN3vcl6WindowEPK12SwViewOption+0x22)[0x7f0ab954b5a2]
/usr/lib/libreoffice/program/libswlo.so(_ZN9SwFEShellC1ER5SwDocPN3vcl6WindowEPK12SwViewOption+0x9)[0x7f0ab95a0739]
/usr/lib/libreoffice/program/libswlo.so(_ZN10SwWrtShellC1ER5SwDocPN3vcl6WindowER6SwViewPK12SwViewOption+0x2f)[0x7f0ab9c93b0f]
/usr/lib/libreoffice/program/libswlo.so(_ZN6SwViewC1EP12SfxViewFrameP12SfxViewShell+0xed0)[0x7f0ab9bd2840]
/usr/lib/libreoffice/program/libswlo.so(_ZN6SwView14CreateInstanceEP12SfxViewFrameP12SfxViewShell+0x25)[0x7f0ab9bd4855]
/usr/lib/libreoffice/program/libmergedlo.so(_ZN12SfxBaseModel20createViewControllerERKN3rtl8OUStringERKN3com3sun4star3uno8SequenceINS6_5beans13PropertyValueEEERKNS7_9ReferenceINS6_5frame6XFrameEEE+0x141)[0x7f0b03f2dbd1]
/usr/lib/libreoffice/program/libmergedlo.so(+0x1ec983a)[0x7f0b03fc383a]
/usr/lib/libreoffice/program/libmergedlo.so(+0x18a49ca)[0x7f0b0399e9ca]
/usr/lib/libreoffice/program/libmergedlo.so(+0x18a62d6)[0x7f0b039a02d6]
/usr/lib/libreoffice/program/libmergedlo.so(+0x18a66de)[0x7f0b039a06de]
/usr/lib/libreoffice/program/libmergedlo.so(+0x18c0b15)[0x7f0b039bab15]
/usr/lib/libreoffice/program/libgcc3_uno.so(+0x8980)[0x7f0afa92b980]
/usr/lib/libreoffice/program/libgcc3_uno.so(+0x7e06)[0x7f0afa92ae06]
/usr/lib/libreoffice/program/libgcc3_uno.so(+0x82ee)[0x7f0afa92b2ee]
/usr/lib/libreoffice/program/libbinaryurplo.so(+0x15abe)[0x7f0af976eabe]
/usr/lib/libreoffice/program/libbinaryurplo.so(+0x1626e)[0x7f0af976f26e]
/usr/lib/libreoffice/program/libbinaryurplo.so(+0x1a13e)[0x7f0af977313e]
/usr/lib/libreoffice/program/libuno_cppu.so.3(+0x7eff)[0x7f0aff506eff]
/usr/lib/libreoffice/program/libuno_cppu.so.3(+0x8390)[0x7f0aff507390]
/usr/lib/libreoffice/program/libuno_cppu.so.3(+0x8fca)[0x7f0aff507fca]
/usr/lib/libreoffice/program/libuno_sal.so.3(+0x402a8)[0x7f0b020dc2a8]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f0b01382fa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f0b01fb94cf]

And here is the Error in the converter:

Traceback (most recent call last):
  File "/home/xxxx/.local/lib/python3.7/site-packages/unoserver/converter.py", line 201, in convert
    document.storeToURL(export_path, output_props)
unoserver.converter.DisposedException: Binary URP bridge disposed during call

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxxxx/.local/bin/unoconvert", line 8, in <module>
    sys.exit(main())
  File "/home/xxxxx/.local/lib/python3.7/site-packages/unoserver/converter.py", line 248, in main
    inpath=args.infile, outpath=args.outfile, convert_to=args.convert_to
  File "/home/xxxxx/.local/lib/python3.7/site-packages/unoserver/converter.py", line 204, in convert
    document.close(True)
uno.RuntimeException: illegal object given!

Unoconvert & unoserver hanging/freezes

I ran into the following problem:
I have a file that causes the process to hang/freeze unoconvert (file attached)

unoconvert --convert-to pdf '/root/4e9e29569d0b4ae09f5f52b762a83e12.jpg' '/root/4e9e29569d0b4ae09f5f52b762a83e12.pdf'

This is generally not a problem, since I have written a python wrapper/handler that interrupts unoconvert process on a timer of ~N seconds
The main problem is that interrupting the unconvert command does not interrupt the unoserver process itself (freezed conversion process) and all attempts to send another file results hang too

Debug data:

LibreOffice 6.4.7.2 40(Build:2)
Python 3.8.10
unoserver==1.2

Description:    Ubuntu 20.04.5 LTS
Release:        20.04
Codename:       focal

feature req: restore wildcards, empty [outfile]= reuse filename with new extension

Hi, thanks for modernizing unoconv. I have a script that I need for my work that I used to be able to run on unoconv. unoconv no longer works because of deprecation of distutils. unoconverter does not yet have the features I need to keep my work up to date.

I can get unoconvert to run, but I lose the ability to input wildcards for infile and output the same names to outfile with a new extension. The command in unoconv:

unoconv -f pdf -e SelectPdfVersion=1 -e ExportNotes=false '*.odt'

gives error

/usr/bin/unoconv:860: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if product.ooName not in ('LibreOffice', 'LOdev') or LooseVersion(product.ooSetupVersion) <= LooseVersion('3.3'):

Thanks for your consideration.
Using Manjaro 21.3 Cinnamon, 5.42, libreoffice fresh 7.4.2-2

Dockerizing Unoserver

Much like the scenario described in #17 , I am planning a service that hooks to our platform for document conversion.

Ideally, we provide these services dockerized, in order to minimize the complexity of installation for those users that self-host our platform (which is open source).

Do you have any hints / tips / procedures that could help in the process of dockerizing Unoserver (including Libreoffice in the way Unoserver requires it)?

Thanks for any insight.

Problems with Unoserver

WE are new to python and are currently using unoconv to convert .odt to pdf.

We installed the latest new unoserver/unoconvert packages.

WE set the $PYTHONPATH TO /usr/local/lib/python3.9/site-packages/uno

When we run the following command:
unoconvert -h pdf CONVERT_TO /docdata/tex/templates/generatetmp/2v5bolq.odt /docdata/tex/templates/generatetmp/2v5bolq.pdf

we are getting lots of errors of the print command in base.py. print commands were like this:

print 'new attri:' and shouldn't they be print('new attrib:) (with pararentheses)?)

I fiixed the print errors and now I am getting:

/usr/local/lib/python3.9/site-packages/uno$ unoconvert -h pdf CONVERT_TO /docdata/tex/templates/generatetmp/2v5bolq.odt /docdata/tex/templates/generatetmp/2v5bolq.pdf
Traceback (most recent call last):
File "/usr/local/bin/unoconvert", line 5, in
from unoserver.converter import main
File "/usr/local/lib/python3.9/site-packages/unoserver/converter.py", line 2, in
import uno
File "/usr/local/lib/python3.9/site-packages/uno/init.py", line 4, in
from base import Element, Css, Payload, UnoBaseFeature, UnoBaseField
File "/usr/local/lib/python3.9/site-packages/uno/base.py", line 11, in
PAYLOAD_TAGS = helpers.minus(NORMAL_TAGS, ABNORMAL_TAGS)
File "/usr/local/lib/python3.9/site-packages/uno/helpers.py", line 16, in wrapper
if kwargs.has_key('return_type'):
AttributeError: 'dict' object has no attribute 'has_key'

Is there something that we are missing as to why this is not working?
Is this a product that can be used on our website?

Barb Ward

Is it possible to specify additional export filters with unoserver?

Hello!

My goal is to create tagged PDF documents and right now I'm using unoconv to convert programmatically created fodt documents to pdf. With unoconv I can specify export filters like UseTaggedPDF=true and SelectPdfVersion=2, but can't find how to supple same filters to unoconvert. Is it possible crrently? If not, do you have plans to implement it? I'm not really familiar with python and UNO, but I'm willing to contribute, if you provide some guidance.

pdf/A

With unoconv it was possible to make different kind of pdf/A by passing -eUseTaggedPDF=1 -eSelectPdfVersion=1
Would this be possible with unoserver/unoconvert?

Free up CPU ressources

Hi,
Firstly I'd like to thank you for this amazing module.
I'am trying to use this for a program for converting any kind of document with an extension matching with the following regex r'(?i)^\.(odt|odp|ods|doc|ppt|xls)(x)?$'.
My program works, but after a certain time it become slower and slower.
So, I'm trying to set a timeout for preventing too long conversion, but it doesn't seem to work.
Could you tell me wether there is a way to detect too long conversion and reset the server in that case to prevent it from consuming all CPU resources ?

Thanks in advance.

Jeff

This is my code :

try:
  pdf_target = Path(out_dir, '{}.pdf'.format(src_file.stem))
  cmd = [shutil.which('unoconvert'), '--convert-to', 'pdf',
         src_file, pdf_target]

  # usefull documentation about timeout
  # https://alexandra-zaharia.github.io/posts/kill-subprocess-and-its-children-on-timeout-python/
  p = subprocess.Popen(cmd, start_new_session=True,
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE,)
  exit_code = p.wait(timeout=self.TIMEOUT)
  self.LOGGER.debug('Exit code: {}'.format(exit_code))
  if pdf_target.is_file():
      return pdf_target
  else:
      self.LOGGER.warning('Error occurred while trying '
                          'to convert file {} into '
                          'PDF format. '
                          'Raw file will be uploaded to '
                          'the pdf directory.'
                          .format(src_file))

except (TimeoutError, subprocess.TimeoutExpired):
  self.LOGGER.warning('TimeoutExpired while trying to convert '
                      'file {} into PDF format. '
                      'Raw file will be uploaded to '
                      'the pdf directory. '
                      .format(src_file))
  # maybe not enough...
  os.killpg(os.getpgid(p.pid), signal.SIGTERM)
  # brute kill
  os.killpg(os.getpgid(p.pid), signal.SIGKILL)

type detection failed while running unoconvert

hi,
please check and let me know what i'm missing-

unoserver.converter.IllegalArgumentException: Unsupported URL file:///files_new/libretest/simple.xlsx: "type detection failed"

$ unoserver --executable /usr/bin/libreoffice
INFO:unoserver:Starting unoserver.
INFO:unoserver:Command: /usr/bin/libreoffice --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/tmpovte7i1v --accept=socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext

$ sudo netstat -anp | grep 2002
tcp        0      0 127.0.0.1:2002          0.0.0.0:*               LISTEN      804395/soffice.bin

$ ls
simple.xlsx

$ unoconvert --convert-to pdf ./simple.xlsx s1.pdf
INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening ./simple.xlsx
Traceback (most recent call last):
  File "/usr/local/bin/unoconvert", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 247, in main
    result = converter.convert(
  File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 152, in convert
    document = self.desktop.loadComponentFromURL(
unoserver.converter.IllegalArgumentException: Unsupported URL <file:///files_new/libretest/simple.xlsx>: "type detection failed"

Integration tests for convertion to PDF does not accept PDF-1.6 format

I'm running LibreOffice 7.4.3.2 40(Build:2), which outputs PDF-1.6 when converting to PDF.
Integration test's assertions at lines 29 and 56 explicitly check for PDF-1.5 ( assert start == b"%PDF-1.5\n") and therefore fails.

add html, rdf types to DOC_TYPES (conversion fails for them because of type filter)

text/plain
text/html
text/rdf

should be added to converter DOC_TYPES type filter

Failing to convert docx to pdf

Hello,

I have some trouble understanding an error message. It occurs when my app tries to convert a docx file to pdf.

    cmd = 'unoconvert --convert-to pdf {} {}'.format(document_name, output_name).split()

    p = subprocess.Popen(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    output, error = p.communicate()

I get the following:

INFO:unoserver:Starting unoconverter. 
INFO:unoserver:Opening /tmp/3224c812-4046-418b-b2f7-52e122fc1ab4.docx 
Traceback (most recent call last): 
File "/usr/local/bin/unoconvert", line 8, in <module> 
    sys.exit(main()) 
File "/usr/local/lib/python3.9/dist-packages/unoserver/converter.py", line 245, in main 
    result = converter.convert( 
File "/usr/local/lib/python3.9/dist-packages/unoserver/converter.py", line 153, in convert 
    document = self.desktop.loadComponentFromURL( unoserver.converter.DisposedException: ./sfx2/source/doc/sfxbasemodel.cxx:2926

The workflow is : For each docx template, I spawn a thread that build the final docx document from the template, and then the pdf.
So I call unoconvert from within a python thread.

(I use https://docxtpl.readthedocs.io/en/latest/)

Runtime error with unoconvert: "Could not find an export filter ... to generic_Text"

Hello and thank you for all unoconv projects.

I am trying to run unoserver in docker and getting some issues with unoconvert.

The Dockerfile content is the following:

FROM eclipse-temurin:8u332-b09-jre-jammy

WORKDIR /var/unoserver

RUN apt-get update \
    && apt-get install -y \
        wget \
        libreoffice \
        libreoffice-writer \
        ure \
        libreoffice-java-common \
        libreoffice-core \
        libreoffice-common \
    && apt-get remove -y libreoffice-gnome \
    && apt-get autoremove -y

RUN wget https://bootstrap.pypa.io/get-pip.py \
    && python3 get-pip.py \
    && python3 -m pip install unoserver

In the container I try to test connection between unoserver and unoconvert

unoserver &
wget http://www.africau.edu/images/default/sample.pdf
unoconvert sample.pdf smaple.txt

And unfortunately it doesn't work with error output:

INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening sample.pdf
Traceback (most recent call last):
  File "/usr/local/bin/unoconvert", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/unoserver/converter.py", line 247, in main
    result = converter.convert(
  File "/usr/local/lib/python3.10/dist-packages/unoserver/converter.py", line 185, in convert
    raise RuntimeError(
RuntimeError: Could not find an export filter from com.sun.star.drawing.DrawingDocument to generic_Text

unoconvert error when convert docx to pdf

when i ran unoconvert , the following error happened.
how to resolve the problem, can give some help with me @regebro

Traceback (most recent call last):
File "/usr/bin/unoconvert", line 8, in
sys.exit(main())
File "/usr/lib/python3.7/site-packages/unoserver/converter.py", line 248, in main
inpath=args.infile, outpath=args.outfile, convert_to=args.convert_to
File "/usr/lib/python3.7/site-packages/unoserver/converter.py", line 201, in convert
document.storeToURL(export_path, output_props)
unoserver.converter.IOException: SfxBaseModel::impl_store file:///gsyf/app/tmp/1-%E4%B8%8A%E4%BC%A0%E8%B5%84%E6%96%99-2022%E5%B9%B4%E5%BA%A6%E5%B7%A5%E4%BC%A4%E9%A2%84%E9%98%B2%E5%9F%B9%E8%AE%AD%E9%A1%B9%E7%9B%AE%E5%91%8A%E7%9F%A5%E4%B9%A6.pdf failed: 0x11b(Error Area:Io Class:Abort Code:27)

Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/unoserver/converter.py", line 201, in convert
document.storeToURL(export_path, output_props)
unoserver.converter.DisposedException: Binary URP bridge disposed during call

During handling of the above exception, another exception occurred:

Can't convert pdf to word

receive the error
could not find an export from com.sun.star.drawing.DrawingDocument to writer_MS_Word_2007

I tested with any pdf documents. The libreoffice command works perfect.

AttributeError: 'NoneType' object has no attribute 'supportsService'

When converting a document from DOCX to PDF, we get the following error. With lower versions of LibreOffice, we haven't run into this.

Versions:
LibreOffice 7.4.5.1 40(Build:1) (tested on 7.0.4.2 as well)
unoserver 1.3 (tested 1.2 as well)

Input file mimetype: application/vnd.openxmlformats-officedocument.wordprocessingml.document

$ unoconvert input.docx output.pdf

INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening input.docx
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 173, in convert
    import_type = get_doc_type(document)
  File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 43, in get_doc_type
    if doc.supportsService(t):
AttributeError: 'NoneType' object has no attribute 'supportsService'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/unoconvert", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 276, in main
    result = converter.convert(
  File "/usr/local/lib/python3.8/dist-packages/unoserver/converter.py", line 225, in convert
    document.close(True)
AttributeError: 'NoneType' object has no attribute 'close'

REST API implementation

Hi, guys.
I call converter.convert in rest api interface, always return an error.
File "/home/liubo/project/python/xulan-server/app/api/api_v1/printer/printer.py", line 52, in max_upload
conv = converter.UnoConverter()
File "/home/liubo/project/python/xulan-server/app/extends/unoserver/converter.py", line 69, in init
self.resolver = self.local_context.ServiceManager.createInstanceWithContext(
SystemError: pyuno runtime is not initialized, (the pyuno.bootstrap needs to be called before using any uno classes)

However, it is normal for me to execute unoconvert on the command line

warning : The Location of LibreOffice Python has changed!

I just find that the LibreOffice python has moved to /Applications/LibreOffice.app/Contents/Resources in the lastest Version (7.4.3.2) , FYI

Bring back the index refresh support

One of the features I miss from the old unoconv, is the automatic update of indexes and ToC before the conversion.
I created a branch, where I re-added the for cicle needed to refresh the indexes. And it just works !

I hope you can bring back this feature in a future release!

No module named 'uno'

I had install python3 and libreoffice, when i run libreoffice,
No error when run unoserver,
but, There is a error as following when I run unoconvert.
Can you give some help. @regebro

Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/unoserver/converter.py", line 2, in
import uno
ModuleNotFoundError: No module named 'uno'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/bin/unoconvert", line 5, in
from unoserver.converter import main
File "/usr/lib/python3.7/site-packages/unoserver/converter.py", line 5, in
"Could not find the 'uno' library. This package must be installed with a Python "
ImportError: Could not find the 'uno' library. This package must be installed with a Python installation that has a 'uno' library. This typically means you should installit with the same Python executable as your Libreoffice installation uses.

 at ChildProcess.exithandler (child_process.js:303:12)
 at ChildProcess.emit (events.js:310:20)
 at maybeClose (internal/child_process.js:1021:16)
 at Process.ChildProcess._handle.onexit (internal/child_process.js:286:5) {

killed: false,
code: 1,
signal: null,

[Feature Request] Add as formula to HomeBrew

Add as formula to HomeBrew

Can also probably add LibraOffice as a dependency to be automatically installed when installing the unoserver package.
Could also add support for brew-service as an option to automatically set up a launchd plist and start and stop as a daemon.

Considerations regarding output buffers

While studying the converter code, I wanted to share some thoughts on output buffers:
Currently, if one does not want to write to a file, it's possible to set outpath to None. In this case, the data will be written into a new, internally created io.BytesIO object, and then the value of the buffer is returned as bytes.

However, that means the entire data will be in memory at the same time, which increases resource usage.
Looking at the uno outputstream interface, it seems like the data is actually provided incrementally (presuming that writeBytes() is called multiple times with smaller parts rather than one large sequence):

class OutputStream(unohelper.Base, XOutputStream):
    def __init__(self):
        self.buffer = io.BytesIO()

    def closeOutput(self):
        pass

    def writeBytes(self, seq):
        self.buffer.write(seq.value)

If unoserver would accept a caller-provided output buffer to write into (e. g. a file handle acquired by open(..., "wb"), or sys.stdout), the data wouldn't necessarily have to be in memory at once.

For a possible backwards-compatible implementation, outpath could just be adapted to accept a byte buffer (i. e. anything that implements write(), read(), and seek()), and an init parameter could be added to OutputStream to take it over.

I'm not certain how useful this would be, given that uno can already write to files on its own, and if you intend to post-process the output, it probably needs to be in memory as a whole anyway. Nevertheless, it seems a bit more elegant (e. g. in case callers want to handle file writing on their own for some reason). Would you be interested in a Pull Request?

Memory Leak in LibreOffice

I'm using unoserver/ unoconvert to convert bulk files to JPG format using Docker.
I have a large number of files that need processing so my server is always up and running, converting each file one by one sequentially. However, after 2-4 hours of work, the conversion processes start getting stuck, and then soon the whole docker container is down, and I have to manually restart it.
When checking the memory while it was stuck I found that the process soffice.bin holds 10GB of memory.
The unoserver starts automatically with the container starts, then I only call unoconvert command for each file.
I'm thinking that maybe LibreOffice is not releasing the files after loading and converting them.
I looked through the README again to see if there's a configuration I can set that will allow me to free up memory after each conversion, but I didn't find any.
Is there a way to implement that?
Thanks and regards.