shelfio / aws-lambda-libreoffice Goto Github PK

View Code? Open in Web Editor NEW

213.0 24.0 42.0 177.56 MB

Utility to work with Docker version of LibreOffice in Lambda

License: MIT License

JavaScript 6.22% TypeScript 68.05% Shell 23.54% Dockerfile 2.19%

aws-lambda serverless libreoffice npm-package node-module nodejs pdf-converter pdf-generation

aws-lambda-libreoffice's People

Contributors

Stargazers

Watchers

aws-lambda-libreoffice's Issues

/tmp/instdir/program/soffice.bin: error while loading shared libraries: libmergedlo.so: cannot open shared object file: No such file or directory

Hello, I am having problems when converting files to pdf in lambda using the eu-west-1 layer (https://github.com/shelfio/libreoffice-lambda-layer, tried both gzip and brotli options). I am using the 3.0.0 version of @shelf/aws-lambda-libreoffice.
I use AWS SAM to deploy the lambda to AWS and the strange thing is that there are no problems locally.
This is the error that I get when trying to convert a file:

{
  "errorType": "Error",
  "errorMessage": "Command failed: cd /tmp && /tmp/instdir/program/soffice.bin --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --nofirststartwizard --convert-to pdf --outdir /tmp /tmp/dog.png\n/tmp/instdir/program/soffice.bin: error while loading shared libraries: libmergedlo.so: cannot open shared object file: No such file or directory\n",
  "trace": [
    "Error: Command failed: cd /tmp && /tmp/instdir/program/soffice.bin --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --nofirststartwizard --convert-to pdf --outdir /tmp /tmp/dog.png",
    "/tmp/instdir/program/soffice.bin: error while loading shared libraries: libmergedlo.so: cannot open shared object file: No such file or directory",
    "",
    "    at checkExecSyncError (child_process.js:629:11)",
    "    at execSync (child_process.js:666:13)",
    "    at convertTo (/var/task/node_modules/@shelf/aws-lambda-libreoffice/lib/convert.js:39:40)",
    "    at process._tickCallback (internal/process/next_tick.js:68:7)"
  ]
}

How to troubleshoot if not converting?

Thanks for putting this together!

I'm having an issue where the call to convertFileToPDF is returning success, but no file is created. This is using AWS Lambda with Node.js 8.10. I have tried with both a .docx and a .pptx. I have verified that my local LibreOffice can read both files.

From the logs it appears it spent about 20 milliseconds on the conversion (which seems way too fast even for this small file):

2019-01-09T14:47:05.311Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (pptx_to_pdf) Checking for existence of: /tmp/test_document.docx
2019-01-09T14:47:05.311Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (found_local) The file /tmp/test_document.docx exists... Converting to PDF
2019-01-09T14:47:05.311Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (convertToPdf) Attempting conversion of /tmp/test_document.docx to PDF
2019-01-09T14:47:05.331Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (exists) Conversion supposedly successful.
2019-01-09T14:47:05.331Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (exists) Checking for existence of /tmp/test_document.pdf
2019-01-09T14:47:05.332Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (error) ERROR in Catch block
2019-01-09T14:47:05.332Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (error) Catch block error:Error: ENOENT: no such file or directory, access '/tmp/test_document.pdf'
2019-01-09T14:47:05.332Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (error) Enumerating files in /tmp
2019-01-09T14:47:05.347Z 6c5bbff7-141d-11e9-b165-77d9c1425b46 (error) Files in /tmp: test_document.docx

Any ideas on how I can troubleshoot this? Is there a log file or a verbose mode?
Here is the code that is actually attempting the conversion:

  const convertToPdf = (local_source) => {
    return new Promise((resolve, reject) => {
      console.log("(convertToPdf) Attempting conversion of " + local_source + " to PDF");
      // should create /tmp/<local_source>.pdf
      success = convertFileToPDF(local_source);
      if (success) {
        resolve(true);
      } else {
        reject(false)
      }
    })
  };

Thanks!!

Can't convert any msg and some docx files to pdf

Lambda nodejs10 + lambda layer libre_office with Brotli.
On attempt to convert docx document where section "Table of Contents", i get error "Please verify input parameters... (SfxBaseModel::impl_store file:///tmp/file.pdf failed: 0xc10(Error Area:Io Class:Write Code:16))"
In result of conversion *.msg files I get unreadable pdf files with mix of ???? symbols + original text

Changing LibreOffice settings?

Hi, big fan of your work here.

Do you have any ideas how I could set LibreOffice settings? I want to make some changes that make Excel files calculate their formulas when converting xlsx to PDF.

On my Fedora system, libreoffice config gets saved in ~/.config/libreoffice.

I'm poking around on my Lambda instance and I can't find anything there.

local testing

I am rather clueless regarding the functionality of this repo but I would suggest replacing all static tmp with os.tmpdir() to enable local testing as well.
OK, rephrasing, I'd love documentation regarding local testing on non linux machines.

Breaks intermittently due to Thread::create failed error

I have created a custom docker image which is using shelf/lambda-libreoffice-base:7.4-node16-x86_64 as a base image by following the instructions in README file. It is working fine for most of the time but it intermittently breaks.

Here's the code of my docker image:

FROM public.ecr.aws/shelf/lambda-libreoffice-base:7.4-node16-x86_64

COPY ./fonts/* /usr/local/share/fonts/
COPY ./ ./
RUN yum install java-1.8.0-openjdk-devel -y
CMD [ "app.handler" ]

And here's my handler function:

const handler = async (event) => {
  try {
    const {fileUrl, isExcel} = event;

    const inputFileExtension = ".docx";
    const fileName = `template-${uuidv4()}${inputFileExtension}`;
    await download(fileUrl, `/tmp/${fileName}`);
    const pdfFilePath = convertTo(fileName, 'pdf');
    const stats = fs.statSync(pdfFilePath);
    const pdfFileData = fs.readFileSync(pdfFilePath);
  
    await clearTmpDirectory();
    const response = {
      statusCode: 200,
      body: pdfFileData
    };
    return response;
  } catch (err) {
    console.error(err);
    throw err;
  }
}

Here download function is simply downloading my file from url that has been passed to it and clearTmpDirectory function is simply clearing all the files from tmp directory.

Here's the error that I am getting intermittently:

javaldx: Could not find a Java Runtime Environment!

what():  osl::Thread::create failed

terminate called after throwing an instance of 'std::runtime_error'

Error: Command failed: cd /tmp && libreoffice7.4 --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp /tmp/template-bf853fb4-95d9-4fef-87b1-59c54cf13c58.docx
javaldx: Could not find a Java Runtime Environment!
Warning: failed to read path from javaldx
terminate called after throwing an instance of 'std::runtime_error'
  what():  osl::Thread::create failed

    at checkExecSyncError (node:child_process:861:11)
    at execSync (node:child_process:932:15)
    at convertTo (/var/task/node_modules/@shelf/aws-lambda-libreoffice/lib/convert.js:29:40)
    at Runtime.handler (/var/task/app.js:66:25) {
  status: 134,
  signal: null,
  output: [
    null,
    <Buffer >,
    <Buffer 6a 61 76 61 6c 64 78 3a 20 43 6f 75 6c 64 20 6e 6f 74 20 66 69 6e 64 20 61 20 4a 61 76 61 20 52 75 6e 74 69 6d 65 20 45 6e 76 69 72 6f 6e 6d 65 6e 74 ... 150 more bytes>
  ],
  pid: 12370,
  stdout: <Buffer >,
  stderr: <Buffer 6a 61 76 61 6c 64 78 3a 20 43 6f 75 6c 64 20 6e 6f 74 20 66 69 6e 64 20 61 20 4a 61 76 61 20 52 75 6e 74 69 6d 65 20 45 6e 76 69 72 6f 6e 6d 65 6e 74 ... 150 more bytes>
}

Note: This works perfectly fine for a while if I redeploy my same image on lambda, but after that it again starts to fail with above error intermittently.

File not being created, failing silently

I have experienced an issue similar to this - #39

It's periodic, soffice.bin returns the expected convert /tmp/whatever.docx -> /tmp/whatever.pdf using filter : writer_pdf_Export as expected, but the file is not created.
I tried following a similar path as in issue #39 , but it appears those changes have already been implemented into the library.

The file converts as expected using a similar conversion process on the osx version of the soffice command line tool, it's only using this layer that it fails.

If a file fails in this manner, it will always fail in this manner, most files convert fine.

Has anyone got any idea how to debug this? Has anyone experienced anything similar?

Converting docx (downloaded from s3) to pdf (uploaded to s3)

Hello guys,

I'm trying to convert a MS Word .docx file (that is being downloaded from S3 bucket) to a pdf one (and upload it to the same location in that S3 bucket).

NOTE: I've tried the layer solution and it did work for me.

I've followed all the mentioned steps:

1- Created my own docker image:

Where my Dockerfile looks like this:

FROM public.ecr.aws/shelf/lambda-libreoffice-base:7.4-node16-x86_64
COPY app.js package.json ${LAMBDA_TASK_ROOT}/
RUN npm install
CMD [ "app.handler" ]

and apps.js looks like this:

const {convertTo, canBeConvertedToPDF} = require('@shelf/aws-lambda-libreoffice');
const fs = require("fs");
const AWS = require("aws-sdk");

const S3_BUCKET = process.env.S3_BUCKET
const WORD_FILE_KEY = process.env.WORD_FILE_KEY
const TMP_DIR = "/tmp"

const s3 = new AWS.S3();


module.exports.handler = async () => {

    console.log(`# Bucket Name: ${S3_BUCKET}`)
    console.log(`# File Name: ${WORD_FILE_KEY}`)
    console.log(`[Initial]: ${getTmpDirContent()}`)

    // Download .docx file from S3 to /tmp
    console.log("# Downloading the .docx file from S3 ...")
    await downloadFileFromS3(WORD_FILE_KEY)
    console.log(`[After Download]: ${getTmpDirContent()}`)

    if (canBeConvertedToPDF(WORD_FILE_KEY)) {
        // Convert .docx to .pdf
        console.log("# Converting .docx file to .pdf file ...")
        const convertedFilePath = convertTo(WORD_FILE_KEY, "pdf");
        console.log(convertedFilePath)
        console.log(`[After Convert]: ${getTmpDirContent()}`)

        // Upload .pdf file to S3
        console.log("# Uploading the .pdf file to S3 ...")
        await uploadFileToS3(convertedFilePath)

        console.log("# Done")
    } else {
        console.log("# Can't be converted tp pdf!")
    }
}

const downloadFileFromS3 = async (fileKey) => {
    try {
        const filePath = `${TMP_DIR}/${fileKey}`
        const params = {
            Bucket: S3_BUCKET,
            Key: fileKey
        }
        const objData = await s3.getObject(params).promise();
        fs.writeFileSync(filePath, objData.Body.toString());
        console.log(`- File ${fileKey} has been downloaded to ${filePath} successfully`);
    } catch (e) {
        throw new Error(`- Download Error: ${e.message}`)
    }
}

const uploadFileToS3 = async (filePath) => {
    try {
        const fileKey = filePath.split(`${TMP_DIR}/`)[1]
        const fileData = fs.readFileSync(filePath)
        const params = {
            Bucket: S3_BUCKET,
            Key: fileKey,
            Body: fileData
        }
        const data = await s3.upload(params).promise();
        console.log(`- Upload successfully at ${data.Location}`);
    } catch (e) {
        throw new Error(`- Upload Error: ${e.message}`)
    }
};

const getTmpDirContent = () => {
    return fs.readdirSync(TMP_DIR)
}

& package.json is the following:

{
  "name": "libreoffice-lambda-container-image",
  "version": "1.0.0",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "@shelf/aws-lambda-libreoffice": "^5.0.1",
    "aws-sdk": "^2.1293.0"
  }
}

2- Pushed it to ECR.

3- Created a Lambda Function:

With the following configurations:

AND I'm getting the following error:

Any suggestions?!

Error: Cannot find module '/var/task/node_modules/@shelf/aws-lambda-brotli-unpacker/src/iltorb'

I'm getting this error when I call convertTo, I've installed the package from npm. Any clue?

Is this no longer supported?

NPM says the package is deprecated. This seems like a more elegant solution for Lambda than others.

Convert only one page of the document

Hello, first of all, great job on this, works like a charm. This is not really an issue, but more like a question/feature request, it's possible to only transform one page of the document instead of all the document?

IE: Having a pages=X variable that allows you to limit the page of the documents that going to be converted

If the answer is "no", there's a possibility that this will be implemented in the future?

Thanks again for taking the time to read this and have a nice day :)

@shelf/aws-lambda-brotli-unpacker compiled against a different Node.js version

Getting the following error when trying to run this in a node 10.x lambda runtime. Assume it's related to the brotli unpacker not being updated to node 10.

{
  "errorType": "Error",
  "errorMessage": "The module '/var/task/node_modules/@shelf/aws-lambda-brotli-unpacker/src/iltorb/build/bindings/iltorb.node'\nwas compiled against a different Node.js version using\nNODE_MODULE_VERSION 57. This version of Node.js requires\nNODE_MODULE_VERSION 64. Please try re-compiling or re-installing\nthe module (for instance, using `npm rebuild` or `npm install`).",
  "trace": [
    "Error: The module '/var/task/node_modules/@shelf/aws-lambda-brotli-unpacker/src/iltorb/build/bindings/iltorb.node'",
    "was compiled against a different Node.js version using",
    "NODE_MODULE_VERSION 57. This version of Node.js requires",
    "NODE_MODULE_VERSION 64. Please try re-compiling or re-installing",
    "the module (for instance, using `npm rebuild` or `npm install`).",
    "    at Object.Module._extensions..node (internal/modules/cjs/loader.js:807:18)",
    "    at Module.load (internal/modules/cjs/loader.js:653:32)",
    "    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)",
    "    at Function.Module._load (internal/modules/cjs/loader.js:585:3)",
    "    at Module.require (internal/modules/cjs/loader.js:692:17)",
    "    at require (internal/modules/cjs/helpers.js:25:18)",
    "    at Object.<anonymous> (/var/task/node_modules/@shelf/aws-lambda-brotli-unpacker/src/iltorb/index.js:8:16)",
    "    at Module._compile (internal/modules/cjs/loader.js:778:30)",
    "    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)",
    "    at Module.load (internal/modules/cjs/loader.js:653:32)"
  ]
}

Full disclosure, haven't yet gotten a successful run on a 8.10 runtime yet so I'm not 100% sure it's that but seems likely.

If node 10.16 has native brotli support, is it possible to just lose the aws-lambda-brotli-unpacker dependency?

Issues unpacking lo.tar.br

Hi @vladgolubev,

I have been trying to use this package instead, but continue to have trouble. I saw that abdulbarik was able to get it working so I assume the issue is on my end. The function seems to hang on the getExecutablePath() function, and specifically the last line where it's piping the decompressStream. I've pasted my lambda function below, do you mind seeing if it's something in my implementation?

`const {writeFileSync, readFileSync} = require('fs');
const {execSync} = require('child_process');
const {parse} = require('path');
const {S3} = require('aws-sdk');
const fs = require('fs');

const s3 = new S3({params: {Bucket: 'convert-doc-to-pdf'}});

exports.handler = async ({filename}) => {
console.log(filename);
const {Body: inputFileBuffer} = await s3.getObject({Key: filename}).promise();
writeFileSync(/tmp/${filename}, inputFileBuffer);

fs.readdirSync('/tmp').forEach(file => {
console.log(file);
});

const {getExecutablePath, defaultArgs} = require('aws-lambda-libreoffice');

const loBinary = await getExecutablePath(); // /tmp/instdir/program/soffice

execSync(${loBinary} ${defaultArgs.join(' ')} --convert-to pdf /tmp/${filename} --outdir /tmp)

const outputFilename = ${parse(filename).name}.pdf;
const outputFileBuffer = readFileSync(/tmp/${outputFilename});

await s3
.upload({
Key: outputFilename, Body: outputFileBuffer,
ACL: 'public-read', ContentType: 'application/pdf'
})
.promise();

return https://s3.amazonaws.com/convert-doc-to-pdf/${outputFilename};
};`

Error while loading shared libraries

Lambda says

/tmp/instdir/program/soffice.bin: error while loading shared libraries: libmergedlo.so: cannot open shared object file: No such file or directory

Cannot read property "1" of null

I'm running this package inside of lambda to process many documents simultaneously. It's working great.

However, I'm getting this issue when lambda runs one of my documents.

Here's the error:
{ "name": "DocumentConversionFailed", "input": { "Error": "TypeError", "Cause": "{\"errorType\":\"TypeError\",\"errorMessage\":\"Cannot read property '1' of null\",\"trace\":[\"TypeError: Cannot read property '1' of null\",\" at getConvertedFilePath (/var/task/node_modules/@shelf/aws-lambda-libreoffice/lib/logs.js:9:54)\",\" at convertTo (/var/task/node_modules/@shelf/aws-lambda-libreoffice/lib/convert.js:44:41)\",\" at processTicksAndRejections (internal/process/task_queues.js:93:5)\",\" at async ConvertDocumentToPDF (/var/task/index.js:129:26)\",\" at async Runtime.exports.handler (/var/task/index.js:92:21)\"]}" } }

You can see the line numbers in there to possibly debug.

I can send the file I'm trying to convert if needed. I have recreated this 3-4 times with the same file. Not sure if it's the file or something wrong with "warm" lambdas.

I'm debugging this and will post back here if I find any more information.

Thanks

no such file or directory, open '/opt/lo.tar.br'

Hello,

I'm testing this using the AWS SAM cli and keep getting these errors. I'm using the node 8.10 runtime and bumped the memory up to 3008 just to make sure.

2019-07-08T23:20:44.910Z	e3170015-1de5-137d-7c8d-62be684ad191	CAN BE CONVERTED!
2019-07-08T23:20:44.911Z	e3170015-1de5-137d-7c8d-62be684ad191	test.docx
2019-07-08T23:20:44.918Z	e3170015-1de5-137d-7c8d-62be684ad191	{ Error: ENOENT: no such file or directory, open '/opt/lo.tar.br'
  errno: -2,
  code: 'ENOENT',
  syscall: 'open',
  path: '/opt/lo.tar.br' }
END RequestId: e3170015-1de5-137d-7c8d-62be684ad191
REPORT RequestId: e3170015-1de5-137d-7c8d-62be684ad191	Duration: 2750.91 ms	Billed Duration: 2800 ms	Memory Size: 3008 MB	Max Memory Used: 84 MB	

{"errno":-2,"code":"ENOENT","syscall":"open","path":"/opt/lo.tar.br"}

Here is the relevant code:

 if(canBeConvertedToPDF(fileName)) {
            console.log("CAN BE CONVERTED!");
            fs.readdirSync('/tmp').forEach(file => {
                console.log(file);
            });
            convertTo("test.docx", 'pdf');
        } else {
            console.log("CANNOT BE CONVERTED!");
        }

Any assistance would be appreciated. Thank you!

Error 403 while building docker image

This is my Dockerfile:

FROM public.ecr.aws/shelf/lambda-libreoffice-base:7.4-node16-x86_64

COPY ./ ${LAMBDA_TASK_ROOT}/

RUN npm install

CMD [ "dist/index.handler" ]

and this is the error I get if I try to run the docker build command: ERROR: failed to solve: public.ecr.aws/shelf/lambda-libreoffice-base:7.4-node16-x86_64: unexpected status from HEAD request to https://public.ecr.aws/v2/shelf/lambda-libreoffice-base/manifests/7.4-node16-x86_64: 403 Forbidden any ideas why?

convertTo docx generates blank document

hello! when I try the convertTo method from odt to pdf it works fine, but if I change pdf to docx, the result is a blank document.

I did an extra test running the --convert-to docx command in my local machine (ubuntu), extracted both zip files and compared the content.
I noticed that the [content-types].xml differs, the final part in the local test is:
` ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml

while in aws-lambda-libreoffice is:
ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"

As I understand there is two different docx formats (when in the libreoffice IDE you select the type to export to), how can specify in the convertTo method wich one you wants?

Suggestion: increase memory recommendation

Hi,

using your project and it works very nice

However when documents reach a certain complexity, libreoffice fails to start and therefor is unable to render. We use carbone.js to replace parameters in template files and create pdfs. We used the minimum recommended memory setting 1536mb.

On the other hand aws docs state that at 1769mb one full core is allocated. That means, until that some kind of throttling applies. As lambdas scale linear with memory, that 1536mb should give about 86.8% of an vcpu.

When running lo locally using lambci/lambda:nodejs12.x no problem occurred so far. To test the theory, param --cpus="0.868" was added to docker run. Guess what? Lo failed to start as it does when deployed as lambda. My knowledge of the inner workings of Lo is not existing, but for me it looks like one of the lo components does not like to be interrupted. Once that was at least reproduceable, an increase of lambda memory to 1769mb(1 vcpu) solved the problem.

stacktrace we got until that change looks like this:

Fatal exception: Signal 6                                     
Stack:                                                        
/tmp/instdir/program/libuno_sal.so.3(+0x13deb)[0x7f56a386fdeb]
/tmp/instdir/program/libuno_sal.so.3(+0x3ad13)[0x7f56a3896d13]
/lib64/libpthread.so.0(+0x117e0)[0x7f56a2d777e0]              
/lib64/libc.so.6(gsignal+0x110)[0x7f56a29eeb20]               
/lib64/libc.so.6(abort+0x148)[0x7f56a29effc8]

Thank you for your work
Matthias

Getting Error While Building Docker Image

While building docker file iam getting error

Big CSV files(100+ MB) timing out at max resources

I wonder how you deal with big files?

Lambda Settings

3000 MB Ram
Timeout is 15 minutes

The conversion process is taking all the time. There are no other heavy operations. Other operations account for less than 500ms.

One solution is to split the files into manageable pieces, process it independently, and join it together once processing is complete. Is that a good solution?

@vladgolubev I'm curious how you guys are dealing with the problem?

PS: This is not an issue with the library, but a general question. Feel free to close it.

version of LibreOffice

What the version of LibreOffice you use?

Cannot destructure property `inputPath` of 'undefined' or 'null'

When using

const {unpack, defaultArgs} = require('@shelf/aws-lambda-libreoffice');

await unpack(); // default path /tmp/instdir/program/soffice.bin

There is an error Cannot destructure property inputPath of 'undefined' or 'null'

Seem that need to enter the input path manually

LibreOffice can support image conversion

Any reason why canBeConvertedToPDF return false for images even though LibreOffice can convert the most popular image types to pdf?

For example, PNGs can be converted to pdf fine with LibreOffice.

PDF conversion for the password protected doc file

Hello,

I am trying to convert the password protected .docx file to pdf using this nodejs package.
But I am not able to figure out the way to do so. Can you please let me know the best way to achieve this functionality?

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

circleci

.circleci/config.yml

node 5.0.3

npm

package.json

@shelf/is-audio-filepath 2.0.0

del 5.1.0

is-image 3.1.0

is-video 1.0.1

@babel/cli 7.20.7

@babel/core 7.20.12

@shelf/babel-config 1.2.0

@shelf/eslint-config 2.27.1

@shelf/prettier-config 1.0.0

@types/jest 29.4.0

babel-jest 29.4.1

eslint 8.33.0

husky 8.0.3

jest 29.4.1

lint-staged 13.1.0

prettier 2.8.3

typescript 4.9.4

node >=16

Check this box to trigger a request for Renovate to run again on this repository

update libreoffice version

It will be great if the package can be upgraded to LibreOffice version 6.1.4 or something stable currently it is using an alpha build.

question: how to convert s3 bucket files

In the example, it assumes that the file is already in the /tmp folder.
Is there a way to convert files inside s3 buckets?

I'm thinking of creating a trigger for the lambda whenever there is a file uploaded in s3.
Is that possible?

LibreOffice 7.0.6.2

I've updated the dockerfile, but I'm getting a build error:
Any ideas what I'm missing?

#28 202.5 g++: internal compiler error: Killed (program cc1plus)
#28 202.5 Please submit a full bug report,
#28 202.5 with preprocessed source if appropriate.
#28 202.5 See <http://bugzilla.redhat.com/bugzilla> for instructions.
#28 202.5 make[1]: *** [/tmp/libreoffice/workdir/GenCxxObject/UnpackedTarball/libcmis/src/libcmis/atom-document.o] Error 4
#28 202.5 make[1]: *** Deleting file `/tmp/libreoffice/workdir/GenCxxObject/UnpackedTarball/libcmis/src/libcmis/atom-document.o'
#28 227.3 make: *** [build] Error 2

FROM amazonlinux:latest

# see https://stackoverflow.com/questions/2499794/how-to-fix-a-locale-setting-warning-from-perl
ENV LC_CTYPE=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8

ENV LIBREOFFICE_VERSION=7.0.6.2

# install basic stuff required for compilation
RUN yum install -y yum-utils \
    && yum-config-manager --enable epel \
    && yum install -y \
    google-crosextra-caladea-fonts \
    autoconf \
    ccache \
    expat-devel \
    expat-devel.x86_64 \
    fontconfig-devel \
    git \
    gmp-devel \
    google-crosextra-caladea-fonts \
    google-crosextra-carlito-fonts \
    icu \
    libcurl-devel \
    liberation-sans-fonts \
    liberation-serif-fonts \
    libffi-devel \
    libICE-devel \
    libicu-devel \
    libmpc-devel \
    libpng-devel \
    libSM-devel \
    libX11-devel \
    libXext-devel \
    libXrender-devel \
    libxslt-devel \
    mesa-libGL-devel \
    mesa-libGLU-devel \
    mpfr-devel \
    nasm \
    nspr-devel \
    nss-devel \
    openssl-devel \
    perl-Digest-MD5 \
    which

# install python3
RUN amazon-linux-extras install python3

RUN yum groupinstall -y "Development Tools"

# install gperf
ADD http://ftp.gnu.org/pub/gnu/gperf/gperf-3.1.tar.gz /usr
RUN cd /usr && \
    tar -xzvf gperf-3.1.tar.gz && \
    cd gperf-3.1 && \
    ./configure --prefix=/usr --docdir=/usr/share/doc/gperf-3.1  && \
    make && \
    make -j1 check && \
    make install  && \
    gperf --version

# install flex
ADD https://github.com/westes/flex/files/981163/flex-2.6.4.tar.gz /usr
RUN cd /usr && \
    tar -xzvf flex-2.6.4.tar.gz && \
    cd flex-2.6.4 && \
    ./autogen.sh && \
    ./configure && \
    make && \
    make install && \
    flex --version

# install doxygen
#ADD https://doxygen.nl/files/doxygen-1.9.1.linux.bin.tar.gz /usr
#RUN cd /usr && \
#    tar -xzvf doxygen-1.9.1.linux.bin.tar.gz && \
#    mv doxygen-1.9.1/bin/doxygen /usr/bin/doxygen

# install libpng
#ADD https://downloads.sourceforge.net/libpng/libpng-1.6.37.tar.xz /usr
#RUN cd /usr && \
#    tar xf libpng-1.6.37.tar.xz && \
#    cd libpng-1.6.37 && \
#    ./configure --prefix=/usr --disable-static && \
#    make && \
#    make install

# fetch the LibreOffice source
ADD https://github.com/LibreOffice/core/archive/libreoffice-${LIBREOFFICE_VERSION}.tar.gz /tmp
RUN cd /tmp \
    && tar -xzf libreoffice-${LIBREOFFICE_VERSION}.tar.gz \
    && mv core-libreoffice-${LIBREOFFICE_VERSION} libreoffice

WORKDIR /tmp/libreoffice

# see https://ask.libreoffice.org/en/question/72766/sourcesver-missing-while-compiling-from-source/
RUN echo "lo_sources_ver=${LIBREOFFICE_VERSION}" >> sources.ver

# install liblangtag (not available in Amazon Linux or EPEL repos)
# paste repo info from https://unix.stackexchange.com/questions/433046/how-do-i-enable-centos-repositories-on-rhel-red-hat
COPY config/centos.repo /etc/yum.repos.d/
RUN yum repolist && yum install -y liblangtag && cp -r /usr/share/liblangtag /usr/local/share/liblangtag/

RUN ./autogen.sh \
    --disable-avahi \
    --disable-cairo-canvas \
    --disable-coinmp \
    --disable-cups \
    --disable-cve-tests \
    --disable-dbus \
    --disable-dconf \
    --disable-dependency-tracking \
    --disable-evolution2 \
    --disable-dbgutil \
    --disable-extension-integration \
    --disable-extension-update \
    --disable-firebird-sdbc \
    --disable-gio \
    --disable-gstreamer-1-0 \
    --disable-gstreamer-1-0 \
    #--disable-gtk \
    --disable-gtk3 \
    --disable-introspection \
    #--disable-kde4 \
    --disable-gtk3-kde5 \
    --disable-largefile \
    --disable-lotuswordpro \
    --disable-lpsolve \
    --disable-odk \
    --disable-ooenv \
    --disable-pch \
    --disable-postgresql-sdbc \
    --disable-python \
    --disable-randr \
    --disable-report-builder \
    --disable-scripting-beanshell \
    --disable-scripting-javascript \
    --disable-sdremote \
    --disable-sdremote-bluetooth \
    --enable-mergelibs \
    --with-galleries="no" \
    --with-system-curl \
    --with-system-expat \
    --with-system-libxml \
    --with-system-nss \
    --with-system-openssl \
    --with-theme="no" \
    --without-export-validation \
    --without-fonts \
    --without-helppack-integration \
    --without-java \
    --without-junit \
    --without-krb5 \
    --without-myspell-dicts \
    --without-system-dicts

# Disable flaky unit test failing on macos (and for some reason on Amazon Linux as well)
# find the line "void PdfExportTest::testSofthyphenPos()" (around 600)
# and replace "#if !defined MACOSX && !defined _WIN32" with "#if defined MACOSX && !defined _WIN32"
RUN sed -i '647s/#if !defined MACOSX && !defined _WIN32/#if defined MACOSX \&\& !defined _WIN32/' vcl/qa/cppunit/pdfexport/pdfexport.cxx

# this will take 30 minutes to 2 hours to compile, depends on your machine
RUN make

# this will remove ~100 MB of symbols from shared objects
# strip will always return exit code 1 as it generates file warnings when hitting directories
RUN strip ./instdir/**/* || true

# remove unneeded stuff for headless mode
RUN rm -rf ./instdir/share/gallery \
    ./instdir/share/config/images_*.zip \
    ./instdir/readmes \
    ./instdir/CREDITS.fodt \
    ./instdir/LICENSE* \
    ./instdir/NOTICE

# test if compilation was successful
RUN echo "hello world" > a.txt \
    && ./instdir/program/soffice --headless --invisible --nodefault --nofirststartwizard \
    --nolockcheck --nologo --norestore --convert-to pdf --outdir $(pwd) a.txt

RUN tar -cvf /tmp/lo.tar instdir/


# Brotli
ENV BROTLI_VERSION=1.0.9

WORKDIR /tmp

# Compile Brotli
ADD https://github.com/google/brotli/archive/v${BROTLI_VERSION}.zip /usr
RUN cd /usr \
    && yum install -y make zip unzip bc autoconf automake libtool \
    && unzip v${BROTLI_VERSION}.zip \
    && cd brotli-${BROTLI_VERSION} \
    && ./bootstrap \
    && ./configure \
    && make \
    && make install

RUN brotli --best /tmp/lo.tar && zip -r layers.zip lo.tar.br

Convert html to pdf

Hello
A couple mistakes on converting html->pdf:
I need convert msg files to pdf. As LibreOffice can't convert msg to pdf directly, i do pre-conversion msg-> html by another converter app. And AWS Lambda with LibreOffice converts this html to pdf.
Result pdf contains empty, extra first blank page and doesn't contain "From" email part. I've tried desktop LibreOffice app, it converts the same html correctly.

Hyperlinks Missing in PDF

I am seeing an issue where my .docx file has a working HTTP link, but the PDF does not. In the PDF, it looks like a hyperlink, but neither mouse-overs nor mouse-clicks give any response.

Is this a supported feature? Or perhaps there is a configuration setting I'm missing?

I'm using aws-labmda-libreoffice v5.0.0 and libreoffice-lambda-base-image v7.4

Thanks for your feedback.

Seth

com::sun::star::container::NoSuchElementException - On Amazon Linux 2 (Karoo) with Nodejs 10

I am using arn:aws:lambda:us-east-1:764866452798:layer:libreoffice-gzip:1 and trying to test the function locally (MacOS) using aws-sam-cli (SAM CLI, version 0.40.0)

I get the following error:
ERROR Invoke Error {"errorType":"Error","errorMessage":"Command failed: cd /tmp && /tmp/instdir/program/soffice.bin --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp test.txt\nterminate called after throwing an instance of 'com::sun::star::container::NoSuchElementException'\n/bin/sh: line 1: 28 Aborted /tmp/instdir/program/soffice.bin --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp test.txt\n","status":134,"signal":null,"output":[null,{"type":"Buffer","data":[]},{"type":"Buffer","data":[116,101,114,109,105,110,97,116,101,32,99,97,108,108,101,100,32,97,102,116,101,114,32,116,104,114,111,119,105,110,103,32,97,110,32,105,110,115,116,97,110,99,101,32,111,102,32,39,99,111,109,58,58,115,117,110,58,58,115,116,97,114,58,58,99,111,110,116,97,105,110,101,114,58,58,78,111,83,117,99,104,69,108,101,109,101,110,116,69,120,99,101,112,116,105,111,110,39,10,47,98,105,110,47,115,104,58,32,108,105,110,101,32,49,58,32,32,32,32,50,56,32,65,98,111,114,116,101,100,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,47,116,109,112,47,105,110,115,116,100,105,114,47,112,114,111,103,114,97,109,47,115,111,102,102,105,99,101,46,98,105,110,32,45,45,104,101,97,100,108,101,115,115,32,45,45,105,110,118,105,115,105,98,108,101,32,45,45,110,111,100,101,102,97,117,108,116,32,45,45,118,105,101,119,32,45,45,110,111,108,111,99,107,99,104,101,99,107,32,45,45,110,111,108,111,103,111,32,45,45,110,111,114,101,115,116,111,114,101,32,45,45,99,111,110,118,101,114,116,45,116,111,32,112,100,102,32,45,45,111,117,116,100,105,114,32,47,116,109,112,32,116,101,115,116,46,116,120,116,10]}],"pid":27,"stdout":{"type":"Buffer","data":[]},"stderr":{"type":"Buffer","data":[116,101,114,109,105,110,97,116,101,32,99,97,108,108,101,100,32,97,102,116,101,114,32,116,104,114,111,119,105,110,103,32,97,110,32,105,110,115,116,97,110,99,101,32,111,102,32,39,99,111,109,58,58,115,117,110,58,58,115,116,97,114,58,58,99,111,110,116,97,105,110,101,114,58,58,78,111,83,117,99,104,69,108,101,109,101,110,116,69,120,99,101,112,116,105,111,110,39,10,47,98,105,110,47,115,104,58,32,108,105,110,101,32,49,58,32,32,32,32,50,56,32,65,98,111,114,116,101,100,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,47,116,109,112,47,105,110,115,116,100,105,114,47,112,114,111,103,114,97,109,47,115,111,102,102,105,99,101,46,98,105,110,32,45,45,104,101,97,100,108,101,115,115,32,45,45,105,110,118,105,115,105,98,108,101,32,45,45,110,111,100,101,102,97,117,108,116,32,45,45,118,105,101,119,32,45,45,110,111,108,111,99,107,99,104,101,99,107,32,45,45,110,111,108,111,103,111,32,45,45,110,111,114,101,115,116,111,114,101,32,45,45,99,111,110,118,101,114,116,45,116,111,32,112,100,102,32,45,45,111,117,116,100,105,114,32,47,116,109,112,32,116,101,115,116,46,116,120,116,10]},"stack":["Error: Command failed: cd /tmp && /tmp/instdir/program/soffice.bin --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp test.txt","terminate called after throwing an instance of 'com::sun::star::container::NoSuchElementException'","/bin/sh: line 1: 28 Aborted /tmp/instdir/program/soffice.bin --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp test.txt",""," at checkExecSyncError (child_process.js:629:11)"," at execSync (child_process.js:666:13)"," at exports.handler.Promise.all.then (/var/task/app.js:154:13)"," at process._tickCallback (internal/process/next_tick.js:68:7)"]}

Error while running the service in kubernetes

{ 
  Error: ENOENT: no such file or directory, open '/opt/lo.tar.br'
  errno: -2,
  code: 'ENOENT',
  syscall: 'open',
  path: '/opt/lo.tar.br' 
}

I am getting the above error while running the service in kubernetes/docker.
According to the documentation: Since version 2.0.0 npm package no longer ships the 85 MB LibreOffice. So, how do we add libreoffice in the docker container?

Sometimes conversion exceeds RAM limits

First of all I want to thank you for such an amazing tool!

The issue

In my scenario, I have a lambda function (memory limit is set to 3008Mb) that is constantly invoked to convert different PPT(x)/DOC(x) documents to PDFs. Converting moderately big files (~30Mb) works fine, but converting large files of ~80Mb sometimes fails.

I noticed that it doesn't fail when the lambda is executed after some time of not running at all (so when it performs a cold start). My hypothesis is that after cold start the RAM is completely clean, and there are no issues when converting. But after a couple of executions, the RAM somehow is dirty with results of previous executions, and the new conversion fails to load the appropriate data since there is not enough space.

Could this issue be because the LibreOffice process is not exiting properly after a conversion is done?

Some technical details:

The lambda is running Amazon Linux 2
The runtime is nodejs12.x
As stated above, the memory size is 3008Mb
Timeout is 800 seconds (although no timeout issues have happened)
Using version 3.0.1 of @shelf/aws-lambda-libreoffice
Libreoffice is loaded via the external layer arn:aws:lambda:us-west-2:764866452798:layer:libreoffice-brotli:1
- Which is the latest version provided by libreoffice-lambda-layer
For conversion of the files I use convertTo(filename "pdf");

Is there a way to run this locally?

I'm using serverless-offline on MacOS, when I try to run the converter locally it says

no suitable image found.  Did find:
     node_modules/@shelf/aws-lambda-brotli-unpacker/src/iltorb/build/bindings/iltorb.node: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
       node_modules/@shelf/aws-lambda-brotli-unpacker/src/iltorb/build/bindings/iltorb.node: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00

Is this meant only for running on a deployed environment or there is another way to make ti work locally?

Thanks

Publish as a Lambda Layer

Due to such a huge size this dependency is heavy to install locally. Also it takes time to bundle and deploy. Ideal fit for Lambda Layers!

We should fork this for Google Cloud Functions too.

We should create a fork of this to support google/firebase cloud functions.

https://github.com/vially/cloud-functions-libreoffice
this fork was created but is unmaintained.

Is anyone willing to collab for the same?

converting pdf to pdf

Hi,

Thank you for libreoffice lambda

I tried converting pdf to pdf

wowza.pdf

wowza-converted.pdf

do you have a solution for the edges that have been cut off

does not work with node6

works as expected with node8, but we have all the lambdas locked to node6.
it throws Error: Module version mismatch. Expected 48, got 57. error.

is there a instructions to how to generate /src/iltorb/build/bindings/iltorb.node for node 6?

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (github>shelfio/renovate-config)

Failed to convert if instance of lambda if re-used

Hello,

Please try the Lambda Function source code attached.

Issue is reproduced when running two requests, one after another. In this case, instance of Lambda Function is reused and all next calls will be failed.

If you wait couple of minutes for Lambda instance to shutdown, it will run normal

LibreOfficeNPM.zip

Blank PDF's getting generated for subsequent uploaded document

@shelfio/aws-lambda-libreoffice Version: 3.0.1

Problem: First time when libre lambda function is invoked, it works well giving desired output of pdf however if i submit multiple document for conversion then it generates blank pdf's.

Code snippet: (handler.js)

module.exports.libre = async event => {
 
   console.log("Incoming Event: ", event);
  const bucket = event.Records[0].s3.bucket.name;
  const filename = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
  const message = `File is uploaded in - ${bucket} -> ${filename}`;
  console.log(message);

  var params = {
    Bucket: process.env.SOURCE_BUCKET,
    //Key: data.document.s3Key
    Key : event.Records[0].s3.object.key
  };

  const tempFileName = `/tmp/${filename}`;
  console.log(tempFileName)
  var tempFile = fs.createWriteStream(tempFileName);

const s3 = new S3();
  var s3Stream = s3.getObject(params).createReadStream().pipe(tempFile);

  // Listen for errors returned by the service
  s3Stream.on('error', function(err) {
  // NoSuchKey: The specified key does not exist
  console.error(err);
});

s3Stream.pipe(tempFile).on('error', function(err) {
  // capture any errors that occur when writing data to the file
  console.error('File Stream:', err);
}).on('close', function() {
  console.log('Done.');
});


   console.log(filename)
  if (!canBeConvertedToPDF(filename)) {
    console.log('In false method')
    return false;
  }

  let convertedPath = await convertTo(filename, 'pdf')
  console.log('file converted')
  console.log(convertedPath)


  const outputFilename = `${parse(filename).name}.pdf`;
  const outputFileBuffer = readFileSync(`/tmp/${outputFilename}`);

  await s3
  .upload({
    Bucket: process.env.DESTINATION_BUCKET, Key: outputFilename, Body: outputFileBuffer,
    ACL: 'public-read', ContentType: 'application/pdf'
  })
  .promise();


  return `https://s3.amazonaws.com/${bucket}/${outputFilename}`;
};

Observation :
Lambda timeout period is 15 min . If i try invoking lambda after 15 min, it will work and generates correct output generating PDF. Subsequent document job again will produce blank pdf's.

@vladgolubev @KnupMan Any pointers ?

default input / output path

there is a way to change default path for convertTo()? it takes the fiole from /tmp and write on the same directory, there is a way to change to a differents path?

Password protected documents are not handled successfully

In our use case, password protected documents are quite common. People try to preview them, and the lambda conversion function fails because obviously the files are encrypted. I do believe that it's expected behavior for these conversions to fail, but fail in a user-friendly manner.

Currently, this case ends up in a JS error:
Cannot read property '1' of null
It's thrown by this line of code

aws-lambda-libreoffice/src/logs.ts

Line 2 in ab4dde6

return logs.match(/\/tmp\/.+->\s(\/tmp\/.+) using/)[1];

Because output of soffice --convert-to command outputs some error instead of converted file path

I might provide a pull request to somehow avoid this in the near future. I think the easiest solution would to find an easy way of finding out whether a file is encrypted or not, but a quick google search did not help me out. Perhaps somebody else has a better solution ?

EDIT: I noticed there has already been an issue regarding this problem, but hopefully I or somebody else will come up with a solution this time. #28

javaldx: Could not find a Java Runtime Environment!

So I believe I followed the instructions, not sure if I missed something though.

FROM public.ecr.aws/shelf/lambda-libreoffice-base:7.4-node16-x86_64

COPY index.js package.json ${LAMBDA_TASK_ROOT}

RUN npm install

CMD [ "index.handler" ]

This is my Dockerfile I have my app code in index.js.
I'm running docker build --platform linux/x86_64 -t convert-to-pdf . and then deploying that image to my ECR registry and deploying that on my lambda function.
It's all working, except when I call the convertTo method I end up getting this chunk of errors:

javaldx: Could not find a Java Runtime Environment!
Warning: failed to read path from javaldx
Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///tmp/SampleResume.pdf> failed: 0xc10(Error Area:Io Class:Write Code:16) at /home/buildslave/source/libo-core/sfx2/source/doc/sfxbasemodel.cxx:3207 at /home/buildslave/source/libo-core/sfx2/source/doc/sfxbasemodel.cxx:1783)

I haven't seen anyone else having this issue, so I'm unsure what the problem could be.

Question about unpack()

Hi, I want to call soffice.bin directly.
However, it seems that unpack() does not exist above v4 (?)

aws-lambda-libreoffice/readme.md

Lines 86 to 98 in 47e61fc

 Or if you want more control: 

 ```js 

 const {unpack, defaultArgs} = require('@shelf/aws-lambda-libreoffice'); 

 await unpack(); // default path /tmp/instdir/program/soffice.bin 

 execSync( 

 `/tmp/instdir/program/soffice.bin ${defaultArgs.join( 

  ' ' 

  )} --convert-to pdf file.docx --outdir /tmp` 

 ); 

 ```

Is there a way to do the same?

It seems to crash with docx larger than a certain size

It works perfectly most of the time but certain documents make it crash. I suspect it is document size as the document I tried was 5MB. Desktop Libreoffice is able to convert this docx Real Property Casebook _ text.docx to pdf.

I am not sure if there are special characters in the document that make converting fail. And, subsequent requests that are sent to this same lambda instance would not work (see the error message at the bottom). It seems that the libreoffice soffice.bin file is removed when this error occurs.

This is the error log I got when I tried to convert the attached docx

/bin/sh: line 1: 221 Aborted (core dumped) /tmp/instdir/program/soffice.bin --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --nofirststartwizard --convert-to pdf --outdir /tmp /tmp/RealPropertyCasebook_text.docx
/tmp/instdir/program/soffice.bin[0x40068a]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7f780372413a]
/tmp/instdir/program/soffice.bin[0x40064b]
/tmp/instdir/program/libmergedlo.so(soffice_main+0x105)[0x7f7805999cc5]
/tmp/instdir/program/libmergedlo.so(_Z10ImplSVMainv+0x62)[0x7f7806942642]
/tmp/instdir/program/libmergedlo.so(+0x1c74755)[0x7f7805979755]
/tmp/instdir/program/libmergedlo.so(_ZN11Application7ExecuteEv+0x3e)[0x7f780693bc8e]
/tmp/instdir/program/libmergedlo.so(+0x2c34f62)[0x7f7806939f62]
/tmp/instdir/program/libmergedlo.so(_ZN14SvpSalInstance7DoYieldEbb+0x39)[0x7f78069d3f99]
/tmp/instdir/program/libmergedlo.so(_ZN16SalUserEventList18DispatchUserEventsEb+0x185)[0x7f7806914b65]
/tmp/instdir/program/libmergedlo.so(_ZN14SvpSalInstance12ProcessEventEN16SalUserEventList12SalUserEventE+0x26)[0x7f78069d3626]
/tmp/instdir/program/libmergedlo.so(+0x29d1351)[0x7f78066d6351]
/tmp/instdir/program/libmergedlo.so(+0x1c7366a)[0x7f780597866a]
/tmp/instdir/program/libmergedlo.so(+0x1c71f82)[0x7f7805976f82]
/tmp/instdir/program/libmergedlo.so(+0x1c8d752)[0x7f7805992752]
/tmp/instdir/program/libmergedlo.so(+0x1c85255)[0x7f780598a255]
/tmp/instdir/program/libmergedlo.so(_ZN10comphelper19SynchronousDispatch8dispatchERKN3com3sun4star3uno9ReferenceINS4_10XInterfaceEEERKN3rtl8OUStringESD_RKNS4_8SequenceINS3_5beans13PropertyValueEEE+0x3b0)[0x7f7804d99a00]
/tmp/instdir/program/libmergedlo.so(+0x1567ad8)[0x7f780526cad8]
/tmp/instdir/program/libmergedlo.so(+0x1566e14)[0x7f780526be14]
/tmp/instdir/program/libmergedlo.so(+0x15cbb96)[0x7f78052d0b96]
/tmp/instdir/program/libmergedlo.so(+0x15ca4fa)[0x7f78052cf4fa]
/tmp/instdir/program/libmergedlo.so(+0x1c2da10)[0x7f7805932a10]
/tmp/instdir/program/libmergedlo.so(_ZN12SfxBaseModel4loadERKN3com3sun4star3uno8SequenceINS2_5beans13PropertyValueEEE+0x1bb)[0x7f78058954cb]
/tmp/instdir/program/libmergedlo.so(_ZN14SfxObjectShell6DoLoadEP9SfxMedium+0x1124)[0x7f780586cf44]
/tmp/instdir/program/libmergedlo.so(_ZN14SfxObjectShell10ImportFromER9SfxMediumRKN3com3sun4star3uno9ReferenceINS4_4text10XTextRangeEEE+0x1ac3)[0x7f7805864143]
/lib64/libc.so.6(abort+0x148)[0x7f7803738148]
/lib64/libc.so.6(gsignal+0x110)[0x7f7803736ca0]
/lib64/libc.so.6(+0x33d10)[0x7f7803736d10]
/tmp/instdir/program/libuno_sal.so.3(+0x394ae)[0x7f7803ae74ae]
/tmp/instdir/program/libuno_sal.so.3(+0x16b09)[0x7f7803ac4b09]
/tmp/instdir/program/libmergedlo.so(+0x2c3bb04)[0x7f7806940b04]
/tmp/instdir/program/libmergedlo.so(+0x1c6ea50)[0x7f7805973a50]
/tmp/instdir/program/libmergedlo.so(_ZN11Application5AbortERKN3rtl8OUStringE+0x95)[0x7f780693a455]
/tmp/instdir/program/libmergedlo.so(+0x2cb87c2)[0x7f78069bd7c2]
/lib64/libc.so.6(abort+0x148)[0x7f7803738148]
/lib64/libc.so.6(gsignal+0x110)[0x7f7803736ca0]
/lib64/libc.so.6(+0x33d10)[0x7f7803736d10]
/tmp/instdir/program/libuno_sal.so.3(+0x395b3)[0x7f7803ae75b3]
/tmp/instdir/program/libuno_sal.so.3(+0x13e62)[0x7f7803ac1e62]
Stack:
Fatal exception: Signal 6
Application Error
Error: source file could not be loaded
rm: cannot remove ‘/tmp/RealPropertyCasebook_text.docx’: No such file or directory

Error: Command failed: rm /tmp/RealPropertyCasebook_text.docx rm: cannot remove ‘/tmp/RealPropertyCasebook_text.docx’: No such file or directory at checkExecSyncError (child_process.js:635:11) at execSync (child_process.js:671:15) at convertTo (/var/task/node_modules/@shelf/aws-lambda-libreoffice/lib/convert.js:58:31) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:97:5) at async LibreOfficeService.convertToPdf (/var/task/services/libreoffice-service.js:173:7) at async Runtime.exports.handler (/var/task/app.js:102:15) { status: 1, signal: null, output: ...

This the error log for the subsequent requests

/bin/sh: /tmp/instdir/program/soffice.bin: No such file or directory

Error: Command failed: cd /tmp && /tmp/instdir/program/soffice.bin --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --nofirststartwizard --convert-to pdf --outdir /tmp /tmp/abc.docx 
/bin/sh: /tmp/instdir/program/soffice.bin: No such file or directory at checkExecSyncError (child_process.js:635:11) 
at execSync (child_process.js:671:15) at convertTo (/var/task/node_modules/@shelf/aws-lambda-libreoffice/lib/convert.js:55:40) 
at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:97:5) 
at async LibreOfficeService.convertToPdf (/var/task/services/libreoffice-service.js:182:7) 
at async Runtime.exports.handler (/var/task/app.js:102:15) ...

Certain characters in filename cause errors

first off, I just wanted to thank you for this amazing library and the work you've put into it!

however, I did notice a small issue while using it - canBeConvertedToPDF returns true for files with certain characters (parentheses in my case) in the name, however the convertTo function throws the following error for the same file:

/bin/sh: -c: line 0: syntax error near unexpected token ('`

for anyone experiencing this issue, my workaround is to escape() the filename before using this tool.

not a big issue, just wanted to save some hassle for anyone who runs into this : )

Fonts are not getting transferred

I am using this to convert a word document to a PDF document in AWS lambda and I have the word document in Times New Roman, but the PDF that is created does not use the Times New Roman font. I am not sure if this is an issue with the packaged libre office or this npm package though.

errors if filename doesn't have an extension

if I have a filename with an extension, getConvertedFilePath fails

	Or if you want more control:

	```js
	const {unpack, defaultArgs} = require('@shelf/aws-lambda-libreoffice');

	await unpack(); // default path /tmp/instdir/program/soffice.bin

	execSync(
	`/tmp/instdir/program/soffice.bin ${defaultArgs.join(
	' '
	)} --convert-to pdf file.docx --outdir /tmp`
	);
	```

shelfio / aws-lambda-libreoffice Goto Github PK

aws-lambda-libreoffice's People

Contributors

Stargazers

Watchers

Forkers

aws-lambda-libreoffice's Issues

1- Created my own docker image:

2- Pushed it to ECR.

3- Created a Lambda Function:

Open

Detected dependencies

The issue

Some technical details:

Recommend Projects

Recommend Topics

Recommend Org