Giter VIP home page Giter VIP logo

function-pdf-to-jpeg's Introduction

PDF to JPEG Azure Function (Custom docker runtime)

This Azure function uses the pdf2image library to convert PDF files to JPEG format.

There are multiple ways this project can be used:

  • The master branch will convert every page to JPEG and concatenate them togetehr into a single .jblob file. The .jblob file can then be converted back into multiple JPEGs by parsing it (example parser).
  • The first_page_only branch will only convert the first page to JPEG and save it as a .jpg file.
  • The one_file_per_page branch will convert every page of the PDF input into its own JPEG file. This version is a bit special and doesn't use the built-in Azure Function set() method to create the BLOBs, as it doesn't seem to support a variable number of BLOB outputs yet (with Python at least). Thus, another library is used and the name pattern in the output blob binding is ignored. The resulting JPEG files will be stored at the root of the destination container with the pattern {filename}-{pageNumber}.png.

The function is triggered each time a file is dropped in the pdfs BLOB container and puts the resulting .jblob in the images BLOB container. (see Configuration).

Usage

To use this function, you must create an Azure Function App and select Docker Container in the "Publish" section.

Once the Function App is created, select "Container settings" is the app dashboard and enter one of these:

  • dockerutils/pypdftojpg:latest is the master branch, with the jblob file containing all the images concatenated.
  • dockerutils/pypdftojpg:first_page_only is the first_page_only branch.
  • dockerutils/pypdftojpg:one_file_per_page is the one_file_per_page branch.

See the section above for more details on these variants.

Configuration

This function uses two variables called InputStorage and OutputStorage. In the app dashboard, select "Configuration" and provide a value for both InputStorage and OutputStorage (The Storage Accounts access key). Both variable can designate the same Azure Data Storage

See the application settings documentation.

{
    "scriptFile": "__init__.py",
    "bindings": [
        {
            "name": "myBlob",
            "type": "blobTrigger",
            "direction": "in",
            "path": "pdfs/{name}",
            "connection": "InputStorage"
        },
        {
            "name": "myOutputBlob",
            "type": "blob",
            "path": "images/{name}.jblob",
            "connection": "OutputStorage",
            "direction": "out"
        }
    ]
}

Why Docker

The pdf2image library requires the poppler tool to be installed on the host. This tool is not included in the default Microsoft Python runtime, so this runtime needs to be extended.

function-pdf-to-jpeg's People

Contributors

uponleir avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.