Giter VIP home page Giter VIP logo

html-to-docx's Introduction

html-to-docx

NPM Version

html-to-docx is a js library for converting HTML documents to DOCX format supported by Microsoft Word 2007+, LibreOffice Writer, Google Docs, WPS Writer etc.

It was inspired by html-docx-js project but mitigates the problem of documents generated being non-compatiable with word processors like Google Docs and libreOffice Writer that doesn't support altchunks feature.

html-to-docx earlier used to use libtidy to clean up the html before parsing, but had to remove it since it was causing so many dependency issues due to node-gyp.

Disclaimer

Even though there is an instance of html-to-docx running in production, please ensure that it covers all the cases that you might be encountering usually, since this is not a complete solution.

Currently it doesn't work with browser directly, but it was tested against React.

Installation

Use the npm to install foobar.

npm install html-to-docx

Usage

await HTMLtoDOCX(htmlString, headerHTMLString, documentOptions, footerHTMLString)

full fledged examples can be found under example/

Parameters

  • htmlString <String> clean html string equivalent of document content.
  • headerHTMLString <String> clean html string equivalent of header. Defaults to <p></p> if header flag is true.
  • documentOptions <?Object>
    • orientation <"portrait"|"landscape"> defines the general orientation of the document. Defaults to portrait.
    • margins <?Object>
      • top <Number> distance between the top of the text margins for the main document and the top of the page for all pages in this section in TWIP. Defaults to 1440. Supports equivalent measurement in pixel, cm or inch.
      • right <Number> distance between the right edge of the page and the right edge of the text extents for this document in TWIP. Defaults to 1800. Supports equivalent measurement in pixel, cm or inch.
      • bottom <Number> distance between the bottom of text margins for the document and the bottom of the page in TWIP. Defaults to 1440. Supports equivalent measurement in pixel, cm or inch.
      • left <Number> distance between the left edge of the page and the left edge of the text extents for this document in TWIP. Defaults to 1800. Supports equivalent measurement in pixel, cm or inch.
      • header <Number> distance from the top edge of the page to the top edge of the header in TWIP. Defaults to 720. Supports equivalent measurement in pixel, cm or inch.
      • footer <Number> distance from the bottom edge of the page to the bottom edge of the footer in TWIP. Defaults to 720. Supports equivalent measurement in pixel, cm or inch.
      • gutter <Number> amount of extra space added to the specified margin, above any existing margin values. This setting is typically used when a document is being created for binding in TWIP. Defaults to 0. Supports equivalent measurement in pixel, cm or inch.
    • title <?String> title of the document.
    • subject <?String> subject of the document.
    • creator <?String> creator of the document. Defaults to html-to-docx
    • keywords <?Array<String>> keywords associated with the document. Defaults to ['html-to-docx'].
    • description <?String> description of the document.
    • lastModifiedBy <?String> last modifier of the document. Defaults to html-to-docx.
    • revision <?Number> revision of the document. Defaults to 1.
    • createdAt <?Date> time of creation of the document. Defaults to current time.
    • modifiedAt <?Date> time of last modification of the document. Defaults to current time.
    • headerType <"default"|"first"|"even"> type of header. Defaults to default.
    • header <?Boolean> flag to enable header. Defaults to false.
    • footerType <"default"|"first"|"even"> type of footer. Defaults to default.
    • footer <?Boolean> flag to enable footer. Defaults to false.
    • font <?String> font name to be used. Defaults to Times New Roman.
    • fontSize <?Number> size of font in HIP(Half of point). Defaults to 22. Supports equivalent measure in pt.
    • complexScriptFontSize <?Number> size of complex script font in HIP(Half of point). Defaults to 22. Supports equivalent measure in pt.
    • table <?Object>
      • row <?Object>
        • cantSplit <?Boolean> flag to allow table row to split across pages. Defaults to false.
    • pageNumber <?Boolean> flag to enable page number in footer. Defaults to false. Page number works only if footer flag is set as true.
    • skipFirstHeaderFooter <?Boolean> flag to skip first page header and footer. Defaults to false.
    • lineNumber <?Boolean> flag to enable line numbering. Defaults to false.
    • lineNumberOptions <?Object>
      • start <Number> start of the numbering - 1. Defaults to 0.
      • countBy <Number> skip numbering in how many lines in between + 1. Defaults to 1.
      • restart <"continuous"|"newPage"|"newSection"> numbering restart strategy. Defaults to continuous.
  • footerHTMLString <String> clean html string equivalent of footer. Defaults to <p></p> if footer flag is true.

Returns

<Promise<Buffer|Blob>>

Notes

Currently page break can be implemented by having div with classname "page-break" or style "page-break-after" despite the values of the "page-break-after", and contents inside the div element will be ignored. <div class="page-break" style="page-break-after: always;"></div>


CSS list-style-type for <ol> element are now supported. Just do something like this in the HTML:

  <ol style="list-style-type:lower-alpha;">
    <li>List item</li>
    ...
  </ol>

List of supported list-style-type:

  • upper-alpha, will result in A. List item
  • lower-alpha, will result in a. List item
  • upper-roman, will result in I. List item
  • lower-roman, will result in i. List item
  • decimal, will result in 1. List item
  • lower-alpha-bracket-end, will result in a) List item
  • decimal-bracket-end, will result in 1) List item
  • decimal-bracket, will result in (1) List item

Also you could add attribute data-start="n" to start the numbering from the n-th.

<ol data-start="2"> will start the numbering from ( B. b. II. ii. 2. )

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to branch new branches off of develop for contribution.

License

MIT

html-to-docx's People

Contributors

amrita-syn avatar charuthab avatar hanagejet avatar kurukururuu avatar privateomega avatar robinminso avatar

Watchers

 avatar

Forkers

jwasnoggin tupaca

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.