Giter VIP home page Giter VIP logo

html-to-xlsx's Introduction

html-to-xlsx

NPM Version License Build Status

node.js html to xlsx transformation

Transformation only supports html table and several basic style properties. No images or charts are currently supported.

Usage

const util = require('util')
const fs = require('fs')
const conversionFactory = require('html-to-xlsx')
const puppeteer = require('puppeteer')
const chromeEval = require('chrome-page-eval')({ puppeteer })
const writeFileAsync = util.promisify(fs.writeFile)

const conversion = conversionFactory({
  extract: async ({ html, ...restOptions }) => {
    const tmpHtmlPath = path.join('/path/to/temp', 'input.html')

    await writeFileAsync(tmpHtmlPath, html)

    const result = await chromeEval({
      ...restOptions,
      html: tmpHtmlPath,
      scriptFn: conversionFactory.getScriptFn()
    })

    const tables = Array.isArray(result) ? result : [result]

    return tables.map((table) => ({
      name: table.name,
      getRows: async (rowCb) => {
        table.rows.forEach((row) => {
          rowCb(row)
        })
      },
      rowsCount: table.rows.length
    }))
  }
})

async function run () {
  const stream = await conversion(`<table><tr><td>cell value</td></tr></table>`)

  stream.pipe(fs.createWriteStream('/path/to/output.xlsx'))
}

run()

Supported properties

  • background-color - cell background color
  • color - cell foreground color
  • border-left-style - as well as positions will be transformed into excel cells borders
  • text-align - text horizontal align in the excel cell
  • vertical-align - vertical align in the excel cell
  • width - the excel column will get the highest width, it can be little bit inaccurate because of pixel to excel points conversion
  • height - the excel row will get the highest height
  • font-size - font size
  • colspan - numeric value that merges current column with columns to the right
  • rowspan - numeric value that merges current row with rows below.
  • overflow - the excel cell will have text wrap enabled if this is set to scroll or auto.

Constructor options

const conversionFactory = require('html-to-xlsx')
const puppeteer = require('puppeteer')
const chromeEval = require('chrome-page-eval')({ puppeteer })
const conversion = conversionFactory({ /*[constructor options here]*/})
  • extract function [required] - a function that receives some input (an html file path and a script) and should return some data after been evaluated the html passed. the input that the function receives is:
    {
      html: <file path to a html file>,
      scriptFn: <string that contains a javascript function to evaluate in the html>,
      timeout: <time in ms to wait for the function to complete, the function should use this value to abort any execution when the time has passed>,
      /*options passed to `conversion` will be propagated to the input of this function too*/
    }
  • tmpDir string - the directory path that the module is going to use to save temporary files needed during the conversion. defaults to require('os').tmpdir()
  • timeout number - time in ms to wait for the conversion to complete, when the timeout is reached the conversion is cancelled. defaults to 10000

Conversion options

const fs = require('fs')
const conversionFactory = require('html-to-xlsx')
const puppeteer = require('puppeteer')
const chromeEval = require('chrome-page-eval')({ puppeteer })
const conversion = conversionFactory({
  extract: async ({ html, ...restOptions }) => {
    const tmpHtmlPath = path.join('/path/to/temp', 'input.html')

    await writeFileAsync(tmpHtmlPath, html)

    const result = await chromeEval({
      ...restOptions,
      html: tmpHtmlPath,
      scriptFn: conversionFactory.getScriptFn()
    })

    const tables = Array.isArray(result) ? result : [result]

    return tables.map((table) => ({
      name: table.name,
      getRows: async (rowCb) => {
        table.rows.forEach((row) => {
          rowCb(row)
        })
      },
      rowsCount: table.rows.length
    }))
  }
})

async function main () {
  const stream = await conversion(/* html */, /* extract options */)
}

main()
  • html string - the html source that will be transformed to an xlsx, the html should contain a table element
  • extractOptions object - additional options to pass to the specified extract function

License

See license

html-to-xlsx's People

Contributors

allenconcur avatar areinmeyer avatar bjrmatos avatar fmmich avatar jamcnaughton avatar pofider avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

html-to-xlsx's Issues

Word wrap for long text

Sometimes we have so long text and we can see that heigth of cell has been changed. Do we have any way to apply word wrap?

Current behavior:
no_word_wrap

Expected result:
expected_result

Support for cell types

It would be great is cell types were supported. Eg. Number, text, date, ...
Currently everything is a string.

error : res is not defined

var conversion = require ('html-to-xlsx') ();
var fs = require('fs');

fs.readFile('./Sample9.html', 'utf8', function (err,data) {

if (err) return console.log(err);

conversion(data,function(err,stream) {

    stream.pipe(res);

})

});

I am trying to read the html file(ie file name Sample9.html) place in my root directory and convert it into xlsx , which is not happening can any one help me with this issue

Converting tables with > 400k cells

Conversion crashes when there is more than ~400k cells in the table. Mostly because of the memory limits. I have tried several approaches to fix this, but don't have an ultimate solution for this yet.

  1. The current implementation uses phantom.eval to return cells' values and styles back to the phantom and then to node process. The return from phantom.eveal crashes if you have ~200k cells.

I was able to get over this with batch processing phantom.eval on just 1000 rows.

  1. There are still way too many data to keep in memory. However the cell styles are mostly the same.

I was able to decrease the memory impact with just storing a style id for each cell and styles separately.

  1. phantomjs doesn't seems to be releasing memory at all. Even if I just iterate over table it grows and grows.

Calling phantomjs.page.close() after each phantomjs.page.eval. helps a bit.


With the previous fixes, it still doesn't process more than 800k cells.

The final solution could be to write the partial cell information batches to a file. However the phantomjs memory management is really strange and the solution doesn't need to necessarily work.

unexpected width of cell when using colspan

as reported here and with a demo available here, the second row of excel has the first column width matching the width of the first row (one cell merged from 3 cells). since browsers does not treat this kind of content in the same way (they treat first cell of second row as something standalone from the previous row width) i would say that it is the best if we can match the browser behaviour.

current output:

captura de pantalla 2017-10-19 a las 10 34 22 a m

browser output:

captura de pantalla 2017-10-19 a las 10 34 07 a m

Latest JSreport 1.8.2 Installation Problem with specific Port Num

Hello Sir, i downloaded jsreport-win.zip(1.8.2) file for installing in windows 10. Using jsreport configure command in command prompt i configured and one dev.config.json file was created. in this configuration i have given 3002 as my port number.After this, using jsreport win-install command i installed jsreport output showed your jsreport service is running. Again i typed jsreport start command it is throwing error and the service which was installed is not working with that port number.Can you please help me how to install this latest version with User required port number[ex : 3002]. And also i am using old version jsreport and i want to upgrade engines and receipes also in my visual studio[.net]. Tell me how to upgrade relevant nuget packages for supporting this latest version.

unexpected height of cell when using rowspan

(similar to #19) second and third cells of first row has more height than normally when using rowspan in first cell. since browsers does not treat this kind of content in the same way (they treat cell height as something standalone from the previous cell height) we should match the browser behaviour.

demo of issue here

current output:

captura de pantalla 2017-10-31 a las 9 46 26 a m

browser output:

captura de pantalla 2017-10-31 a las 9 46 48 a m

Error: connect ECONNREFUSED

I've been banging my head over this for about a week now and I can't seem to find a solution.
I'm on OSX Sierra with phantomjs-prebuilt in my node_modules and I get the following error:

Error: connect ECONNREFUSED 127.0.0.1:49837
     at Object.exports._errnoException (util.js:1023:11)
     at exports._exceptionWithHostPort (util.js:1046:20)
     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1090:14)

And this is the code:

let html = pug.renderFile(Utils.root('src/templates/test.pug'), { data: data });
convertToExcel(html, (err, stream) => {
  if (err) {
    console.error(err);
    throw err;
  }
  res.setHeader('Content-Type', 'application/octet-stream');
  res.setHeader('Content-Disposition', 'attachment; filename=report.xlsx');
  stream.pipe(res);
}

I've tried several different configurations with no success so far.
Can you provide any insight?

AWS Lambda puppeteer

I've been using this package for a while on AWS Lambda on the pre 1.x version. I was looking to update my package to the latest version. But I'm having trouble with running puppeteer in AWS Lambda I've search for various solutions online but haven't found one yet that works. Was wondering if you had a possible solution or an alternative. It all started with this error:

Failed to launch chrome!\n/var/task/node_modules/puppeteer/

I've tried using puppeteer-core, puppeteer-lambda and varios other things I've found online.

Multiple rowspan not supported

In our implementation we came across a bug in the rowspan support. It seems when multiple rowspans exist for the same row (i.e., multiple columns span multiple rows), the Excel document that is rendered either is corrupted or has cell values not in the correct order.

I'm currently working on a fix for this, but wanted to document the issue for others and have a issue to attach to the PR.

multiple tables in single sheet?

currently my html page having 3 tables
after converting html to excel 3 tables is showing in 3 different sheets i want it in single sheet.

multi rowspan not render correctly

In may case I need to create that table:

one two test test
one two
one two

The table output is missing the values that comes after the first row, like this:

one two test test

If I'll remove one of the "rowspan" it will work correctly.
And if I put the rowspan as the first tds it also will work correctly.
Any suggestion?

Add delimeter ' to the number fields

After converting the html to xlsx file . All the columns which holds number are added with ' delimiter which causes the problem in calculating their total in excel as it has ' so the total can't be done.
Reply me on this soon.

delimeter

rowspan not render correctly

<table>
    <tr>
        <td>1</td>
        <td rowspan="2">2</td>
    </tr>
    <tr>
        <td>3</td>
    </tr>
</table>

cell 3 is not shown in the xlsx file

Ampersand in html

Need to find a way how to put ampersand in to the xlsx.

Neither of &, &amp, &amp;amp; currently works.

Invalid unicode characters in input data crashes node

"message":"Error: Invalid character (\u001f) in string: \n adri\u001f_\n 

at index 17\n at XMLStringifier.module.exports.XMLStringifier.assertLegalChar 
(E:\\approot\\node_modules\\xmlbuilder\\lib\\XMLStringifier.js:150:15)\n at 
XMLStringifier.assertLegalChar (E:\\approot\\node_modules\\xmlbuilder\\lib\\XMLStringifier.js:4:59)\n at 
XMLStringifier.module.exports.XMLStringifier.eleText (E:\\approot\\node_modules\\xmlbuilder\\lib\\XMLStringifier.js:27:19)\n at new 
XMLText (E:\\approot\\node_modules\\xmlbuilder\\lib\\XMLText.js:19:35)\n at XMLElement.module.exports.XMLNode.text 
(E:\\approot\\node_modules\\xmlbuilder\\lib\\XMLNode.js:168:15)\n at XMLElement.module.exports.XMLNode.node 
(E:\\approot\\node_modules\\xmlbuilder\\lib\\XMLNode.js:160:15)\n at XMLElement.module.exports.XMLNode.element 
(E:\\approot\\node_modules\\xmlbuilder\\lib\\XMLNode.js:103:28)\n at XMLElement.module.exports.XMLNode.ele 
(E:\\approot\\node_modules\\xmlbuilder\\lib\\XMLNode.js:271:19)\n at SharedStrings.toxml 
(E:\\approot\\node_modules\\msexcel-builder-extended\\lib\\msexcel-builder.js:299:10)\n at Workbook.save 
(E:\\approot\\node_modules\\msexcel-builder-extended\\lib\\msexcel-builder.js:831:75)\n at Workbook.save 
(E:\\approot\\node_modules\\msexcel-builder-extended\\lib\\msexcel-builder.js:7:59)\n at tableToXlsx 
(E:\\approot\\node_modules\\html-to-xlsx\\lib\\conversion.js:113:14)\n at icb 
(E:\\approot\\node_modules\\html-to-xlsx\\lib\\conversion.js:51:9)\n at 
E:\\approot\\node_modules\\html-to-xlsx\\lib\\serverStrategy.js:43:13\n at 
E:\\approot\\node_modules\\phantom-workers\\lib\\phantomManager.js:102:9\n at IncomingMessage.<anonymous> 
(E:\\approot\\node_modules\\phantom-workers\\lib\\phantomWorker.js:134:13)\n at IncomingMessage.emit (events.js:117:20)"

conversion fails when last cell of a row contains rowspan

demo of the problem here.

the problem is mainly because this line, when last cell has a rowspan different than default (1), the curr_row ends with a value that points to an unexisting row.

this can be fixed with some conditions but the bigger problem is that this case is special.

this is the browser output for this case:

captura de pantalla 2017-10-31 a las 1 07 11 p m

as you can see the cells of second row starts at the position of first row, so this would require to have an special mapping of rows/cells to achieve the same behaviour of browsers.

Col Span is not working

Hello, I am trying to merge two or three columns using Colspan, it is not coming,
and i want to display in multiple table in the excel instead of showing all the data in a single table[ multiple grids in Excel], is it possible to do that one

I need to send options as arguments to puppeteer

You have

...restOptions,
         html: tmpHtmlPath,
         scriptFn: conversionFactory.getScriptFn()

I want to add:

headless: true,
         slowMo: 100,
         args: amIRoot() ? ['--no-sandbox', '--disable-setuid-sandbox'] : undefined,

But I need a few options to be passed to puppeteer. Please direct me to where in the lib to make the adjustment, or let know if there's another way to do this.

Just started using this package...pretty good!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.