Giter VIP home page Giter VIP logo

Comments (5)

pietersv avatar pietersv commented on August 21, 2024

officegen is primarily a document generation library. It sounds like you want to read existing docx files and extract their properties. On Node.js, you can loop through filenames in a directory, read the files, and count the page breaks:

var JSZip = require('jszip');
var fs = require('fs');
var cheerio = require('cheerio');

fs.readdir(DIRECTORY, function (err, files) {
  files.filter(function (file) {return file.substr(-5).toLowerCase() == ".docx";}).forEach(function (file) {
        fs.readFile(DIRECTORY + file, function (err, content) {
          var zip = new JSZip();
          zip.load(content);
          var xml = zip.file('word/document.xml').asText();
          var $xml = cheerio.load(xml, {xmlMode: true})
          var breaks = $xml('w\\:br[w\\:type="page"]');
          console.log([file, breaks.length])
        });
      })
});

from officegen.

Mako-L avatar Mako-L commented on August 21, 2024

Thanks it helped me a lot with my research. I tried doing it with .doc files but it doesn't seem to work. I don't want to sound stupid or anything but can you make an example for doc too please. Thank you so much for the support 👍

from officegen.

pietersv avatar pietersv commented on August 21, 2024

Hm, the binary format is tougher. May be a better topic for StackOverflow,
e.g.
http://stackoverflow.com/questions/9038231/can-i-read-pdf-or-word-docs-with-node-js

On Sat, Apr 11, 2015 at 4:03 PM, MakoMakox [email protected] wrote:

Thanks it helped me a lot with my research. I tried doing it with .doc
files but it doesn't seem to work. I don't want to sound stupid or anything
but can you make an example for doc too please. Thank you so much for the
support [image: 👍]


Reply to this email directly or view it on GitHub
#57 (comment).

Pieter Sheth-Voss PhD

protobi
e: [email protected]
m: 617.645.4524

from officegen.

Jonexuan avatar Jonexuan commented on August 21, 2024

@MakoMakox hi,I'm a new beginner of web development.I wonder if u have already solve this problem.Could u make an example.Thanks a lot.

from officegen.

vishal7201 avatar vishal7201 commented on August 21, 2024

https://www.npmjs.com/package/docx-pdf-pagecount using this module the number of pages in docx and pdf file can be returned.

const getPageCount = require('docx-pdf-pagecount');

getPageCount('E:/sample/document/aa/test.docx')
  .then(pages => {
    console.log(pages);
  })
  .catch((err) => {
    console.log(err);
  });
  

getPageCount('E:/sample/document/vb.pdf')
  .then(pages => {
    console.log(pages);
  })
  .catch((err) => {
    console.log(err);
  });

from officegen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.