mlewand / rtf-parse Goto Github PK

View Code? Open in Web Editor NEW

11.0 3.0 2.0 2.13 MB

A simplified RTF parser.

License: MIT License

JavaScript 100.00%

rtf node nodejs parser

rtf-parse's Introduction

rtf-parse

A simplified RTF parser.

Installation

$ npm install --save rtf-parse

Usage examples

const rtfParse = require( 'rtf-parse' ),
	path = require( 'path' );

rtfParse.parseFile( path.join( '_fixtures', 'rtfSimple.rtf' ) )
	.then( doc => {
		// Do anything you like with rtf.model.Document instance of your document.
	} );

const rtfParse = require( 'rtf-parse' );

rtfParse.parseString( '{\\rtf1 foobar}' )
	.then( doc => {
		// Do anything you like with rtf.model.Document instance of your document.
	} );

You can find more usage examples in examples directory.

Also you could also browse tests to see how the API is used.

Contribute

This is fully open source pet project, if you feel you're in a mood for a pull request, you're more than welcome to do so!

Getting In Touch

You can always ping me at Twitter @m_lewand.

License

MIT © Marek Lewandowski

rtf-parse's People

Contributors

Stargazers

Watchers

Forkers

pabeda-bt risseraka

rtf-parse's Issues

Provide unified names for commands

Currently the command model operates on "raw" command value, picked while parsing. It means that the same command might be \rtf1 as well as \rtf1 (space at the end). Instead model should have a property like name, where it would be unified simply to rtf1.

Merge subsequent text entries

When creating the model it would be nice to join all the subsequent text models, even if they were splitted by \r\n line separator.

Paragraph support

Paragraphs in RTF are somewhat funny, as they're enclosed between \pard and \par command. It's probably also enclosed in a group.

{\pard This is my fancy text.\par}

Event for parser context change

Eevery time parser changes context, a proper event should be fired.

Similar to #7.

Error on building with webpack - Uglify Error

Hello, First thanks for this great job.

I'm using this plugin in a web based application. And at the end I'm trying to get a build with

"npm run bulild"

but getting this error :
ERROR in 0.a9abdcd49066c2a9cfb7.chunk.js from UglifyJs
Unexpected token: name (Parser) [0.a9abdcd49066c2a9cfb7.chunk.js:96957,7]

The problem is in node_modules/rtf-parse/Parser.js because it is written by using ES6 and uglify js can't handle EF6.

I'm using this config with webpack but it doesn't help anyway.


module: {
    loaders: [{
      test: /\.js$/, // Transform all .js files required somewhere with Babel
      loader: 'babel-loader',
      exclude: /node_modules\/(?!(rtf\-parse)\/).*/,
      query: options.babelQuery,
    }, {

Can you please post releases which is converted to ES5 or do you have an idea how can I fix my problem?

Tokenizer.process is called recursively

We need to refactor mentioned method, so that it's not being called recursively. Because of that currently it calls exception about too big call stack when working with big RTF files.

Implement missing applyToModel methods

At least token.Escape class still has no applyToModel method implemented.

For a brief moment I was thinking about softening "virtual" implementation in base Token intefrace, but it's actually better when it pops as soon as dev forgots to implement one.

Provide an option to early return

A great addition would be a possibility to early return from parsing.

There are some cases for this:

Say you might want simply to parse up to the point where you find interesting data, no further parsing is needed.
You want to parse just a part of RTF, and return as fast as possible.

I'll explain second case:

For this instance I might have loaded 30mb rtf with a picture on it's very end. I can play smart guy, and just find {\pict position in the string, and start parsing from there, all the way until I got to matching GroupEnd - so that I have all the picture data loaded. I'm not interested in parsing whole file, so having this information I want to abort parsing.

Refactor tokens

It became clear that token classes needs to be a spearate classes from AST model types. So it makes perfect sense to create a dedicated namespace for tokens to keep things clear.

Implement Travis CI

Some Linux / OSX CI would be nice.

Better bmp handling

While #12 added initial handling for images, the support is not yet full featured.

One thing I saw missing is support for bmp images.

To reproduce just create a RFT using WordPad on Win10 and paste some image from the clipboard. Then use example/images.js to extract images from this file.

Produced bmps have some correct information, because if I open it in graphic app it shows it, however the size of image is incorrect.

Methods for traversing model

Now that we have some fundamental, we can focus on methods that will help us having the actual work done.

We need some convenient ways to work with models e.g. parentModel.find( curModel => curModel.value !== 'foo' ).

It also should allow for nested search.

Actually it's possible that this event will be helpful for #5.

Image handling

It's about time to add support for \pict commands.