Giter VIP home page Giter VIP logo

rtf-parse's Introduction

rtf-parse

GitHub version Build status Build Status codecov Dependency Status GitHub issues GitHub closed issues

A simplified RTF parser.

Installation

$ npm install --save rtf-parse

Usage examples

const rtfParse = require( 'rtf-parse' ),
	path = require( 'path' );

rtfParse.parseFile( path.join( '_fixtures', 'rtfSimple.rtf' ) )
	.then( doc => {
		// Do anything you like with rtf.model.Document instance of your document.
	} );
const rtfParse = require( 'rtf-parse' );

rtfParse.parseString( '{\\rtf1 foobar}' )
	.then( doc => {
		// Do anything you like with rtf.model.Document instance of your document.
	} );

You can find more usage examples in examples directory.

Also you could also browse tests to see how the API is used.

Contribute

This is fully open source pet project, if you feel you're in a mood for a pull request, you're more than welcome to do so!

Getting In Touch

You can always ping me at Twitter @m_lewand.

License

MIT © Marek Lewandowski

rtf-parse's People

Contributors

mlewand avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rtf-parse's Issues

Provide unified names for commands

Currently the command model operates on "raw" command value, picked while parsing. It means that the same command might be \rtf1 as well as \rtf1 (space at the end). Instead model should have a property like name, where it would be unified simply to rtf1.

Merge subsequent text entries

When creating the model it would be nice to join all the subsequent text models, even if they were splitted by \r\n line separator.

Paragraph support

Paragraphs in RTF are somewhat funny, as they're enclosed between \pard and \par command. It's probably also enclosed in a group.

{\pard This is my fancy text.\par}

Error on building with webpack - Uglify Error

Hello, First thanks for this great job.

I'm using this plugin in a web based application. And at the end I'm trying to get a build with

"npm run bulild"

but getting this error :
ERROR in 0.a9abdcd49066c2a9cfb7.chunk.js from UglifyJs
Unexpected token: name (Parser) [0.a9abdcd49066c2a9cfb7.chunk.js:96957,7]

The problem is in node_modules/rtf-parse/Parser.js because it is written by using ES6 and uglify js can't handle EF6.

I'm using this config with webpack but it doesn't help anyway.


module: {
    loaders: [{
      test: /\.js$/, // Transform all .js files required somewhere with Babel
      loader: 'babel-loader',
      exclude: /node_modules\/(?!(rtf\-parse)\/).*/,
      query: options.babelQuery,
    }, {

Can you please post releases which is converted to ES5 or do you have an idea how can I fix my problem?

Tokenizer.process is called recursively

We need to refactor mentioned method, so that it's not being called recursively. Because of that currently it calls exception about too big call stack when working with big RTF files.

Implement missing applyToModel methods

At least token.Escape class still has no applyToModel method implemented.

For a brief moment I was thinking about softening "virtual" implementation in base Token intefrace, but it's actually better when it pops as soon as dev forgots to implement one.

Provide an option to early return

A great addition would be a possibility to early return from parsing.

There are some cases for this:

  • Say you might want simply to parse up to the point where you find interesting data, no further parsing is needed.
  • You want to parse just a part of RTF, and return as fast as possible.

I'll explain second case:

For this instance I might have loaded 30mb rtf with a picture on it's very end. I can play smart guy, and just find {\pict position in the string, and start parsing from there, all the way until I got to matching GroupEnd - so that I have all the picture data loaded. I'm not interested in parsing whole file, so having this information I want to abort parsing.

Refactor tokens

It became clear that token classes needs to be a spearate classes from AST model types. So it makes perfect sense to create a dedicated namespace for tokens to keep things clear.

Better bmp handling

While #12 added initial handling for images, the support is not yet full featured.

One thing I saw missing is support for bmp images.

To reproduce just create a RFT using WordPad on Win10 and paste some image from the clipboard. Then use example/images.js to extract images from this file.

Produced bmps have some correct information, because if I open it in graphic app it shows it, however the size of image is incorrect.

Methods for traversing model

Now that we have some fundamental, we can focus on methods that will help us having the actual work done.

We need some convenient ways to work with models e.g. parentModel.find( curModel => curModel.value !== 'foo' ).

It also should allow for nested search.

Implement RTF model

Now that PoC for token parsing is ready we need to (based on tokens) create a RTF model. And this is what actually people using the lib will work with.

In this project RTF model would mean pretty much the same what DOM means for HTML.

Expose model classes

All the Model subclasses should be exposed so that these can be used conveniently with Model.getChildren() method and so on.

Event for model instances

We need an event to be fired as model entry is added.

Currently we have only Tokenizer.matched event, which is triggered for Token instances. Need the same thing for models.

Actually it's possible that this event will be helpful for #5.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.