Giter VIP home page Giter VIP logo

masala-parser's People

Contributors

d-plaindoux avatar dependabot[bot] avatar domdomegg avatar lndamaral avatar ltearno avatar nicolas-zozol avatar scamden avatar simon-zozol avatar thecrypticace avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

masala-parser's Issues

Thoughts about Bennu?

I started writing my own parser combinators library until I came across this project and Bennu. Bennu hasn't been worked on in a while, but it seems quite mature. Its really lightweight and has all the bells and whistles I could imagine.

Any reason you are deciding to build a library of your own rather than using Bennu? Just curious, what are you guys using this library for?

Parser Extension: p.sequence(x1,x2,x3)=>[X1,X2,X3]

The goal is to write easily these often seen structures :

2+2

P.sequence ( this.number(), '+', this.number() )

will result in an array : [2, '+', 2]

So we can write :

P.sequence ( this.number(), '+', this.number() ).map( values => values[0] + values[2] );

Brainfuck parser

Make the parse of a Kiss compiler. The context created must reference variables and scopes. If there is none in this stupid language, use an ultra basic Javascript.

Overloading

We often use combinator.parse(Stream.of(document) ). We could add a function combinator.parseString(document), or modify the parse(doc | stream) function to test if argument is a String or a Stream.

Question is the same with the extractor. Should we create textUntil( string | combinator ), or create textUntil(combinator) and textUntilString(string) ?

Build automatically only master on dev

It's mainly to avoid emails when we push onb WIP branches. I don't know what is the best practice. I'm pretty sure we don't need emails at midnight to say the build is broken.

Parser Extension : thenSpread

The objective is to quickly mix a with b multiple elements.

In Parser class :

thenSpread(p) {
        return this.flatmap((a) => p.map((b) => [a, ...b]));
    }

Real exemple : We have a paragraph then a succession of others

function paragraphs() {
    return P.try(paragraph().thenSpread(followingParagraph().rep()));
}

It will return :

parsing Accept {
  offset: 90,
  consumed: true,
  value: 
   [ { paragraph: [Object] },
     { paragraph: [Object] },
     { paragraph: [Object] } ],
  input: 
   StringStream {
     source: 'Lorem ipsum is a *first* paragraph\nSecond line\n\nThe second paragraph\n\nThe Third paragraph\n' } }

The counterpart return this.flatmap((a) => p.map((b) => [...a,b])); might also be interesting in some cases.

Extractor features

I will open a branch for a text Extractor in standard/ directory. It comes after my "real world mission". Main features will be:

  • stringIn(['John', 'Jack']): like charIn, but with an array of strings
  • charNumber: search for numbers, but returns a string and keep 0 ; useful for dates or phone number ( 0502)
  • simpleWords(separators): grab words define as lettersand separated with separators. Separators will default to [' ', '\n'];
  • textUntil(stop, including:bool = false): eats text until stop. Stop can be a combinator, or a text. Set including flag to true to also eat the stop
  • looseDate() : a very loose search for date. Real date with iso formatting would be a full time project. Giving a simple reusable exemple is a better idea.

"We are the 23-09-2013 at noon." : textUntil(looseDate()) will return "We are the " and offset is put at the beginning of the date. textUntil(looseDate(), true) would return "We are the 23-09-2013"

Setting up build for using ES6

Hello @d-plaindoux
I have set ES6 with grunt-babel and jshint does correctly his job. Grunt needs to be at 0.4 old version because of grunt-coverage dependency.

At the end babel produces a dist/app.js ES5 file with sourcemap. Options are defined in the Gruntfile.

When running grunt --gruntfile Gruntfile_Coverage.js,I have no error but I have seen no coverage information, and no lib-cov/report directory. What is it supposed to do ?

If it's ok, I will ES6 everything as soon as possible.

How to select up to a given sequence?

What is the proper way to select everything up to a given sequence?

Let's say that I have this source : "wordSTOPwordagainSTOP"
And I want my value to be : ["word","wordagain"]

I have wrote :

function stop(){
    return P.string('STOP').rep();
}
function detectStop(){
    return P.try(stop().or(P.letters())).rep();
}
function parseText( line, offset=0){
    return detectStop().parse(stream.ofString(line), offset)
}

So once the parser enters into letters(), it does not come back to see if there is a STOP, which has however a better priority. How would I do that ?

====

But suppose that the separator is now -stop- with not only letters

function stop(){
return P.string('-stop-').rep();
}

And we test : test-stop-test-stop-, then my parser works

It is easy, because the parse will detect a special character '-'. But we don't always want to
introduce a special character.

====

Add a debug function for Parser so that we understand what's going on

I propose a debug function used like this :

function paragraph() {
    // if we have found a line, then we will enter in debug
    return line().debug('found a line').thenLeft(eol.opt()).map(paragraphText);
}

Let's have a line as Markdown input. The console output is :

[debug] : found a line [ { text: 'Lorem ' }, { bold: { text: 'ipsum' } }...]

Because debug() is put after line(), the parser will enter in debug. If there was no line, we would not enter.
I send a PR with an exemple of code. If you have more ideas, they are welcome.

letter is only for US ascii letters

The P.letter parser will not accept accents likeé or any foreign UTF-8 characters. There is a sufficient for the moment trick:

var firstLetter = name.charAt(0).toUpperCase();
if( firstLetter.toLowerCase() != firstLetter) {
    // it's a letter
}
else {
    // it's a symbol
}

First solution is to rename it to P.asciiLetter. Second solution is to use the trick, but it will make it quite slower. Third solution is to redefine the method and use a flag as P.letter(onlyAscii=true)

Test `optrep` is not accorded to its text

The question is : When you use optrep(), is it ok to have zero elements ? I think so according to the passing tests.

'expect (optrep) to accepted': function(test) {

    test.deepEqual(parser.char("a").optrep().parse(stream.ofString("a"),0).isAccepted(),
                   true,
                   'should be accepted.');

  },

  'expect (optrep) none to accepted': function(test) {

    test.deepEqual(parser.char("a").optrep().parse(stream.ofString("b"),0).isAccepted(),
                   true,
                   'should be rejected.');  // <===== HERE : did you meant 'Sould be accepted' ?

  },

I'm not fond of this semantic : in real language, you repeat when you have at least two elements. optrep() should be ok with one element or more, but not OK with zero elements.

So we need to build something else when you need to test a real repetition (at least two elements). Something like :

    P.try(anyTitle()).blankLine().blankLine().optrep().paragraph()  // at least one blankline

Optimise Buffered stream

A stream can be buffered. For this purpose an entropic cache is build. When a parser is accepted a cut mechanism can be applied in order to flush cache and optimise memory usage.

Operation Parser

Creating a simple operation parser mainly for educational purposes

Release 0.2 and integration

  • Make sur one can use parsec minified
  • Make sure one can access to standard elements
  • make code source visible, mainly for examples

Creating an extensible LineParser class

I want to create a class LineParser :

export default class LineParser {  // ideally it could extend directly Parser or ParserHelper

   textValue(chars) {
        return { text: chars.join('').trim() };
    }
    text(separator) {
        if (separator) {
                                                                /// vvvv this is null
            return P.not(eol.or(P.string(separator))).optrep().map(this.textValue);
        } else {....}
    }

The problem is that this fonction is extracted from the class with this code :

// (('b -> Parser 'a 'c) * 'b)-> Parser 'a 'c
function lazy(p, parameters) {
                                             //  vvvv 'this' will always be null :(
    return new Parser((input, index=0) => p.apply(null, parameters).parse(input, index));
}

Do you have any clue to avoid this p.apply(null) ?

FlattenDeep

This method is unused. I think this stuff is useless since map and flatmap provide the best approach for data transformation.

Code simplification

In the markdown token.js the following function:

function fourSpacesBlock() {
    return P.char('\t').or(P.try(P.charIn(' \u00A0').then(P.charIn(' \u00A0'))
        .then(P.charIn(' \u00A0')).then(P.charIn(' \u00A0'))));
}

can be simplified and replaced by:

function fourSpacesBlock() {
    return P.char('\t').or(P.charIn(' \u00A0').occurrence(4));
}

Infix operators using Sweet

Expressiveness can be increased using infix operators.

p1 <*> p2    // == p1.then(p2)
p1 <|> p2    // == p1.or(p2)
p1 >>= p2    // == p1.flatmap(p2)
p1 || p2     // == p1.chain(p2)

This can be achieved using Sweet.JS meta language.

operator <*> left 1 = (left, right) => #`${left}.then(${right})`;
operator <|> left 1 = (left, right) => #`${left}.or(${right})`;
operator >>= left 1 = (left, right) => #`${left}.flatmap(${right})`;
operator || left 1  = (left, right) => #`${left}.chain(${right})`;

Performance issue

Since v 0.3, hotelhub automated tests are 10x slower, but the main difference is that their custom code has been wrapped into the Extractor bundle, and that we have split code on bundles.

Accept compareTo function in stream.substreamAt

Here is the current implementation of subStream:

// Stream 'a => [Comparable 'a] -> number -> boolean
    subStreamAt(s, index){
        for (var i = 0; i < s.length; i++) {
            var value = this.get(i + index);
            if (!value.isSuccess() || value.success() !== s[i]) { // <=== compareTo
                return false;
            }
        }

        return true;
    }

Suppose we want to create P.stringIgnoreCase("john doe"). We could send a compareTo function to subStream(s, index, [compareTo]).

if (!value.isSuccess() || !compareTo(value.success(), s[i]) ) {
                return false;
            }

Is it a good idea ?

F.first, F.last

It's a mapping function to pick first and last element of an array:

var helloParser = C.string("Hello")
                    .then(C.char(' ').rep())
                    .then(C.char("'"))
                    .thenRight(C.letter.rep()) // keeping repeated ascii letters
                    .thenLeft(C.char("'"));    // keeping previous letters
                    .map (F.last) // keep last letter

Use kebab-case for file names

underscores or upperCase in files name looks weird in Javascript. We are used to kebab-case such as
line-parser.js moslty because of Windows/Linux file compatibility.

poll

Improving build system

  • Allow use of rm -rf or mkdir tasks with Windows, using rim-raf
  • copy samples files easily

Standard reoganisation

The standard directory contains a json parser and a naive markdown parser (obsolete) and a markdown parser. This should be reorganise with sub directories i.e. one per parser.

Mix of tabs and spaces for bullets

This test is passing. It should not, I think.

 'test bullet niveau 2': function (test) {
        const line = "\t  * This is another lvl2 bullet \n  ";
        testLine(line);
        test.deepEqual({bullet:{level:2, content:[{text:'This is another lvl2 bullet '}]}}, value,'probleme test:test bullet Lvl2');
        test.done();
    },

RxJS and similies stream: `parser.ofRx()`

It should be quite easy to create a stream form RxJS, and as it is a de facto standard, it should be well accepted.
A bit more complex is to allow send Response inside another Rx stream.

1.0 Roadmap discussion

Suppose there is no bug. What feature do we need for 1.0 ?

  • Kiss compiler
  • Clear separation of concerns:
    • Use of Bundles #52
    • isolated libs in standard
    • examples in examples
  • Good naming
  • Dealing with internationalization : #51
  • Compatible with Fantasy Land library for monadic support
  • Build a pattern matching library on top of parsec
  • Binary stream decoder (Scala Scodec)
  • Data Marshaller based on binary stream decoder

Any ideas ?

Add an easy-to-start parseString function

Instead of

var parsec = require('parser-combinator');
var S = require('parser-combinator').stream;
const document = "Hello World in 2017";
const stream = S.ofString(document);
var P = parsec.parser;

We could just write parser.parseString(document); or something like that.
To be defined ...

Parser Extension : flattenDeep

The goal is to parse easily ;

!image: duck.png

Using something like :

P.char('!').then(text()).then(P.char(':')).thenLeft(spaces()).then(text())

We now have a strange soup of arrays. Using :

P.char('!').then(text()).then(P.char(':')).thenLeft(spaces()).then(text()).flattenDeep()

We now have a flat array : ['!','image', ':', 'duck']

Windows Support for eol

Windows uses a \r\n for line feed. There is a lot of \n tests, especially in Markdown, so we may check if \r\n is also working.

Exemple: markdown/bullet-parser.js

function bulletLv1(){
    // TODO: check if T.eol is better on windows
    return C.char('\n').optrep()

Choose Token place

There is a token class, some token (?) in Parser such as P.letter, and a standard/token file. Maybe the term token for an email is inappropriate.

Integration tests

Making a 2 step integration:

  • prepublishing: npm run prepublish will package parser-combinator.js to /pre-integration and verify that it can be used
  • pospublishing: npm run integration will download from npm and check that it's ok to be used

Rename thenLeft and thenRight

I think it's not easy to understand for beginners.

We could change x.thenLeft(y) by x.thenSkip(y) and x.thenRight(y) by x.thenKeep(y);

What do you think ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.