Giter VIP home page Giter VIP logo

clarinet's Introduction

clarinet

NPM Downloads NPM Version CDNJS

clarinet is a sax-like streaming parser for JSON. works in the browser and node.js. clarinet is inspired (and forked) from sax-js. just like you shouldn't use sax when you need dom you shouldn't use clarinet when you need JSON.parse. for a more detailed introduction and a performance study please refer to this article.

design goals

clarinet is very much like yajl but written in javascript:

  • written in javascript
  • portable
  • robust (~110 tests pass before even announcing the project)
  • data representation independent
  • fast
  • generates verbose, useful error messages including context of where the error occurs in the input text.
  • can parse json data off a stream, incrementally
  • simple to use
  • tiny

motivation

the reason behind this work was to create better full text support in node. creating indexes out of large (or many) json files doesn't require a full understanding of the json file, but it does require something like clarinet.

installation

node.js

  1. install npm
  2. npm install clarinet
  3. var clarinet = require('clarinet');

browser

  1. minimize clarinet.js
  2. load it into your webpage

usage

basics

var clarinet = require("clarinet")
  , parser = clarinet.parser()
  ;

parser.onerror = function (e) {
  // an error happened. e is the error.
};
parser.onvalue = function (v) {
  // got some value.  v is the value. can be string, double, bool, or null.
};
parser.onopenobject = function (key) {
  // opened an object. key is the first key.
};
parser.onkey = function (key) {
  // got a subsequent key in an object.
};
parser.oncloseobject = function () {
  // closed an object.
};
parser.onopenarray = function () {
  // opened an array.
};
parser.onclosearray = function () {
  // closed an array.
};
parser.onend = function () {
  // parser stream is done, and ready to have more stuff written to it.
};

parser.write('{"foo": "bar"}').close();
// stream usage
// takes the same options as the parser
var stream = require("clarinet").createStream(options);
stream.on("error", function (e) {
  // unhandled errors will throw, since this is a proper node
  // event emitter.
  console.error("error!", e)
  // clear the error
  this._parser.error = null
  this._parser.resume()
})
stream.on("openobject", function (node) {
  // same object as above
})
// pipe is supported, and it's readable/writable
// same chunks coming in also go out.
fs.createReadStream("file.json")
  .pipe(stream)
  .pipe(fs.createReadStream("file-altered.json"))

arguments

pass the following arguments to the parser function. all are optional.

opt - object bag of settings regarding string formatting. all default to false.

settings supported:

  • trim - boolean. whether or not to trim text and comment nodes.
  • normalize - boolean. if true, then turn any whitespace into a single space.

methods

write - write bytes onto the stream. you don't have to do this all at once. you can keep writing as much as you want.

close - close the stream. once closed, no more data may be written until it is done processing the buffer, which is signaled by the end event.

resume - to gracefully handle errors, assign a listener to the error event. then, when the error is taken care of, you can call resume to continue parsing. otherwise, the parser will not continue while in an error state.

members

at all times, the parser object will have the following members:

line, column, position - indications of the position in the json document where the parser currently is looking.

closed - boolean indicating whether or not the parser can be written to. if it's true, then wait for the ready event to write again.

opt - any options passed into the constructor.

and a bunch of other stuff that you probably shouldn't touch.

events

all events emit with a single argument. to listen to an event, assign a function to on<eventname>. functions get executed in the this-context of the parser object. the list of supported events are also in the exported EVENTS array.

when using the stream interface, assign handlers using the EventEmitter on function in the normal fashion.

error - indication that something bad happened. the error will be hanging out on parser.error, and must be deleted before parsing can continue. by listening to this event, you can keep an eye on that kind of stuff. note: this happens much more in strict mode. argument: instance of Error.

value - a json value. argument: value, can be a bool, null, string on number

openobject - object was opened. argument: key, a string with the first key of the object (if any)

key - an object key: argument: key, a string with the current key. Not called for first key (use openobject for that).

closeobject - indication that an object was closed

openarray - indication that an array was opened

closearray - indication that an array was closed

end - indication that the closed stream has ended.

ready - indication that the stream has reset, and is ready to be written to.

samples

some samples are available to help you get started. one that creates a list of top npm contributors, and another that gets a bunch of data from twitter and generates valid json.

roadmap

check issues

contribute

everyone is welcome to contribute. patches, bug-fixes, new features

  1. create an issue so the community can comment on your idea
  2. fork clarinet
  3. create a new branch git checkout -b my_branch
  4. create tests for the changes you made
  5. make sure you pass both existing and newly inserted tests
  6. commit your changes
  7. push to your branch git push origin my_branch
  8. create an pull request

helpful tips:

check index.html. there's two env vars you can set, CRECORD and CDEBUG.

  • CRECORD allows you to record the event sequence from a new json test so you don't have to write everything.
  • CDEBUG can be set to info or debug. info will console.log all emits, debug will console.log what happens to each char.

in test/clarinet.js there's two lines you might want to change. #8 where you define seps, if you are isolating a test you probably just want to run one sep, so change this array to [undefined]. #718 which says for (var key in docs) { is where you can change the docs you want to run. e.g. to run foobar i would do something like for (var key in {foobar:''}) {.

meta

(oO)--',- in caos

clarinet's People

Contributors

andrewrk avatar anirudhb-sf avatar binki avatar brettz9 avatar bytesnz avatar davidepoveromoenel avatar davidgwking avatar dfahlander avatar dlehenbauer avatar dscape avatar evan-king avatar fent avatar henryrawas avatar isaacs avatar jimhigson avatar jlank avatar jmakeig avatar laurie71 avatar mikeal avatar nicferrier avatar richtera avatar rlasch avatar smh avatar thejh avatar tmkujala avatar tmpvar avatar yahsieh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clarinet's Issues

Not emitting first key

When running simple.html, I never see the onkey event emitted for the first key (firstName).

The is my output:

Value: John simple.html:11
Key: lastName simple.html:14
Value: Smith simple.html:11
Key: age simple.html:14
Value: 25 simple.html:11
Key: address simple.html:14
Value: 21 2nd Street simple.html:11
Key: city simple.html:14
Value: New York simple.html:11
Key: state simple.html:14
Value: NY simple.html:11
Key: postalCode simple.html:14
Value: 10021 simple.html:11
Key: phoneNumber simple.html:14
Value: home simple.html:11
Key: number simple.html:14
Value: 212 555-1234 simple.html:11
Value: fax simple.html:11
Key: number simple.html:14
Value: 646 555-4567 

Thanks for your patience.

Big files on npm

Hello!

Thanks for the awesome project.
I am currently using your repository (as a dependency of one of my dependencies) in a project of mine, and am packaging it for multiple platforms, however the clarinet folder is 36MB on disk and it all get's automatically packaged! Would you perhaps consider .npmignoreing test/* and samples/*? As they aren't needed as part of using the library and as such doesn't make sense requiring all devs download them to their environments, and in my case, my users downloading them in a binary.

Off by one character

Using, what I hoped was the simplest set-up, it seems to be parsing, but all of my keys and values are off by one character.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8" />
    <title>Clarinet</title>
    <script type="text/javascript">var exports = {};</script>
    <script type="text/javascript" src="../lib/clarinet.js"></script>
    <script type="text/javascript">
    var parser = exports.parser();
    parser.onvalue = function (v) {
      console.log("Value: " + v);
    };
    parser.onkey = function (key) {
      console.log("Key: " + key);
    };
    parser.write('{ "firstName": "John", "lastName" : "Smith", "age" : 25, "address" : { "streetAddress": "21 2nd Street", "city" : "New York", "state" : "NY", "postalCode" : "10021" }, "phoneNumber": [ { "type" : "home", "number": "212 555-1234" }, { "type" : "fax", "number": "646 555-4567" } ] }').close();
    </script>
</script>
</head>
<body>
  Look at the console.
</body>
</html>

results in

Value: ohn
Key: astName
Value: mith
Key: ge
Value: 25
Key: ddress
Value: 1 2nd Street
Key: ity
Value: ew York
Key: tate
Value: Y
Key: ostalCode
Value: 0021
Key: honeNumber
Value: ome
Key: umber
Value: 12 555-1234
Value: ax
Key: umber
Value: 46 555-4567

This is happening in Chrome 17.0.963.79. Am I doing something dumb?

outdated benchmark dependencies causes CVE's

Hi,

the stuff in the benchmark folder is obviously installed with clarinet and the dependencies defined in benchmark/package.json are very old. So running a CVE scanner results in:

Vulnerability ID           Package                        Severity        Fix            CVE Refs              Vulnerability URL                                        Type          Feed Group        Package Path                                       
GHSA-jf85-cpcp-j695        lodash-4.17.11                 High            4.17.12        CVE-2019-10744        https://github.com/advisories/GHSA-jf85-cpcp-j695        npm           github:npm        /opt/magic_mirror/node_modules/clarinet/benchmark/node_modules/lodash/package.json
GHSA-p6mc-m468-83gw        lodash-4.17.11                 Low             4.17.19        CVE-2020-8203         https://github.com/advisories/GHSA-p6mc-m468-83gw        npm           github:npm        /opt/magic_mirror/node_modules/clarinet/benchmark/node_modules/lodash/package.json

Is there a chance to update these dependencies or not getting installed this stuff using npm install --only=production?

Thanks,

Karsten.

Error: Max buffer length exceeded: textNode - What is the limit, can I raise it and if not what alternatives do I have.

I do have very large string values in my documents, so this is probably expected. Do I have any way of avoiding this ( I think some of the nodes could be > 64M, I know it sounds strange but basically I'm my JSON file is a database dump that could contain large text objects, or large binary objects that are HEXBINARY encoded.

Error: Max buffer length exceeded: textNode
Line: 1
Column: 523
Char:
at error (C:\Development\YADAMU\Oracle\node\node_modules\clarinet\clarinet.js:324:10)
at checkBufferLength (C:\Development\YADAMU\Oracle\node\node_modules\clarinet\clarinet.js:108:13)
at CParser.write (C:\Development\YADAMU\Oracle\node\node_modules\clarinet\clarinet.js:650:7)
at CStream.write (C:\Development\YADAMU\Oracle\node\node_modules\clarinet\clarinet.js:253:20)
at RowParser._transform (C:\Development\YADAMU\Oracle\node\cImport.js:358:21)
at RowParser.Transform._read (_stream_transform.js:190:10)
at RowParser.Transform._write (_stream_transform.js:178:12)
at doWrite (_stream_writable.js:410:12)
at writeOrBuffer (_stream_writable.js:394:5)
at RowParser.Writable.write (_stream_writable.js:294:11)

Closure Compiler Advanced Optimizations

Hi! Thanks so much for Clarinet -- it's a great project!

We're including it as part of a build that is being minified with the Google Closure Compiler using the ADVANCE_OPTIMIZATIONS option. There are a few minor code changes I had to make to get it to work, and I have the minified version passing the unit tests.

I'm not sure if this particular use-case warrants a pull request, but I thought I'd share just in case it's useful for anyone else. It's in the closure-compiler-advanced-optimizations branch on my fork.

https://github.com/semmypurewal/clarinet/tree/closure-compiler-advanced-optimizations

Thanks again!

Parser Line value seems to be incorrect in certain edge cases

With larger, deeply nested JSONs, that have been JSON.stringified (to prettify/autoformat the JSON) the parser line value appears to be off: it is incrementing in situations even where there is no \n newline value.

I was able to solve this by doing parser.line -= 1 in parser.onclosearray and parser.oncloseobject, but this issue probably deserves addressing without a manual fix. A cursory review of the clarinet.js source file suggests that we are simply incrementing when we encounter a new LineFeed (i.e., \n), so the code doesn't look immediately wrong, and I'm perplexed by what could be causing this.

Alternatively, if I've grossly misused the library somehow, that would be great to know too!

As an example case, try printing out parser.line in every single event callback (i.e., parser.onopenobject and parser.onvalue) for this -- you will see some line number discrepancies if you don't do line adjustments.

{
	"a": {
		"b": 5
	},
	"c": {
		"d": {
			"e": 1
		},
		"f": [
			7,
			8,
			9
		],
		"g": 5
	}
}

Text parsing buffer length check

I'm getting "Max buffer length exceeded" errors while parsing large text nodes. Looking at the code, this line seems to indicate that the parser should be able to handle strings that are larger than MAX_BUFFER_LENGTH. But closeText() doesn't exist. In practice, it never gets called because the buffer is named "textNode" instead of "text" (code).

I can work around the issue by increasing MAX_BUFFER_LENGTH, but being able to deal with arbitrary string sizes would be nicer, of course.

Streaming multi-byte UTF8 characters not being parsed correctly

Hey @dscape! I was using another streaming parser (jsonparse) when I found this class of bug, and checked clarinet to see if it was present, turns out it is. To save github some bytes of storage I'll link to the bug description at creationix/jsonparse.

In a nutshell, when you do .toString() on a streamed buffer, if the stream breaks between a multi-byte utf8 character, it will not be able to properly convert the split character, and it ends up putting two replacement characters in the stream instead. I haven't devised a specific solution for clarinet yet, but I know what needs to happen.

I also wrote a little demo repo so you can see the bug in action.

guess I didn't save those storage bytes after all.

Strict mode mentioned but not documented

I see a mention to strict mode in the readme but I don't see any mention of it in the code. I have invalid JSON and the parser will just steam right through it and not throw any errors out.

The sample invalid JSON: [{"n"'ame":"Hello" }]

missing full text of license of original work

Hi,

the MIT license you're referring to in the LICENSE file states:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

So unless your fork is now 'not a substantial portion' of sax-js, you have to comply by simply including the full MIT license.
I'm aware sax-js has changed license, but if i understand what happened correctly, you forked it long before the change, so https://github.com/isaacs/sax-js/blob/715951fff7e7477f6b42aee68c65a5de851b723a/LICENSE applies.

Regards,
Jérémy.

Haven't been able to install the latest clarinet version from npm

This just started since the last merge, i pinned our dependency of clarinet to 0.7.2 until it's resolved.

Please let me know if this isn't a clarinet issue.

$ npm install clarinet
npm WARN package.json [email protected] No repository field.
npm http GET https://registry.npmjs.org/clarinet
npm http 304 https://registry.npmjs.org/clarinet
npm http GET https://registry.npmjs.org/clarinet/-/clarinet-0.7.3.tgz
npm http 404 https://registry.npmjs.org/clarinet/-/clarinet-0.7.3.tgz
npm ERR! fetch failed https://registry.npmjs.org/clarinet/-/clarinet-0.7.3.tgz
npm ERR! Error: 404 Not Found
npm ERR! at WriteStream. (/usr/local/lib/node_modules/npm/lib/utils/fetch.js:57:12)
npm ERR! at WriteStream.EventEmitter.emit (events.js:117:20)
npm ERR! at fs.js:1596:14
npm ERR! at /usr/local/lib/node_modules/npm/node_modules/graceful-fs/graceful-fs.js:103:5
npm ERR! at Object.oncomplete (fs.js:107:15)
npm ERR! If you need help, you may report this log at:
npm ERR! http://github.com/isaacs/npm/issues
npm ERR! or email it to:
npm ERR! [email protected]

npm ERR! System Linux 3.2.0-57-generic
npm ERR! command "node" "/usr/local/bin/npm" "install" "clarinet"
npm ERR! cwd /home/gio/morph/middle
npm ERR! node -v v0.10.6
npm ERR! npm -v 1.3.8
npm ERR!
npm ERR! Additional logging details can be found in:
npm ERR! /home/gio/morph/middle/npm-debug.log
npm ERR! not ok code 0

Somehow the old version no longer installs due to jade, and this version locks

When using yarn upgrade --latest clarinet the system never returns for me.
Not quite sure how to get past this.
But somehow my production build with clarinet 0.12.1 stopped building today.
With the following error

Step #2: error /run/lib-exec/node_modules/clarinet: Command failed.
Step #2: Exit code: 1
Step #2: Command: cd benchmark && npm i
Step #2: Arguments: 
Step #2: Directory: /run/lib-exec/node_modules/clarinet
Step #2: Output:
Step #2: npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
Step #2: npm WARN notice [SECURITY] growl has the following vulnerability: 1 critical. Go here for more details: https://nodesecurity.io/advisories?search=growl&version=1.5.1 - Run `npm i npm@latest -g` to upgrade your npm version, and then `npm audit` to get more info.
Step #2: npm WARN lifecycle The node binary used for scripts is /tmp/yarn--1546643216990-0.9583703550637648/node but npm is using /usr/local/bin/node itself. Use the `--scripts-prepend-node-path` option to include the path for the node binary npm was executed with.
Step #2: 
Step #2: > [email protected] postinstall /run/lib-exec/node_modules/clarinet/benchmark/node_modules/clarinet
Step #2: > cd benchmark && npm i
Step #2: 
Step #2: npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
Step #2: npm WARN lifecycle The node binary used for scripts is /tmp/yarn--1546643216990-0.9583703550637648/node but npm is using /usr/local/bin/node itself. Use the `--scripts-prepend-node-path` option to include the path for the node binary npm was executed with.
Step #2: 
Step #2: > [email protected] postinstall /run/lib-exec/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet
Step #2: > cd benchmark && npm i
Step #2: 
Step #2: npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
Step #2: npm WARN lifecycle The node binary used for scripts is /tmp/yarn--1546643216990-0.9583703550637648/node but npm is using /usr/local/bin/node itself. Use the `--scripts-prepend-node-path` option to include the path for the node binary npm was executed with.
Step #2: 
Step #2: > [email protected] postinstall /run/lib-exec/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet
Step #2: > cd benchmark && npm i
Step #2: 
Step #2: npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
Step #2: npm WARN lifecycle The node binary used for scripts is /tmp/yarn--1546643216990-0.9583703550637648/node but npm is using /usr/local/bin/node itself. Use the `--scripts-prepend-node-path` option to include the path for the node binary npm was executed with.
Step #2: 
Step #2: > [email protected] postinstall /run/lib-exec/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet
Step #2: > cd benchmark && npm i
Step #2: 
Step #2: npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
Step #2: npm WARN lifecycle The node binary used for scripts is /tmp/yarn--1546643216990-0.9583703550637648/node but npm is using /usr/local/bin/node itself. Use the `--scripts-prepend-node-path` option to include the path for the node binary npm was executed with.
Step #2: 
Step #2: > [email protected] postinstall /run/lib-exec/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet/benchmark/node_modules/clarinet
Step #2: > cd benchmark && npm i
Step #2: 
Step #2: npm WARN deprecated [email protected]: Jade has been renamed to pug, please install the latest version of pug instead of jade
Step #2: npm WARN lifecycle The node binary used for scripts is /tmp/yarn--1546643216990-0.9583703550637648/node but npm is using /usr/local/bin/node itself. Use the `--scripts-prepend-node-path` option to 
.
.
.
.

Parser fails for non-object, non-array inputs

Test case

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8" />
    <title>Clarinet</title>
    <script type="text/javascript">var exports = {};</script>
    <script type="text/javascript" src="../clarinet.js"></script>
    <script type="text/javascript">
    var parser = exports.parser();
    parser.onvalue = function (v) {
      console.log("Value: " + v);
    };
    parser.onkey = function (key) {
      console.log("Key: " + key);
    };
    parser.write('1').close();
    </script>
</script>
</head>
<body>
  Look at the console.
</body>
</html>

Expected outcome

parser.onvalue is called

Actual outcome

Error: Non-whitespace before {[.
Line: 1
Column: 1
Char: 49

Discussion

JSON now allows 123 as a legal value. If clarinet only supports the older version of JSON where this was not legal, then please update the documentation.

First key in object handled differently (in `openobject` event rather than `key` event)

As previously reported with a documentation clarification (see #46).

It seems to me that API usability would be improved by eliminating this special case, such that openobject only signifies the start of an object and key events be raised exactly once for every key without exceptions, starting in a v1.0 release (simultaneously committing to full and meaningful semver compliance). However I would appreciate feedback from @dscape as well as current maintainers of other dependent public packages.

Alternatively it might be nice to offer some more detailed usage guidance including coverage of edge cases. Here's an excerpt from a module I wrote—using clarinet and convertings its output to a top-level object/array stream—where I internally capture and normalize the current behavior:

function setKey(key) {
    // update current key
}
sax.onkey = setKey;

sax.onopenobject = function(key) {
    // track object creation

    if(key !== undefined) setKey(key);
}

stream interface should provide destroy() method

The stream interface example in README.md uses node's Stream.pipe to send data to clarinet. Stream.pipe assumes that the destination stream (in this case clarinet) implements a destroy() method, which it invokes in the event of an abrupt close of the source stream.

https://github.com/joyent/node/blob/master/lib/stream.js#L74

This will happen, for example, if we are piping from a socket and the socket is abruptly disconnected. Since clarinet's stream does not provide destroy(), this chain results in an error in the global scope such as the following:

node.js:201
        throw e; // process.nextTick error, or 'error' event on first tick
              ^
TypeError: Object #<CStream> has no method 'destroy'
    at IncomingMessage.onclose (stream.js:98:10)
    at IncomingMessage.emit (events.js:88:20)
    at abortIncoming (http.js:1386:11)
    at Socket.<anonymous> (http.js:1459:5)
    at Socket.emit (events.js:88:20)
    at Array.0 (net.js:320:10)
    at EventEmitter._tickCallback (node.js:192:40)

clarinet with watchFile/tail -f

Hi. Just started playing with clarinet today - really handy, thanks.

Is there a recommended method to use clarinet in a tail -f / fs.watchFile / node-tail mode, tracking an ever-growing file? Should this be more of a node-tail question about whether it presents a fs.createReadStream/pipe-like interface?

TIA.

Very large stringified JSON

I'm interesting in parsing very large stringified JSON strings (up to 2GB). I have the string in memory, but the problem I have is that JSON.parse takes too long to do the parsing. So I'm looking at solutions like clarinet to progressively parse. What I'm wondering is whether clarinet blocks while parsing? Does it ever use setImmediate or yield or something like that so that the node event-loop can run something else and then come back to clarinet?

Retrieving chunks of a value

Is it possible, to get an event on chunks of a value?
I get very large JSON objects (> 20MB) as a result of an ajax call.
The JSON object has some meta data (a few kb) and one very large value.
The vlv ("very large value") must be post-processed (herefore the meta data is needed) and I would like to do it on the fly.
So I needed an

  • onopenvalue
  • onchunkvalue
  • onclosevalue
    (Btw the JSON object is something like the JSON representation of an pkcs#7 object and I have to base64decode and decrypt the document)

Missing objectclose events?

It appears that if the last key in an object is a value, we don't get an objectclose event. This is causing me some problems when trying to create the list of keys in the file.

Is this on purpose? I see a few examples which seem to imply this is a bug.

column info seems to be off

I get unexpected column values when parsing a JSON file.
Have a look at the following code.
For example: I expect value 3 to have a column value of 78, but the tool reports 65

import * as c from "clarinet"

const { Readable } = require('stream')

//       1         2         3         4         5         6         7         8
//.4.6.8.0.2.4.6.8.0.2.4.6.8.0.2.4.6.8.0.2.4.6.8.0.2.4.6.8.0.2.4.6.8.0.2.4.6.8.0
const readable = Readable.from([`{
   "a0": "x",
   "b0": "x", "b1": 0, "b2": "", "b3": "", "b4": "", "b5": 1, "b6": 2, "b7": 3 
}`])

var parse_stream = c.createStream()

parse_stream.on('openobject', function (name: string) {
  console.log("openobj", name, "*", `${parse_stream._parser.line}:${parse_stream._parser.column}`, parse_stream._parser.position)
});

parse_stream.on('key', function (name: string) {
  console.log("key", name, "*", `${parse_stream._parser.line}:${parse_stream._parser.column}`, parse_stream._parser.position)
});

parse_stream.on('string', function (name: string) {
  console.log("string", name, "*", `${parse_stream._parser.line}:${parse_stream._parser.column}`, parse_stream._parser.position)
});

parse_stream.on('value', function (value: string) {
  console.log("value", value, "*", `${parse_stream._parser.line}:${parse_stream._parser.column}`, parse_stream._parser.position)
});

// parse_stream.on('end', function () {
//   console.log("nu end", `${parse_stream._parser.line}:${parse_stream._parser.column}`, parse_stream._parser.position)
// });

readable.pipe(parse_stream);

/*
output:

openobj a0 * 2:6 10
value x * 2:10 15
key b0 * 3:6 24
value x * 3:10 29
key b1 * 3:14 35
value 0 * 3:17 38
key b2 * 3:22 45
value  * 3:26 49
key b3 * 3:30 55
value  * 3:34 59
key b4 * 3:38 65
value  * 3:42 69
key b5 * 3:46 75
value 1 * 3:49 78
key b6 * 3:54 85
value 2 * 3:57 88
key b7 * 3:62 95
value 3 * 3:65 98

*/

Requesting support of stand-alone values as valid JSON

Hey Nuno,

Hoping to get support of stand-alone values within your parser to match http://www.json.org/ definition of valid JSON. Right now the parser throws an error if passed a Number or Boolean value.

Line 346

    case S.BEGIN:
      if (c === "{") parser.state = S.OPEN_OBJECT;
      else if (c === "[") parser.state = S.OPEN_ARRAY;
      else if (c !== '\r' && c !== '\n' && c !== ' ' && c !== '\t') 
        error(parser, "Non-whitespace before {[.");
    continue;

So the parser only accepts arrays or objects as being valid, otherwise it throws an error.

Thanks!

Setup Windows CI (e.g., AppVeyor)

Since #45 has been merged, why not also run a Windows CI? I could create and test an appveyor.yml if this would be accepted.

@evan-king Do you have the right access to this repo to configure it for AppVeyor? Is this plausible/useful? Otherwise I won’t bother working on this ;-).

Version bump in npm

The npm directory has no idea yet about the clarinet's 0.9.1 version which is available here on GitHub. Could you make it happen? 😃
Thanks.

Implement pause()

Hi,
I'm trying to parse a large JSON file without blowing up my RAM, and this seems like a good solution. Specifically, I want to reach each object, restructure it, and write it to its own file, then go onto the next object.

However, the stream doesn't appear to support the native stream.pause() method. So it can't wait for the file to write before it keeps reading new objects. In your example code, you seem to be implementing a buffer/stack to capture the objects as they're parsed - but that kind of defeats the purpose of reading them one at a time and not using too much memory.

Is there an intrinsic reason Clarinet doesn't support pause(), or is it something that could be potentially added?

Thanks!

ES6 import?

Is there any way clarinet can be made to dual support node and ES6 modules?

That would certainly make it a lot easier to depend on for the browser.

Position Isn't Getting Calculated correctly

The position value isn't get calculated correctly. There are some cases where the code is moving the current character being parsed/examined, but wasn't incrementing the position counter. I have a PR here around this to fix it.

empty keys not being handled properly

Empty keys are emitting value events instead of key events. This is because an empty string is used as the default value for text nodes, instead of undefined. I've got this fixed and will be filing a pull request shortly.

JSONTestSuite test results

There's this Parsing JSON is a Minefield article which details various issues (mostly in edge cases) across different JSON parsers.

Its author produced a test suit to test various implementations. I've run this test suit against Clarinet (on node 9.6.1), results compared to native JSON.parse are below (successful tests not shown).

Generally, I think the results are good with Clarinet being pretty close to native implementation and never failing too hard. But there are two things it still does wrong: It does not fail on some wrong inputs and fails on legitimate results consisting of one literal (i.e. null). The former is pretty minor I think, while the latter may lead to compatibility issues.

I'm raising this issue more for the purposes of discussion. What do you think, should those be addressed?

Results screenshot

image

Legend

image

Snippet I used to run Clarinet (derived from their JSON.parse):

var fs = require('fs');
var path = process.argv[2];
var clarinet = require('clarinet');
var parser = clarinet.parser();

try {
	var data = fs.readFileSync(path, 'utf-8');

	parser.onerror = function(e) {
		throw e;
	}
	parser.write(data).close();
	
} catch (e) {
	console.log("--", e.message)
	process.exit(1);
}

process.exit(0);

Note that at the time of writing the test runner has a really weird bug preventing it from running which isn't hard to patch out though.

[clarinet] string escaping

Well, no clarinet here, can't test stuff, but looks to me like we're just stripping backslashes from strings, not interpreting \u0000 or so.

Big twitter.json file

This file in /samples is over 13MB big, and is not used. I feel like this is completely unnecessary, considering there are some packages that depend on this. It just bloats the package.

json streaming does not work without node.js

Maybe I'm doing something wrong, but clarinet seems not to be working in browser in streaming mode. Normal "block" decoding example provided works just fine.

I'm trying to use it like this:

// depends on clarinet lib
function json_stream_parser_c()
{
    function on_close_object()
    {
        alert("on close object");
    }

    var stream_parser = clarinet.createStream();

    stream_parser.on("closeobject", on_close_object);
    stream_parser.write("{ }");
}

It would be very nice if I could use it without node.js plugin as part of a web-site I'm making (client side code).

Skipping first key in an object?

It looks like when stream parsing an object, it always skips the first key.

Test JSON

{"Name": "Mike", "Favorite Color": "Blue", "Spirit Animal": "Tiger", "Birthday": "03/01/1987"}

Parser Code

var fs = require('fs');
var json = require('clarinet');

var file = './test.json';

var jsonStream = json.createStream({
    strict: true
});

var curObj = undefined;
var curKey = undefined;

jsonStream.on('openobject', function() {
    curObj = {};
});

jsonStream.on('key', function(key) {
    console.log('\tkey: ', key);
    curKey = key;
});

jsonStream.on('value', function(data) {
    console.log('\tvalue: ', data);
    curObj[curKey] = data;
    curKey = undefined;
});

jsonStream.on('closeobject', function() {
    console.log('Completed: ', curObj);
});


jsonStream.on('error', function(err) {
    console.log('ERROR! ', err);
});

jsonStream.on('end', function() {å
    console.log('finished');
});

fs.createReadStream(file).pipe(jsonStream);

Output

    value:  Mike
    key:  Favorite Color
    value:  Blue
    key:  Spirit Animal
    value:  Tiger
    key:  Birthday
    value:  03/01/1987
Completed:  { undefined: 'Mike',
  'Favorite Color': 'Blue',
  'Spirit Animal': 'Tiger',
  Birthday: '03/01/1987' }
finished

Notice that the first key 'Name' is completely skipped, and the value 'Mike' comes out first, hence the key in the completed object of 'undefined'. I'm using the latest from NPM, 0.10.0.

Question... Can I pause the event Stream...

I have a situation where I need to some significant processing after reaching the end of a particular object. This processing must complete before I can process the remaining objects in the file. Is there a way from within the onend... methods to pause and resume event generation, I've tried unpipe and pipe on the readstream that is feeding the parser but this doesn't seem to have the desired affect ( presumably the parser already has a buffer full of data to process)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.