Giter VIP home page Giter VIP logo

cheshire's Introduction

Cheshire

'Cheshire Puss,' she began, rather timidly, as she did not at all know whether it would like the name: however, it only grinned a little wider. 'Come, it's pleased so far,' thought Alice, and she went on. 'Would you tell me, please, which way I ought to go from here?'

'That depends a good deal on where you want to get to,' said the Cat.

'I don't much care where--' said Alice.

'Then it doesn't matter which way you go,' said the Cat.

'--so long as I get SOMEWHERE,' Alice added as an explanation.

'Oh, you're sure to do that,' said the Cat, 'if you only walk long enough.'

Cheshire is fast JSON encoding, based off of clj-json and clojure-json, with additional features like Date/UUID/Set/Symbol encoding and SMILE support.

Clojure code with docs

Clojars Project Continuous Integration status

Why?

clojure-json had really nice features (custom encoders), but was slow; clj-json had no features, but was fast. Cheshire encodes JSON fast, with added support for more types and the ability to use custom encoders.

Usage

[cheshire "5.13.0"]

;; Cheshire v5.13.0 uses Jackson 2.17.0

;; In your ns statement:
(ns my.ns
  (:require [cheshire.core :refer :all]))

Encoding

;; generate some json
(generate-string {:foo "bar" :baz 5})

;; write some json to a stream
(generate-stream {:foo "bar" :baz 5} (clojure.java.io/writer "/tmp/foo"))

;; generate some SMILE
(generate-smile {:foo "bar" :baz 5})

;; generate some JSON with Dates
;; the Date will be encoded as a string using
;; the default date format: yyyy-MM-dd'T'HH:mm:ss'Z'
(generate-string {:foo "bar" :baz (java.util.Date. 0)})

;; generate some JSON with Dates with custom Date encoding
(generate-string {:baz (java.util.Date. 0)} {:date-format "yyyy-MM-dd"})

;; generate some JSON with pretty formatting
(generate-string {:foo "bar" :baz {:eggplant [1 2 3]}} {:pretty true})
;; {
;;   "foo" : "bar",
;;   "baz" : {
;;     "eggplant" : [ 1, 2, 3 ]
;;   }
;; }

;; generate JSON escaping UTF-8
(generate-string {:foo "It costs £100"} {:escape-non-ascii true})
;; => "{\"foo\":\"It costs \\u00A3100\"}"

;; generate JSON and munge keys with a custom function
(generate-string {:foo "bar"} {:key-fn (fn [k] (.toUpperCase (name k)))})
;; => "{\"FOO\":\"bar\"}"

;; generate JSON without escaping the characters (by writing it to a file)
(spit "foo.json" (json/generate-string {:foo "bar"} {:pretty true}))

In the event encoding fails, Cheshire will throw a JsonGenerationException.

Custom Pretty Printing Options

If Jackson's default pretty printing library is not what you desire, you can manually create your own pretty printing class and pass to the generate-string or encode methods:

(let [my-pretty-printer (create-pretty-printer
                          (assoc default-pretty-print-options
                                 :indent-arrays? true))]
  (generate-string {:foo [1 2 3]} {:pretty my-pretty-printer}))

See the default-pretty-print-options for a list of options that can be changed.

Decoding

;; parse some json
(parse-string "{\"foo\":\"bar\"}")
;; => {"foo" "bar"}

;; parse some json and get keywords back
(parse-string "{\"foo\":\"bar\"}" true)
;; => {:foo "bar"}

;; parse some json and munge keywords with a custom function
(parse-string "{\"foo\":\"bar\"}" (fn [k] (keyword (.toUpperCase k))))
;; => {:FOO "bar"}

;; top-level strings are valid JSON too
(parse-string "\"foo\"")
;; => "foo"

;; parse some SMILE (keywords option also supported)
(parse-smile <your-byte-array>)

;; parse a stream (keywords option also supported)
(parse-stream (clojure.java.io/reader "/tmp/foo"))

;; parse a stream lazily (keywords option also supported)
(parsed-seq (clojure.java.io/reader "/tmp/foo"))

;; parse a SMILE stream lazily (keywords option also supported)
(parsed-smile-seq (clojure.java.io/reader "/tmp/foo"))

In 2.0.4 and up, Cheshire allows passing in a function to specify what kind of types to return, like so:

;; In this example a function that checks for a certain key
(decode "{\"myarray\":[2,3,3,2],\"myset\":[1,2,2,1]}" true
        (fn [field-name]
          (if (= field-name "myset")
            #{}
            [])))
;; => {:myarray [2 3 3 2], :myset #{1 2}}

The type must be "transient-able", so use either #{} or []

Custom Encoders

Custom encoding is supported from 2.0.0 and up, if you encounter a bug, please open a github issue. From 5.0.0 onwards, custom encoding has been moved to be part of the core namespace (not requiring a namespace change)

;; Custom encoders allow you to swap out the api for the fast
;; encoder with one that is slightly slower, but allows custom
;; things to be encoded:
(ns myns
  (:require [cheshire.core :refer :all]
            [cheshire.generate :refer [add-encoder encode-str remove-encoder]]))

;; First, add a custom encoder for a class:
(add-encoder java.awt.Color
             (fn [c jsonGenerator]
               (.writeString jsonGenerator (str c))))

;; There are also helpers for common encoding actions:
(add-encoder java.net.URL encode-str)

;; List of common encoders that can be used: (see generate.clj)
;; encode-nil
;; encode-number
;; encode-seq
;; encode-date
;; encode-bool
;; encode-named
;; encode-map
;; encode-symbol
;; encode-ratio

;; Then you can use encode from the custom namespace as normal
(encode (java.awt.Color. 1 2 3))
;; => "java.awt.Color[r=1,g=2,b=3]"

;; Custom encoders can also be removed:
(remove-encoder java.awt.Color)

;; Decoding remains the same, you are responsible for doing custom decoding.

NOTE: `cheshire.custom` has been deprecated in version 5.0.0

Custom and Core encoding have been combined in Cheshire 5.0.0, so there is no longer any need to require a different namespace depending on what you would like to use.

Aliases

There are also a few aliases for commonly used functions:

encode -> generate-string
encode-stream -> generate-stream
encode-smile -> generate-smile
decode -> parse-string
decode-stream -> parse-stream
decode-smile -> parse-smile

Features

Cheshire supports encoding standard clojure datastructures, with a few additions.

Cheshire encoding supports:

Clojure data structures

  • strings
  • lists
  • vectors
  • sets
  • maps
  • symbols
  • booleans
  • keywords (qualified and unqualified)
  • numbers (Integer, Long, BigInteger, BigInt, Double, Float, Ratio, Short, Byte, primitives)
  • clojure.lang.PersistentQueue

Java classes

  • Date
  • UUID
  • java.sql.Timestamp
  • any java.util.Set
  • any java.util.Map
  • any java.util.List

Custom class encoding while still being fast

Also supports

  • Stream encoding/decoding
  • Lazy decoding
  • Pretty-printing JSON generation
  • Unicode escaping
  • Custom keyword coercion
  • Arbitrary precision for decoded values:

Cheshire will automatically use a BigInteger if needed for non-floating-point numbers, however, for floating-point numbers, Doubles will be used unless the *use-bigdecimals?* symbol is bound to true:

(ns foo.bar
  (require [cheshire.core :as json]
           [cheshire.parse :as parse]))

(json/decode "111111111111111111111111111111111.111111111111111111111111111111111111")
;; => 1.1111111111111112E32 (a Double)

(binding [parse/*use-bigdecimals?* true]
  (json/decode "111111111111111111111111111111111.111111111111111111111111111111111111"))
;; => 111111111111111111111111111111111.111111111111111111111111111111111111M (a BigDecimal)

Change Log

Change log is available on GitHub.

Speed

Cheshire is about twice as fast as data.json.

Check out the benchmarks in cheshire.test.benchmark; or run lein benchmark. If you have scenarios where Cheshire is not performing as well as expected (compared to a different library), please let me know.

Experimental things

In the cheshire.experimental namespace:

$ echo "Hi. \"THIS\" is a string.\\yep." > /tmp/foo

$ lein repl
user> (use 'cheshire.experimental)
nil
user> (use 'clojure.java.io)
nil
user> (println (slurp (encode-large-field-in-map {:id "10"
                                                  :things [1 2 3]
                                                  :body "I'll be removed"}
                                                 :body
                                                 (input-stream (file "/tmp/foo")))))
{"things":[1,2,3],"id":"10","body":"Hi. \"THIS\" is a string.\\yep.\n"}
nil

encode-large-field-in-map is used for streamy JSON encoding where you want to JSON encode a map, but don't want the map in memory all at once (it returns a stream). Check out the docstring for full usage.

It's experimental, like the name says. Based on Tigris.

Advanced customization for factories

See this and this for a list of features that can be customized if desired. A custom factory can be used like so:

(ns myns
  (:require [cheshire.core :as core]
            [cheshire.factory :as factory]))

(binding [factory/*json-factory* (factory/make-json-factory
                                  {:allow-non-numeric-numbers true})]
  (json/decode "{\"foo\":NaN}" true))))))

See the default-factory-options map in factory.clj for a full list of configurable options. Smile factories can also be created, and factories work exactly the same with custom encoding.

Future Ideas/TODOs

  • move away from using Java entirely, use Protocols for the custom encoder (see custom.clj)
  • allow custom encoders (see custom.clj)
  • figure out a way to encode namespace-qualified keywords
  • look into overriding the default encoding handlers with custom handlers
  • better handling when java numbers overflow ECMAScript's numbers (-2^31 to (2^31 - 1))
  • handle encoding java.sql.Timestamp the same as java.util.Date
  • add benchmarking
  • get criterium benchmarking ignored for 1.2.1 profile
  • look into faster exception handling by pre-allocating an exception object instead of creating one on-the-fly (maybe ask Steve?)
  • make it as fast as possible (ongoing)

License

Release under the MIT license. See LICENSE for the full license.

Thanks

Thanks go to Mark McGranaghan for clj-json and Jim Duey for the name suggestion. :)

cheshire's People

Contributors

aiba avatar amalloy avatar borkdude avatar brabster avatar bronsa avatar crazymerlyn avatar dakrone avatar gfredericks avatar goodwink avatar hiredman avatar jakepearson avatar kostafey avatar kumarshantanu avatar laczoka avatar lucacervello avatar maxnoel avatar metame avatar mpenet avatar nilern avatar niwinz avatar pjstadig avatar prayerslayer avatar rymndhng avatar sbtourist avatar sjamaan avatar sleepful avatar tanzoniteblack avatar technomancy avatar zk avatar ztellman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cheshire's Issues

PersistentHashMap key order lost

The order of the keys of a PersistentHashMap gets lost:

(def h1 {"f" 6 "b" 2 "a" 1 "c" 3 "e" 5 "d" 4 "g" 7 "h" 8})
(def h2 {"f" 6 "b" 2 "a" 1 "c" 3 "e" 5 "d" 4 "g" 7 "h" 8 "i" 9})

(class h1) ; => clojure.lang.PersistentArrayMap
(class h2) ; => clojure.lang.PersistentHashMap

(generate-string h1) ; => "{\"f\":6,\"b\":2,\"a\":1,\"c\":3,\"e\":5,\"d\":4,\"g\":7,\"h\":8}"         => Order of keys not changed
(generate-string h2) ; => "{\"a\":1,\"b\":2,\"c\":3,\"d\":4,\"e\":5,\"f\":6,\"g\":7,\"h\":8,\"i\":9}" => Order of keys changed

Is there a possibility to solve this issue? Please see the following stackoverflow entry.

CSON support

It would be helpful if cheshire supported CSON. Is there a timeline for when you will be adding support?

License - MIT or Apache?

README.md says MIT, project.clj says Apache.

In case you're wondering why I care, my company has software to automatically check that open-source dependencies have an acceptable (i.e. not GPL) license. I just need to list which license each library uses.

Invalid JSON parsed without any exception

Hi,

If one tries to parse JSON where opening curly bracket is missing some strange parsing happens. Example:
(parse-string " \"value\" : 1 }") returns "value"
I would expect an exception to be raised instead. It would be great to have this fixed.

Thanks & Regards,

Michal

Inconsistent conversion to LazySeq/PersistentVector depending on input

(def result (cheshire/parse-string "[1,2,3]"))

(println result) => (1 2 3)
(class result) => clojure.lang.LazySeq

(def result-2 (cheshire/parse-string "{"a":[1,2,3]}"))
(println result-2) => {"a" [1 2 3]}
(class (get result-2 "a")) => clojure.lang.PersistentVector

Is this inconsistency as designed?

Rangel
@raspasov

PS Nice library, keep up the good work.

Architectural query: msgpack dataformat

I work for Puppet Labs on PuppetDB with @senior and we're fans of your library but we're consider support for msgpack as well as JSON for our API.

I've found this: https://github.com/cowtowncoder/jackson-dataformat-msgpack ... we're also taking a gander at the equivalent for CBOR.

My question is more about how you would like to include support for external dataformats, this is not core Jackson but certainly plugs into the framework. In concept this is a little like SLIME, and we'd like to follow your development and somehow plug-in to it. How would you prefer to handle this if we were to contribute something to support cheshire/msgpack, would this be of interest of core, should we should a separate plugin for clojars etc. etc. our goal as OSS developers anyway would be to work with the community and contribute back as much as possible.

At the moment we're still investigating, so no promises yet. Alternatives include non-jackson more msgpack 'pure' serialisation/deserialisation but I'm investigating this route. Our decisions will primarily be based on core performance and streaming support.

Anyway, thanks for your time :-).

Remove call to deprecated method & add type hint to parameter to avoid reflection

Symptom:

Reflection warning, cheshire/core.clj:63:3 - call to method createJsonGenerator on com.fasterxml.jackson.core.JsonFactory can't be resolved (argument types: unknown).

Check:
https://github.com/FasterXML/jackson-core/blob/master/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1132
and #L1145

createJsonGenerator method is deprecated since jackson 2.2, should use createGenerator

to avoid reflection add a type hint

(defn create-generator [writer]
"Returns JsonGenerator for given writer."
(.createGenerator
^JsonFactory (or factory/json-factory
factory/json-factory) ^java.io.Writer writer))

Strange behaviour of parsed-seq

Hello,

I'm trying to extract an array out of json file via parsed-seq and that's what I observe:

(require
  '[cheshire.core :as json]
  '[clojure.java.io :as io])

(=
  (json/parsed-seq (io/reader "1.json") true)
  (json/parse-string (slurp "1.json") true))
; false

(=
  (first (json/parsed-seq (io/reader "1.json") true))
  (json/parse-string (slurp "1.json") true))
; true

Here goes the file:

[{"ItemsCount":633,"Name":"Сухой корм","CID":63,"Pred":0,"Desc":""},
{"ItemsCount":113,"Name":"Консервы","CID":74,"Pred":0,"Desc":""}]

Is this an expected behaviour?

P.S. Got a strange feeling that this is related to #43.

Add option to omit JSON keys with null values to generate-string

As per my Stack Overflow question, I'm generating some JSON for data structures like this:

(require '[cheshire.core :refer [generate-string])
(generate-string {:id 123, :foo "something", :bar nil})

Which produces JSON like this:

{"id": 123, "foo": "something", "bar": null}

What I'd like is for the JSON to omit the keys without values; e.g.

{"id": 123, "foo": "something"}

I can certainly pre-filter the map before calling generate-string, but since Cheshire has to traverse my data structure anyway, I thought it would be more performant to instruct Cheshire to do the filtering.

To be clear, I am in no way proposing that this should be the default behaviour, simply that it could be an optional flag similar to :date-format, :pretty, etc.

If this is a feature that you could countenance, I'd be happy to send a pull request.

No way to lazily decode top-level sequences

This is sort of the dual of #35; if the top-level data structure being decoded is an array, then it may be desirable to have it be lazily decoded. One potential use-case for this is streaming JSON requests.

I could see an argument for making this the default behavior, but at the very least having a new lazy parser would be useful.

I'm happy to make a pull request for this feature, but I wanted to test the waters first.

cheshire.custom encodes long values incorrectly

cheshire.core encodes long values correctly:

user=> (use 'cheshire.core)
nil
user=> (encode 2147483648)                                 
"2147483648"

But cheshire.custom does not:

user=> (use 'cheshire.custom)
nil
user=> (encode 2147483648)   
"-2147483648"

(generate-string {"a" "b"}) works in cheshire.core, but not custom

=> (cheshire.core/generate-string {"a" "b"})
"{"a":"b"}"

=> (cheshire.custom/generate-string {"a" "b"})
org.codehaus.jackson.JsonGenerationException: Can not write text value, expecting field name (NO_SOURCE_FILE:0)

custom looks like a drop-in replacement for core, so I'd expect that to work. Btw. numbers are fine:

=> (cheshire.core/generate-string {"a" 2})
"{"a":2}"

=> (cheshire.custom/generate-string {"a" 2})
"{"a":2}"

This is version 2.0.0.

Update: I wonder how test/custom.clj can run through - suspecting something in my setup, as I couldn't "generate-string" the test-obj from your test/custom.clj either. Will try to test from cheshire sources now first.

No easy option for creating lazy ring responses

When returning JSON from a ring server, generate-string works fine but requires setting up the whole response in memory at once. generate-stream is undoubtedly useful for other cases, but for this one requires figuring out how to hook a BufferedWriter up to an InputStream and presumably to execute generate-stream on another thread.

If there were a third option that either returned an InputStream or a lazy seq of strings (in the same manner that enlive does), this would make the ring use case much easier.

writeNumber turns longs into negative numbers

I've tried encoding a DateTime into a timestamp, using the following:

(add-encoder  org.joda.time.DateTime
              (fn [c jg]
                (.writeNumber jg  (coerce/to-long c))))

Whereas coerce/to-long works fine, .writeNumber does not work: the longs show up as some kind of crazy negative numbers, i.e -189129648.

I tried some experiments.

I tried hardcoding a long in there, to see if .writeNumber might be broken. Indeed, it appears to be broken. In other words:

(add-encoder  org.joda.time.DateTime
              (fn [c jg]
                (.writeNumber jg  1360471091853)))

Gives results in the encoded JSON as -1033540979 .

So, maybe writeNumber can't handle longs, even though the javadoc indicates it can? Or is it silently casting to int? I'm unclear. At any rate, this:

(add-encoder  org.joda.time.DateTime
              (fn [c jg]
                (.writeNumber jg  (/ (coerce/to-long c) 1000))))

Will generate numbers like 1343962834, which are at least not negative, but have the distinct disadvantage of being completely wrong too.

I may be doing something dumb here, but having a long show up in the generated JSON as a negative int is at least surprising.

Or it may be a bug in jackson-whatever, but I can't chase it quite that far down the rabbit hole at the moment. I haven't been able to find a workaround either.

parse-string accepts invalid json string

When I do:

 (cheshire.core/parse-string "\"foo\": \"bar\"}")

I get

"foo"

note the lack of a starting {

I would expect this to throw an exception instead. May be linked to #65

It must see everything after "foo" as junk...

Fails to parse large string

I have a config.json file which is about 500MB. This fails:

(parse-string (slurp "config.json") true)

;; This fails, too

(parse-stream (clojure.java.io/reader "config.json") true)

There is no error thrown, the process just dies peacefully.

Any ideas on what I'm doing wrong?

parse-string should not accept trailing junk

Right now this works:

(cheshire.core/parse-string "{\"foo\": 1}asdf") ; => {"foo" 1}

Instead, it should raise an exception about the trailing asdf. This is confusing and incorrect behavior for a function called parse-string, which reads a single JSON value from a string. It only makes sense that the entire string should be a valid JSON object.

(This issue is related to #52 -- in that bug, you identified that a string is a valid JSON value, which is certainly true; however, the problem there, as here, is that parse-string accepts trailing junk.)

The reason this bug exists (and the reason the same exact bug is in data.json and clj-json) is that parse-string is implemented using a streaming JSON reader (jackson in this case). The (pseudo-)code is essentially

(read-json-object-from-stream (StringReader. s))

If you use a streaming parser on a single string like this, you need to also check that the StringReader is empty (drained) after parsing the JSON value.

For comparison, Go, Javascript, Ruby, and Python all have JSON parsers in their standard libraries which all have some kind of parse-string equivalent function that rejects trailing data.

Common key encoding/decoding functions?

Would you accept a patch that adds a few common key encoding/decoding functions?

For example:

(defn decode-underscore-key
  [k]
  (-> k
      (clojure.string/replace "_" "-")
      keyword))

(defn encode-underscore-key
  [k]
  (-> k
      name
      (clojure.string/replace "-" "_")))

I also have ones for camel case.

feature: context specific encoders

I would like to make a structured logger with JSON output, but one catch would be that you might want a different output for a given object when logging as compared to other usage of JSON in the same application.

Is there an obvious way to implement this with the existing cheshire design?

Is this a feature that would be welcome for cheshire if I implemented it?

custom decoding

Creating a full tree of typed objects on decode seems like a natural complement to the custom encoders. Is this omitted on purpose as non-Clojurish, or ?

Add java Byte and Short support.

When generate json string from java Byte or Short Object, it will raise a error.

for example : (generate-string (Byte. "3")) will raise
JsonGenerationException Cannot JSON encode object of class: class java.lang.Byte: 3 cheshire.generate/generate (generate.clj:76)

Allow custom decoder function

Hi,

I find myself often parsing a json string into a clj datastructure, then do a deep walk to coerce values into a custom type based on their key.

I would love to be able to do pass an (fn [k v] ...) function which does the coercion of values during the decoding.

Wdyt?

RFE: support for java.sql.Timestamp

I have these timestamps running all around my app, they get pulled out the database (Postgres).

The format I need/want looks like this:

2011-06-26T18:35:49Z

Which I understand to be compliant with RFC 3339 and ISO 8601.

There are lots of nice JavaScript/jQuery tools that can process timestamps in this format…..

Here is how I added this to the Dan Larkin json lib:

(:require [org.danlarkin.json :as json2])
(:use [clojure.contrib.logging :only [log]]
[clojure.contrib.json :as json]
clj-time.core
clj-time.format
clj-time.coerce)
(:import java.io.File
java.util.Calendar
java.util.Date
java.text.SimpleDateFormat
java.util.TimeZone
java.util.UUID
java.sql.Timestamp
java.io.FileWriter))

(defn- timestamp-encoder
[timestamp writer pad current-indent start-token-indent indent-size](.append writer %28str start-token-indent " %28unparse %28formatters :date-time-no-ms%29 %28from-long %28. timestamp getTime%29%29%29 "%29))

(json2/add-encoder java.sql.Timestamp timestamp-encoder)

(defn- datetime-encoder
[date-time writer pad current-indent start-token-indent indent-size](.append writer %28str start-token-indent " %28unparse %28formatters :date-time-no-ms%29 date-time%29 "%29))

(json2/add-encoder org.joda.time.DateTime datetime-encoder)

Using protocols in cheshire.generate

It seems to me that the cheshire.custom namespace could be deprecated if a protocol was added to the cheshire.generate namespace.

Is there any reason why protocols aren't used in cheshire.generate? The only reason I can think of is performance, but with type hinting protocols are as fast as calling any Java method, AFAIK.

edit: changed wording to be clearer

Write to stream on iterate through record set.

I need to iterate through record set (probably really huge) and actively output data per row to the stream. Like this:

(<generate-stream> {:header "some-info" :data []} writer)
(j/query my-db
         ["select * from universe"]
         :row-fn (fn [row]
                   ;; :data contents here
                   (<generate-stream> row writer)))
(<generate-stream> {:tail "more-info"} writer)

Is it possible?

Can't lazily parse large JSON objects

Cheshire has support for lazily parsing JSON files containing large numbers of objects, but not for parsing large objects themselves.

For example:

{"stars" : [
    {"name" : "Betelgeuse", "magnitude" : 0.58},
    {"name" : "Rigel", "magnitude" : 0.12},
    ... 300 billion others ...
    {"name" : "Antares", "magnitude" : 0.92}]}

Trying to use Cheshire to lazily get the stars in the Milky Way one by one from this JSON doesn't work - if I try to parse the "stars" object using core/parsed-seq, it calls directly into parse/parse*, which bypasses the lazy array functionality in parse/parse, and attempts to evaluate the entire galaxy. Even using parse/parse directly doesn't seem to work (but I'm not sure why).

top level arrays decode as seqs, lower level as vectors; override?

parse-string returns a lazy seq rather than a vector for top-level arrays; at lower levels, say in an object, vectors are returned.

(json/parse-string "{\"foo\": [1, 2, 3]}" true)   
  ;=> {:foo [1 2 3]}
(json/parse-string "[1, 2, 3]" true)        
  ;=> (1 2 3)

The latter is a nice feature when laziness is wanted, but I want all arrays to decode as vectors in my application (decoding and validation are separated, data sizes are small).

Is there a way to automatically override this behavior and have arrays just decode to vecs? Basically what I'd like is an option to parse (or if necessary a dynamic var) that allows substituting parse-array for lazily-parse-array in the conditional within parse/parse.

And just to be clear: I know there are some simple work arounds. For instance, I could check the string for an initial array token before parse-string or do a seq? followed by a vec if true after parse-string. But while it works, it's not very pleasing. I'd have to do that for every request and it complicates my decoding logic which I had delegated to Cheshire. In contrast, the parser knows when it is dealing with an array so could do it automatically and efficiently.

Thanks for your consideration. I've been very happy with Cheshire; it's a great package!

RFC: partial decoding

I made a proof of concept for this a while back, pushed a branch, and then didn't actually explain what it is: https://github.com/dakrone/cheshire/tree/field-predicate-feature

Basically, we want to be able to use the "skip children" feature in the Jackson parser. Unfortunately, it's hard to specify where to skip without some sort of schema, but schemas (typically) would require us to enumerate all the fields we want to parse, rather than use some sort of programmatic specification.

My questionable workaround for this is a predicate function which passes in the parent and child keys as it traverses the JSON structure. So for this structure:

{"a": {"b": 1}}

The predicate would first get passed [nil "a"] to check if {"b": 1} should be parsed, and then ["a" "b"] to check if the 1 should be parsed. This is an adequate solution, at best, but I can't think of an obviously better one.

The performance gains here are at best 50%, as Jackson still needs to figure out where the children nodes end. However, 50% is still pretty good. I'd be interested to hear your thoughts.

Clarity on laziness

Hello,

I recently had a problem with parse-stream in cheshire 5.4.0 acting lazily (it caused a "Stream closed" exception because the lazy loading happened outside the scope of the with-open block that created the stream).

Previously, the same code worked (with an earlier version of cheshire) so I assume that something changed with respect to laziness.

The docstring currently says:

"If laziness is needed, see parsed-seq"

That sort of implies that parse-stream is not meant to be lazy.... but the behaviour implies otherwise. Can we get clarity in the docs on whether this is actually meant to be lazy?

Can't parse top-level arrays

I've got some JSON that contains an array at the top level:

[
  {"foo": "bar"},
  {"hi": "there"}
]

When I try to parse this I get the following output:

(parse-string "[{\"foo\": \"bar\"}, {\"hi\": \"there\"}]")
ClassCastException clojure.core$identity cannot be cast to com.fasterxml.jackson.core.JsonParser cheshire.parse/lazily-parse-array/fn--365 (parse.clj:49)

The JSON RFC indicates that a JSON document can be an object or an array (section 2). Is this a bug, or is there something I need to do to parse this sort of JSON document?

custom vs core encode symbols differently

using the core encoding symbols whether interned or not have no namespace name prefixed to the symbol name. This is not the case for the custom encoder which prefixes the namespace name to the symbol name if the symbol has been interned. This behavior is inconsistent with the core behavior.

Is core supposed to be that much faster that resolving name spaces for symbols is considered a hit?

Jackson dependency causes failures in AOT-compiled projects

The Jackson project deploys changes to existing versions, so you can't reliably depend on the "version" you are using. Today they made a change that causes ClassNotFound exceptions (Exception in thread "main" java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/json/ReaderBasedJsonParser). I am presently looking through their versions in maven to see if overriding with any of these might work.

You can reproduce the problem by AOT compiling a project with Cheshire as a dependency and then attempting to run it.

EDIT: this is evidently not sufficient to reproduce the problem. I am looking more into why I am getting this error.

Seeking to preserve order of keys in map after decode

I am retrieving a JSON template from a remote server which I will use to populate and then send back to create a record on that server. When I decode the template in the body of the response, the resulting map reorders the keys in the original template. Unfortunately, the remote server will not accept my request to create a record since I have submitted a JSON template with reordered keys.

I found a library that allows me to keep the ordered nature of the JSON template using the ordered-map function. With that find, I tried using the decode method to preserve the ordering of the keys in the JSON template by checking the root level field name and converting the associated value to an ordered map:

;; here (new-contribution-receipt) returns a response from the remote server containing
;; the JSON template in the body key
(decode
  (:body (new-receipt))
  false
  (fn [field-name]
    (if (= field-name "contributionReceipt")
      ordered-map
      {})))
;; => keys are not ordered

That didn't work, so I'm trying to think of the problem from a different angle. Perhaps I need to modify Cheshire's decode function so that I can pass an option that would let me apply a function (i.e. ordered-map) to values that are maps?

Does the community have any ideas on how I can tackle this problem?

In 2.0.4 the custom encoding fails to correctly encode on hash-map

While trying to use the custom encoder on this hash-map:
{:resources {:robot (reader trough-1)}, :notes {}, :duration 0.35, :name "robot-move", :procedures ["script3228" "script3220"], :promise "a-promise", :ident "T-105"}

The custom encoder produces this result:
"{"resources":{"robot":["/","/"]},"notes":{},"duration":0.35,"name":"robot-move","procedures":["script3228","script3220"],"promise":"a-promise","ident":"T-105"}"

While the non-custom encoder produces this result:
"{"resources":{"robot":["reader","trough-1"]},"notes":{},"duration":0.35,"name":"robot-move","procedures":["script3228","script3220"],"promise":"a-promise","ident":"T-105"}"

Note that the :robot value for the custom encoder is :["/","/"] but it should be :["reader","trough-1"]
The primary reason for using a custom encoder is that the value "a-promise" is a promise object and a custom encoder is necessary so that encoding can be done at all.

No custom encoders have been added when the call to the custom/encoder was invoked.

Date's don't default to ISO formatting when in the key position

generate-string formats dates as strings. But it does that differently when the same date is in the key versus value position:

(require 'cheshire.core)
(import 'java.util.Date)
(def now (Date.))
(cheshire.core/generate-string {now now})

=> "{\"Thu Apr 17 15:39:35 PDT 2014\":\"2014-04-17T22:39:35Z\"}"

This is surprising.

pretty print stream of data

Is there a way to pretty-print a stream of data? I figured generate-string and generate-stream would have comparable options, but only generate-string seems to be pretty printed.

(println (generate-string data {:pretty true})) ; pretty-printed
(generate-stream data (clojure.java.io/writer "test.json")) {:pretty true}) ; not pretty-printed in file

Default Encoder for java.lang.Character

Was surprised to hit a JsonGenerationException when passing a character literal through generate-smile. Simple enough to add support by doing (add-encoder java.lang.Character encode-str), but was also curious to know if there's a reason this isn't provided by default.

JSON factory parameters do not work

I am using Cheshire to create parameters for Neo4j queries out of Clojure maps. The problem is that Neo4j only accepts json (a subset of it, but that's not the case) with unqoted field names. I was unable to make Cheshire behave this way using my custom factory. Probably something is broken. Here are the snippets:

Manually calling and setting the Jackson Java thing (works):

(binding [factory/*json-factory* (factory/make-json-factory {})]
   (.disable factory/*json-factory* com.fasterxml.jackson.core.JsonGenerator$Feature/QUOTE_FIELD_NAMES)
   (c/generate-string {:a 1}))
=> "{a:1}"

Using Cheshire factory parameters (does not work):

(binding [factory/*json-factory* (factory/make-json-factory {:allow-unquoted-field-names true})]
   (c/generate-string {:a 1}))
 => "{\"a\":1}"

Use `[com.fasterxml.jackson.core/jackson-core "2.0.0"]` to avoid conflicts with old versions

Hi,
I am developer of ringMon which is a Ring middleware that provides web browser REPL interface in various incarnations:

  • It can be inserted into web apps Ring chain
  • it can be added as a development dependency for non-web app
  • it is part of lein-webrepl, a lein2 plugin that can start browser based REPL within any Clojure 1.3.0/lein 2.0 project in same manner that lein repl does, no changes to project.clj needed.

In all above use cases ringMon needs to fit in with existing projects, so to avoid conflicts dependencies were
kept to a minimum. It needs JSON encoder/decoder just to/from strings. Since I started with a Noir 1.2.2 application as a showcase for ringMon running on Heroku, I chose clj-json, already a Noir 1.2.2 dependency, that was using [org.codehaus.jackson/jackson-core-asl "1.5.0"].
All was fine until I needed to parse project.clj. Since clj-json can't do it, I switched to cheshire 3.0.0 that uses [org.codehaus.jackson/jackson-core-asl "1.9.0"], but then it does not work with Noir 1.2.2 since Noir indirectly depends on [org.codehaus.jackson/jackson-core-asl "1.5.0"], that is unfortunately in conflict with 1.9.0 needed by cheshire :).

So, I took a drastic action that did not feel right: cherry pick bits needed
from cheshire, cobble them all in one file and declare dependency on jackson 1.5.0, so now embedded cheshire works fine with Noir 1.2.2. Today I tried to upgrade to Noir 1.3.0 beta and it turns out that Noir also uses cheshire
that is dependent on jackson 1.9.x and therefore incompatible with ringMon.

Out of desperation I decided to find the jackson source, change the package name and pull it in as well and then I had a moment of joy - somebody else has already done it:
Jackson 2.0.0

Basically, from 2.0.0 jackson has moved to GitHub and changed the package name to:
[com.fasterxml.jackson.core/jackson-core "2.0.0"], it took me a very short time to get it working with embedded cheshire. Some configuration flags are shuffled, but nothing major. See it here

Now ringMon is conflict free and it would be very nice if you could try this brand new version of jackson in cheshire - so I can eventually stop embedding it into ringMon :)
Regards
Zoka

difference with clojure.contrib.json on backslash escaping

Hello,

clojure.contrib.json escapes backslashes by default, even though it is not a standard feature Jackson supports this as well through http://jackson.codehaus.org/1.9.0/javadoc/org/codehaus/jackson/JsonParser.Feature.html#ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER

Here is an example that illustrates the issue:

(use 'cheshire.core)
user> (generate-string "<script>explodes</script>")
"\"<script>explodes</script>\""

(use 'clojure.contrib.json)
(json-str "<script>explodes</script>")
"\"<script>explodes<\\/script>\""

Most of the time the default jackson behavior is fine, but if your json is embedded in an html page, in a script tag and it contains an HTML tag (like script, </script> unescaped, it will be parsed as the closing tag of the script container before the real one), this can cause some headaches.

A possible solution would be to enable this feature by default, matching json-str behavior.

What do you think about this?

Custom encoder doesn't work on PersistentQueue

(cheshire.custom/add-encoder clojure.lang.PersistentQueue cheshire.custom/encode-seq)
(cheshire.core/encode clojure.lang.PersistentQueue/EMPTY)

results in: No matching clause: clojure.lang.PersistentQueue@0

custom encoder problem

(ns my-namespace
  (:require
            ;;cheshire stuff
            [cheshire.core :refer :all]
            [cheshire.generate :refer [add-encoder encode-str remove-encoder]]))

;; There are also helpers for common encoding actions:
(add-encoder java.net.URI encode-str)

I have the above code in a NS with a route in it. (also tried this in my core.clj).
I am using wrap-json-response ring middleware on my route.

I am getting java.lang.Exception: Don't know how to write JSON of class java.net.URI exceptions when my rendered gets to objects of type java.new.URI.

any help would be appreciated.

I followed the example from the readme.

New release?

AFAICS, there hasn't been a release in 11 months. Time for 5.4.0?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.