Giter VIP home page Giter VIP logo

transit-clj's People

Contributors

bobby avatar dchelimsky avatar fogus avatar ohpauleez avatar puredanger avatar redinger avatar russolsen avatar stuarthalloway avatar swannodette avatar timewald avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transit-clj's Issues

Incorrect read from ByteArrayInputStream String.getBytes

The code below demonstrates the problem

(let [body {:name ["John"]}
      out (ByteArrayOutputStream.)
      w (transit/writer out :msgpack)]
  (do (transit/write w body)
      (transit/read 
        (transit/reader (ByteArrayInputStream. (.getBytes (.toString out))) 
        :msgpack))))
;=> -17

But in the case of :json the code runs correctly. And also in the case (ByteArrayInputStream. (.toByteArray out)) with :msgpack it runs correctly

transit/read throwing RuntimeException, not EOFException

The docstring for transit/read says it throws an EOFException when the input is empty. However in ReaderFactory, it catches all Throwables and rethrows them as RuntimeExceptions.

It seems like either the docstring for transit-clj read is wrong, or the implementation in transit-java shouldn't be catching all Throwables?

This is using transit-clojure 0.8.295 and transit-java 0.8.319

Here's a minimal test case to reproduce it:

(let [out (java.io.ByteArrayOutputStream. 4096)
      writer (transit/writer out :json)
      _ (transit/write writer {:a 1 :b 2 :c 3 :d 4})
      reader (transit/reader (java.io.ByteArrayInputStream. (.toByteArray out)) :json)]
  (transit/read reader)
  (try (transit/read reader)
       (catch java.io.EOFException e
         (println "Caught EOF Exception"))
       (catch RuntimeException e
         (println "Caught Runtime Exception"))))

MsgPack for Large System Snapshots?

Hi, Im the author of Prevayler, prevayler-clj and Sneer, the sovereign platform.

I have read old anecdotes on the web that transit's msgpack format wasn't production ready. I couldnt find any post stating otherwise. What is the official take on that?

Is transit msgpack suited for large amounts of data, say tens of gigabytes? Is there a known limit?

Keep up the great work and thanks, Klaus

NPE when using :handlers instead of meaningful error message

It seems transit is giving an NPE instead of a meaningful error message when using :handlers and you give it a type it can't handle:

clojure -Sdeps '{:deps {com.cognitect/transit-clj {:mvn/version "RELEASE"}}}' -M /tmp/transit.clj
(require '[cognitect.transit :as transit])

(def ldt-write-handler (transit/write-handler "pod.babashka.sql/local-date-time" str))

(defn write-transit [v]
  (let [baos (java.io.ByteArrayOutputStream.)]
    (transit/write (transit/writer baos :json {:handlers {java.time.LocalDateTime ldt-write-handler}}) v)
    (.toString baos "utf-8")))

(write-transit (into-array String ["foo"]))
Syntax error (NullPointerException) compiling at (/tmp/transit.clj:19:1).
null

Double/NaN is encoded as "NaN" in JSON

Writing the value Double/NaN in JSON encoding emits just the string NaN. I would expect the string ~dNaN or something similar. The spec says a JSON number should be emitted. As JSON doesn't support NaN in numbers, the explicit tagged string value of ~dNaN would be an option.

This issue can be reproduced in a REPL:

(import '[java.io ByteArrayOutputStream ByteArrayInputStream])
(require '[cognitect.transit :as transit])
(def out (ByteArrayOutputStream. 4096))
(def writer (transit/writer out :json))
(transit/write writer [Double/NaN])

(String. (.toByteArray out) "UTF-8")

The last expression evaluates to "[\"NaN\"]". I enclosed Double/NaN in a vector to avoid more verbose quoting.

It should be obvious that the reader can't read back the JSON string NaN as Double/NaN. So a simple round trip test with Double/NaN fails also.

The same is also true for Double/POSITIVE_INFINITY and Double/NEGATIVE_INFINITY.

record-read-handler doesn't demunge hyphenated namespaces

If you try to use a record that was created in a hyphenated namespace, e.g. my-hyphenated-ns.MyRecord, record-read-handler will try to resolve the symbol my_hyphenated_ns/map->MyRecord instead of my-hyphenated-ns/map->MyRecord. As a result, attempting to read a tagged literal for the record type will result in a NullPointerException.

Transit reader produces Clojure keywords that cannot be round-tripped to EDN

Minimal reproduction:

(-> "[\"^ \",\"~:foo}\",\"bar\"]"
    (.getBytes "UTF-8")
    (java.io.ByteArrayInputStream.)
    (cognitect.transit/reader :json)
    cognitect.transit/read
    pr-str
    clojure.edn/read-string)

The above results in a RuntimeException from read-string ("Map literal must contain an even number of forms"), due to the } at the end of the :foo} in the transit string (which, to be clear, I've constructed by hand, rather than outputting from Transit). If you remove the read-string and replace it with keys first, you can see that Transit has not identified the keyword as invalid, and has produced a keyword :foo}. This issue is me asking for Transit instead to report that the input was malformed or is not representable as a valid Clojure data structure.

I understand that for performance reasons Clojure doesn't validate keywords when interning, and I'm not proposing to bikeshed that decision. However, for Transit, that approach has the unfortunate consequence that you have to re-validate data that Transit has read from the potentially-untrusted network before you can assume that its keywords are safe to use in everyday Clojure code.

Specifically, we have a Clojure web app with a ClojureScript front-end, and we use Transit to communicate between them. Sometimes we'll need to take some of the data received from the front-end and write it out somewhere else (to be read back in future, or to communicate it to a separate process, etc). The natural way to do that is via pr-str and clojure.edn/read. This issue introduces a new point of failure in our application -- reading the EDN back in might fail, even though it was written out via pr-str, because Transit might have given us a keyword which is not printable/readable in Clojure.

For additional context, this issue was discovered by an external pen-test, attempting an XSS attack (by manually replaying a modified request), so the answer for us can't just be "well then don't have your front end send data across the wire that you're not going to be capable of reading correctly". To give ourselves some level of confidence that our app isn't going to break when someone tries XSS, we're now having to write additional code to wrap around our calls to transit/read to ensure that the keywords it has read are valid keywords, etc. That seems like it would be better handled when reading the input in the first place.

Falsey cmap Entries are discarded by the Reader

The reader fails to read back maps with composite and falsey keys like {[] 0, false 0} or {[] 0, nil 0}. The bug does not depend on the encoding used, all of them :json, :json-verbose and :mgspack show this.

The problem manifests itself in transit.clj line 211 where (if-let [k @next-key] ...) is used. Its quite obvious that falsey keys would be skipped here.

The following code can be used to reproduce this issue in the REPL:

(import '[java.io ByteArrayOutputStream ByteArrayInputStream])
(require '[cognitect.transit :as transit])
(def out (ByteArrayOutputStream. 4096))
(def writer (transit/writer out :json))
(transit/write writer {[] 0, false 0})
(def in (ByteArrayInputStream. (.toByteArray out)))
(def reader (transit/reader in :json))
(prn (transit/read reader))

The last expression evaluates to {[] 0} but it should be {[] 0, false 0}.

I found this bug using test.check with the any-printable generator. The shrunk test case was [{false 0, [] 0}] which is only one vector to much. Very impressive!

version 0.8.307 is broken?

Adding this to a vanilla lein project and running the most basic writer example yields:

broken-transit.core> (def out (ByteArrayOutputStream. 4096))
#'broken-transit.core/out
broken-transit.core> (def writer (transit/writer out :json))
#'broken-transit.core/writer
broken-transit.core> (transit/write writer "foo")
NullPointerException   cognitect.transit/writer/reify--5416 (transit.clj:160)
broken-transit.core>

Example repo. Details here.

UPDATE: in Slack, Alex Miller mentioned that the Java version may be relevant here, so adding that info just in case it helps:

$ java -version
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
$

UPDATE 2: upgraded to JDK 10, still busted.

$ java -version
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
java version "10" 2018-03-20
Java(TM) SE Runtime Environment 18.3 (build 10+46)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10+46, mixed mode)
$

transit/read doesn't close InputStream

I am trying to understand whether transit/read is supposed to close the InputStream. It currently doesn't, as demonstrated here:

(let [s "[\"~#list\",[0,1,2.0,true,false,\"five\",\"~:six\",\"~$seven\",\"~~eight\",null]]"
      close-count (atom 0)
      bais (proxy [ByteArrayInputStream] [(.getBytes s)]
             (close []
               (swap! close-count inc)
               (proxy-super close)))
      rdr (transit/reader bais :json {})
      result (transit/read rdr)]
  (println result)
  (println "InputStream closed" @close-count "times") ;prints out "0 times"!
  (.close bais)                                             ; show that close() really works
  (println "InputStream closed" @close-count "times")) ;now it prints out "1 times"

If I call (transit/read rdr) twice, it will close the InputStream the second time -- although there is nothing there to read since the InputStream is at the end position, so no data is returned.

Don't quote MessagePack scalar values at the top level

It seems like the MessagePack transit writer quotes scalar values at the top level. Reading through the spec: https://github.com/cognitect/transit-format#quoting it seems like there's a good reason for doing this for JSON, but I'm not sure that same reason carries over into MessagePack, especially since users of the binary protocol might be more space conscious?

(def bs (java.io.ByteArrayOutputStream.))
(def writer (transit/writer bs :msgpack))
(transit/write writer 42)
(do (doseq [b (.toByteArray bs)] (printf "%02x " b)) (println))
;; => 92 a3 7e 23 27 2a
;; which is a quoted value, I'd like it to be the single byte 0x2a

Intended way to gracefully upgrade?

The doc says:

If storing Transit data durably, readers and writers are expected to use the same version of Transit and you are responsible for migrating/transforming/re-storing that data when and if the transit format changes.

Does Transit have any way to detect what version of Transit data it is dealing with? Or is it expected the application tracks separately which version a payload was encoded in, so that it chooses the correct version for decoding and possibly re-encoding in the new version?

[question] Is there a generic way to (de)serialize arrays using transit?

Bringing this question over here:

https://ask.clojure.org/index.php/10617/generic-way-to-de-serialize-arrays-using-transit

Is there a generic way to (de)serialize arrays using transit?

I find myself in this situation:

(require '[cognitect.transit :as transit])

(def array-write-handler (transit/write-handler "pod.babashka.sql/array" vec))

(def array-type (class (into-array Object [])))

(defn write-transit [v]
  (let [baos (java.io.ByteArrayOutputStream.)]
    (transit/write (transit/writer baos :json {:handlers {array-type array-write-handler}}) v)
    (.toString baos "utf-8")))

(prn (write-transit (into-array Object ["foo"]))) ;; works
(prn (write-transit (into-array String ["foo"]))) ;; ERROR

and it looks like you have to encode serialization explicitly for every type, which can be error prone and tedious. Can I use some sort of fallback handler before transit decides it cannot encode a certain type of object? In this fallback I could check if the object is an array using .isArray?

The way I currently work around this is using walk/postwalk and make some custom representation, but this seems to defeat the purpose of transit.

Request for newline between multiple objects

In the example usage in the README, the string representation of multiple transit objects has a single space interposed among the objects:

(.toString out)
;; "{\"~#'\":\"foo\"} [\"^ \",\"~:a\",[1,2]]"

It turns out that interposing newlines between each object makes it trivial for javascript code to read multiple objects since .split("\n") can 'tokenize' each json blob which may be individually parsed by the JSON reader. Since JSON does not support raw newlines in strings and Transit does not appear to use them anywhere else, newlines do not appear to break Transit readers. I'm requesting that newlines be used instead of spaces like this:

(.toString out)
;; => "{\"~#'\":\"foo\"}
;;[\"^ \",\"~:a\",[1,2]]"

I'm not familiar with all Transit readers so this may be a misguided request. If so, please disregard.

'^' crashes reader with StringIndexOutOfBoundsException

MWE: https://nextjournal.com/a/LEjZdbN3nL5irFe2khgCM?token=KBtQJAPvS6TVvcmXyeN5U5

(require '[cognitect.transit :as transit])
(import [java.io ByteArrayInputStream ByteArrayOutputStream])
(def out "\"^\"")
(def in (ByteArrayInputStream. (.getBytes out)))
(def reader (transit/reader in :json))
(transit/read reader)

Stacktrace:

{:via [{:type java.lang.RuntimeException, :message "java.lang.StringIndexOutOfBoundsException: String index out of range: 1", :at [com.cognitect.transit.impl.ReaderFactory$ReaderImpl read "ReaderFactory.java" 114]} {:type java.lang.StringIndexOutOfBoundsException, :message "String index out of range: 1", :at [java.lang.String charAt "String.java" 658]}], :trace [[java.lang.String charAt "String.java" 658] [com.cognitect.transit.impl.ReadCache codeToIndex "ReadCache.java" 30] [com.cognitect.transit.impl.ReadCache cacheRead "ReadCache.java" 40] [com.cognitect.transit.impl.JsonParser parseVal "JsonParser.java" 60] [com.cognitect.transit.impl.JsonParser parse "JsonParser.java" 46] [com.cognitect.transit.impl.ReaderFactory$ReaderImpl read "ReaderFactory.java" 112] [cognitect.transit$read invokeStatic "transit.clj" 319] [cognitect.transit$read invoke "transit.clj" 315] [user$eval871 invokeStatic "NO_SOURCE_PATH" 5] [user$eval871 invoke "NO_SOURCE_PATH" 5] [clojure.lang.Compiler eval "Compiler.java" 7176] [clojure.lang.Compiler eval "Compiler.java" 7166] [clojure.lang.Compiler eval "Compiler.java" 7131] [clojure.core$eval invokeStatic "core.clj" 3214] [clojure.core$eval invoke "core.clj" 3210] [user$eval869 invokeStatic "NO_SOURCE_PATH" 5] [user$eval869 invoke "NO_SOURCE_PATH" 5] [clojure.lang.Compiler eval "Compiler.java" 7176] [clojure.lang.Compiler eval "Compiler.java" 7131] [clojure.core$eval invokeStatic "core.clj" 3214] [clojure.core$eval invoke "core.clj" 3210] [unrepl.repl$hpNiwYgtt8PN_xegDbIo_Axg5Xo$start$interruptible_eval__637$fn__638$fn__639$fn__640 invoke "NO_SOURCE_FILE" 697] [unrepl.repl$hpNiwYgtt8PN_xegDbIo_Axg5Xo$start$interruptible_eval__637$fn__638$fn__639 invoke "NO_SOURCE_FILE" 697] [clojure.lang.AFn applyToHelper "AFn.java" 152] [clojure.lang.AFn applyTo "AFn.java" 144] [clojure.core$apply invokeStatic "core.clj" 665] [clojure.core$with_bindings_STAR_ invokeStatic "core.clj" 1973] [clojure.core$with_bindings_STAR_ doInvoke "core.clj" 1973] [clojure.lang.RestFn invoke "RestFn.java" 425] [unrepl.repl$hpNiwYgtt8PN_xegDbIo_Axg5Xo$start$interruptible_eval__637$fn__638 invoke "NO_SOURCE_FILE" 689] [clojure.core$binding_conveyor_fn$fn__5739 invoke "core.clj" 2030] [clojure.lang.AFn call "AFn.java" 18] [java.util.concurrent.FutureTask run "FutureTask.java" 266] [java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1149] [java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 624] [java.lang.Thread run "Thread.java" 748]], :cause "String index out of range: 1"}

When using with HTTP Chunked Transfer Encoding, 6.7MB becomes 10.5MB due to chunking overheads

This is a big problem for me. I am using transit-json on a Web endpoint, and I am streaming the result of encoding. The stream is using HTTP chunked transfer encoding. This essentially indicates the overall size is unknown, and informs you of how many bytes are about to come repeatedly until it is finished.

Due to the flushes from Transit, these chunks end up very small. Usually just a string and some open/close parens. This overhead causes the 6.7MB of transit to transport as 10.5MB of HTTP. That is, it gets significantly bigger when transferred this way. For this application and expected timing, this directly impacts 10s of load time.

transit-clj seems to flush after every write (https://github.com/cognitect/transit-java/blob/cff7111c2081fc8415cd9bd6c6b2ba518680d660/src/main/java/com/cognitect/transit/impl/AbstractEmitter.java#L189). I would guess this is related to #43.

Reflection Warning on REPL Start

Hi,
I've started getting a Reflection warning:
"Reflection warning, cognitect/transit.clj:142:19 - call to static method writer on com.cognitect.transit.TransitFactory can't be resolved (argument types: unknown, java.io.OutputStream, unknown)."
Reproduced in Java SE 8u152, using Clojure 1.8.0 and Clojure 1.9.0; the error only appears when I uncomment the following function definition:

(defn connect-testdb
  []
  (def conn
   (<!!
     (client/connect
       {:db-name "test"
        :account-id client/PRO_ACCOUNT
        :access-key *****
        :secret *****
        :region "none"
        :endpoint "localhost:8998"
        :service "peer-server"}))))

Transit reader fails on empty input

Hi.

Should return nil instead?

(require '[cognitect.transit :as transit])
(import [java.io ByteArrayInputStream])
(-> (byte-array 0) 
    (ByteArrayInputStream.) 
    (transit/reader :json) 
    (transit/read))
;; EOFException   com.cognitect.transit.impl.JsonParser.parse (JsonParser.java:44)

Please make the spec versioned as well as the implementation releases

If you're not careful, dependency correlation could become confused.

(ie v??? of the ruby implementation is compatible with which versions of the clojurescript implementations?)

To solve this you could possibly version the spec as well as the individual releases and write a version table so people can know which versions are compatible with which spec versions. FWIW, my 2 cents.

Writing large object to file is slow

Problem:

Writing a large object to a file with transit-clj was slower than I expected. It takes about 1 second with transit-clj and roughly 1/10th of that time with regular spit.

Test data:

test.edn.zip
Extract to test.edn.

Repro:

clj -Sdeps '{:deps {com.cognitect/transit-clj {:mvn/version "0.8.313"}}}'
(require '[clojure.edn :as edn])
(def edn (edn/read-string (slurp "test.edn")))
(count (keys edn)) ;;=> 799
(require '[cognitect.transit :as transit])
(require '[clojure.java.io :as io])
(def writer (transit/writer (io/output-stream (io/file "transit.json")) :json))
(time (transit/write writer edn)) ;; prints:
"Elapsed time: 1151.116438 msecs"

While writing the same EDN to a file (printing with str is much faster):

(time (with-open [fos (java.io.FileOutputStream. "/tmp/foo.edn")] (let [w (io/writer fos)] (.write w (str edn)) (.flush fos)))) ;; prints:
"Elapsed time: 73.316135 msecs"

Flamegraphs

Created with https://github.com/clojure-goes-fast/clj-async-profiler

Flamegraph of transit:
transit-flamegraph.svg.zip

Flamegraph of EDN to file:
edn-flamegraph.svg.zip

Make test.check a dev dependency

test.check currently leaks to project depending on transit-clj. As it is only used for tests it can be safely declared as a test dependency.

Memory leak in version 0.8.271

My co-worker (Tanya Romankova) and I have discovered a memory leak in version 0.8.271 of transit-clj. We were attempting to encode several million items with transit (each with a separate writer), and eventually got a GC overhead limit OOME. After examining the heap dump it became clear that the issue was transit-java's WriteHandler cache.

Every time a writer is created a new hash map of default handlers is also created by calling cognitect.transit/default-write-handlers. Since the values of the hash map that is generated are reified objects, they have identity equality semantics. This means that no two hash maps that come out of cognitect.transit/default-write-handlers are equal:

(not= (cognitect.transit/default-write-handlers) (cognitect.transit/default-write-handlers))
=> true

The result is that the WriteHandler cache in transit-java accrues a new entry every time you create a writer.

We've created a repository that demonstrates the memory leak. The README contains details about our diagnosis as well as sample output.

https://github.com/pjstadig/transit-clj-0.8.271-memory-leak

Error when encoding map with metadata on symbol key

transit-clj errors when trying to encode a map with a symbol key with metadata on it:

(require 'cognitect.transit)
(in-ns 'cognitect.transit)

(import [java.io File ByteArrayInputStream ByteArrayOutputStream OutputStreamWriter])

(def out (ByteArrayOutputStream. 2000))
(def w (writer out :json {:transform write-meta}))

;; meta on val works
(write w {'key (with-meta 'val {:foo 'bar})})

;; but writing a map with metadata on the key errors
(write w {(with-meta 'key {:foo 'bar}) 'val})

;; Execution error at com.cognitect.transit.impl.AbstractEmitter/emitEncoded (AbstractEmitter.java:1).
;;   Cannot be used as a map key cognitect.transit.WithMeta@5f483ec3

;; writing the same with a second `{}` key works:
(write w {(with-meta 'key {:foo 'bar}) 'val {} 'val2})

I think the issue is that transit should pick a cmap for a symbol with metadata on it.

The same works with transit-cljs:

(require [cognitect.transit :as t]
(def w (t/writer :json {:transform t/write-meta}))
(cognitect.transit/write w {(with-meta 'key {:foo 'bar}) 'val})
;; => "["^ ",["~#with-meta",["~$key",["^ ","~:foo","~$bar"]]],"~$val"]"

Warn user about custom handler's tags potentially conflicting with default handlers

As a new user of transit, it wasn't obvious to me that default handler's tags share the same namespace as my custom tags.
It makes sense now, but hindsight is 20/20...

Ideally for me, a warning should be issued when trying to add a handler on a tag that already denotes another type, but the current "late binding" api doesn't seem amenable to this.

Instead, I would suggest adding a warning in the documentation of write-handler reminding the user to check for conflicts with previously declared tags (default or otherwise). Ideally pointing to the spec page on github where said default tags are easily visible.

Happy to contribute this if there's consensus!

Provide sequence abstraction over stream of Transit-encoded objects

I often want to read a sequence of Transit-encoded objects from a stream, and apply some transformations to them. The attached files create a lazy sequence from an arbitrary InputStream, and properly handle the end-of-file exception.

You're welcome to use them however you wish; I warrant that I am the author and sole holder of copyright, that there are no known patents covering this work, and hereby grant to to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.

transit-seq.zip

Mapping with read

How can one go about reading from a reader with an arbitrary number of multiple values on the stream without hitting an EOFException? For example:

(def out (ByteArrayOutputStream. 4096))
(def writer (transit/writer out :json))
(map (partial transit/write writer) some-collection)
(def in (ByteArrayInputStream. (.toByteArray out)))
(def reader (transit/reader in :json))

Now I'd like to do as many transit/read calls as are necessary to read the whole stream, but no more, preferably through a map or reduce construct, but it's not clear how to go about that. There doesn't appear to be a way to query the reader about whether there are more items to read.

Add support for transit-java-0.8.327 and its defaultWriteHandler

As this is quite important functionality I guess transit-clj should not be left behind.

This should not be hard to implement, happy to do a pull request (but I think you don't accept those from non-contrib ppl, right?).

Changes required would basically be something like:

(defn writer
  "Creates a writer over the provided destination `out` using
   the specified format, one of: :msgpack, :json or :json-verbose.
   An optional opts map may be passed. Supported options are:
   :handlers - a map of types to WriteHandler instances, they are merged
   with the default-handlers and then with the default handlers
   provided by transit-java
   :unsupported-handler - a default WriteHandler that will be used for types that don't have a handler 
   (if none is  used an Exception is thrown by default). This way types can be handled (e.g. 
   converted to string etc.) so as they can still be part of the payload. A caution should be taken when 
   leveraging this feature as these types will likely not be fully unmarshalled to their original type."
  ([out type] (writer out type {} nil))
  ([^OutputStream out type {:keys [handlers unsupported-handler]}]
     (if (#{:json :json-verbose :msgpack} type)
       (let [handler-map (if (instance? HandlerMapContainer handlers)
                           (handler-map handlers)
                           (merge default-write-handlers handlers))]
         (Writer. (TransitFactory/writer (transit-format type) out handler-map unsupported-handler)))
(throw (ex-info "Type must be :json, :json-verbose or :msgpack" {:type type})))))

As outlined in the transit-java readme, a default write handler can look and be used like this:

WriteHandler customDefaultWriteHandler = new WriteHandler() {
    @Override
    public String tag(Object o) { return "unknown"; }
    @Override
    public Object rep(Object o) { return o.toString(); }
    @Override
    public String stringRep(Object o) { return o.toString(); }
    @Override
    public WriteHandler getVerboseHandler() { return this; }
};
OutputStream out = new ByteArrayOutputStream();
Writer w = TransitFactory.writer(TransitFactory.Format.JSON, out, customDefaultWriteHandler);
w.write(new Point(37,42));
System.out.print(out.toString());
// => "[\"~#unknown\",\"Point at 37, 42\"]"

Imho this is a super useful feature and a vital use case without which transit cannot be used in enterprise environment - where one simply cannot fully control what data types are used by the other (upstream/downstream) applications. It has to be handled more gracefully then Runtime error / 500.

map-as-array Marker not escaped in JSON

Another bug I found with test.check. If I write the value ["^ "] it is read back as {}. The problem here is that the string "^ " is the special map-as-array marker which is not escaped but should be.

Note that the value [(str "^" " ")] is escaped correctly and emitted as ["~^ "] in JSON but still read back as {}.

The reason why the string literal "^ " or any other interned string equal to "^ " is not escaped can be found in AbstractEmitter line 79. Here the string to escape is compared using the == identity comparison operator with Constants.MAP_AS_ARRAY. The reasoning behind using == was properly not to escape Constants.MAP_AS_ARRAY as nearly used in JsonEmitter line 156. I say "nearly" because there only "^ " is used instead of Constants.MAP_AS_ARRAY. Because of string interning this strategy doesn't work either so I think a special way to output "^ " is needed.

The reason why a correctly escaped JSON value of ["~^ "] is not read back as ["^ "] can be found in JsonParser line 125 where the unescaped string "^ " is compared to Constants.MAP_AS_ARRAY.

Using a custom object like a Java enum for Constants.MAP_AS_ARRAY might one overall solution to both problems.

Handle Umlauts correctly

Currently transit-clj cannot handle umlaute correctly.
This is the shortest example I can come up with:

(require '[cognitect.transit :as transit])
(import [java.io ByteArrayInputStream ByteArrayOutputStream])

(def out (ByteArrayOutputStream. 4096))
(def writer (transit/writer out :json))
(transit/write writer "ü")
(.toString out)
=> "[\"~#'\",\"ü\"]"

Add support for type ArrayMap

Im not sure if this is a CLJ or CLJS comment.

When I send a larger array-map as transit, it loses order when being converted back in cljs. It does look like the data continues to maintain order through the conversion to transit.

My array-map is 12 elements long (ie larger than than the 8 elements where the default map structure is already a array-map. When my data was less than 8 elements, order was maintained.

NegativeArraySizeException when decoding some msgpack strings

It seems that if I've encoded a string using msgpack with a length greater than or equal to 128 bytes and stored the header + length as the bytes D9 80, transit-clj gives a NegativeArraySizeException (stack trace at bottom). If the header + length is instead provided in a two-byte format DA 00 80, the message is decoded as expected.

To reproduce, I generated a file with the following code:

(require '[clojure.java.io :as io])
(require '[cognitect.transit :as transit])

(with-open [f (io/output-stream "test_good.msgpack")]
   (transit/write (transit/writer f :msgpack) (clojure.string/join "" (repeat 128 "X"))))

I then copied test_good.msgpack to test_bad.msgpack, opened it with a hex editor, and changed the bytes DA 00 80 to D9 80. I can then see that, while both files decode the same when using msgpack, trying to read the second file fails:

(require '[clojure.java.io :as io])
(require '[cognitect.transit :as transit])

(defn read-transit [f] 
   (with-open [s (io/input-stream f)] 
      (transit/read (transit/reader s :msgpack))))

(read-transit "test_good.msgpack") ; => okay, string of 128 Xs
(read-transit "test_bad.msgpack") ; => exception

Examples of said files attached: test files.zip

Verified msgpack identicalness with the following (using clojure-msgpack 1.2.0 ):

(require '[clojure.java.io :as io])
(require '[msgpack.core :as msg])

(defn read-msg [fname]
   (with-open [f (io/input-stream fname)] (msg/unpack f)))

(= (read-msg "test_good.msgpack") (read-msg "test_bad.msgpack")) ; => true

Stack trace when trying to decode:

#error {
 :cause nil
 :via
 [{:type java.lang.RuntimeException
   :message "java.lang.NegativeArraySizeException"
   :at [com.cognitect.transit.impl.ReaderFactory$ReaderImpl read "ReaderFactory.java" 119]}
  {:type java.lang.NegativeArraySizeException
   :message nil
   :at [org.msgpack.unpacker.MessagePackUnpacker readRawBody "MessagePackUnpacker.java" 362]}]
 :trace
 [[org.msgpack.unpacker.MessagePackUnpacker readRawBody "MessagePackUnpacker.java" 362]
  [org.msgpack.unpacker.MessagePackUnpacker readOneWithoutStackLarge "MessagePackUnpacker.java" 228]
  [org.msgpack.unpacker.MessagePackUnpacker readOneWithoutStack "MessagePackUnpacker.java" 139]
  [org.msgpack.unpacker.MessagePackUnpacker readValue "MessagePackUnpacker.java" 566]
  [org.msgpack.unpacker.AbstractUnpacker readValue "AbstractUnpacker.java" 65]
  [com.cognitect.transit.impl.MsgpackParser parseVal "MsgpackParser.java" 55]
  [com.cognitect.transit.impl.MsgpackParser parseArray "MsgpackParser.java" 135]
  [com.cognitect.transit.impl.MsgpackParser parseVal "MsgpackParser.java" 53]
  [com.cognitect.transit.impl.MsgpackParser parse "MsgpackParser.java" 44]
  [com.cognitect.transit.impl.ReaderFactory$ReaderImpl read "ReaderFactory.java" 117]
  [cognitect.transit$read invokeStatic "transit.clj" 296]
  [cognitect.transit$read invoke "transit.clj" 293]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.