smee / binary Goto Github PK

View Code? Open in Web Editor NEW

74.0 74.0 10.0 176 KB

Clojure API for binary format I/O using java's stream apis

Java 19.56% Clojure 80.44%

binary clojure codec stream

binary's People

Contributors

Stargazers

Watchers

Forkers

vonwenm nagyistoce christo-auer harrigan codewhale xbalabala whittlesjr ilyapomaskin paulschulz abritov

binary's Issues

Trailing bytes with repeated :separator-using strings

I'm not sure if this is intended behavior or not:

(let [str-seq (repeated (string "UTF-8" :separator 0))
      in (java.io.ByteArrayInputStream. (.getBytes "abc\u0000def\u0000ghi" "UTF-8"))]
  [(decode str-seq in) (.read in)])
; [["abc" "def"] -1]

Should the parser be consuming the trailing bytes ("ghi") in this case? If so, is there a way for my code to access those bytes?

Binary utilities

Hi! So I recently (finally) (sort of) finished my BACnet implementation. I ended up with a bunch of generic utility functions. I'd much prefer to contribute to an existing project, rather than spin off a new one, so I thought I'd ask if you'd want to merge any or all of the following as a separate util namespace?

https://gist.github.com/WhittlesJr/dd94e7e4d9e21460b4dd9cd31b9fcaa1

The "util.core" namespace has more generic functions. I'm thinking of making a separate library for my map-matching functions, but I included them in the gist so you could see what they are.

I included the npdu example so you can get a sense for my use case, but it's just a small part of the BACnet protocol.

Conditionals and complex bit handling

Is there currently a way to use "conditional" fields? My use case is for the BACnet protocol, which is somewhat complex. Many fields are included in the spec that only show up if a previous field matches a certain value (or some other more complicated condition is met). Or sometimes, based on an earlier condition, the parsing rules for further segments will change...

I'm not sure how to do that with this library... maybe I'm missing something? I'm investigating header further to see if it can do everything I need, and if I find out I'll close this issue.

Q: header example

Hi,

I'm extending your Bitcoin protocol example (demo/bitcoin.clj) to
handle Bitcoin messages that are sent over the wire. The format is:

magic (4 bytes)
command (12 bytes)
length (4 bytes)
checksum (4 bytes)
payload (variable length)

The problem I have run into is the checksum field between the length
and the payload. I tried using something like:

(def payload (binary/blob :prefix length-and-checksum))

and having length-and-checksum reify BinaryIO so that it ignores the
checksum when reading and just returns the length. However, for
writing, I don't have access to the payload from here so I can't
compute the checksum. Also, I'd prefer to compute the checksum outside
of the codec.

Do you know of any way of doing this? Sorry if I missed something
obvious and thank you for creating smee/binary.

Regards,
@harrigan

Public BinaryIO protocol or codec for `nil`

Thanks for awesome library!

I need to serialize no value in some cases. For example:

(b/header :int-be
          (fn header->body-codec [length]
            (if (= -1 length)
              codec/null
              (b/blob :length length)))
          "not used")

So I add null codec. But it's ugly:

;; hack
(def null
  (b/compile-codec
   (byte-array 0)
   (constantly (byte-array 0))
   (constantly nil)))

Is it possible to make public BinaryIO protocol or add nil primitive codec?

Terminated strings

Some codecs use null-terminated strings whose length isn't known in advance, which is very awkward to parse at the moment. An optional :suffix or :terminator argument to string and/or repeated would be very useful.

Use exceptions rather than assertions for parse errors

Would you have any objection to changing constant and enum to throw exceptions rather than assertion errors?

Can't omit final separator when encoding

(Related to Issue #3)

I'm having trouble encoding a fixed-length string sequence while omitting the final separator:

(defn fixed-string-seq [size]
    (padding (repeated (string "UTF-8" :separator 0)) size))

(let [out (java.io.ByteArrayOutputStream.)]
    (encode (fixed-string-seq 11) out ["abc" "def" "ghi"]))
; IllegalArgumentException Data should be max. 11 bytes, but attempting to write 0 bytes more!  org.clojars.smee.binary.core/padding/reify--1466 (core.clj:302)

The content should be exactly 11 bytes without the trailing null, but it seems the encoder doesn't like this.

[PATCH] [fix] typo fixed

---
 src/org/clojars/smee/binary/core.clj | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/src/org/clojars/smee/binary/core.clj b/src/org/clojars/smee/binary/core.clj
index e85f6f7..2223ffe 100644
--- a/src/org/clojars/smee/binary/core.clj
+++ b/src/org/clojars/smee/binary/core.clj
@@ -52,7 +52,7 @@
    :short    (primitive-codec .readShort .writeShort short :be)
    :short-le (primitive-codec .readShort .writeShort short :le)
    :short-be (primitive-codec .readShort .writeShort short :be)
-   
+
    :ushort    (primitive-codec .readUnsignedShort .writeUnsignedShort int :be)
    :ushort-le (primitive-codec .readUnsignedShort .writeUnsignedShort int :le)
    :ushort-be (primitive-codec .readUnsignedShort .writeUnsignedShort int :be)
@@ -315,7 +315,7 @@ Flag names `null` are ignored. Bit count will be padded up to the next multiple
                    (fn [bytes] (set (map idx->flags (filter #(bit-set? bytes %) bit-indices)))))))

 (defn header
-  "Decodes a header using `header-codec`. Passes this datastructure to `header->body` which returns the codec to
+  "Decodes a header using `header-codec`. Passes this datastructure to `header->body-codec` which returns the codec to
 use to parse the body. For writing this codec calls `body->header` with the data as parameter and
 expects a value to use for writing the header information.
 If the optional flag `:keep-header` is set, read will return a vector of `[header body]`
@@ -327,11 +327,11 @@ else only the `body` will be returned."
         (let [header (read-data header-codec big-in little-in)
               body-codec (header->body-codec header)
               body (read-data body-codec big-in little-in)]
-          (if keep-header? 
-            {:header header 
+          (if keep-header?
+            {:header header
              :body body}
             body)))
-      (write-data [_ big-out little-out value] 
+      (write-data [_ big-out little-out value]
         (let [body (if keep-header? (:body value) value)
               header (if keep-header? (:header value) (body->header body))
               body-codec (header->body-codec header)]
@@ -354,9 +354,9 @@ Example:
     (encode (padding (repeated (string \"UTF8\" :separator 0)) :length 11 :truncate? true) outstream [\"abc\" \"def\" \"ghi\"])
     => ; writes bytes [97 98 99 0 100 101 102 0 103 104 105]
        ; observe: the last separator byte was truncated!"
-  [inner-codec & {:keys [length 
+  [inner-codec & {:keys [length
                          padding-byte
-                         truncate?] 
+                         truncate?]
                   :or {padding-byte 0
                        truncate? false}
                   :as opts}]
@@ -426,12 +426,12 @@ Example:
     Object (toString [_] (str "<BinaryIO aligned, options=" opts ">"))))


-(defn union 
+(defn union
   "Union is a C-style union. A fixed number of bytes may represent different values depending on the
 interpretation of the bytes. The value returned by `read-data` is a map of all valid interpretations according to
 the specified unioned codecs.
 Parameter is the number of bytes needed for the longest codec in this union and a map of value names to codecs.
-This codec will read the specified number of bytes from the input streams and then successively try to read 
+This codec will read the specified number of bytes from the input streams and then successively try to read
 from this byte array using each individual codec.

 Example: Four bytes may represent an integer, two shorts, four bytes, a list of bytes with prefix or a string.
@@ -442,7 +442,7 @@ Example: Four bytes may represent an integer, two shorts, four bytes, a list of
               :prefixed (repeated :byte :prefix :byte)
               :str (string \"UTF8\" :prefix :byte)})"
   [bytes-length codecs-map]
-  (padding 
+  (padding
     (reify BinaryIO
       (read-data  [_ big-in _]
         (let [arr (byte-array bytes-length)
@@ -480,14 +480,14 @@ Example: Four bytes may represent an integer, two shorts, four bytes, a list of
   "An enumerated value. `m` must be a 1-to-1 mapping of names (e.g. keywords) to their decoded values.
 Only names and values in `m` will be accepted when encoding or decoding."
   (let [pre-encode (strict-map m lenient?)
-        post-decode (strict-map (map-invert m) lenient?)] 
+        post-decode (strict-map (map-invert m) lenient?)]
     (compile-codec codec pre-encode post-decode)))

-#_(defn at-offsets 
+#_(defn at-offsets
   "Read from a stream at specific offsets. Problems are we are skipping data inbetween and we miss data earlier in the stream."
   [offset-name-codecs]
   {:pre [(every? #(= 3 (count %)) offset-name-codecs)]}
-  (let [m (reduce (fn [m [offset name codec]] (assoc m offset [name codec])) (sorted-map) offset-name-codecs)] 
+  (let [m (reduce (fn [m [offset name codec]] (assoc m offset [name codec])) (sorted-map) offset-name-codecs)]
     (reify BinaryIO
       (read-data [this big-in little-in]
         (loop [pos (.size big-in), pairs (seq m), res {}]
@@ -495,7 +495,7 @@ Only names and values in `m` will be accepted when encoding or decoding."
             res
             (let [[seek-pos [name codec]] (first pairs)
                   _ (.skipBytes big-in (- seek-pos pos))
-                  obj (read-data codec big-in little-in)]              
+                  obj (read-data codec big-in little-in)]
               (recur (.size big-in) (next pairs) (assoc res name obj))))))
       (write-data [this big-out little-out values]
         (throw :not-implemented)))))
@@ -513,7 +513,7 @@ Only names and values in `m` will be accepted when encoding or decoding."
       bytes))
   (write-data [this out _ _]
     (.write ^OutputStream out (.getBytes ^String this)))
-  
+
   java.lang.String
   (read-data [this big-in _]
     (let [^bytes bytes (read-bytes big-in (count this))
@@ -522,7 +522,7 @@ Only names and values in `m` will be accepted when encoding or decoding."
       res))
   (write-data [this out _ _]
     (.write ^OutputStream out (.getBytes ^String this)))
-  
+
   clojure.lang.ISeq
   (read-data [this big-in little-in]
     (map #(read-data % big-in little-in) this))
-- 
2.1.4

needs docs

This library worked great for me, but it needs docs. ;) Some use examples, e.g. reading from a byte array, would be nice.

Native arrays for repeated primitives

Would it be possible for repeated to output a native array when given a primitive codec? This would be especially useful for codecs that include binary blobs, since wrapping each byte in a java.lang.Byte is quite wasteful.

A separate codec like repeated-prim or even just bytes would also work.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.