smee / binary Goto Github PK
View Code? Open in Web Editor NEWClojure API for binary format I/O using java's stream apis
Clojure API for binary format I/O using java's stream apis
I'm not sure if this is intended behavior or not:
(let [str-seq (repeated (string "UTF-8" :separator 0))
in (java.io.ByteArrayInputStream. (.getBytes "abc\u0000def\u0000ghi" "UTF-8"))]
[(decode str-seq in) (.read in)])
; [["abc" "def"] -1]
Should the parser be consuming the trailing bytes ("ghi") in this case? If so, is there a way for my code to access those bytes?
Hi! So I recently (finally) (sort of) finished my BACnet implementation. I ended up with a bunch of generic utility functions. I'd much prefer to contribute to an existing project, rather than spin off a new one, so I thought I'd ask if you'd want to merge any or all of the following as a separate util namespace?
https://gist.github.com/WhittlesJr/dd94e7e4d9e21460b4dd9cd31b9fcaa1
The "util.core" namespace has more generic functions. I'm thinking of making a separate library for my map-matching functions, but I included them in the gist so you could see what they are.
I included the npdu
example so you can get a sense for my use case, but it's just a small part of the BACnet protocol.
Is there currently a way to use "conditional" fields? My use case is for the BACnet protocol, which is somewhat complex. Many fields are included in the spec that only show up if a previous field matches a certain value (or some other more complicated condition is met). Or sometimes, based on an earlier condition, the parsing rules for further segments will change...
I'm not sure how to do that with this library... maybe I'm missing something? I'm investigating header
further to see if it can do everything I need, and if I find out I'll close this issue.
Hi,
I'm extending your Bitcoin protocol example (demo/bitcoin.clj) to
handle Bitcoin messages that are sent over the wire. The format is:
The problem I have run into is the checksum field between the length
and the payload. I tried using something like:
(def payload (binary/blob :prefix length-and-checksum))
and having length-and-checksum reify BinaryIO so that it ignores the
checksum when reading and just returns the length. However, for
writing, I don't have access to the payload from here so I can't
compute the checksum. Also, I'd prefer to compute the checksum outside
of the codec.
Do you know of any way of doing this? Sorry if I missed something
obvious and thank you for creating smee/binary.
Regards,
@harrigan
Thanks for awesome library!
I need to serialize no value in some cases. For example:
(b/header :int-be
(fn header->body-codec [length]
(if (= -1 length)
codec/null
(b/blob :length length)))
"not used")
So I add null codec. But it's ugly:
;; hack
(def null
(b/compile-codec
(byte-array 0)
(constantly (byte-array 0))
(constantly nil)))
Is it possible to make public BinaryIO protocol or add nil
primitive codec?
Some codecs use null-terminated strings whose length isn't known in advance, which is very awkward to parse at the moment. An optional :suffix
or :terminator
argument to string
and/or repeated
would be very useful.
Would you have any objection to changing constant
and enum
to throw exceptions rather than assertion errors?
(Related to Issue #3)
I'm having trouble encoding a fixed-length string sequence while omitting the final separator:
(defn fixed-string-seq [size]
(padding (repeated (string "UTF-8" :separator 0)) size))
(let [out (java.io.ByteArrayOutputStream.)]
(encode (fixed-string-seq 11) out ["abc" "def" "ghi"]))
; IllegalArgumentException Data should be max. 11 bytes, but attempting to write 0 bytes more! org.clojars.smee.binary.core/padding/reify--1466 (core.clj:302)
The content should be exactly 11 bytes without the trailing null, but it seems the encoder doesn't like this.
---
src/org/clojars/smee/binary/core.clj | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/src/org/clojars/smee/binary/core.clj b/src/org/clojars/smee/binary/core.clj
index e85f6f7..2223ffe 100644
--- a/src/org/clojars/smee/binary/core.clj
+++ b/src/org/clojars/smee/binary/core.clj
@@ -52,7 +52,7 @@
:short (primitive-codec .readShort .writeShort short :be)
:short-le (primitive-codec .readShort .writeShort short :le)
:short-be (primitive-codec .readShort .writeShort short :be)
-
+
:ushort (primitive-codec .readUnsignedShort .writeUnsignedShort int :be)
:ushort-le (primitive-codec .readUnsignedShort .writeUnsignedShort int :le)
:ushort-be (primitive-codec .readUnsignedShort .writeUnsignedShort int :be)
@@ -315,7 +315,7 @@ Flag names `null` are ignored. Bit count will be padded up to the next multiple
(fn [bytes] (set (map idx->flags (filter #(bit-set? bytes %) bit-indices)))))))
(defn header
- "Decodes a header using `header-codec`. Passes this datastructure to `header->body` which returns the codec to
+ "Decodes a header using `header-codec`. Passes this datastructure to `header->body-codec` which returns the codec to
use to parse the body. For writing this codec calls `body->header` with the data as parameter and
expects a value to use for writing the header information.
If the optional flag `:keep-header` is set, read will return a vector of `[header body]`
@@ -327,11 +327,11 @@ else only the `body` will be returned."
(let [header (read-data header-codec big-in little-in)
body-codec (header->body-codec header)
body (read-data body-codec big-in little-in)]
- (if keep-header?
- {:header header
+ (if keep-header?
+ {:header header
:body body}
body)))
- (write-data [_ big-out little-out value]
+ (write-data [_ big-out little-out value]
(let [body (if keep-header? (:body value) value)
header (if keep-header? (:header value) (body->header body))
body-codec (header->body-codec header)]
@@ -354,9 +354,9 @@ Example:
(encode (padding (repeated (string \"UTF8\" :separator 0)) :length 11 :truncate? true) outstream [\"abc\" \"def\" \"ghi\"])
=> ; writes bytes [97 98 99 0 100 101 102 0 103 104 105]
; observe: the last separator byte was truncated!"
- [inner-codec & {:keys [length
+ [inner-codec & {:keys [length
padding-byte
- truncate?]
+ truncate?]
:or {padding-byte 0
truncate? false}
:as opts}]
@@ -426,12 +426,12 @@ Example:
Object (toString [_] (str "<BinaryIO aligned, options=" opts ">"))))
-(defn union
+(defn union
"Union is a C-style union. A fixed number of bytes may represent different values depending on the
interpretation of the bytes. The value returned by `read-data` is a map of all valid interpretations according to
the specified unioned codecs.
Parameter is the number of bytes needed for the longest codec in this union and a map of value names to codecs.
-This codec will read the specified number of bytes from the input streams and then successively try to read
+This codec will read the specified number of bytes from the input streams and then successively try to read
from this byte array using each individual codec.
Example: Four bytes may represent an integer, two shorts, four bytes, a list of bytes with prefix or a string.
@@ -442,7 +442,7 @@ Example: Four bytes may represent an integer, two shorts, four bytes, a list of
:prefixed (repeated :byte :prefix :byte)
:str (string \"UTF8\" :prefix :byte)})"
[bytes-length codecs-map]
- (padding
+ (padding
(reify BinaryIO
(read-data [_ big-in _]
(let [arr (byte-array bytes-length)
@@ -480,14 +480,14 @@ Example: Four bytes may represent an integer, two shorts, four bytes, a list of
"An enumerated value. `m` must be a 1-to-1 mapping of names (e.g. keywords) to their decoded values.
Only names and values in `m` will be accepted when encoding or decoding."
(let [pre-encode (strict-map m lenient?)
- post-decode (strict-map (map-invert m) lenient?)]
+ post-decode (strict-map (map-invert m) lenient?)]
(compile-codec codec pre-encode post-decode)))
-#_(defn at-offsets
+#_(defn at-offsets
"Read from a stream at specific offsets. Problems are we are skipping data inbetween and we miss data earlier in the stream."
[offset-name-codecs]
{:pre [(every? #(= 3 (count %)) offset-name-codecs)]}
- (let [m (reduce (fn [m [offset name codec]] (assoc m offset [name codec])) (sorted-map) offset-name-codecs)]
+ (let [m (reduce (fn [m [offset name codec]] (assoc m offset [name codec])) (sorted-map) offset-name-codecs)]
(reify BinaryIO
(read-data [this big-in little-in]
(loop [pos (.size big-in), pairs (seq m), res {}]
@@ -495,7 +495,7 @@ Only names and values in `m` will be accepted when encoding or decoding."
res
(let [[seek-pos [name codec]] (first pairs)
_ (.skipBytes big-in (- seek-pos pos))
- obj (read-data codec big-in little-in)]
+ obj (read-data codec big-in little-in)]
(recur (.size big-in) (next pairs) (assoc res name obj))))))
(write-data [this big-out little-out values]
(throw :not-implemented)))))
@@ -513,7 +513,7 @@ Only names and values in `m` will be accepted when encoding or decoding."
bytes))
(write-data [this out _ _]
(.write ^OutputStream out (.getBytes ^String this)))
-
+
java.lang.String
(read-data [this big-in _]
(let [^bytes bytes (read-bytes big-in (count this))
@@ -522,7 +522,7 @@ Only names and values in `m` will be accepted when encoding or decoding."
res))
(write-data [this out _ _]
(.write ^OutputStream out (.getBytes ^String this)))
-
+
clojure.lang.ISeq
(read-data [this big-in little-in]
(map #(read-data % big-in little-in) this))
--
2.1.4
This library worked great for me, but it needs docs. ;) Some use examples, e.g. reading from a byte array, would be nice.
Would it be possible for repeated
to output a native array when given a primitive codec? This would be especially useful for codecs that include binary blobs, since wrapping each byte in a java.lang.Byte is quite wasteful.
A separate codec like repeated-prim
or even just bytes
would also work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.