Giter VIP home page Giter VIP logo

binaryparse's Introduction

binaryparse

This module implements a macro to create binary parsers. The parsers generated reads from a Stream and returns a tuple with each named field. The general format the macro takes is:

[type]<size>: <name>[options]

Where optional fields are in [] brackets and required fields are in <> brackets. Each field has separate meanings, as described in the table below:

Name Description
type This is the type of value found in this field, if no type is specified then it will be parsed as an integer. Supported types are u to get unsigned integers, f for floating point, s for strings, and * for custom parser.
size The size, in bits, of the field to read. For uint and int values from 1 to 64 inclusive are supported. For floats only 32 and 64 are supported. Strings use this field to specify the amount of characters to read into the string. If they don't specify a size they will be read to the first NULL byte (this only applies to strings). When the custom parser type is specified the size field is used to name the custom parser procedure.
name The name of the value, this will be used as the name in the resulting tuple. If the value doesn't need to be stored one can use _ as the name and it will not get a field in the result.
options These will change the regular behaviour of reading into a field. Since they are so different in what they do they are described below instead of in this table.

Many binary formats include special "magic" sequences to identify the file or regions within it. The option = <value> can be used to check if a field has a certain value. If the value doesn't match a MagicError is raised. Value must match the value of the field it checks. When the field is a string type the exact length of the magic string is read, to include a terminating NULL byte use \0 in the string literal.

To read more fields of a certain kind into a sequence you can use the option [[count]] (that is square brackets with an optional count inside). If no count is specified and the brackets left empty it must be the last field or the next field needs to be a magic number and will be used to terminate the sequence. If it is the last field it will read until the end of the stream. As count you can use the name of any previous field, literals, previously defined variables, or a combination. Note that all sequences are assumed to terminate on a byte border, even if given a statically evaluatable size.

Another thing commonly found in binary formats are repeating blocks or formats within the format. These can be read by using a custom parser. Custom parsers technically supports any procedure that takes a Stream as the first argument, however care must be taken to leave the Stream in the correct position. You can also define the inner format with a parser from this module and then pass that parser to the outer parser. This means that you can easily nest parsers. If you need values from the outer parser you can add parameters to the inner parser by giving it colon expressions before the body (e.g the call createParser(list, size: uint16) would create a parser proc (stream: Stream, size: uint16): <return type>). To call a parser use the * type as described above and give it the name of the parser and any optional arguments. The stream object will get added automatically as the first parameter.

When creating a parser you get a tuple with two members, get and put which is stored by a let as the identifier given when calling createParser. These are both procedures, the first only takes a stream (and any optional arguments as described above) and returns a tuple containing all the fields. The second takes a stream and a tuple containing all the fields, this is the same tuple returned by the get procedure and writes the format to the stream.

Example: In lieu of proper examples the binaryparse.nim file contains a when isMainModule() block showcasing how it can be used. The table below describes that block in a bit more detail:

Format Description
u8: _ = 128 Reads an unsigned 8-bit integer and checks if it equals 128 without storing the value as a field in returned tuple
u16: size Reads an unsigned 16-bit integer and names it size in the returned tuple
4: data[size*2] Reads a sequence of 4-bit integers into a data field in the returned tuple. Size is the value read above, and denotes the count of integers to read.
s: str[] Reads null terminated strings into a str field in the returned tuple. Since it's given empty brackets the next field needs to be a magic field and the sequence will be read until the magic is found.
s: _ = "9xC\0" Reads a non-null terminated string and checks if it equals the magic sequence.
*list(size): inner Uses a pre-defined procedure list which is called with the current Stream and the size read earlier. Stores the return value in a field inner in the returned tuple.
u8: _ = 67 Reads an unsigned 8-bit integer and checks if it equals 67 without storing the value.

This file is automatically generated from the documentation found in binaryparse.nim. Use nim doc2 binaryparse.nim to get the full documentation.

binaryparse's People

Contributors

pmunch avatar sealmove avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

binaryparse's Issues

bug on simple parsing?

ran into it while trying this library for today's aoc:

import binaryparse, streams

block:
  let test1 = newStringStream("\xD2\xFE\x28")

  createParser(packet):
    u3: version
    u3: typeId
  
  var data = packet.get(test1)
  dump data

outputs:

data = (version: 6, typeId: 7)

but typeId should be 4.

For comparison binarylang has the correct output:

import binarylang

block:
  let test1 = newStringBitStream("\xD2\xFE\x28")

  struct(packet):
    u3: version
    u3: typeId

  var data = packet.get(test1)
  # data is an object without a $ proc defined
  echo data.version
  echo data.typeId

outputs:

6
4

Internal error when specifying a count

Hello, I'm trying a simple exmple with the following code:

import streams, binaryparse

var strm = newFileStream("demodado.tem",fmRead)

createParser(simple):
  u16: field1[2]

echo simple.get(strm)

When introducing any count on field1 the following error occurs:

\binaryparse-0.2.2\binaryparse.nim(476, 22) Error: internal error: environment misses: :tmp

Omitting the "[2]" the compilation runs fine.

[Question] Unexpected results

I have the following in python:

import struct

a = 0x95006c08 
print( a & 0x3FF )           # First: 10 bits  --> 8
print( (a >> 10) & 0x1FFF )  # Second: 13 bits --> 27

And I am trying to do the equialent with binaryparse:

import binaryparse
import streams

createParser(p):
  10: a
  13: b 

var 
  a = 0x95006C08 
  s = newStringStream()

s.write(a)
s.setPosition(0)

var tmp = p.get(s)
echo tmp.a  # --> 33 (instead of 8)
echo tmp.b  # --> 2688 (instead of 27)

What am I doing wrong? I am looking to replicate python's results.

By the way, binaryparse is AWESOME. I love it.

endian processing is wrong

basic test: (in x86 arch)

import binaryparse, streams

createParser(simple):
  lu32: size

var t: typeGetter(simple)
t.size = 1
let s = newStringStream("")
simple.put(s, t)
echo repr s.data
s.setPosition 0
echo simple.get(s)

got:

0000000000960060"\0\0\0\1"
(size: 16777216)

[suggestion] new "loop" section

I have a file which consists of a lot of entries, each of them has 4 int32 fields (so file content is like this: i32 i32 i32 i32 i32 i32 i32 i32 - in this example there's two entries), but there's no thing like "size" in the file.

Of course I can easily read this file with streams module, but maybe it would be good for binaryparse to have that functionality. Maybe it can be possible to implement "loop" section or something like that.

Custom encoder does not support extra parameters

When creating a custom parser with more parameters than just the stream, binaryparse expects an encoder signature with only 2 parameters - the stream and the input. If you try to add more parameters it doesn't work. For example:

import streams
import binaryparse

proc parseCustom(stream: Stream; extra: int): tuple[a: int] =    
  result = (1,)

proc encodeCustom(stream: Stream; input: var tuple[a: int], extra: int) =    
  discard

let custom = (get: parseCustom, put: encodeCustom)

createParser(x):
  *custom(1): y

This errors with:

test.nim(12, 13) template/generic instantiation of `createParser` from here
binaryparse.nim(358, 15) Error: type mismatch: got <Stream, tuple[a: int]>
but expected one of:
proc (stream: Stream, input: var tuple[a: int], extra: int){.noSideEffect, gcsafe, locks: 0.}

How to use a socket as input

Is there a way of using a socket connection directly as a stream?
I only see examples of StringStream and FileStream but don't know how to get data from a socket

peekData*E is wrong

it will only read the first byte, since peek won't move the position...

binaryparse/binaryparse.nim

Lines 141 to 153 in 5f8f4a7

template peekDataBE*(stream: Stream, buffer: pointer, size: int) =
for i in 0..<size:
let tmp = cast[pointer](cast[int](buffer) + ((size-1)-i))
if stream.peekData(tmp, 1) != 1:
raise newException(IOError,
"Unable to peek the requested amount of bytes from file")
template peekDataLE*(stream: Stream, buffer: pointer, size: int) =
for i in 0..<size:
let tmp = cast[pointer](cast[int](buffer) + i)
if stream.peekData(tmp, 1) != 1:
raise newException(IOError,
"Unable to peek the requested amount of bytes from file")

Difference with stream read

I'm comparing the use of binaryparse and using streams directly.
Why does the following code does not result in the same output?

import streams, binaryparse

var strm = newFileStream("demodado.tem",fmRead)
strm.setPosition(0)

var datehour : array[7,uint16]

for i in 0..6:
 datehour[i] = strm.readUint16()

echo datehour

strm.setPosition(0)
createParser(simple):
  u16:field1[7]

echo simple.get(strm)
strm.close()

The echo results are:

[20, 2, 1990, 21, 44, 52, 22]
(field1: @[5120, 512, 50695, 5376, 11264, 13312, 5632])

parsing binary incorrectly

when reading in uint16s it appear to be reading them in backwards as all of the numbers I am getting seem to be bit shifted 8 to the left instead of where they should be. Maybe I am doing something wrong. Here is the code I am testing it on.

import streams
import binaryparse

createParser(las):
    s4:sig = "LASF"
    u16:id
    u16:encoding
    u32:data1
    u16:data2
    u16:data3
    s8:data4
    s1:major
    s1:minor
    s32:sysID
    s32:software
    u16:daycreated
    u16:yearcreated
    u16:headersize





var strm = newFileStream("NEONDSSampleLiDARPointCloud.las", fmread)

https://www.asprs.org/wp-content/uploads/2010/12/LAS_1_4_r13.pdf is the specifications for parsing.
and the dataset I am testing it on can be found here https://figshare.com/articles/dataset/NEON_Teaching_Data_LiDAR_Point_Cloud_las_Data/4307750

when I try manually parsing the code I am able to get correct numbers but the numbers I am getting from the generated parser are wrong. for ID the value should be 101 but i am getting 25856 instead. I can share the manual parser if that is needed. Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.