scott-griffiths / bitstring Goto Github PK

View Code? Open in Web Editor NEW

396.0 14.0 67.0 43.25 MB

A Python module to help you manage your bits

Home Page: https://bitstring.readthedocs.io/en/stable/index.html

License: MIT License

Python 100.00%

bitstring bit-manipulation python bitarray binary-data

bitstring's Issues

Right- and left-justified hex functions

It would be nice to have "hex"-like properties that work on BitStrings even 
when length is not evenly divisible by 4.  The new properties(rhex and lhex?) 
would ideally right or left justify the data (depending on the property 
called), pad with zeros to ensure length is multiple of 4, and then return 
the corresponding hex value.

>>> b = BitString('0b11')
>>> b.rhex
'0x3'
>>> b.lhex
'0xc'

Original issue reported on code.google.com by [email protected] on 8 Sep 2009 at 4:51

append/prepend could be more efficient

These are effectively the same:

a.append(b)

b.prepend(a)

but the performance could be wildly different, depending on how much data
has to be bit-shifted to the correct alignment before joining.

We should make append() and prepend() private, then the new append() and
prepend() can call whichever one is most appropriate for the input they're
given.

Original issue reported on code.google.com by [email protected] on 15 Jul 2009 at 2:09

Replacing BitString with self leads to unpredictable behaviour

>>> a = BitString('0b11')
>>> a.replace('0b1', a)
2
>>> a
BitString('0b11')

So nothing changed when replacing one bit with two. But using a copy
everything works:

>>> a.replace('0b1', a[:])
2
>>> a
BitString('0xf')

Original issue reported on code.google.com by [email protected] on 28 Jul 2009 at 3:34

Need deletebytes to complement deletebits

Have:
deletebits(bits, bitpos)

Need:
deletebytes(bytes, bytepos)

Original issue reported on code.google.com by [email protected] on 13 Feb 2009 at 12:16

Auto initialisation from file object

Using the filename= method of initialisation is fine, but we really should
accept a file object that has been opened elsewhere. This would be more in
keeping with standard library behaviour. So:

>>> f = open('somefile', 'rb')
>>> s = BitString(f)

would be roughly equivalent to

>>> s = BitString(filename='somefile')

except that you'd still have the file object in scope if you wanted it.

We should also add a tofile function, that also takes a file object as a
parameter. It should write the BitString to the file in chunks, to avoid it
being read into memory unnecessarily. This doesn't really come into its own
until the immutable arrays come along, but could still be useful for
copying a chunk of one file to another.

Original issue reported on code.google.com by [email protected] on 4 Aug 2009 at 8:59

Make join a member function?

Currently we have a module-level join function:

bitstring.join(bsl)

We could copy the standard library by implementing one that uses the
BitString instance as the separator:

BitString('0b1').join(bsl)  # Put a '1' bit between every item in bsl.

Easy enough to do, but would it actually be useful to anyone? And should we
deprecate the current join function?

Original issue reported on code.google.com by [email protected] on 18 Jun 2009 at 1:34

Multiplicative factors and groupings for tokens

For example:

a = pack('8*uint:16', range(8))

The '*' isn't strictly necessary but it makes the intent a fair bit
clearer. Brackets could also be employed:

b = BitString('3*(bit:1, uint:7)', '1', 34, '0', 12, '1', 33

and finally with a variable factor:

c = b.unpack('n*(bin:1, uint:7)', n=3)

Original issue reported on code.google.com by [email protected] on 30 Aug 2009 at 7:55

Slicing raises IndexErrors when the length of the BitString is exceeded.

Raising IndexError if a slice index is greater than the length of the
BitString was the intended behaviour, but doesn't match the usual sequence
slicing behaviour.

For slices with indices that exceed the length of the container they are
silently changed to be the length (this is only true for slices and not for
indexing).

Original issue reported on code.google.com by [email protected] on 5 May 2009 at 9:01

findall() should return a generator.

Much nicer if findall() returns a generator because if you only need the
first few results you won't need to find them all.

Original issue reported on code.google.com by [email protected] on 7 May 2009 at 4:44

Flexible 'interpret' function

Not sure if interpret is the best name. Serves a similar purpose to read or
peek, but doesn't depend on the current bitpos.

a, b, c = s.interpret('10:uint8, +5, hex4, 100:se')

is equivalent to:

s.bitpos = 10
a = s.read('uint8')
s.bitpos += 5
b = s.read('hex4')
s.bitpos = 100
c = s.read('se')

Note that '+5' means advance 5 bits, '-5' means retreat 5 bits and '5'
means return next 5 bits as a BitString.

Could also allow one token to be indeterminate length. This would then
consume the rest of the BitString.

a, b, c = s.interpret('oct9, bin, uint12')

Note that if the first bit position isn't given then it defaults to zero.
Also I think that I might have to think carefully about what the flexible
size item does when multiple bit positions and movements are present...

If you need the rest of the BitString just as a BitString then use the
'rest' token (A better name?)

a, b = s.read('uint32, rest')

Here I'm using it in the read function, as I think it would work well there
too (and peek of course).

Original issue reported on code.google.com by [email protected] on 17 Jul 2009 at 5:36

Rotate functions


To complement the bit shift functions ( <<, <<=, >>, >>= ) it would be nice
to have some bit rotation functions.

s.ror(12) # rotate bits to the right by 12
s.rol(10) # rotate bits to the left by 10

>>> s = BitString('0b001111')
>>> s.ror(1)
BitString('0b100111')

It would be consistent to have startbit and endbit parameters too, which
then leaves open the question of whether startbit and endbit are needed for
the ordinary shift operations. As they couldn't be used for the operators,
we could have:

s.shl(bits, startbit, endbit) # a bit like s[startbit:endbit] << bits
                              # except it's done in-place.
s.slr(bits, startbit, endbit)

Original issue reported on code.google.com by [email protected] on 17 Jul 2009 at 10:10

Some properties such as hex, bin, int, can be computationally expensive

The python style guide suggests avoiding the use of properties where their
use could be computationally expensive.

I've mostly ignored that advice here so expressions like 

{{{
a = BitString()
a.bin = '10001001101'
print(a.bin)
}}}

can be more expensive than they look. At present it doesn't even cache the
binary representation.

Original issue reported on code.google.com by [email protected] on 21 Dec 2008 at 11:06

findall() should have a 'count' parameter

This would bring its interface closer to replace and split.

  list(a.findall(s, count=n))

should be equivalent to

  list(a.findall(s))[:n]


I think that there is also case for renaming the split function's maxsplit
parameter to count also. It's nice not to have to remember the difference...

Original issue reported on code.google.com by [email protected] on 2 Jun 2009 at 1:35

Problem with prepend when used with offsetted BitStrings.


>>> b = BitString(data='\x30', length=2, offset=2)
>>> b.prepend(b)
BitString('0x3')

It should be '0xf' (0b1111). Not good.

Original issue reported on code.google.com by [email protected] on 11 Jun 2009 at 8:00

Problems with very large files

There remain some problems analysing very large files. The magic number is
probably 4GB.

For example, using findbytealigned a BitString initialised with a filename
of a very large file may raise an OverflowError (at least it does for me).
Might be platform dependent.

Original issue reported on code.google.com by [email protected] on 16 Jan 2009 at 4:45

Convenience tokens for reading / construction.

Might be nice to have some shorthand for common types, for example:

byte -> bytes:1
bit -> bits:1
short -> uint:16
long -> uint:32
quad -> uint:64

etc.

Or we could go for something more snappy. These are lifted from Perl's pack:

c -> int:8
C -> uint:8
s -> int:16
S -> uint:16
l -> int:32
L -> uint:32
q -> int:64
Q -> uint:64

So we internally translate from

>>> s = bitstring.pack('C, l, Q', 10, 100, 1000)
>>> a, b, c = s.unpack('Q, C, l')

to 

>>> s = bitstring.pack('uint:8, int:32, uint:64, 10, 100, 1000)
>>> a, b, c = s.unpack('uint:64, uint:8, int:32')

Original issue reported on code.google.com by [email protected] on 13 Aug 2009 at 4:32

split() shouldn't return the whole BitString if maxsplit is set.

At the moment the whole of the remaining BitString is returned as the final
item from split(). If you have specified maxsplit, then this probably isn't
what you wanted (the final item could be huge!)

  list(a.split(delimiter, maxsplit=n))

should give the same result as

  list(a.split(delimiter))[:n]

(but hopefully much more quickly!)

Original issue reported on code.google.com by [email protected] on 2 Jun 2009 at 1:26

Some properties don't work for file-type BitStrings

You can't use .int, .uint, .se and .ue properties on BitStrings initialised
using filename.

The work-around is just to copy the whole BitString and use the copy.

Original issue reported on code.google.com by [email protected] on 22 Jan 2009 at 4:46

Modifying BitString initialised with filename fails

For example:
a = BitString(filename='foo')
a.append('0xff')

will fail.

Original issue reported on code.google.com by [email protected] on 18 Feb 2009 at 8:40

Allow initialisation with lists?

This may or may not make sense...

Allow lists to initialise BitString objects, by evaluating each element as
a bool. e.g.

>>> a = BitString([True, False, 7, [False], '0', 'hello', []])
>>> a.bin
'0b1011110'

Original issue reported on code.google.com by [email protected] on 27 Apr 2009 at 5:04

Enable step in slice() function

Summary says it all really. Want to be able to say

t = s.slice(a, b, c)

instead of having to use

t = s[a:b:c]

Original issue reported on code.google.com by [email protected] on 28 May 2009 at 10:09

Some new function ideas

Improvements to current ones:

find(bs, bytealigned=True, startbit=None, endbit=None)
split(delimiter, bytealigned=True, startbit=None, endbit=None)

And some new ones:

replace(old, new, bytealigned=True, startbit=None, endbit=None)
count(bs, bytealigned=True, startbit=None, endbit=None)
rfind(bs, bytealigned=True, startbit=None, endbit=None)

Original issue reported on code.google.com by [email protected] on 17 Mar 2009 at 12:21

unpack to dictionary

To try to get better symmetry between pack and unpack it would be nice to
return a dictionary with unpack.

>>> f = 'hex:32=start_code, uint:12=width, uint:12=height'
>>> s = pack(f, start_code='0x000001b3', width=352, height=288)
>>> s.unpack(f)
{'height': 288, 'start_code': '0x000001b3', 'width': 352}

Which is fine and lovely, but what happens if there is also a list being
returned?

>>> s.unpack('hex:32, uint:12, uint:12=height')

Should it return a tuple of a list and dictionary? Seems a bit extreme...

Original issue reported on code.google.com by [email protected] on 30 Aug 2009 at 8:06

Merged into: #88

Assertion from truncateend()

s = BitString('0b111')
s.truncatestart(2)
s.truncateend(1) # asserts

Original issue reported on code.google.com by [email protected] on 6 Apr 2009 at 11:30

Allow concatenated strings to be used to initialise

For example, allow

a = BitString('0b000b10b111') # a.bin == 0b001111
b = BitString('0xff0xe2')     # b.hex == 0xffe2

This will allow constructions like
a += '0b0' + '0b1' + '0b1110'
which currently fail as the strings are concatenated first.

Note that we can't combine '0x' and '0b' strings (unfortunately) because
'0b' is valid hex as well as being the binary indicator. Annoying that.

Original issue reported on code.google.com by [email protected] on 16 Feb 2009 at 2:32

Allow slice to be specified for reversebits()


i.e. reversebits(startbit, endbit) would reverse the bits in the slice
[startbit:endbit] in place.

This would let you write things like:

>>> a = BitString('0x01020408')
>>> for i in range(a.length/8):
...     a.reversebits(i*8, (i+1)*8)
>>> a.hex
'0x80402010'

Original issue reported on code.google.com by [email protected] on 27 Apr 2009 at 1:32

Add support for octal.

Would need to use the Python 3.0 notation (prefix of '0o' or '0O') rather
than the '0' prefix.

a = BitString('0o777')
b = BitString(oct='777')

Original issue reported on code.google.com by [email protected] on 16 Feb 2009 at 4:29

Multiple reads in one statement

Instead of 
>>> a = s.readbits(10)
>>> b = s.readbits(4)

Why not
>>> a, b = s.readbits(10, 4)

You could then also write things like 

>>> [x.uint for x in s.readbits(5, 6, 5)]

Would need to modify readbits, peekbits, readbytes, peekbytes (but not
peekbit etc.)

Original issue reported on code.google.com by [email protected] on 29 Jun 2009 at 3:08

Inventive use of stride when slicing

Currently using the stride is not allowed when slicing a BitString. This is
primarily because it just isn't very useful - each item is just a single bit.

Suggestion is to use the stride to indicate the *size* of the items being
sliced. For example using a stride of 8 would make the start and stop
indices into byte indices:

>>> a = BitString('0xabcdef')
>>> print a[0:16]
'0xabcd'
>>> print a[0:16:1]
'0xabcd'
>>> print a[0:2:8]
'0xabcd'
>>> print a[1:2:4]
'0xcd'

I think that the notation a[x:y:8] is cleaner than the equivalent (and
frequently used) a[x*8:y*8].

Negative strides are interesting too. a[::-1] would be the reversed bit
BitString, whereas a[::-8] would reverse the byte order.

What could possibly go wrong?

Original issue reported on code.google.com by [email protected] on 24 Apr 2009 at 3:57

Add support for more special methods

Some of these should be appropriate:

__invert__
__mul__
__lshift__
__rshift__
__hex__
__oct__
__imul__
__ilshift__
__irshift__
__setitem__

Original issue reported on code.google.com by [email protected] on 17 Feb 2009 at 2:53

Intelligent string parsing for initialisation

For example:

s += BitString(uint=12, length=8)

could be written as

s += 'uint8 12'

while

s = BitString('0x12') + BitString(ue=4) + BitString('0b1')

becomes

s = BitString('0x12, ue4, 0b1')

Lots of questions as to what the best format is. Separator could be ',' or
':' (or either). Is 'ue4' better than 'ue=4' or 'ue 4'?

Of course the one that wouldn't work is the 'data' initialiser, as it would
be impossible to work out when the data ended...

Original issue reported on code.google.com by [email protected] on 28 Jun 2009 at 8:50

split() doesn't always return initial bytes before delimiter

In particular if there are no bytes before the delimiter then it should
yield an empty BitString as the first item, but it fails to do so.

It could just be the documentation for split() that is incorrect and the
behaviour is intended.

Original issue reported on code.google.com by [email protected] on 7 Jan 2009 at 2:36

Data is often copied between BitStrings when a reference would suffice.

Many operations that return a new BitString don't alter the underlying data
in any way, often just needing a slice of it. Currently the data is always
copied, which could be rather expensive in some cases.

Suggestion is to improve memory and computational efficiency by allowing a
BitString's internal byte data store to reference another BitString's data
rather than taking a copy.

Original issue reported on code.google.com by [email protected] on 17 Jan 2009 at 10:54

[deleted issue]

[deleted issue]

oct() and hex() should be deprecated.

Rationale: In Python 2.6 there's also a bin() function, but it can't be
overloaded in the same way as oct() and hex() (i.e. treating leading zeros
as significant).

In Python 3.0 it's even worse as the oct() and hex() won't work either.

Overall I think it's better to have a consistent interface across
hex/oct/bin as well as across Python 2.x/3.x, so the only way to go is to
get rid of hex() and oct().

Original issue reported on code.google.com by [email protected] on 16 Jun 2009 at 4:50

Intelligent string parsing for reading.

Rather than

>>> h = s.readbits(12).hex

use

>>> h = s.read('hex12')

Then we can start joining them:

>>> start_code, width, height = s.read('hex32, uint12, uint12')

Needs to work for peek() as well as read() of course.

Original issue reported on code.google.com by [email protected] on 29 Jun 2009 at 3:02

More concise creation from binary or hexidecimal

It would be nice to be able to use the '0x' and '0b' prefixes to specify
hex and binary without the explicit initialiser. For example:
s = BitString('0xff')
t = BitString('0b0001')
instead of
s = BitString(hex='0xff')
t = BitString(bin='0b0001')

Also, this could be used in functions that require a BitString argument:
s.append('0b0')
t.findbytealigned('0x47')
instead of
s.append(BitString(bin='0'))
t.findbytealigned(BitString(hex='0x47'))

Original issue reported on code.google.com by [email protected] on 13 Feb 2009 at 11:35

Library only compatible with Python 2.5/2.6. Should pass tests for 2.4 too.

Some code won't run under Python 2.4.

In particular the 'a if c else b' construction is used.

It wouldn't be too much work to get the unit tests to pass for python 2.4.

Original issue reported on code.google.com by [email protected] on 21 Jan 2009 at 5:14

Endianness reading and construction

The int, uint properties are bit-wise big-endian. To support other
endianness suggest we add:

intle - little endian int. Must be a multiple of 8 bits long
uintle - little endian uint. Must be a multiple of 8 bits long
intbe - synonym for int
uintbe - synonym for uint


Suggest that we don't add explicit support for bit-wise little-endian
interpretations. (use reversebits() or [::-1] slice)

Also having things like hexle or binle would just get very confusing!

For example:

s = BitString(intle=104, length=16)
(or)
s = BitString('intle16=104')
assert s.intle == 104
s.intle = 950
i = s.read('intle16')
assert i == 950
assert s[::-8].int == 950

Original issue reported on code.google.com by [email protected] on 16 Jul 2009 at 4:52

Can't set a length on a BitString initialised with a filename

Any length set when creating a file-based BitString will be ignored when,
for example, displaying the BitString as a hex string.

Original issue reported on code.google.com by [email protected] on 22 Jan 2009 at 4:48

Problems appending to BitStrings with offsets.


b = BitString(data='\x28\x28', offset=1)
b.append('0b0')  # asserts


It also fails for prepend, and probably more. The assert itself isn't all
that important so programs should still function if you use -O.

Need to add unit test to cover this!

Original issue reported on code.google.com by [email protected] on 12 Mar 2009 at 4:14

Problem when using length and offset when initialising with auto.

For example:

>>> s = BitString('0o777', length=1, offset=1)
>>> s
BitString('0b11')


...which clearly doesn't have a length of 1.

Original issue reported on code.google.com by [email protected] on 6 Jun 2009 at 7:32

Can't use setitem to insert

__setitem__ can be used to replace a slice of a BitString with another, but
can't be used to insert. This should be possible:

a = BitString('0x0011223344')
a[16:16] = '0xff'
print a              # 0x0011ff223344

But instead it raises an IndexError. Of course you can still use insert()
to do this.

Original issue reported on code.google.com by [email protected] on 27 Apr 2009 at 8:45

split() could also split into constant length chunks

The first parameter of split() could be an integer, which would then mean
that it would return a generator for constant sized chunks. For example

for byte in s.split(8):
  do_something_with(byte)

Original issue reported on code.google.com by [email protected] on 5 Jun 2009 at 8:30

Initialisation via 'filename' not working fully yet

The option the initialise a BitString with a filename isn't fully
implemented yet.

If you want to analyse a file the suggested method is still to do something
like:

s = BitString(data=open('filename', 'rb').read())

which obviously isn't going to work very well if the file is very large.

If you need to analyse 20GB files (as I occasionally do) then feel free to
try the filename initialiser, but the interface and functionality have yet
to be finalised.

Original issue reported on code.google.com by [email protected] on 21 Dec 2008 at 11:40

Assertion from prepend()

c = BitString('0x1122334455667788')
c.bitpos = 40
c.append('0b1').prepend('0x6666666') # asserts in _assertsanity()

Original issue reported on code.google.com by [email protected] on 20 Mar 2009 at 4:21

Allow slice initialisation from integers

Generally it's not possible to use integers to initialise a BitString
without providing a length, which means that it can be more cumbersome than
hex or bin initialisation.

However, if a slice is being specified then we already have a default
length so that this could make sense:

>>> a = BitString('0x000000')
>>> a[8:16] = 100
>>> print a
'0x006400'

If the signed or unsigned integer doesn't fit then a ValueError would be
raised.

Original issue reported on code.google.com by [email protected] on 1 May 2009 at 3:04

BitString.advancebits(0) is incorrectly proscribed in the doc string

What steps will reproduce the problem?
Python 2.6.1 (r261:67515, Jan 22 2009, 11:41:14) 
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bitstring
>>> bs = bitstring.BitString('0x900dbeef')
>>> bs.bitpos
0
>>> bs.advancebits(0)
>>> bs.bitpos
0
>>> 


What is the expected output? What do you see instead?
The docstring for advancebits() says:
     """Advance position by bits.

        bits -- Number of bits to increment bitpos by. Must be >= 0.

        Raises ValueError if bits is negative or if bitpos goes past the end
        of the BitString.
"""

The doc text for bits should read "Must be > 0." The last sentence is correct.

What version of the product are you using? On what operating system?
Path: .
URL: http://python-bitstring.googlecode.com/svn/trunk
Repository Root: http://python-bitstring.googlecode.com/svn
Repository UUID: 442ccf1e-c85e-11dd-94fd-9de6169c3690
Revision: 288
Node Kind: directory
Schedule: normal
Last Changed Author: python.bitstring
Last Changed Rev: 285
Last Changed Date: 2009-04-24 03:38:13 +1000 (Fri, 24 Apr 2009)


Please provide any additional information below.
This may be the most trivial issue I have ever raised...sorry. It's in good
faith, I promise.

I believe the same bug occurs in the docstrings for advancebytes and the
retreat* methods.

Original issue reported on code.google.com by [email protected] on 1 May 2009 at 4:49

Endianness changing functions.

Personally I dislike the name byteswap() (as used in the array module) as
it doesn't really say what's going on - i.e. which bytes are being swapped
with which. bytereverse() is closer to the truth.

Suggestion:

To change endianness of 2-byte data:

>>> s.reversebytes(size=2)

So base it on reversebits(), which could also change to have a size parameter.

def reversebytes(startbit=None, endbit=None, size=0)
def reversebits(startbit=None, endbit=None, size=0)

A size==0 implies that the whole slice just gets reversed, which is
backward compatible with the current reversebits().

Examples:
s = BitString('0x0011002200330044')
s.bytereverse()         # 0x4400330022001100
s.bytereverse(size=2)   # 0x1100220033004400
s.bytereverse(size=4)   # 0x2200110044003300
s.bytereverse(size=3)   # 0x001100330022 (the rest gets truncated)
s.bytereverse(size=1)   # Unchanged - no effect

I'm not sure I like the name of the 'size' parameter, but I can't think of
anything better right now.

Original issue reported on code.google.com by [email protected] on 10 Jul 2009 at 10:37

Exception on pack() with upper-case key

Example:

import bitstring
format = 'bits:4=BL_OFFT, uint:12=width, uint:12=height'
d = {'BL_OFFT': '0b1011', 'width': 352, 'height': 288}
s = bitstring.pack(format, **d)

No output expected. Instead, got a ValueException:
Traceback (most recent call last):
  File "trybs.py", line 4, in <module>
    s = bitstring.pack(format, **d)
  File "C:\Python26\lib\site-packages\bitstring.py", line 2663, in pack
    s.append(_init_with_token(name, length, value))
  File "C:\Python26\lib\site-packages\bitstring.py", line 101, in 
_init_with_tok
en
    b = BitString(value)
  File "C:\Python26\lib\site-packages\bitstring.py", line 576, in __init__
    func(d, offset, length)
  File "C:\Python26\lib\site-packages\bitstring.py", line 1115, in _setauto
    self.append(_init_with_token(*token))
  File "C:\Python26\lib\site-packages\bitstring.py", line 107, in 
_init_with_tok
en
    raise ValueError("Can't parse token name %s." % name)
ValueError: Can't parse token name bl_offt.

Note lower-case name 'bl-offt' in last line of output.

Changing the key to lower-case in the format and dictionary allowed the 
example to run.

I'm using r456 in Subversion. Python version 2.6.2 on Windows XP.

Original issue reported on code.google.com by [email protected] on 8 Sep 2009 at 2:19

scott-griffiths / bitstring Goto Github PK

bitstring's Issues

Recommend Projects

Recommend Topics

Recommend Org