farsightsec / pymtbl Goto Github PK
View Code? Open in Web Editor NEWPython extension module for libmtbl
License: Apache License 2.0
Python extension module for libmtbl
License: Apache License 2.0
Update to support Python 3.
Typo fix:
diff --git a/mtbl.pyx b/mtbl.pyx
index b15c853..82365c9 100644
--- a/mtbl.pyx
+++ b/mtbl.pyx
@@ -586,7 +586,7 @@ cdef class sorter(object):
Keyword arguments:
temp_dir -- temporary directory (default "/var/tmp")
- max_memory -- maxmimum amount of memory for in-memory sorting in bytes (default 1 GB)
+ max_memory -- maximum amount of memory for in-memory sorting in bytes (default 1 GB)
"""
cdef mtbl_sorter *_instance
cdef _lock
The code linked below will cause a segfault when using merger
objects to process files with duplicates in them.
https://gist.github.com/tsellers-r7/aab8a5fd99da8091308da336b643be25
The crashes can be triggered by multiple loops over .iteritems()
on one instance or by repeatedly creating new objects and then looping over .iteritems()
. In all cases the segfault seems to require the processing of multiple files where at least two of them have matching 'keys' which trigger the merge function.
The code contains 3 tests which can be run independently by uncommenting the call to the relevant function. The test(s) that cause the segfault will vary depending on the operating environment and pymtbl version. test1
and test2
will segfault on Ubuntu 16.04.01 LTS with the latest version of pymtbl and mtbl from Github. test3
will segfault on Ubuntu 14.04 using pymtbl 0.3.0 and mtbl 0.6.0.
Failure seems to be highly situational and depends on the operating environment. Sometimes test1
and test2
will fail on the first iteration of the loop, other times it will be on loop 2 or 3. Adjusting the contents of the padding
function, which isn't called anywhere, can impact when this failure occurs. Another example of this is that in test3
the number of loops/length of file name will change depending on if the debugging print statement on line 93 is comment out or not.
Steps to reproduce the environment on Ubuntu 16.04.1 LTS:
sudo apt-get install build-essential liblz4-1 liblz4-dev python git dh-autoreconf pkgconf libsnappy1v5 libsnappy-dev zlib1g-dev
mkdir testing
cd testing
git clone https://github.com/farsightsec/mtbl.git
(cd mtbl && ./autogen.sh && ./configure && make && sudo make install && sudo ldconfig)
sudo apt-get install python-pip
pip install cython
git clone https://github.com/farsightsec/pymtbl.git
(cd pymtbl && sudo python setup.py install)
save example as 'example.py' and make it executable
mkdir data
rm ./data/*.mtbl & ./example.py
Sample output
Loop # 1
Enumerate contents of the 'my_merger' object:
key: a, value:1
Loop # 2
Enumerate contents of the 'my_merger' object:
Exception TypeError: 'mtbl.iteritems.__next__ (mtbl.c:3106)() takes no arguments (3 given)' in 'mtbl.merge_func_wrapper' ignored
Loop # 3
Enumerate contents of the 'my_merger' object:
Exception TypeError: 'mtbl.iteritems.__next__ (mtbl.c:3106)() takes no arguments (3 given)' in 'mtbl.merge_func_wrapper' ignored
Loop # 4
Enumerate contents of the 'my_merger' object:
Exception TypeError: 'mtbl.iteritems.__next__ (mtbl.c:3106)() takes no arguments (3 given)' in 'mtbl.merge_func_wrapper' ignored
Loop # 5
Enumerate contents of the 'my_merger' object:
Exception TypeError: 'mtbl.iteritems.__next__ (mtbl.c:3106)() takes no arguments (3 given)' in 'mtbl.merge_func_wrapper' ignored
[1]+ Done rm ./data/*.mtbl
Segmentation fault (core dumped)
Sample output from test3
which increments the length of one of the input file's name.
Testing file len: 46 and name: ./data/123456789_123456789_123456789_1234.mtbl
DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_1234.mtbl
DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
key: a, value:1
Testing file len: 47 and name: ./data/123456789_123456789_123456789_12345.mtbl
DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_12345.mtbl
DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
key: a, value:1
Testing file len: 48 and name: ./data/123456789_123456789_123456789_123456.mtbl
DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_123456.mtbl
DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
key: a, value:1
Testing file len: 49 and name: ./data/123456789_123456789_123456789_1234567.mtbl
DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_1234567.mtbl
DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
key: a, value:1
Testing file len: 50 and name: ./data/123456789_123456789_123456789_12345678.mtbl
DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_12345678.mtbl
DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
Segmentation fault (core dumped)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.