Giter VIP home page Giter VIP logo

pymtbl's People

Contributors

alesage avatar edmonds avatar hstern avatar jeffmurphy avatar massar avatar reedjc avatar rep avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pymtbl's Issues

typo in mtbl.pyx MTBL sorter docs

Typo fix:

diff --git a/mtbl.pyx b/mtbl.pyx
index b15c853..82365c9 100644
--- a/mtbl.pyx
+++ b/mtbl.pyx
@@ -586,7 +586,7 @@ cdef class sorter(object):
 
     Keyword arguments:
     temp_dir -- temporary directory (default "/var/tmp")
-    max_memory -- maxmimum amount of memory for in-memory sorting in bytes (default 1 GB)
+    max_memory -- maximum amount of memory for in-memory sorting in bytes (default 1 GB)
     """
     cdef mtbl_sorter *_instance
     cdef _lock

Segfault after multiple calls to merger objects

The code linked below will cause a segfault when using merger objects to process files with duplicates in them.

https://gist.github.com/tsellers-r7/aab8a5fd99da8091308da336b643be25

The crashes can be triggered by multiple loops over .iteritems() on one instance or by repeatedly creating new objects and then looping over .iteritems(). In all cases the segfault seems to require the processing of multiple files where at least two of them have matching 'keys' which trigger the merge function.

The code contains 3 tests which can be run independently by uncommenting the call to the relevant function. The test(s) that cause the segfault will vary depending on the operating environment and pymtbl version. test1 and test2 will segfault on Ubuntu 16.04.01 LTS with the latest version of pymtbl and mtbl from Github. test3 will segfault on Ubuntu 14.04 using pymtbl 0.3.0 and mtbl 0.6.0.

Failure seems to be highly situational and depends on the operating environment. Sometimes test1 and test2 will fail on the first iteration of the loop, other times it will be on loop 2 or 3. Adjusting the contents of the padding function, which isn't called anywhere, can impact when this failure occurs. Another example of this is that in test3 the number of loops/length of file name will change depending on if the debugging print statement on line 93 is comment out or not.

Steps to reproduce the environment on Ubuntu 16.04.1 LTS:

sudo apt-get install build-essential liblz4-1 liblz4-dev python git dh-autoreconf pkgconf libsnappy1v5 libsnappy-dev zlib1g-dev

mkdir testing
cd testing

git clone https://github.com/farsightsec/mtbl.git
(cd mtbl && ./autogen.sh && ./configure && make && sudo make install && sudo ldconfig)


sudo apt-get install python-pip
pip install cython

git clone https://github.com/farsightsec/pymtbl.git
(cd pymtbl && sudo python setup.py install)

save example as 'example.py' and make it executable

mkdir data
rm ./data/*.mtbl & ./example.py

Sample output

Loop # 1
    Enumerate contents of the 'my_merger' object:
        key: a, value:1

Loop # 2
    Enumerate contents of the 'my_merger' object:
Exception TypeError: 'mtbl.iteritems.__next__ (mtbl.c:3106)() takes no arguments (3 given)' in 'mtbl.merge_func_wrapper' ignored

Loop # 3
    Enumerate contents of the 'my_merger' object:
Exception TypeError: 'mtbl.iteritems.__next__ (mtbl.c:3106)() takes no arguments (3 given)' in 'mtbl.merge_func_wrapper' ignored

Loop # 4
    Enumerate contents of the 'my_merger' object:
Exception TypeError: 'mtbl.iteritems.__next__ (mtbl.c:3106)() takes no arguments (3 given)' in 'mtbl.merge_func_wrapper' ignored

Loop # 5
    Enumerate contents of the 'my_merger' object:
Exception TypeError: 'mtbl.iteritems.__next__ (mtbl.c:3106)() takes no arguments (3 given)' in 'mtbl.merge_func_wrapper' ignored
[1]+  Done                    rm ./data/*.mtbl
Segmentation fault (core dumped)

Sample output from test3 which increments the length of one of the input file's name.

Testing file len: 46 and name: ./data/123456789_123456789_123456789_1234.mtbl
    DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_1234.mtbl
    DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
key: a, value:1

Testing file len: 47 and name: ./data/123456789_123456789_123456789_12345.mtbl
    DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_12345.mtbl
    DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
key: a, value:1

Testing file len: 48 and name: ./data/123456789_123456789_123456789_123456.mtbl
    DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_123456.mtbl
    DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
key: a, value:1

Testing file len: 49 and name: ./data/123456789_123456789_123456789_1234567.mtbl
    DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_1234567.mtbl
    DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
key: a, value:1

Testing file len: 50 and name: ./data/123456789_123456789_123456789_12345678.mtbl
    DEBUG: Adding mtbl file: ./data/123456789_123456789_123456789_12345678.mtbl
    DEBUG: Adding mtbl file: ./data/first_fname.mtbl
Enumerate contents of the 'merger' object:
Segmentation fault (core dumped)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.