Giter VIP home page Giter VIP logo

Comments (28)

mojotx avatar mojotx commented on May 16, 2024

It's definitely something in libmagic1 5.09-2. I downloaded the source for libmagic 5.10 and 5.15 (most recent) from ftp://ftp.astron.com/pub/file/ and it started working. Strange.

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

Apparently libmagic1 5.09-2 isn't detecting the mime type, so python-magic is treating it as an error and throwing the "MagicException". I wrote my own little C test program using libmagic1, and version 5.11-2ubuntu4 of libmagic1 is returning "application/vnd.ms-excel; charset=binary" as the mime type, but version 5.09-2 is returning the mime type set to "; charset=binary".

I suppose I will have to treat all instances of this MagicException as an "unable to determine mime type" error. Either that or fork my own copy of python-magic and modify the errorcheck_null method to return "application/octet-stream" if the MIME type cannot be determined (which is probably a more RFC-compliant solution). For what it's worth, libmagic1 5.09-2 is correctly identifying it as an Excel spreadsheet, it's just not finding the mime-type for this particular file. I suspect it's a bug in libmagic1.

As another test, I used "dd if=/dev/random" to just create a big file of random binary data, and fed THAT to my program. It correctly identified it as "application/octet-stream", so I don't know why it's choking on this Excel spreadsheet.

My short-term fix will be to modify errorcheck_null to return "application/octet-stream" if result is None.

from python-magic.

ahupp avatar ahupp commented on May 16, 2024

Sigh. This is a new error checking path that is (supposed) to more closely adhere to the documented error behaviour.

"The magic_buffer(), magic_getpath(), and magic_file(), functions return a string on success and NULL on failure. "

The number in the args array is a pointer to the magic_t that was created when you initially initialized the library.

I just pushed a hypothetical fix for this issue to master, take a look at 75eab74.

Does this work for you?

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

Looks like there's a bug. You are looking for self.flags in line 88 of magic.py, and that's not a attribute of the Magic class.

$ ./test_magic.py fail
Traceback (most recent call last):
  File "./test_magic.py", line 35, in <module>
    mime_type = magic.from_file(filename, mime=True)
  File "/usr/local/lib/python2.7/dist-packages/python_magic-0.4.6-py2.7.egg/magic.py", line 132, in from_file
    return m.from_file(filename)
  File "/usr/local/lib/python2.7/dist-packages/python_magic-0.4.6-py2.7.egg/magic.py", line 82, in from_file
    return self._handle509Bug(e)
  File "/usr/local/lib/python2.7/dist-packages/python_magic-0.4.6-py2.7.egg/magic.py", line 88, in _handle509Bug
    if e.message is None and (self.flags & MAGIC_MIME):
AttributeError: Magic instance has no attribute 'flags'

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

Changing flags to self.flags in the "init()" method of class Magic in magic.py works, in that it returns "application/octet-stream" as the mime type.

from python-magic.

ahupp avatar ahupp commented on May 16, 2024

This should be fixed now. Thanks for the report and help with the fix!

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

Are you going to bump the setup.py and make this an official change, or do I need to continue pulling the code from GitHub?

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

Bump. Can this be released to PyPI please?

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

This or something super similar is still an issue on 5.15 on ArchLinux. I am experiencing this issue with an empty file.

In my case the exception returned does not actually have a message property, so I get a double wammy

Traceback:

$ python reproduce_bug.py 
Traceback (most recent call last):
  File "/tmp/python-magic-bug/libs/python-magic/magic.py", line 67, in from_buffer
    return magic_buffer(self.cookie, buf)
  File "/tmp/python-magic-bug/libs/python-magic/magic.py", line 227, in magic_buffer
    return _magic_buffer(cookie, buf, len(buf))
  File "/tmp/python-magic-bug/libs/python-magic/magic.py", line 180, in errorcheck_null
    raise MagicException(err)
magic.MagicException: None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "reproduce_bug.py", line 10, in <module>
    mime_magic.from_buffer(blob_chunk)
  File "/tmp/python-magic-bug/libs/python-magic/magic.py", line 69, in from_buffer
    return self._handle509Bug(e)
  File "/tmp/python-magic-bug/libs/python-magic/magic.py", line 88, in _handle509Bug
    if e.message is None and (self.flags & MAGIC_MIME):
AttributeError: 'MagicException' object has no attribute 'message'

OS and library version:
ArchLinux
libmagic verion: 5.15-1

How to reproduce:

Create virtualenv with latest python-magic checkout

Create empty file

$ touch empty_file

Create reproduce_bug.py

import magic
mime_magic = magic.Magic(mime_encoding=True)
empty_file = open("empty_file")
blob_chunk = empty_file.read()
mime_magic.from_buffer(blob_chunk)
empty_file.close()

Run test case

$ python ./reproduce_bug.py

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

@goodwillcoding

Out of curiosity, I ran your test on two Ubuntu versions:

  • Ubuntu 13.10, libmagic 5.11-2ubuntu4
  • Ubuntu 12.04 LTS, libmagic1 5.09-2

In both cases I had the latest Git version of python-magic, but I don't get an exception, but I get "None" returned when it analyzes the blob_chunk.

Changed your code to print result:

import magic, pprint
mime_magic = magic.Magic(mime_encoding=True)
empty_file = open("empty_file")
blob_chunk = empty_file.read()
pprint.pprint( mime_magic.from_buffer(blob_chunk) )
empty_file.close()

Here's the output:

$ python reproduce_bug.py 
None

Also, in both cases, running the "file" command gave this mime type:

$ file --mime-type empty_file 
empty_file: inode/x-empty

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

this works for me as well so clearly file is able to figure it out.

$ file --mime-type empty_file 
empty_file: inode/x-empty

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@mojotx

So a little more info. I am using python3. Under python2 I get the same output as you (probably should have mentioned it before but skipped my mind)

The are apparently python binding included with "file" distribution here: ftp://ftp.astron.com/pub/file/
extract the file and look in ./python

The binding were able to identify the file as empty just fine, so I suspect libmagic works fine it might be some API that changed and python-magic is tripping up on it.

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@mojotx
FYI, the file python binding are able to identify the file correctly under python 2.7 and 3.3

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@mojotx
okie so for the record the example python binding included with file also return result as None when its trying to read the mimetype using from_buffer.

python-magic on the other hand tries to handle the None return as if is an error. It then fails to get the proper error using magic_error(args[0]) on line 179. However the magic_error also returns None as error, which when passed in as param to MagicException and handled byt _handle509bug errors out since None does not have a property of message.

"file" utility however returns mimetype as inode/x-empty, something is wrong somewhere.

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

Ok, I think I located the bug in libmagic ... the file_buffer function does not handle the case when flag is set to MAGIC_MIME_ENCODING. I am going to try to locate the maintainer and see if my fix is correct or its late and I am tripping.

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@mojotx

I might have found a bug and fix in libmagic

See: https://github.com/glensc/file/blob/master/src/funcs.c#L176

Line should probably be this:

if ((!mime || (mime & MAGIC_MIME_TYPE) || (mime & MAGIC_MIME_ENCODING)) &&

This way when python-magic has mime_encoding=True it still works

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

@goodwillcoding

You can report the bug here:

http://bugs.gw.com/my_view_page.php

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@mojotx @ahupp

So I was able to reproduce the condition using file.

Determine mime-encoding gives the same error

$ file --special-files --mime-encoding empty_file 
empty_file: ERROR: (null)

However, running

$ file --mime-encoding empty_file 
empty_file: binary

"--special-files" is a flag to tell file to skip "stat" style detection, treat the file as normal and let it be read into a buffer and analized, so as far as I can tell it uses magic_buffer rather then magic_file.

There is a similar but more extensive condition with a 1-byte file

Create 1-byte file

echo -n 1 > onebytefile

$ file --special-files --mime-encoding onebytefile 
onebytefile: ERROR: (null)

$ file --mime-encoding onebytefile 
onebytefile: ERROR: (null)

In short magic is not actually detecting mime-encoding for file that are 0 or 1 byte in length when using buffer for either case. In the case of 1-byte file there is no "stat" detection as well.

So my take away is that handling for mime-encoding for these 3 conditions needs to be added by libmagic.
I am going to try to investigate this more and file a bug (and maybe a potential fix).

Now to the handle509 fix. I get 2 errors when I run 5.15 against empty (and now 1 byte file). I am not sure what case exactly _handle509bug handles (since I was not reproducing it) but would suggest it is a bit broad.

if e.message is None and (self.flags & MAGIC_MIME):
    return "application/octet-stream"

For one it returns "application/octet-stream" for both MAGIC_MIME_TYPE and MAGIC_MIME_ENCODING since
MAGIC_MIME = MAGIC_MIME_TYPE|MAGIC_MIME_ENCODING

So probably a more precise fix
if e.message is None and (self.flags & MAGIC_MIME_TYPE):

Additonally in the case I experienced the "err passed into MagicException) is None since apparently this is not an error that libmagic handles (which might be its own bug), As such _handle509Bug method actually causes its own bug

So a the full, more precise fix might be:

class MagicException(Exception):
    def __init__(self, magic_err, *args, **kwargs):
        self.magic_err = magic_err
        Exception.__init__(self, magic_err, *args, **kwargs)

And in _handle509Bug:

    if e.magic_err is not None and e.message is None and (self.flags & MAGIC_MIME):
            return "application/octet-stream"

If I may ask was a there a bug filed with libmagic to handle condition described in _handle509Bug?

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

let me know if the updated fix above is more reasonable and I'll do a pull request.

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@mojotx @ahupp
One more question. Why is the mimetype returned in _handle509Bug "application/octet-stream"? Does this happen only for binary files, Can you maybe provide a the file that is throwing the exception?

I ask because python-magic is now doing a job of libmagic which can be problematic if all usecases are not flushed out.

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

Additionally if the fix only handles 5.09 issues we should check the magic lib for that version.

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

@goodwillcoding In the case of my particular bug (certain Microsoft Office documents not being identified) the problem was resolved in later versions of file/libmagic. I compiled 5.15 from source and that "fixed" my particular problem. I've filed a bug with Canonical/Launchpad (https://bugs.launchpad.net/ubuntu/+source/file/+bug/1243938) asking that the Ubuntu 12.04 LTS version of file and libmagic1 be updated, but I filed no bug report with the maintainers of actual file/libmagic source.

It sounds like you've found some further bugs, with 5.15, and it sounds entirely reasonable to file a bug report with them. Once you file it, if you need assistance with people chiming in saying "me too!" to get it fixed, just let me know. Unfortunately, when it comes to open source projects, the squeaky wheel is often the only one greased.

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@mojotx @ahupp

Bug filed with file/libmagic maintainers: http://bugs.gw.com/view.php?id=294

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

@goodwillcoding I left a "me too" note on your bug report.

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@mojotx thank you.

By the way do you know which magic version fixed your bug? Can you check please? Or give me a sample file to check it on.

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

I can't share the sample file. I compiled 5.10 from source and that worked, using a C program linked with libmagic1.

Here's the C code I used. I had to copy the magic.h file from the source (there is no libmagic1-dev), but I just passed the Windows document name as an argument.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include "magic.h"

#if 0
#define MY_MAGIC_FLAGS (MAGIC_NONE|MAGIC_DEBUG|MAGIC_MIME)
#else
#define MY_MAGIC_FLAGS (MAGIC_NONE|MAGIC_MIME)
#endif


int main(int argc, const char **argv)
{
    magic_t cookie;
    const char *fileInfo;
    const char **fp=argv;

    if (argc<2) {
        fprintf(stderr, "Usage:  %s [filename1 . . . filenameN]\n", *fp );
        return 0;
    }

    if ((cookie=magic_open( MY_MAGIC_FLAGS ) ) == NULL ) {
        int err=errno;
        const char *me = magic_error( cookie );
        fprintf( stderr, "Error opening the magic file:  %s (%s)\n", strerror(err), (me ? me : "NULL"));
        return -1;
    }

    if ((magic_load(cookie, NULL )) != 0) {
        int err=errno;
        const char *me = magic_error( cookie );
        fprintf( stderr, "Error opening the magic database:  %s (%s)\n", strerror(err), (me ? me : "NULL"));
        return -1;
    }

    for (++fp; *fp;  ++fp ) {
        printf( "processing %s\n", *fp );


        if ((fileInfo=magic_file( cookie, *fp )) == NULL ) {
            int err=errno;
            const char *me = magic_error( cookie );
            fprintf( stderr, "Error analyzing the file %s:  %s (%s)\n", *fp, strerror(err), (me ? me : "NULL"));
            return -1;
        }
        printf( "fileInfo=\"%s\" %s\n", fileInfo, *fp );
    }

    magic_close( cookie );

    return 0;
}

from python-magic.

mojotx avatar mojotx commented on May 16, 2024

Also, I used C to prove that it was an issue with the actual libmagic library, and not something in python-magic. My bug was really two separate issues:

  • A bug in the libmagic1 implementation for Ubuntu 12.04 LTS, not handling some common file types correctly. Since my issue appears to have been fixed with 5.10, I didn't bother the maintainers of file/libmagic, but instead filed the Launchpad bug to try to get a fix pushed for 12.04 LTS.
  • A bug in python-magic, not handling the libmagic1 bug gracefully

from python-magic.

goodwillcoding avatar goodwillcoding commented on May 16, 2024

@ahupp @mojotx

Based on the above conversation I am doing a pull request that still accommodates the problem described by @mojotx but does not cause problems for when other libmagic errors like the one I filed on empty/1-byte files occur. It is the same fix but it only runs when the right conditions are mode like:

  • magic version < 510
  • when magic_error actually returns an error and its message can be checked.

@mojotx can you test the fix using 509, your file and my feature branch here: https://github.com/goodwillcoding/python-magic/tree/updated_509_fix

from python-magic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.