Giter VIP home page Giter VIP logo

mime-detective's People

Contributors

andersonpimentel avatar muraad avatar wewebber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mime-detective's Issues

XML File Type

Hi,

I have tried to use the below Definition to get the Mimetype for an XML file. I looked at the MimeTypes.cs for the XML ByteArray definition as below in DEC format instead of the HEX, but I am not getting the correct ByteArray Sequence for some XML file cases

private static readonly byte[] XML = { 114, 115, 105, 111, 110, 61, 34, 49, 46, 48, 34, 63, 62 }; // converted from HEX to DEC from MimeTypes.cs

private static readonly byte[] XML = { 60, 99, 101, 114, 116, 105, 102, 105, 99, 97 };// this works in most cases except for some

Any suggestions?

what is the use of offset ?

I dont get why we have to use offset for example for doc files. because the 4 byte of doc files are the same and we can use them to detect file types ???

.msg files are undetected

.msg files are undetected. I need to get application/vnd.ms-outlook mime type. Can you please add?

Thanks.

PDF files detected as plain/text

This example PDF file gets detected as text/plain when MaxHeaderSize first bytes are used for the detection: http://www.orimi.com/pdf-test.pdf

I would run the file signature detection before checking for plain text files.

public static FileType GetFileType(Func<byte[]> fileHeaderReadFunc, string fileFullName = "")
{
    // if none of the types match, return null
    FileType fileType = null;

    // read first n-bytes from the file
    byte[] fileHeader = fileHeaderReadFunc();

    // compare the file header to the stored file headers
    foreach (FileType type in types)
    {
        int matchingCount = GetFileMatchingCount(fileHeader, type);
        if (matchingCount == type.Header.Length)
        {
            // check for docx and xlsx only if a file name is given
            // there may be situations where the file name is not given
            // or it is unpracticable to write a temp file to get the FileInfo
            if (type.Equals(ZIP) && !String.IsNullOrEmpty(fileFullName))
                fileType = CheckForDocxAndXlsx(type, fileFullName);
            else
                fileType = type;    // if all the bytes match, return the type

            break;
        }
    }

    if (fileType == null)
    {
        // nothing found yet; maybe just plain text?
        // checking if it's binary (not really exact, but should do the job)
        // shouldn't work with UTF-16 OR UTF-32 files
        if (!fileHeader.Any(b => b == 0))
        {
            fileType = TXT;
        }

        // this would be the place to add detection based on file extension e.g. .csv

    }

    return fileType;
}

Undetected PDF that can still be opened in pdf reader.

Hi.
Recently I received a pdf document that was not corrupt and could be opened in a pdf reader but was not detected as a pdf by Mime-Detective.
The pdf standard says that a pdf document should start with the magic number and a version number. See 'Technical overview - File structure' here: https://en.wikipedia.org/wiki/PDF But the document that I received started with a new line and this òÀ� followed by the magic number and version number. You can replicate this by taking any working pdf document and adding it to the beginning of the file in a text editor. Setting the pdf type offset to 4 makes Mime-Detective detect it as a pdf since it skips the added gibberish.
The issue here is, since pdf readers can safely open such documents, shouldn't Mime-Detective detect it as a valid pdf document?
The problem seems to be in the GetFileMatchingCount method in MimeTypes class. It expects the header to be the first thing it sees and breaks out immediately.
Cheers!

Position into the stream not reset

When using GetFileType, the position into the stream is modified.
I had an issue trying to find out why my stream was suddenly considered as being 0 byte long, as after using FetFileType, I immediatly tried to upload it somewhere.
I suggest we could save the current position into the stream and update the position of the stream afterwords, when finishing copying the data from the stream :
in MimeType.cs

  • line 214 : long currentPosition = stream.Position;
  • line 224 : stream.Position = currentPosition;

Support svg

can you please add support to sniff svg? thats the only thing missing to use this package for image content format sniffing.

GPL 2 license. Evaluate possibility to switch to MIT/Apache 2.0

My "fork" started as a short hack, but since mentioned at StackOverflow it gets some kind of attention.

The original code from https://filetypedetective.codeplex.com/ is GPL2 licensed. That means you have to be open source too if you use this, EXCEPT you use it only internally and don´t distribute it.

GPL 2 is "problematic" for many use cases i think. You have to be carefull and know what you are doing.
Switching to MIT or Apache 2.0 would make the usage much easier. But we need permission to do this from trailmax (https://www.codeplex.com/site/users/view/trailmax)

@trailmax if you have time, can you comment the issue and tell me your thoughts about this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.