muraad / mime-detective Goto Github PK

View Code? Open in Web Editor NEW

122.0 122.0 82.0 265 KB

Mime type for files.

License: MIT License

C# 100.00%

mime-detective's People

Contributors

Stargazers

Watchers

mime-detective's Issues

Add Unit Tests for content detection

IndexOutOfRangeException for short files

I create simple 1.txt file with simple short content: "1" and got the System.IndexOutOfRangeException.

XML File Type

Hi,

I have tried to use the below Definition to get the Mimetype for an XML file. I looked at the MimeTypes.cs for the XML ByteArray definition as below in DEC format instead of the HEX, but I am not getting the correct ByteArray Sequence for some XML file cases

private static readonly byte[] XML = { 114, 115, 105, 111, 110, 61, 34, 49, 46, 48, 34, 63, 62 }; // converted from HEX to DEC from MimeTypes.cs

private static readonly byte[] XML = { 60, 99, 101, 114, 116, 105, 102, 105, 99, 97 };// this works in most cases except for some

Any suggestions?

what is the use of offset ?

I dont get why we have to use offset for example for doc files. because the 4 byte of doc files are the same and we can use them to detect file types ???

.NET Standard 1.3 Port

Just a heads up I made a fork of this, to begin work on porting it to .NET Core... the fork is currently building on 1.0.0-RC2 using the netstandard1.5 definitions

https://github.com/clarkis117/Mime-Detective

.msg files are undetected

.msg files are undetected. I need to get application/vnd.ms-outlook mime type. Can you please add?

Thanks.

PDF files detected as plain/text

This example PDF file gets detected as text/plain when MaxHeaderSize first bytes are used for the detection: http://www.orimi.com/pdf-test.pdf

I would run the file signature detection before checking for plain text files.

public static FileType GetFileType(Func<byte[]> fileHeaderReadFunc, string fileFullName = "")
{
    // if none of the types match, return null
    FileType fileType = null;

    // read first n-bytes from the file
    byte[] fileHeader = fileHeaderReadFunc();

    // compare the file header to the stored file headers
    foreach (FileType type in types)
    {
        int matchingCount = GetFileMatchingCount(fileHeader, type);
        if (matchingCount == type.Header.Length)
        {
            // check for docx and xlsx only if a file name is given
            // there may be situations where the file name is not given
            // or it is unpracticable to write a temp file to get the FileInfo
            if (type.Equals(ZIP) && !String.IsNullOrEmpty(fileFullName))
                fileType = CheckForDocxAndXlsx(type, fileFullName);
            else
                fileType = type;    // if all the bytes match, return the type

            break;
        }
    }

    if (fileType == null)
    {
        // nothing found yet; maybe just plain text?
        // checking if it's binary (not really exact, but should do the job)
        // shouldn't work with UTF-16 OR UTF-32 files
        if (!fileHeader.Any(b => b == 0))
        {
            fileType = TXT;
        }

        // this would be the place to add detection based on file extension e.g. .csv

    }

    return fileType;
}

New version for NuGet?

It will be good.

Undetected PDF that can still be opened in pdf reader.

Hi.
Recently I received a pdf document that was not corrupt and could be opened in a pdf reader but was not detected as a pdf by Mime-Detective.
The pdf standard says that a pdf document should start with the magic number and a version number. See 'Technical overview - File structure' here: https://en.wikipedia.org/wiki/PDF But the document that I received started with a new line and this òÀ� followed by the magic number and version number. You can replicate this by taking any working pdf document and adding it to the beginning of the file in a text editor. Setting the pdf type offset to 4 makes Mime-Detective detect it as a pdf since it skips the added gibberish.
The issue here is, since pdf readers can safely open such documents, shouldn't Mime-Detective detect it as a valid pdf document?
The problem seems to be in the GetFileMatchingCount method in MimeTypes class. It expects the header to be the first thing it sees and breaks out immediately.
Cheers!

Position into the stream not reset

When using GetFileType, the position into the stream is modified.
I had an issue trying to find out why my stream was suddenly considered as being 0 byte long, as after using FetFileType, I immediatly tried to upload it somewhere.
I suggest we could save the current position into the stream and update the position of the stream afterwords, when finishing copying the data from the stream :
in MimeType.cs

line 214 : long currentPosition = stream.Position;
line 224 : stream.Position = currentPosition;

Not working for docx file, it is showing zip for docx files

Remove WPF project and turn in to a standalone library

Would like to turn this in to a library without the WPF project. What are your thoughts on this?

Mime type detection for .bmp images incorrect

new FileType(new byte?[] { 66, 77 }, "bmp", "image/gif");

must be changed to

new FileType(new byte?[] { 0x42, 0x4D }, "bmp", "image/bmp"); // or image/x-windows-bmp

.webp files throw NullReferenceException

.webp image files throw NullReferenceException when GetFileType().Mime is called

.msg file detected as PPT

Some outlook .msg files with atttachment detected as PPT

Support svg

can you please add support to sniff svg? thats the only thing missing to use this package for image content format sniffing.

GPL 2 license. Evaluate possibility to switch to MIT/Apache 2.0

My "fork" started as a short hack, but since mentioned at StackOverflow it gets some kind of attention.

The original code from https://filetypedetective.codeplex.com/ is GPL2 licensed. That means you have to be open source too if you use this, EXCEPT you use it only internally and don´t distribute it.

GPL 2 is "problematic" for many use cases i think. You have to be carefull and know what you are doing.
Switching to MIT or Apache 2.0 would make the usage much easier. But we need permission to do this from trailmax (https://www.codeplex.com/site/users/view/trailmax)

@trailmax if you have time, can you comment the issue and tell me your thoughts about this?

muraad / mime-detective Goto Github PK

mime-detective's People

Contributors

Stargazers

Watchers

Forkers

mime-detective's Issues

Recommend Projects

Recommend Topics

Recommend Org