muraad / mime-detective Goto Github PK
View Code? Open in Web Editor NEWMime type for files.
License: MIT License
Mime type for files.
License: MIT License
I create simple 1.txt file with simple short content: "1" and got the System.IndexOutOfRangeException
.
Hi,
I have tried to use the below Definition to get the Mimetype for an XML file. I looked at the MimeTypes.cs for the XML ByteArray definition as below in DEC format instead of the HEX, but I am not getting the correct ByteArray Sequence for some XML file cases
private static readonly byte[] XML = { 114, 115, 105, 111, 110, 61, 34, 49, 46, 48, 34, 63, 62 }; // converted from HEX to DEC from MimeTypes.cs
private static readonly byte[] XML = { 60, 99, 101, 114, 116, 105, 102, 105, 99, 97 };// this works in most cases except for some
Any suggestions?
I dont get why we have to use offset for example for doc files. because the 4 byte of doc files are the same and we can use them to detect file types ???
Just a heads up I made a fork of this, to begin work on porting it to .NET Core... the fork is currently building on 1.0.0-RC2 using the netstandard1.5 definitions
.msg files are undetected. I need to get application/vnd.ms-outlook mime type. Can you please add?
Thanks.
This example PDF file gets detected as text/plain when MaxHeaderSize
first bytes are used for the detection: http://www.orimi.com/pdf-test.pdf
I would run the file signature detection before checking for plain text files.
public static FileType GetFileType(Func<byte[]> fileHeaderReadFunc, string fileFullName = "")
{
// if none of the types match, return null
FileType fileType = null;
// read first n-bytes from the file
byte[] fileHeader = fileHeaderReadFunc();
// compare the file header to the stored file headers
foreach (FileType type in types)
{
int matchingCount = GetFileMatchingCount(fileHeader, type);
if (matchingCount == type.Header.Length)
{
// check for docx and xlsx only if a file name is given
// there may be situations where the file name is not given
// or it is unpracticable to write a temp file to get the FileInfo
if (type.Equals(ZIP) && !String.IsNullOrEmpty(fileFullName))
fileType = CheckForDocxAndXlsx(type, fileFullName);
else
fileType = type; // if all the bytes match, return the type
break;
}
}
if (fileType == null)
{
// nothing found yet; maybe just plain text?
// checking if it's binary (not really exact, but should do the job)
// shouldn't work with UTF-16 OR UTF-32 files
if (!fileHeader.Any(b => b == 0))
{
fileType = TXT;
}
// this would be the place to add detection based on file extension e.g. .csv
}
return fileType;
}
It will be good.
Hi.
Recently I received a pdf document that was not corrupt and could be opened in a pdf reader but was not detected as a pdf by Mime-Detective.
The pdf standard says that a pdf document should start with the magic number and a version number. See 'Technical overview - File structure' here: https://en.wikipedia.org/wiki/PDF But the document that I received started with a new line and this òÀ� followed by the magic number and version number. You can replicate this by taking any working pdf document and adding it to the beginning of the file in a text editor. Setting the pdf type offset to 4 makes Mime-Detective detect it as a pdf since it skips the added gibberish.
The issue here is, since pdf readers can safely open such documents, shouldn't Mime-Detective detect it as a valid pdf document?
The problem seems to be in the GetFileMatchingCount method in MimeTypes class. It expects the header to be the first thing it sees and breaks out immediately.
Cheers!
When using GetFileType, the position into the stream is modified.
I had an issue trying to find out why my stream was suddenly considered as being 0 byte long, as after using FetFileType, I immediatly tried to upload it somewhere.
I suggest we could save the current position into the stream and update the position of the stream afterwords, when finishing copying the data from the stream :
in MimeType.cs
Would like to turn this in to a library without the WPF project. What are your thoughts on this?
new FileType(new byte?[] { 66, 77 }, "bmp", "image/gif");
must be changed to
new FileType(new byte?[] { 0x42, 0x4D }, "bmp", "image/bmp"); // or image/x-windows-bmp
.webp image files throw NullReferenceException when GetFileType().Mime is called
Some outlook .msg files with atttachment detected as PPT
can you please add support to sniff svg? thats the only thing missing to use this package for image content format sniffing.
My "fork" started as a short hack, but since mentioned at StackOverflow it gets some kind of attention.
The original code from https://filetypedetective.codeplex.com/ is GPL2 licensed. That means you have to be open source too if you use this, EXCEPT you use it only internally and don´t distribute it.
GPL 2 is "problematic" for many use cases i think. You have to be carefull and know what you are doing.
Switching to MIT or Apache 2.0 would make the usage much easier. But we need permission to do this from trailmax (https://www.codeplex.com/site/users/view/trailmax)
@trailmax if you have time, can you comment the issue and tell me your thoughts about this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.