Giter VIP home page Giter VIP logo

filetype's Introduction

filetype GoDoc Go Version

Small and dependency free Go package to infer file and MIME type checking the magic numbers signature.

For SVG file type checking, see go-is-svg package. Python port: filetype.py.

Features

  • Supports a wide range of file types
  • Provides file extension and proper MIME type
  • File discovery by extension or MIME type
  • File discovery by class (image, video, audio...)
  • Provides a bunch of helpers and file matching shortcuts
  • Pluggable: add custom new types and matchers
  • Simple and semantic API
  • Blazing fast, even processing large files
  • Only first 262 bytes representing the max file header is required, so you can just pass a slice
  • Dependency free (just Go code, no C compilation needed)
  • Cross-platform file recognition

Installation

go get github.com/h2non/filetype

API

See Godoc reference.

Subpackages

Examples

Simple file type checking

package main

import (
  "fmt"
  "io/ioutil"

  "github.com/h2non/filetype"
)

func main() {
  buf, _ := ioutil.ReadFile("sample.jpg")

  kind, _ := filetype.Match(buf)
  if kind == filetype.Unknown {
    fmt.Println("Unknown file type")
    return
  }

  fmt.Printf("File type: %s. MIME: %s\n", kind.Extension, kind.MIME.Value)
}

Check type class

package main

import (
  "fmt"
  "io/ioutil"

  "github.com/h2non/filetype"
)

func main() {
  buf, _ := ioutil.ReadFile("sample.jpg")

  if filetype.IsImage(buf) {
    fmt.Println("File is an image")
  } else {
    fmt.Println("Not an image")
  }
}

Supported type

package main

import (
  "fmt"

  "github.com/h2non/filetype"
)

func main() {
  // Check if file is supported by extension
  if filetype.IsSupported("jpg") {
    fmt.Println("Extension supported")
  } else {
    fmt.Println("Extension not supported")
  }

  // Check if file is supported by extension
  if filetype.IsMIMESupported("image/jpeg") {
    fmt.Println("MIME type supported")
  } else {
    fmt.Println("MIME type not supported")
  }
}

File header

package main

import (
  "fmt"
  "os"

  "github.com/h2non/filetype"
)

func main() {
  // Open a file descriptor
  file, _ := os.Open("movie.mp4")

  // We only have to pass the file header = first 261 bytes
  head := make([]byte, 261)
  file.Read(head)

  if filetype.IsImage(head) {
    fmt.Println("File is an image")
  } else {
    fmt.Println("Not an image")
  }
}

Add additional file type matchers

package main

import (
  "fmt"

  "github.com/h2non/filetype"
)

var fooType = filetype.NewType("foo", "foo/foo")

func fooMatcher(buf []byte) bool {
  return len(buf) > 1 && buf[0] == 0x01 && buf[1] == 0x02
}

func main() {
  // Register the new matcher and its type
  filetype.AddMatcher(fooType, fooMatcher)

  // Check if the new type is supported by extension
  if filetype.IsSupported("foo") {
    fmt.Println("New supported type: foo")
  }

  // Check if the new type is supported by MIME
  if filetype.IsMIMESupported("foo/foo") {
    fmt.Println("New supported MIME type: foo/foo")
  }

  // Try to match the file
  fooFile := []byte{0x01, 0x02}
  kind, _ := filetype.Match(fooFile)
  if kind == filetype.Unknown {
    fmt.Println("Unknown file type")
  } else {
    fmt.Printf("File type matched: %s\n", kind.Extension)
  }
}

Supported types

Image

  • jpg - image/jpeg
  • png - image/png
  • gif - image/gif
  • webp - image/webp
  • cr2 - image/x-canon-cr2
  • tif - image/tiff
  • bmp - image/bmp
  • heif - image/heif
  • jxr - image/vnd.ms-photo
  • psd - image/vnd.adobe.photoshop
  • ico - image/vnd.microsoft.icon
  • dwg - image/vnd.dwg
  • avif - image/avif

Video

  • mp4 - video/mp4
  • m4v - video/x-m4v
  • mkv - video/x-matroska
  • webm - video/webm
  • mov - video/quicktime
  • avi - video/x-msvideo
  • wmv - video/x-ms-wmv
  • mpg - video/mpeg
  • flv - video/x-flv
  • 3gp - video/3gpp

Audio

  • mid - audio/midi
  • mp3 - audio/mpeg
  • m4a - audio/mp4
  • ogg - audio/ogg
  • flac - audio/x-flac
  • wav - audio/x-wav
  • amr - audio/amr
  • aac - audio/aac
  • aiff - audio/x-aiff

Archive

  • epub - application/epub+zip
  • zip - application/zip
  • tar - application/x-tar
  • rar - application/vnd.rar
  • gz - application/gzip
  • bz2 - application/x-bzip2
  • 7z - application/x-7z-compressed
  • xz - application/x-xz
  • zstd - application/zstd
  • pdf - application/pdf
  • exe - application/vnd.microsoft.portable-executable
  • swf - application/x-shockwave-flash
  • rtf - application/rtf
  • iso - application/x-iso9660-image
  • eot - application/octet-stream
  • ps - application/postscript
  • sqlite - application/vnd.sqlite3
  • nes - application/x-nintendo-nes-rom
  • crx - application/x-google-chrome-extension
  • cab - application/vnd.ms-cab-compressed
  • deb - application/vnd.debian.binary-package
  • ar - application/x-unix-archive
  • Z - application/x-compress
  • lz - application/x-lzip
  • rpm - application/x-rpm
  • elf - application/x-executable
  • dcm - application/dicom

Documents

  • doc - application/msword
  • docx - application/vnd.openxmlformats-officedocument.wordprocessingml.document
  • xls - application/vnd.ms-excel
  • xlsx - application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • ppt - application/vnd.ms-powerpoint
  • pptx - application/vnd.openxmlformats-officedocument.presentationml.presentation

Font

  • woff - application/font-woff
  • woff2 - application/font-woff
  • ttf - application/font-sfnt
  • otf - application/font-sfnt

Application

  • wasm - application/wasm
  • dex - application/vnd.android.dex
  • dey - application/vnd.android.dey

Benchmarks

Measured using real files.

Environment: OSX x64 i7 2.7 Ghz

BenchmarkMatchTar-8    1000000        1083 ns/op
BenchmarkMatchZip-8    1000000        1162 ns/op
BenchmarkMatchJpeg-8   1000000        1280 ns/op
BenchmarkMatchGif-8    1000000        1315 ns/op
BenchmarkMatchPng-8    1000000        1121 ns/op

License

MIT - Tomas Aparicio

filetype's People

Contributors

akupila avatar alexbakker avatar andrewstucki avatar aofei avatar bkda avatar cfergeau avatar cugu avatar evanoberholster avatar fanpei91 avatar ferdnyc avatar h2non avatar ivanlemeshev avatar kols avatar kumakichi avatar leeefang avatar leslie-wang avatar lex-r avatar lynxbyorion avatar ma124 avatar mikewiacek avatar mygityf avatar nlamirault avatar rangelreale avatar rhyselsmore avatar rikonor avatar strazzere avatar taraspos avatar tim-caper avatar y-yagi avatar yiitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

filetype's Issues

filetype.IsApplication isn't defined ?

I'm using this code:

if filetype.IsArchive(head) {
  fileType = "archive"
} else if filetype.IsDocument(head) {
  fileType = "document"
} else if filetype.IsFont(head) {
  fileType = "font"
} else if filetype.IsAudio(head) {
  fileType = "audio"
} else if filetype.IsVideo(head) {
  fileType = "video"
} else if filetype.IsImage(head) {
  fileType = "image"
} else if filetype.IsApplication(head) {
  fileType = "application"
} else {
  fileType = "other"
}

Go displays this error:

undefined: filetype.IsApplication

add text/plain support ?

can you add this?

i try use AddMatcher but not work

my code:

var txtType = filetype.NewType("txt", "text/plain")

func txtMatcher(buf []byte) bool {
return len(buf) > 1 && buf[0] == 0x01 && buf[1] == 0x02
}

filetype.AddMatcher(txtType, txtMatcher)

// Check if the new type is supported by extension
if filetype.IsSupported("txt") {
	fmt.Println("New supported type: txt")
}

// Check if the new type is supported by MIME
if filetype.IsMIMESupported("text/plain") {
	fmt.Println("New supported MIME type: text/plain")
}

head := make([]byte, 261)
f, _ := file.Open()
f.Read(head)
kind, _ := filetype.Match(head)
if kind == filetype.Unknown {
	fmt.Println("Unknown file type")
	fmt.Println(kind)
	fmt.Println(head)
	return errors.New("Este arquivo não é permitido")
} else {
	fmt.Printf("File type matched: %s\n", kind.Extension)
}

image

DNG image format support?

Seems like it is supported in file command already

$ file /tmp/sample1.dng
/tmp/sample1.dng: TIFF image data, little-endian, direntries=54, height=171, bps=662, compression=none, PhotometricIntepretation=RGB, manufacturer=Canon, model=Canon EOS 350D DIGITAL, orientation=upper-left, width=256

Sample file can be found at https://filesamples.com/formats/dng

tar file not being recognized

👋 filetype happy user here! Today someone opened an issue in my bin project (marcosnils/bin#140) which led me here.

Filetype is not being able to detect the tar archive inside this gzipped file here https://github.com/sass/dart-sass/releases/download/1.52.3/dart-sass-1.52.3-linux-x64.tar.gz. However, tar -xf works and running file <dart-sass-1.52.3-linux-x64.tar> correctly detects the filetype.

Clearly seems like the file MIME headers are not being properly set. Still.. it's interesting how file still detects it as a tar archive even if the extension is removed.

file -i pepe 
pepe: application/x-tar; charset=binary

Any pointers here?

.mp4 file not detected

See attached file (Github doesn't let me upload a video, so it's in a zip).

file doesn't recognize the mime type either:

$ file --mime-type vid-20180326-wa0000.mp4
vid-20180326-wa0000.mp4: application/octet-stream
$ file vid-20180326-wa0000.mp4
vid-20180326-wa0000.mp4: ISO Media

It plays fine with VLC. Here's the part I thought looked relevant:

[00007fdae4000fa0] main input source debug: creating demux: access='file' demux='any' location='/tmp/vid-20180326-wa0000.mp4' file='/tmp/vid-20180326-wa0000.mp4'
[00007fdae4c14c50] main demux debug: looking for demux module matching "mp4": 55 candidates
[00007fdae4c01d20] mp4 stream warning: unknown box type beam (incompletely loaded)
[00007fdae4c01d20] mp4 stream debug: dumping root Box "root"
[00007fdae4c01d20] mp4 stream debug: |   + ftyp size 28 offset 0
[00007fdae4c01d20] mp4 stream debug: |   + beam size 24 offset 28 (????)
[00007fdae4c01d20] mp4 stream debug: |   + moov size 752 offset 52
[00007fdae4c01d20] mp4 stream debug: |   |   + mvhd size 108 offset 60
[00007fdae4c01d20] mp4 stream debug: |   |   + trak size 636 offset 168
[00007fdae4c01d20] mp4 stream debug: |   |   |   + tkhd size 92 offset 176
[00007fdae4c01d20] mp4 stream debug: |   |   |   + mdia size 536 offset 268
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   + mdhd size 32 offset 276
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   + hdlr size 34 offset 308
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   + minf size 462 offset 342
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   + vmhd size 20 offset 350
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   + dinf size 36 offset 370
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   + dref size 28 offset 378
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   |   + url  size 12 offset 394
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   + stbl size 398 offset 406
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   + stsd size 134 offset 414
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   |   + avc1 size 118 offset 430
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   |   |   + avcC size 32 offset 516
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   + stts size 24 offset 548
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   + stsc size 28 offset 572
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   + stsz size 164 offset 600
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   + stco size 20 offset 764
[00007fdae4c01d20] mp4 stream debug: |   |   |   |   |   |   + stss size 20 offset 784
[00007fdae4c01d20] mp4 stream debug: |   + mdat size 230684 offset 804
[00007fdae4c14c50] mp4 demux debug: unrecognized major media specification (mp4v).
[00007fdae4c14c50] mp4 demux debug: found 1 tracks
[00007fdae4c14c50] mp4 demux debug: track[Id 0x1] read 1 chunk
[00007fdae4c14c50] mp4 demux warning: STTS table of 1 entries
[00007fdae4c14c50] mp4 demux debug: track[Id 0x1] read 36 samples length:1s
[00007fdaec000c40] main input debug: selecting program id=0
[00007fdae4c14c50] mp4 demux debug: adding track[Id 0x1] video (enable) language undef
[00007fdae4c14c50] main demux debug: using demux module "mp4"

Also ffmpeg output:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/vid-20180326-wa0000.mp4':
  Metadata:
    major_brand     : mp4v
    minor_version   : 0
    compatible_brands: mp4vmp42isom
  Duration: 00:00:01.14, start: 0.000000, bitrate: 1631 kb/s
    Stream #0:0(und): Video: h264 (Baseline) (avc1 / 0x31637661), yuv420p, 640x272, 1625 kb/s, 31.71 fps, 31.71 tbr, 31714 tbn, 63428 tbc (default)

HEIF support

Hi!

Do you have any plans to add a matcher for the HEIF image format?

"go.mod has non-....v1 module path" with [email protected]

go get gopkg.in/xxx seems to be broken with this error...

GO111MODULE=on go get gopkg.in/h2non/[email protected]
go: gopkg.in/h2non/[email protected]: go.mod has non-....v1 module path "github.com/h2non/filetype" at revision v1.0.8
go: error loading module requirements

but fetching via github works?

$ GO111MODULE=on go get -v github.com/h2non/[email protected]
$ echo $?
0
go version
go version go1.12.4 darwin/amd64

I'm not sure if this is helpful, but the docs show some exception for gopkg.in

Three Transitional Exceptions

gopkg.in

Existing code that uses import paths starting with gopkg.in (such as gopkg.in/yaml.v1 and gopkg.in/yaml.v2) can continue to use those forms for their module paths and import paths even after opting in to modules.

https://github.com/golang/go/wiki/Modules#how-to-upgrade-and-downgrade-dependencies

I'm helping support nmrshll/gphotos-uploader-cli and this error recently started occurring.

In the mean time I'll try to reference the module via github directly, but I wanted to update you in case you weren't aware of the above issue.

Let me know how I can help and provide further info. Thanks again.

3gp videos not detected

I have a video file which detected as unknown type, however, it is the video/3gpp.

file --mime-type file.mp4
file.mp4: video/3gpp

I think this is be related to the #37
Thanks

MP4 file that is not H.264 isn't detected

I have an MP4 file that contains a "mpeg-4" video (MPEG-4 Visual in MediaInfo), but it is detected as "Unknown" by the library.
Shouldn't it check only the MP4 header, not the contained codec per se?

Add AVIF image filetype

AVIF support is getting traction in browsers and it would be great to detect AVIF-files as images 👍🏼

Wrong category of PDF & RTF, may be PS?

Why Portable Document Format aka PDF and RTF listed as archive formats? They are documents.
Also it looks like PostScript (programming language) should be Application (rather than archive) similar to wasm.

Full file needed for Documents

The README specifically states:

Only first 262 bytes representing the max file header is required, so you can just pass a slice

I've tried this out and it works fine for all files except MS Office docs such as docx, xlsx, etc. These files have a kind of application/zip if given only the first 262 bytes, but if you give them the full file, either with MatchFile or MatchReader they are detected correctly.

In fact, each file type seems to have a different buffer length minimum for filetype to report accurately. docx only seems to require a minimum of 1750 bytes, .xlsm requires at minimum of 1855 bytes. For each of these files, a buffer length under this amount will inaccurately report application/zip. For my application, this is very important.

For now I'll have to do the work of determining the minimum buffer size for MSO files to report accurately, but if you know this already, please update the docs, or at least have a caveat around the 262 number.

Err is nil when filetype is unknown

Expected behavior:

kind, err := filetype.MatchReader(badReader)

We would expect err != nil, but this is not so.

The documentation appears to contradict itself. In one example, we have:

  kind, unknown := filetype.Match(buf)
  if unknown != nil {
    fmt.Printf("Unknown: %s", unknown)
    return
  }

Yet, in another example, we have:

  // Try to match the file
  fooFile := []byte{0x01, 0x02}
  kind, _ := filetype.Match(fooFile)
  if kind == filetype.Unknown {
    fmt.Println("Unknown file type")
  } else {
    fmt.Printf("File type matched: %s\n", kind.Extension)
  }

I would expect the documentation to be consistent and the error to be populated if we have an unknown filetype.

Matches .doc as application/vnd.ms-excel

I am trying to detect the MIME type of a .doc file, and the result I get is of type
File type: xls. MIME: application/vnd.ms-excel
or
ile type: ppt. MIME: application/vnd.ms-powerpoint

Can't match my own file with text/html

Hello, This is my first time to create an issue, so I'm little nervous.

By the way, I needed to my own file with MIME type as text/html in my program.
But this module still can't do that.

following main.go is to Match sample.html, and result is Unknown file type

$ od -N14 -hc sample.html 
0000000      213c    6f64    7463    7079    2065    7468    6c6d        
           <   !   d   o   c   t   y   p   e       h   t   m   l        
0000016

$ go run main.go 
Unknown file type

I tried to add MIME type for text, so I create a pull request about it.
Please check my commit!

Best regards.

switching filetype to use Ragel

Thanks for the library.

The benchmark was of particular interest to me. When matching the contents of a file, there are more efficient ways to detect binary patterns.

I did a proof of concept using Ragel. It is an external dependency, but it generates the final golang code as an efficient state machine.

At the time of writing this issue, I was able to support your benchmarks for images, zip, and tar. The documents that have XML were skipped at the moment because I cannot discern their patterns as easily as the others.

The benchmarks were run with the same fixtures.

These are the results. A test was used to validate that the correct file types were being returned, too.

goos: darwin
goarch: amd64
BenchmarkMatchTar-4    	50000000	       183 ns/op
BenchmarkMatchZip-4    	1000000000	         6.23 ns/op
BenchmarkMatchJpeg-4   	2000000000	         4.98 ns/op
BenchmarkMatchGif-4    	2000000000	         4.47 ns/op
BenchmarkMatchPng-4    	1000000000	         6.80 ns/op

This happened on a 1.7 GHz Intel Core i7 Macbook Air 2014.

I'd like to contribute the work back. It seems that we can get this to be really fast.

Ragel machines can be language agnostic, so the same machine could be used for C-Python.

test data is non-free

Hello,

It looks like the test data cannot be distributed in this repository.

filetype.v1/fixtures/sample.tif:

techsoft as - P.O.BOX 132, N-3201 Sandefjord,NORWAY, PixEdit Version 3.72, Not licensed - for internal TechSoft use only

Please consider using distributable fixtures.

Cheers,

panic: runtime error: slice bounds out of range

While running go-fuzz on one of our services, I discovered an input that raised the following runtime error:

panic: runtime error: slice bounds out of range

goroutine 1 [running]:
github.com/h2non/filetype/matchers/isobmff.GetFtyp(0x7f3b1f727000, 0x1a, 0x1a, 0x489801, 0x4cf652, 0x4cf652, 0x4, 0x240bd42694a81301, 0xc000049c70, 0x40c7ff)
	/home/<name>/gocode/src/github.com/h2non/filetype/matchers/isobmff/isobmff.go:27 +0x353
github.com/h2non/filetype/matchers.Heif(0x7f3b1f727000, 0x1a, 0x1a, 0x4a2070)
	/home/<name>/gocode/src/github.com/h2non/filetype/matchers/image.go:119 +0xb8
github.com/h2non/filetype/matchers.NewMatcher.func1(0x7f3b1f727000, 0x1a, 0x1a, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/<name>/gocode/src/github.com/h2non/filetype/matchers/matchers.go:26 +0x81
gopkg.in/h2non/filetype%2ev1.Match(0x7f3b1f727000, 0x1a, 0x1a, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/<name>/gocode/src/gopkg.in/h2non/filetype.v1/match.go:29 +0x20a
gopkg.in/h2non/filetype%2ev1.Get(...)
	/home/<name>/gocode/src/gopkg.in/h2non/filetype.v1/match.go:40
github.com/h2non/filetype.Fuzz(0x7f3b1f727000, 0x1a, 0x1a, 0x4)
	/home/<name>/gocode/src/github.com/h2non/filetype/fuzz.go:9 +0x7a
go-fuzz-dep.Main(0xc000049f80, 0x1, 0x1)
	/tmp/go-fuzz-build324713724/goroot/src/go-fuzz-dep/main.go:36 +0x1b6
main.main()
	/tmp/go-fuzz-build324713724/gopath/src/github.com/h2non/filetype/go.fuzz.main/main.go:15 +0x52
exit status 2

83f44c13f8e6579e1f5e3ec0d047160288363c99.zip

Buffer overrun in matchers.Mp4()

Mp4() function needs additional parenthesis in order to evaluate correctly. Diff included below:

$ git diff -w
diff --git a/matchers/video.go b/matchers/video.go
index bb5ba26..e135b76 100644
--- a/matchers/video.go
+++ b/matchers/video.go
@@ -86,7 +86,7 @@ func Flv(buf []byte) bool {

 func Mp4(buf []byte) bool {
        return len(buf) > 27 &&
-               (buf[0] == 0x0 && buf[1] == 0x0 && buf[2] == 0x0 &&
+               ((buf[0] == 0x0 && buf[1] == 0x0 && buf[2] == 0x0 &&
                        (buf[3] == 0x18 || buf[3] == 0x20) && buf[4] == 0x66 &&
                        buf[5] == 0x74 && buf[6] == 0x79 && buf[7] == 0x70) ||
                        (buf[0] == 0x33 && buf[1] == 0x67 && buf[2] == 0x70 && buf[3] == 0x35) ||
@@ -95,5 +95,5 @@ func Mp4(buf []byte) bool {
                                buf[8] == 0x6D && buf[9] == 0x70 && buf[10] == 0x34 && buf[11] == 0x32 &&
                                buf[16] == 0x6D && buf[17] == 0x70 && buf[18] == 0x34 && buf[19] == 0x31 &&
                                buf[20] == 0x6D && buf[21] == 0x70 && buf[22] == 0x34 && buf[23] == 0x32 &&
-                       buf[24] == 0x69 && buf[25] == 0x73 && buf[26] == 0x6F && buf[27] == 0x6D)
+                               buf[24] == 0x69 && buf[25] == 0x73 && buf[26] == 0x6F && buf[27] == 0x6D))
 }

No equivalent of mime.ExtensionsByType

I want to replace my use of mime.ExtensionsByType with filetype because filetype understands more types (e.g. HEIF). However there's not currently a function in filetype that returns a Type by MIME. I'll send a PR.

File header example is misleading

Hi, first thanks on the awesome package. I've used it in a pet project and I've found an issue in the documentation. In the File header section, there is this example:

// Read a file
buf, _ := ioutil.ReadFile("sample.jpg")

// We only have to pass the file header = first 261 bytes
head := buf[:261]

Problem is that ioutil.ReadFile will read the whole file, which is not needed and slow for big files. Better way is to do it like this:

file, _ := os.Open("movie.mp4")
head := make([]byte, 261)

I've tested it on 2GB video files and it reduces processing time from 5 seconds to a few miliseconds on my PC. I can do a PR to update docs if you agree.

Export max size of header needed

It would be helpful if this package exported the maximum size of the header slice that would be used to match against. Currently this information is available in the README, and is hardcoded in my package that uses it.

If the size were exported, I could import that and ensure that my application automatically retrieves the updated length slice when this library is updated.

M4V not being recognized

The file I'm having problems has a codec of "H264 - MPEG-4 AVC (part 10) (avc1)"

00 00 00 28 66 74 79 70  4D 34 56 20 00 00 00 00  | ...(ftypM4V ....
4D 34 56 20 4D 34 41 20  6D 70 34 32 6D 70 34 31  | M4V M4A mp42mp41

We check exactly for "00 00 00 1C 66 74 79 70 4D 34 56" (....ftypM4V) however this is looking for too much, we just want the type and sub-type.

Solution: check for "66 74 79 70 4D 34 56" (ftypM4V)
http://www.file-recovery.com/m4v-signature-format.htm

m4a mime type

filetype van - 1.1.3

Uploading audio file with .m4a extension and its detected as video

filetype.IsVideo(buf) == true
filetype.IsAudio(buf) == false

types.Type={{video 3gpp video/3gpp} 3gp})

I believe it should be something like audio\m4a

Specific MP3 file not detected

test.zip
This mp3 is not detected as it.
It has a bit rate of 8 kbps and a sample rate or 24000 Hz. It is basically one second of silence.

PS. gabriel-vasile/mimetype detects it correctly.

Buffer overrun in matchers.Eot()

Eot() only check for len(buf) > 10 before indexing up to buf[35]

func Eot(buf []byte) bool {
    return len(buf) > 10 &&
    buf[34] == 0x4C && buf[35] == 0x50 &&
    ((buf[8] == 0x02 && buf[9] == 0x00 &&
        buf[10] == 0x01) || (buf[8] == 0x01 &&
        buf[9] == 0x00 && buf[10] == 0x00) ||
        (buf[8] == 0x02 && buf[9] == 0x00 && buf[10] == 0x02))
}

The header reads 261 bytes and then wants to use the full file data, so it's missing 261 bytes. How to handle this gracefully

The test case following:

func (o *OSSESSuite) Test_Upload_FileStream_SetContentType2() {
	file, err := os.Open("../../test/assets/HgWFAEPozfVdcst")
	assert.NoError(o.T(), err)

	// Gets the file size, equal to true.
	info, err := file.Stat()
	assert.Equal(o.T(), int64(87064), info.Size())

	head := make([]byte, 261)
	_, err = file.Read(head)
	assert.NoError(o.T(), err)
	match, err := filetype.Match(head)
	assert.Equal(o.T(), "image/png", match.MIME.Value)

	// Gets the file size, equal to true.
	info, err = file.Stat()
	assert.NoError(o.T(), err)
	assert.Equal(o.T(), int64(87064), info.Size())

	o.client.SetObjectContentType(match.MIME.Value)
	err = o.client.PutObject("HgWFAEPozfVdcst", file)
	assert.NoError(o.T(), err)
}

The test case result:

=== RUN   TestOSSESSuite
--- FAIL: TestOSSESSuite (0.03s)
=== RUN   TestOSSESSuite/Test_Upload_FileStream_SetContentType2
    --- FAIL: TestOSSESSuite/Test_Upload_FileStream_SetContentType2 (0.03s)
        osses_test.go:105: 
            	Error Trace:	osses_test.go:105
            	Error:      	Received unexpected error:
            	            	Put http://hello-world.oss-cn-beijing.aliyuncs.com/HgWFAEPozfVdcst: net/http: HTTP/1.x transport connection broken: http: ContentLength=87064 with Body length 86803
            	Test:       	TestOSSESSuite/Test_Upload_FileStream_SetContentType2
FAIL

The focus is on: ContentLength=87064 with Body length 86803

sample.dex file triggering antivirus engines :/

I just had an awkward situation trying to go get a tool that used this module from my work laptop and the corporate cybersecurity solution (Fortinet Forticlient Antivirus) tripped on the sample.dex telling me it thinks it's some kind of Android trojan:

image

VirusTotal also reports positives from several other AV engines:
https://www.virustotal.com/gui/file/8995adc809fd239ecd2806c6957ee98db6eb06b64dac55089644014d87e6f956/detection

That said, I don't believe you meant harm or are trying to sneak in trojans to the world though. This looks like an unfortunate case of a suspicious file that made it into the unit tests suite; that is all.

I saw it was added by a commit from @mikusjelly but where did they get the file from? In any case, do you think it could be possible to swap it for another .dex that is not flagged as highly suspicious? -- If you upload the new .dex to virustotal.com for a scan and if it comes out totally clean then it's good for the repo.

What do you think?

ps: I emailed Fortinet to report it as a possible false positive and they came back to me with:

The sample contains suspicious codes that are related to the SMS service, purchase interface, payment, bill, China Mobile, China Unicom, and China Telecommunications Corporation.
The class names and function names are all simply obfuscated, and it also involved the "android.provider.Telephony.SMS_RECEIVED" and "android.provider.Telephony.SMS_DELIVER" as part of the suspicious behaviors.

Some types of MP3 are not matched

Hi,

I have a couple of .mp3 files which are not matched in my testing. It seems the magic bytes can change depending on channels and bitrates of the files. For example, one mono .mp3 has 0xFFFA90 and a stereo file has 0xFFFB90. Those that start 0x494433 are correctly matched.

It seems numerous examples can be seen here: https://github.com/mirror/jdownloader/blob/master/src/org/jdownloader/extensions/extraction/mime.type

I will shortly attempt to fix this in my program and provide a patch in due course.

Is adding to the 'if' statement the preferred way?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.