Giter VIP home page Giter VIP logo

file_signature's Introduction

Introduction

Magic numbers are patterns in the first few bytes of a file or data stream which (hopefully, uniquely) identify the type of the file or data stream. For example, a file which starts with "%!PS-Adobe-" is very likely a postscript file, whereas one which starts with the three bytes, 1F 8B 08, is very likely a gzip-compressed file.

This gem allows you to check a file or IO stream against a list of known magic numbers, and thereby determine its type.

Want to help? Feel free to fork it and submit a pull request.

Install quickstart

Install (for the previous 1.1.x version):

gem install file_signature

Bundler (for the current 1.2.x version):

gem "file_signature", :git => 'http://github.com/distler/file_signature.git'

Require:

require "file_signature"

Details

This gem infers the content-type based on widespread programming conventions for data file formats. It is tested on MRI 1.8.7 and 1.9.3. If you test it and find it working on other rubies please share your success.

Our principal guide to these magic numbers is: http://www.garykessler.net/library/file_sigs.html But we've also found useful input from libmagic and other sources on the internet.

Typical uses of magic numbers:

  • quickly guess at a file's mime type
  • check if data matches the file's MIME type or extension
  • check if a web form file upload matches its HTTP content type

Compare:

Usage

 require 'file_signature'

The gem adds two class methods to the File class, and two instance methods to the IO class.

 File.magic_number_type('myfile')

and

 f = File.open('myfile')
 f.magic_number_type

both return a symbol (or nil if the type is unrecognized) corresponding to the type of file found. Similarly,

 File.mime_type('myfile')

and

 f = File.open('myfile')
 f.mime_type

return a string, which is the corresponding MIME-type ('application/octet-stream' if unrecognized).

##TODO

  • This library currently contains some pathetic handful of bytestrings for type identification... see if we can import some /usr/share/file/magic or something to extend this so its actually useful for more people.

Changes

  • 2012-03-14 1.0.0 Update docs, tests
  • 2012-05-31 1.1.0 Add memo, reformat and rename things for clarity
  • 2012-08-10 1.2.0 Many new magic numbers and corrections to existing ones, add methods to retrieve mime_type, lots of tests

License

You may choose any of these open source licenses:

  • Apache License
  • BSD License
  • CreativeCommons License, Non-commercial Share Alike
  • GNU General Public License Version 2 (GPL 2)
  • GNU Lesser General Public License (LGPL)
  • MIT License
  • Perl Artistic License
  • Ruby License

The software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement.

In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

This license is for the included software that is created by SixArm; some of the included software may have its own licenses, copyrights, authors, etc. and these do take precedence over the SixArm license.

Copyright (c) 2005-2012 Joel Parker Henderson

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.