Giter VIP home page Giter VIP logo

voight-kampff's Introduction

Voight-Kampff

Build Status

Voight-Kampff relies on a user agent list for its detection. It can easily tell you if a request is coming from a crawler, spider or bot. This can be especially helpful in analytics such as page hit tracking.

Installation

gem install voight_kampff

Configuration

A JSON file is used to match user agent strings to a list of known bots.

If you'd like to use an updated list or make your own customizations, run rake voight_kampff:import_user_agents. This will download a crawler-user-agents.json file into the ./config directory.

Note: The pattern entries in the JSON file are evaluated as regular expressions.

Usage

There are three ways to use Voight-Kampff

  1. Through Rack::Request such as in your Ruby on Rails controllers:
    request.bot?

  2. Through the VoightKampff module:
    VoightKampff.bot? 'your user agent string'

  3. Through a VoightKampff::Test instance:
    VoightKampff::Test.new('your user agent string').bot?

All of the above examples accept human? and bot? methods. All of these methods will return true or false.

Upgrading to version 1.0

Version 1.0 uses a new source for a list of bot user agent strings since the old source was no longer maintained. This new source, unfortuately, does not include as much detail. Therefore the following methods have been deprecated:

  • #browser?
  • #checker?
  • #downloader?
  • #proxy?
  • #crawler?
  • #spam?

In general the #bot? command tends to include all of these and I'm sure it's unlikely that anybody was getting this granular with their bot checking. So I see it as a small price to pay for an open and up to date bot list.

Also, the gem no longer extends ActionDispatch::Request instead it extends Rack::Request which ActionDispatch::Request inherits from. This allows the same functionality for Rails while opening the gem up to other rack-based projects.

FAQ

Q: What's with the name?
A: It's the machine in Blade Runner that is used to test whether someone is a human or a replicant.

Q: I've found a bot that isn't being matched
A: The list is being pulled from github.com/monperrus/crawler-user-agents. If you'd like to have entries added to the list, please create a pull request with that project. Once that pull request is merged, feel free to create an issue here and I'll release a new gem version with the updated list. In the meantime you can always run rake voight_kampff:import_user_agents on your project to get that updated list.

Q: __Why don't you use the user agent list from ______________ If you know of a better source for a list of bot user agent strings, please create an issue and let me know. I'm open to switching to a better source or supporting multiple sources. There are others out there but I like the openness of monperrus' list.

Thanks

Thanks to github.com/monperrus/crawler-user-agents for providing an open and easily updatable list of bot user agents.

Contributing

PR without tests will not get merged, Make sure you write tests for api and rails app. Feel free to ask for help, if you do not know how to write a determined test.

Running Tests?

  • bundle install
  • bundle exec rspec

voight-kampff's People

Contributors

adamcrown avatar acnalesso avatar teejteej avatar seanlinsley avatar simonhildebrandt avatar spyderdfx avatar vamereh avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.