• Stars
    star
    116
  • Rank 303,894 (Top 6 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 6 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ruby gem to detect bots and crawlers via the user agent

CrawlerDetect

Build Gem Version

About

CrawlerDetect is a Ruby version of PHP class @CrawlerDetect.

It helps to detect bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect 1,000's of bots/spiders/crawlers.

Why CrawlerDetect?

Comparing with other popular bot-detection gems:

CrawlerDetect Voight-Kampff Browser
Number of bot-patterns >1000 ~280 ~280
Number of checked HTTP-headers 10 1 1
Number of updates of bot-list (1st half of 2018) 14 1 7

In order to remain up-to-date, this gem does not accept any crawler data updates โ€“ any PRs to edit the crawler data should be offered to the original JayBizzle/CrawlerDetect project.

Installation

Add this line to your application's Gemfile:

gem 'crawler_detect'

Basic Usage

CrawlerDetect.is_crawler?("Bot user agent")
=> true

Or if you need crawler name:

detector = CrawlerDetect.new("Googlebot/2.1 (http://www.google.com/bot.html)")
detector.is_crawler?
# => true
detector.crawler_name
# => "Googlebot"

Rack::Request extension

Optionally you can add additional methods for request:

request.is_crawler?
# => false
request.crawler_name
# => nil

It's more flexible to use request.is_crawler? rather than CrawlerDetect.is_crawler? because it automatically checks 10 HTTP-headers, not only HTTP_USER_AGENT.

Only one thing you have to do is to configure Rack::CrawlerDetect midleware:

Rails

class Application < Rails::Application
  # ...
  config.middleware.use Rack::CrawlerDetect
end

Rack

use Rack::CrawlerDetect

Configuration

In some cases you may want to use your own white-list, or black-list or list of http-headers to detect User-agent.

It is possible to do via CrawlerDetect::Config. For example, you may have initializer like this:

CrawlerDetect.setup! do |config|
  config.raw_headers_path    = File.expand_path("crawlers/MyHeaders.json", __dir__)
  config.raw_crawlers_path   = File.expand_path("crawlers/MyCrawlers.json", __dir__)
  config.raw_exclusions_path = File.expand_path("crawlers/MyExclusions.json", __dir__)
end

Make sure that your files are correct JSON files. Look at the raw files which are used by default for more information.

License

MIT License