• Stars
    star
    999
  • Rank 45,913 (Top 1.0 %)
  • Language
    Python
  • License
    MIT License
  • Created over 10 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome ⭐

crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

Install

Direct download

Download the crawler-user-agents.json file from this repository directly.

Npm / Yarn

crawler-user-agents is deployed on npmjs.com: https://www.npmjs.com/package/crawler-user-agents

To use it using npm or yarn:

npm install --save crawler-user-agents
# OR
yarn add crawler-user-agents

In Node.js, you can require the package to get an array of crawler user agents.

const crawlers = require('crawler-user-agents');
console.log(crawlers);

Usage

Each pattern is a regular expression. It should work out-of-the-box wih your favorite regex library:

  • JavaScript: if (RegExp(entry.pattern).test(req.headers['user-agent']) { ... }
  • PHP: add a slash before and after the pattern: if (preg_match('/'.$entry['pattern'].'/', $_SERVER['HTTP_USER_AGENT'])): ...
  • Python: if re.search(entry['pattern'], ua): ...

Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

  • contain a single addition
  • specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")
  • contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot
  • result in a valid JSON file (don't forget the comma between items)

Example:

{
  "pattern": "rogerbot",
  "addition_date": "2014/02/28",
  "url": "http://moz.com/help/pro/what-is-rogerbot-",
  "instances" : ["rogerbot/2.3 example UA"]
}

License

The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.

Related work

There are a few wrapper libraries that use this data to detect bots:

Other systems for spotting robots, crawlers, and spiders that you may want to consider are:

More Repositories

1

ExpandAnimations

LibreOffice/OpenOffice.org extension to expand animations before exporting to PDF. Looking for maintainers.
FreeBasic
208
star
2

bibtexbrowser

Beautiful publication lists with bibtex and PHP (standalone or in Wordpress)
PHP
77
star
3

jskomment

open source AJAX commenting system
JavaScript
37
star
4

jexast

Enables the extraction of Java AST nodes with plain JDT
Java
16
star
5

misc

TeX
10
star
6

alloy-quick-reference

A helper document about the Alloy specification language
TeX
10
star
7

apache-svn-commits

1,7 million commits of the main Apache SVN repository. Searchable thanks to Github.
5
star
8

content-assist-example

an example code completion system for Eclipse
Java
5
star
9

bots.yml

Specification of bots.yml for software bots
4
star
10

Asus-NovaGo-TP370QL

Debug information about Asus-NovaGo-TP370QL
4
star
11

git-api-diff

Computes Git diff at the API level (new methods, modified methods, etc)
Shell
4
star
12

pascal3g

ANTLR v3 grammar for Pascal
OpenEdge ABL
4
star
13

travis-metronome

A piece of software art
3
star
14

litjava

A literate programming tool in Java for Java
3
star
15

dl-gdocs

Downloads and backups Google Documents (texts, spreadsheets, presentations)
Python
3
star
16

real-bug-fixes-icse-2015

Open-science repository for the bug fix commit dataset of "An Empirical Study on Real Bug Fixes" (ICSE 2015).
Python
3
star
17

academia.json

Metadata about academic publications in JSON. Pull-requests welcome.
Python
2
star
18

roundingsat

C++
2
star
19

travis-metronome-tok

The counter-weight of https://github.com/monperrus/travis-metronome
1
star
20

typeusage

extracts type-usages from Java bytecode or source code using Soot
Java
1
star
21

gakoci

Self-hosted continuous integration server for Github, based on webhooks
Python
1
star
22

fun-with-travis

Fun experiments with travis
Shell
1
star
23

kyss

1
star
24

btc-supply-chain

Database of SHA256 of software packages for the Bitcoin software supply chain
Python
1
star
25

one-million-branches

1
star
26

wp-publications

Repository of the wordpress plugin wp-publications (pull requests welcome :)
PHP
1
star
27

dataset-diff-gumtree

Dataset of diff files used in "Fine-grained and Accurate Source Code Differencing"
HTML
1
star
28

exgen

copy from public artefact https://drive.google.com/file/d/10unHPpARh9FBrVIyarkiSVMKSd3qfGV5/view
Solidity
1
star
29

bug-fixes-saner16

Extraction of the 34836 diffs of https://github.com/xuanbachle/data-bugfixes/blob/master/all.zip
Java
1
star
30

airport-by-foot

Instructions on the feasibility and pleasantness of reaching airports by foot ✈✈✈✈✈
1
star
31

rugrat

Fork of the RUGRAT Random Program Generator (aka Carfast)
Java
1
star