Predator
Anti-Automation System
Introduction
Predator is a prototype web application designed to demonstrate anti-crawling, anti-automation & bot detection techniques. It can be used a honeypot, anti-crawling system or a false positive test bed for vulnerability scanners.
Warning: I strongly discourage the use of the demonstrated methods in a production server without knowing what they exactly do. Remember, only the techniques which seem usable according to the web application should be implemented. Predator is a collection of techniques, its code shouldn't be used as is.
The mind map below is a loosely made visualization of how the techniques demonstrated here can be implmented in a production environment.
Note: The numbers and factors in "Observation Phase" can be used to set a reputation to a client which then can be used a strong indicator of malicious activity once a threshold is hit.
Techniques Used
Bot Detection
User-Agent and Header Inspection
HTTP headers sent by bots are often in different order when compared to a real browser or lack altogether. Many bots disclose themselves in the User-Agent header for the sake of ethics while others don't send one at all.
Webdriver Detection
Most of the HTML mutation techniques described here can be bypassed with browser based frameworks such as selenium
and puppeteer
but they can be detected with various tests as implemented in isBot.js.
Resource Usage
Most of the bots only make requests to webpages and images but resources files such as .css
are often ignored as they aren't downloaded by the HTTP implementation in use. Bots can be detected when the ratio of webpages/images and such resource files becomes higher than a predefined threshold.
Malformed HTML
A lot of HTML parsers used in crawlers can't handle broken HTML as browsers do. For example, clicking the following link in a browser leads to page_1
but affected parsers parse the latter value i.e. page_2
<a/href="page_1"/href="page_2">Click</a>
It can be used to keep off and ban crawlers without affecting user experience.
Invisible Links
Some links are hidden from users using CSS but automated progarms can still see them. These links can be used to detect bots and take a desired action such as banning the IP address.
Bait Links
When Predator suspects that the visitor is a bot, it generates random number of random links which direct to a page (limbo.php
)
containing more random links and this process keeps repeating.
Signature Reversing
Vulnerability scanners usually enter a payload and see if the webapp responds in a certain way. Predator can pretend
to have a vulnerability by including exptected response i.e. signature within HTML.
Predator mimics the followiwng vulnerabilities at the moment:
- SQL Injection
- Cross Site Scripting (XSS)
- Local File Inclusion (LFI)
This method makes it possible to set up a honeypot without actually hosting any vulnerable code and serves as a test bed for false positive testing.
Credits
PatheticGeek did all the front-end magic to make Predator look good.