• Stars
    star
    198
  • Rank 196,898 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Passive TCP/IP Fingerprinting Tool. Run this on your server and find out what Operating Systems your clients are *really* using.

Passive TCP/IP Fingerprinting 🚀

Zardaxt.py is a passive TCP/IP fingerprinting tool. Run Zardaxt.py on your server to find out what operating systems your clients are really using. This tool considers the header fields and options from the very first incoming SYN packet of the TCP 3-Way Handshake.

Test your TCP/IP Fingerprint with curl:

curl 'https://tcpip.incolumitas.com/classify?by_ip=1'
curl 'https://tcpip.incolumitas.com/classify?by_ip=1&detail=1'

Why the rewrite?

  • p0f is dead. p0f's database is too old and C is a bit overkill and hard to quickly hack in.
  • satori.py is extremely buggy and hard to use (albeit the ideas behind the code are awesome). Actually, some code and inspiration used in zardaxt has been taken from satori.py.
  • The actual statistics/traffic samples behind TCP/IP fingerprinting are more important than the tool itself. Therefore it makes sense to rewrite it.

What can I do with this tool?

This tool may be used to correlate an incoming TCP/IP connection with a operating system class. For example, It can be used to detect proxies, if the proxy operating system (mostly Linux) differs from the operating system taken from the User-Agent.

If the key os_mismatch is true, then the TCP/IP inferred OS is different from the User-Agent OS.

On the other hand, most VPN protocols cannot be revealed by TCP/IP fingerprint mismatches. This is because VPN protocols work on the network layer, and VPN servers do not establish a dedicated TCP/IP connection that could have the TCP/IP characteristics of the VPN server.

Demo

Installation & Usage

First clone the repo:

# clone repo
git clone https://github.com/NikolaiT/zardaxt
# move into directory
cd zardaxt

I am using pew to create Python virtual environments. If you don't have pew installed yet, install it as follows:

pip3 install pew

Note: For newer Python 3 versions (Such as Python 3.10), you will have to install pcapy-ng (See: https://pypi.org/project/pcapy-ng/) instead of pcapy.

# create a virtual environment with pew
pew new zardaxt
# work on virtual environment `zardaxt`
pew workon zardaxt
# install packages now with pip inside the environment `zardaxt`
pip install dpkt pcapy-ng requests

By default, zardaxt.py looks for a configuration file named zardaxt.json that should reside in the same directory as zardaxt.py. But you can provide your own path to your own config file as first argument to zardaxt.py.

python zardaxt.py ./zardaxt.json

Or run zardaxt.py in the background on your server

nohup pew in zardaxt python zardaxt.py 

Serving over https via nginx

If you want to serve zardaxt.py over nginx, your configuration has to look something like this. HTTPS is provided by Let’s Encrypt (certbot).

server {
  listen 443 ssl default_server;
  listen [::]:443 ssl default_server;
  
  server_name tcpip.incolumitas.com;

  location / {
    proxy_pass http://localhost:8249;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Host $host;
    proxy_set_header  X-Real-IP $remote_addr;
    proxy_cache_bypass $http_upgrade;
  }
  
  ssl_certificate /etc/letsencrypt/live/abs.incolumitas.com/fullchain.pem; # managed by Certbot
  ssl_certificate_key /etc/letsencrypt/live/abs.incolumitas.com/privkey.pem; # managed by Certbot
}

API Support

When you run zardaxt.py, the program automatically launches a simple web API that you can query. A http server is bound to 0.0.0.0:8249. You can query it on http://0.0.0.0:8249/classify.

If you want to query the TCP/IP fingerprint only for the client IP address, use

curl "http://0.0.0.0:8249/classify"

And if you want to have all details in the API output, append &detail=1 to the URL:

curl "http://0.0.0.0:8249/classify?detail=1"

If you want to query all fingerprints in the API database, you have to specify the API key:

curl "http://0.0.0.0:8249/classify?key=abcd1234"

If you want to query/lookup a specific IP address (Example: 103.14.251.215), you will have to specify the IP address and the API key:

curl "http://0.0.0.0:8249/classify?key=abcd1234&ip=103.14.251.215"

What header fields are used for TCP/IP fingerprinting?

Several fields such as TCP Options or TCP Window Size or IP Fragment Flag depend heavily on the OS type and version. Detecting operating systems by analyzing the first incoming SYN packet is surely no exact science, but it's better than nothing.

Entropy from the IP header

  • IP.ihl (4 bits) - Internet Header Length (IHL) - The IPv4 header is variable in size due to the optional 14th field (Options). The IHL field contains the size of the IPv4 header. The minimum value for this field is 5 (20 bytes) and the maximum value is 15 (60 bytes). If the IP options field correlates with the the underlying OS (which I don't think is necessarily the case), the IP.ihl is relevant.
  • IP.len (16 bits) - Total Length - This 16-bit field defines the entire packet size in bytes, including header and data. The minimum size is 20 bytes (header without data) and the maximum is 65,535 bytes. IP.len is likely relevant for the TCP/IP fingerprint.
  • IP.id (16 bits) - Identification - This field is an identification field and is primarily used for uniquely identifying the group of fragments of a single IP datagram. However, the IP.id field is used for other purposes and it seems that its behavior is OS dependent: "We find that that the majority of hosts adopts a constant IP-IDs (39%) or local counter (34%), that the fraction of global counters (18%) significantly diminished, that a non marginal number of hosts have an odd behavior (7%) and that random IP-IDs are still an exception (2%)."
  • IP.flags (3 bits) - Flags - Don't fragment (DF) and more fragments (MF) flags, bit 0 (RF) is always 0. In the flags field of the IPv4 header, there are three bits for control flags. The "don't fragment" (DF) bit plays a central role in Path Maximum Transmission Unit Discovery (PMTUD) because it determines whether or not a packet is allowed to be fragmented. Some OS set the DF flag in the IP header, others don't.
  • IP.ttl (8 bits) - Time to live (TTL) - An eight-bit time to live field limits a datagram's lifetime to prevent network failure in the event of a routing loop. The TTL indicates how long a IP packet is allowed to circulate in the Internet. Each hop (such as a router) decrements the TTL field by one. The maximum TTL value is 255, the maximum value of a single octet (8 bits). A recommended initial value is 64, but some operating systems customize this value. Hence it's relevancy for TCP/IP fingerprinting.
  • IP.protocol (8 bits) - Protocol - This field defines the protocol used in the data portion of the IP datagram. IANA maintains a list of IP protocol numbers as directed by RFC 790. It does not seem to be that relevant for TCP/IP fingerprinting, since it is mostly TCP (6).
  • IP.sum (16 bits) - Header checksum - The 16-bit IPv4 header checksum field is used for error-checking of the header. When a packet arrives at a router, the router calculates the checksum of the header and compares it to the checksum field. If the values do not match, the router discards the packet. Errors in the data field must be handled by the encapsulated protocol. Both UDP and TCP have separate checksums that apply to their data. Probably has no use for TCP/IP fingerprinting.

Entropy from the TCP header

  • TCP.sequence_number (32 bits) - Sequence Number - If the SYN flag is set (1), then this is the initial sequence number. It might be the case that different operating systems use different initial sequence numbers, but the initial sequence number is most likely randomly chosen. Therefore this field is most likely of no particular help regarding fingerprinting.
  • TCP.acknowledgment_number (32 bits) - Acknowledgment Number - If the ACK flag is set then the value of this field is the next sequence number that the sender of the ACK is expecting. Should be zero if the SYN flag is set.
  • TCP.data_offset (4 bits) - Data Offset - This is the size of the TCP header in 32-bit words with a minimum size of 5 words and a maximum size of 15 words. Therefore, the maximum TCP header size size is 60 bytes (with 40 bytes of options data). The TCP header size thus depends on how much options are present at the end of the header. This is correlating with the OS, since the TCP options correlate with the TCP/IP fingerprint.
  • TCP.flags (9 bits) - Flags - This header field contains 9 one-bit flags for TCP protocol controlling purposes. The initial SYN packet has mostly a flags value of 2 (which means that only the SYN flag is set). However, I have also observed flags values of 194 (2^1 + 2^6 + 2^7), which means that the SYN, ECE and CWR flags are set to one. If the SYN flag is set, ECE means that the client is ECN capable. Congestion window reduced (CWR) means that the sending host received a TCP segment with the ECE flag set and had responded in congestion control mechanism.
  • TCP.window_size (16 bits) - Window Size - Initial window size. The idea is that different operating systems use a different initial window size in the initial TCP SYN packet.
  • TCP.checksum (16 bits) - Checksum - The 16-bit checksum field is used for error-checking of the TCP header, the payload and an IP pseudo-header. The pseudo-header consists of the source IP address, the destination IP address, the protocol number for the TCP protocol (6) and the length of the TCP headers and payload (in bytes).
  • TCP.urgent_pointer (16 bits) - Urgent Pointer - If the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data byte. It should be zero in initial SYN packets.
  • TCP.options (Variable 0-320 bits) - Options - All TCP Options. The length of this field is determined by the data offset field. Contains a lot of information, but most importantly: The Maximum Segment Size (MSS), the Window scale value. Because the TCP options data is variable in size, it is the most important source of entropy to distinguish operating systems. The order of the TCP options is also taken into account.

Sources

  1. Mostly Wikipedia TCP/IP fingerprinting article
  2. A lot of inspiration from satori.py
  3. Another TCP/IP fingerprinting tool

More Repositories

1

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
HTML
2,527
star
2

se-scraper

Javascript scraping module based on puppeteer for many different search engines...
HTML
518
star
3

Crawling-Infrastructure

Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
TypeScript
373
star
4

uncaptcha3

Update of uncaptcha2 from 2019
Python
158
star
5

stealthy-scraping-tools

Minimal set of tools to conduct stealthy scraping.
Python
115
star
6

struktur

Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.
JavaScript
64
star
7

scrapeulous

Cloud crawler functions for scrapeulous
JavaScript
43
star
8

IP-Address-API

Datacenter / Hosting IP Address API - Find out if an IP address belongs to a hosting provider such as AWS, Azure or Digitalocean
37
star
9

adblock-detect-javascript-only

Detecting uBlock Origin and Adblock Plus with JavaScript only
JavaScript
33
star
10

SVG-Captcha

A SVG Captcha library written in PHP with stunning Performance and independence of any third party software!
PHP
22
star
11

incolumitas

my static site blog using pelican
JavaScript
17
star
12

youtube-scraping

A Node library that scrapes YouTube video data
JavaScript
16
star
13

Dragonfly-SAE

Implementation of rfc7664 dragonfly key exchange using ECC
Python
13
star
14

dynamically-changing-puppeteer-proxies

The chrome browser controlled via puppeteer does not support switching proxies without restarting the browser. In this tutorial I show how to implement this functionality with the help of a third party module.
JavaScript
11
star
15

scraping-amazon-reviews

Scraping Amazon reviews using headless chrome and selenium
Python
10
star
16

aws-scraper-example

JavaScript
9
star
17

lichess_cheat

Cheating with stockfish engine for lichess. Works on Windows, Linux and Mac.
Python
8
star
18

Large-Primes-for-RSA

Finding large prime numbers for RSA
Python
8
star
19

chess-com-cheat

Library that hooks into PR_Write() and PR_Read() in firefox processes and manipulates WebSocket Messages to cheat on chess.com
C
8
star
20

Scripts

All my programming(scripting) work which doesn't make it to a standalone project but might be useful for the future...
PHP
7
star
21

db.js

In-Memory Key-Value Database with Persistent File Storage
JavaScript
7
star
22

dragonfuzz

Fuzz the WPA3 SAE authentication. We will fuzz the Auth-Commit frame and the Auth-Confirm frame.
Python
4
star
23

clearcontent

A basic but mighty wordpress theme built on underscore and bootstrap 3
PHP
3
star
24

detecting-brightdata

detecting-brightdata
HTML
3
star
25

proxychecker

Checks the status of an proxy server.
Python
2
star
26

fuzz_sae_hostap

Fuzzing of the sae handshake in hostapd via libFuzzer
C
2
star
27

CunningCaptcha

A simple, but complete (down to vector graphics) captcha implementation class. Wordpress plugin. In the beginning of development. September 2013.
PHP
2
star
28

lichess-bot

Bot playing hyper bullet on lichess
JavaScript
2
star
29

TraversingGraphs

Shows in Python how to traverse graphs with Depth-First-Search and Breadth-First-Search
Python
1
star
30

3proxy-docker

Dockerfile for 3proxy setup
Shell
1
star
31

ChatServer

Chat server in java for uni project
Java
1
star
32

probabilistic-sketches

Various implementations for probabilistic sketches
Python
1
star