Wappalyzer implementation in Go]
Gowap [- JS analysing (using Rod)
- DNS scraping
- Confidence rate
- Recursive crawling
- Rod browser integration (Colly can still be used - faster but not loading JS)
- Can be used with as a cmd (technologies.json file embeded)
- Test coverage 100%
- robots.txt compliance
Usage
Using the package
go get github.com/unstppbl/gowap
Call Init()
function with a Config
object created with the NewConfig()
function. It will return Wappalyzer
object on which you can call Analyze method with URL string as argument.
//Create a Config object and customize it
config := gowap.NewConfig()
//Path to override default technologies.json file
config.AppsJSONPath = "path/to/my/technologies.json"
//Timeout in seconds for fetching the url
config.TimeoutSeconds = 5
//Timeout in seconds for loading the page
config.LoadingTimeoutSeconds = 5
//Don't analyze page when depth superior to this number. Default (0) means no recursivity (only first page will be analyzed)
config.MaxDepth = 2
//Max number of pages to visit. Exit when reached
config.MaxVisitedLinks = 10
//Delay in ms between requests
config.MsDelayBetweenRequests = 200
//Choose scraper between rod (default) and colly
config.Scraper = "colly"
//Override the user-agent string
config.UserAgent = "GoWap"
//Output as a JSON string
config.JSON = true
//Initialisation
wapp, err := gowap.Init(config)
//Scraping
url := "https://scrapethissite.com/"
res, err := wapp.Analyze(url)
Using the cmd
You can build the cmd using the commande :
go build -o gowap cmd/gowap/main.go
Then using the compiled binary :
You must specify a url to analyse
Usage : gowap [options] <url>
-delay int
Delay in ms between requests (default 100)
-depth int
Don't analyze page when depth superior to this number. Default (0) means no recursivity (only first page will be analyzed)
-file string
Path to override default technologies.json file
-h Help
-loadtimeout int
Timeout in seconds for loading the page (default 3)
-maxlinks int
Max number of pages to visit. Exit when reached (default 5)
-pretty
Pretty print json output
-scraper string
Choose scraper between rod (default) and colly (default "rod")
-timeout int
Timeout in seconds for fetching the url (default 3)
-useragent string
Override the user-agent string
To Do
List of some ideas :
- analyse robots (field certIssuer)
- analyse certificates (field certIssuer)
- anayse css (field css)
- anayse xhr requests (field xhr)
- scrape an url list from a file in args
- ability to choose what is scraped (DNS, cookies, HTML, scripts, etc...)
- more tests in "real life"
- perf ? regex html seems long
- should output be the same as original wappalizer ? + ordering