• Stars
    star
    148
  • Rank 249,983 (Top 5 %)
  • Language
    Go
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Domain names collector - Crawl websites and collect domain names along with their availability status.

Spidy

A tool that crawl websites to find domain names and checks thier availiabity.

Install

git clone https://github.com/twiny/spidy.git
cd ./spidy

# build
go build -o bin/spidy -v cmd/spidy/main.go

# run
./bin/spidy -c config/config.yaml -u https://github.com

Usage

NAME:
   Spidy - Domain name scraper

USAGE:
   spidy [global options] command [command options] [arguments...]

VERSION:
   2.0.0

COMMANDS:
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --config path, -c path  path to config file
   --help, -h              show help (default: false)
   --urls urls, -u urls    urls of page to scrape  (accepts multiple inputs)
   --version, -v           print the version (default: false)

Configuration

# main crawler config
crawler:
    max_depth: 10 # max depth of pages to visit per website.
    # filter: [] # regexp filter
    rate_limit: "1/5s" # 1 request per 5 sec
    max_body_size: "20MB" # max page body size
    user_agents: # array of user-agents
      - "Spidy/2.1; +https://github.com/ twiny/spidy"
    # proxies: [] # array of proxy. http(s), SOCKS5
# Logs
log:
    rotate: 7 # log rotation
    path: "./log" # log directory
# Store
store:
    ttl: "24h" # keep cache for 24h 
    path: "./store" # store directory
# Results
result:
    path: ./result # result directory
parralle: 3 # number of concurrent workers 
timeout: "5m" # request timeout
tlds: ["biz", "cc", "com", "edu", "info", "net", "org", "tv"] # array of domain extension to check.

TODO

  • Add support to more writers.
  • Add terminal logging.
  • Add test cases.

Issues

NOTE: This package is provided "as is" with no guarantee. Use it at your own risk and always test it yourself before using it in a production environment. If you find any issues, please create a new issue.

More Repositories

1

wbot

A simple & efficient web crawler.
Go
17
star
2

wails-template

wails v2 app template using Svelte & Tailwind
Svelte
10
star
3

snaky

snake game implementation using 2d array in Go
Go
10
star
4

sigma

a small wrapper around go-chi HTTP router.
Go
8
star
5

blockscan

a mini blockchain scanner
Go
8
star
6

screenshot

A small HTTP server that takes a screenshot of a web page.
Go
7
star
7

domaincheck

Domain Name Availability Checker
Go
6
star
8

domain

domain name availability check using WHOIS protocol.
Go
5
star
9

dice

random string/int generator for the Go language
Go
4
star
10

whois

domain name WHOIS client.
Go
4
star
11

svelte-electron-tailwind

Electron App starter template using Svelte & Tailwind (JIT)
Svelte
3
star
12

tinyq

an implementation of a persistent FIFO queue. with ability to pause/start dequeue.
Go
3
star
13

carbon

a wrapper around BadgerDB providing a simple API.
Go
3
star
14

limiter

IP based rate limiter
Go
3
star
15

npgs

docker compose nginx postgres golang svelte & certbot
3
star
16

gocross

golang cross platform compiler
Dockerfile
3
star
17

svelte-app-tailwind

svelte app + tailwind starter template
JavaScript
3
star
18

twiny

3
star
19

flog

A simple logger API.
Go
3
star
20

valve

simple rate limiter API
Go
3
star
21

grpc-template

phonebook a CLI that store contacts. uses gRPC & buf command to generate go files.
Go
3
star
22

bprint

convert byte size into a human-readable format in Go
Go
3
star
23

xagent

Fast HTTP User Agent Detector in Go.
Go
2
star
24

leaky

a leaky bucket
Go
2
star
25

flare

A lightweight signaling mechanism for Go.
Go
1
star
26

poxa

structured way to manage and rotate through a collection of proxy servers.
Go
1
star