"Word Lists" for Software Security Test Cases
Word lists, Dictionary Files, Attack Strings, Miscellaneous Datasets and Proof-of-Concept Test Cases With a Collection of Tools for Penetration Testers
- Brief Introduction to
werdlists
- Inspiration Taken from Similar Projects
- Repository Directory Hierarchy and Structure
- Folder Names and Description of Contents
Brief Introduction to werdlists
โ๏ธ
This project is a collection of word lists--they are mostly whitespace-delimited
or line-based. Although the passes-dicts
folder contains inputs for password cracking,
overall the files amassed here are intended to be useful in facilitating
the creation of insecure program state (with the help of a black-box fuzzer or scanning
tool.) The vast majority of files are simply ASCII with the UNIX
style newline. Beware that this project does not attempt in any way to be minimalist or lack verbosity!
Inspiration Taken From Similar Projects ๐ญ
werdlists
is very similar to fuzzdb
and
SecLists
. SecLists
is maintained by my former colleague at IOActive, Daniel Miessler.
Admittedly, werdlists
is quite similar in mission as it's a centralized attack strings
and input data resource. Regardless, werdlists
expands on a number of concepts: it has its own unique style, organization,
original hand-crafted contents, dataset creation/management/validation scripts, scanner springboards, etc.
Unique Features Only Available With werdlists
๐ฏ
werdlists
cross-references between the code repositories of third-party scanners and its own datasets that each tool will benefit from.
Moreover, there are specialized parsing scripts exclusive to werdlists
that extract results produced through pairing test tools with its own data. Output strings are gathered from those results and fed back into the test tools. In other words, there are a number of interactive and/or
tunable feedback loops implemented. Quite a few of the werdlists
data files were created this way.
Repository Directory Hierarchy and Structure ๐ฉ
The scripts
folder consists of shell scripts used for repository maintenance.
There is a sub-directory of scripts
called init
where scripts that initialize data files are stored. If a script filename stored in init
contains
two dashes, then it's output should reflect the contents of the associated data file. For example, compare manpages-environ
and clib-package-names
. All scripts were written using bash syntax.
The contrib
folder is for storing scripts contributed via pull request and the utils
folder contains utilities that aren't necessarily specific to the werdlists
project, such as scripts for managing any wordlist file.
Other data files were manually composed by hand and a small handful were created by recycling output strings back into input parameter lists, i.e. dirbdirs-feedback
The tools
folder lists security tools that the datasets contained in this repository can be provided as input for.
Individual folders are detailed in the Folder Names and Description of Contents section below.
All files in each dataset directory are detailed in the local README.md
file for that folder (as opposed to the global README.md
in the root directory being read now.)
Naming Scheme, Syntax and Meaning ๐ฌ
Most files have the *.txt
extension signifying the text/plain
MIME type
Often used formats besides plain text include: Comma-Separated Values (text/csv
),
Extended Markup Language (application/xml
),
Hyper Text Markup Language (application/html
), etc.
Any file that is larger than 1MB uncompressed will be compressed with xz
according to the commands in the scripts/xzlarge-files
bash script. Other file extensions in use are:
*.ans
, *.asc
, *.bin
, *.c
, *.conf
, *.cpp
, *.csv
, *.html
, *.inf
, *.ini
, *.json
, *.md
, *.rpz
, *.rst
, *.sh
, *.txt
, *.xml
, *.yaml
, *.yml
, *.zip
, and *.zone
.
Folder Names and Description of Contents ๐
ย ย ย ย Folderย ย Nameย ย ย ย | Description of Contents |
---|---|
apple-paths | |
apple-data | |
arpa-headers | |
ascii-art | |
biology-info | |
browser-data | |
cert-data | |
char-encodes | ๐ Various character encodings provided by different locales/charsets |
char-sequence | |
chat-data | |
cipher-data | |
cmd-usage | |
code-keywords | |
cpu-arch | ๐ญ Low-level computer architecture and hardware subjects |
crypt-output | |
database-strs | |
dns-domains | |
dns-hostnames | ๐ฆ The host name part of an FQDN |
dns-records | ๐ซ Data specific to RR's in the DNS system |
dns-servers | ๐ Data provided to, produced by or related to DNS name servers |
dns-toplevel | ๐ TLD's or Top Level Domains in the uppermost part of the DNS hierarchy |
environ-vars | |
exploit-info | ๐ฑ Technical information on exploitation of security vulnerabilities |
file-extens | โ Stuff on Filename extensions, i.e. the part after the dot |
file-specs | |
ftp-data | ๐ค Various FTP datum from RFC's and elsewhere |
glibc-data | โ๏ธ Data taken from the source code of the GNU C Library |
html-words | |
http-agents | ๐ Software version banners for HTTP User Agents also known as browsers |
http-headers | |
http-methods | |
http-params | |
http-security | |
http-servers | |
http-status | |
inet-addrs | |
inet-routes | |
inet-services | /etc/services |
infosec-people | Noteworthy individuals known from information security communities |
iso-codes | |
java-data | |
linux-data | |
linux-paths | |
malware-iocs | |
mobile-devs | |
net-attacks | โจ๏ธ Info about attacks on telecommunications and Internetworks |
net-ifaces | |
ntfs-paths | |
owasp-data | |
passes-dicts | ๐ Dictionary files for brute-force attacks against account passwords |
passes-sites | |
perl-data | ๐ซ Data often seen in PERL (Practical Extraction and Report Language) |
php-data | |
postal-data | ๐ฌ United States Postal Service information |
python-data | |
radio-data | |
regex-data | |
ruby-data | |
search-dorks | |
smtp-messages | |
soap-messages | ๐จ SOAP (Simple Object Access Protocol) messages |
social-data | |
software-strs | ๐ฝ Strings describing software engineering, programming languages, etc. |
string-enums | |
system-admin | |
system-notices | |
telco-data | |
text-files | |
text-words | |
top-secret | |
unicode-data | |
unix-data | |
unix-paths | |
uri-attacks | |
uri-schemes | |
uri-data | |
vuln-data | ๐ Information about security vulnerabilities found in server software |
webapp-attacks | |
webapp-data | ๐ผ Data associated with applications hosted on web servers |
webapp-dirs | |
webapp-files | |
webapp-paths | |
webapp-words | |
web-sites | |
wifi-networks | |
windows-data |
ans asc bin c conf cpp csv html inf ini json md rpz rst sh txt xml yaml yml zip zone