• Stars
    star
    672
  • Rank 67,180 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 5 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A machine learning tool that ranks strings based on their relevance for malware analysis.

StringSifter is a machine learning tool that automatically ranks strings based on their relevance for malware analysis.

Quick Links

Usage

StringSifter requires Python version 3.6 or newer. Run the following commands to get the code, run unit tests, and use the tool:

Installation

Use pip to get running immediately. Choose the major version corresponding to your version of python:

Python Version Stringsifter Version Branch Example Pip Command
3.8+ 2.x master pip install stringsifter~=2.0
3.6, 3.7 1.x python3.7 pip install stringsifter~=1.0

For development, check out the correct branch for your Python version or stay on master for the latest supported version. Then use pipenv:

git clone https://github.com/fireeye/stringsifter.git
cd stringsifter
git checkout python3.7 #Optional
pipenv install --dev

Running Unit Tests

To run unit tests from the StringSifter installation directory:

pipenv run tests

Running from the Command Line

The pip install command installs two runnable scripts flarestrings and rank_strings into your python environment. When developing from source, use pipenv run flarestrings and pipenv run rank_strings.

flarestrings mimics features of GNU binutils' strings, and rank_strings accepts piped input, for example:

flarestrings <my_sample> | rank_strings

rank_strings supports a number of command line arguments. The positional argument input_strings specifies a file of strings to rank. The optional arguments are:

Option Meaning
--scores (-s) Include the rank scores in the output
--limit (-l) Limit output to the top limit ranked strings
--min-score (-m) Limit output to strings with score >= min-score
--batch (-b) Specify a folder of strings outputs for batch processing

Ranked strings are written to standard output unless the --batch option is specified, causing ranked outputs to be written to files named <input_file>.ranked_strings.

flarestrings supports an option -n (or --min-len) to print sequences of characters that are at least min-len characters long, instead of the default 4. For example:

flarestrings -n 8 <my_sample> | rank_strings

will print and rank only strings of length 8 or greater.

Running from a Docker container

  • After cloning the repo, build the container. From the the package's top level directory:
docker build -t stringsifter -f docker/Dockerfile .
  • Run the container with flarestrings or rank_strings argument to use the respective command. The containerized commands can be used in pipelines:
cat <my_sample> | docker run -i stringsifter flarestrings | docker run -i stringsifter rank_strings
  • Or, run the container without arguments to get a shell prompt, using the -v flag to expose a host directory to the container:
docker run -v <my_malware>:/samples -it stringsifter

where <my_malware> contains samples for analysis, for example:

docker run -v $HOME/malware/binaries:/samples -it stringsifter
  • At the container prompt:
flarestrings /samples/<my_sample> | rank_strings <options>

All command line arguments are supported in the containerized scripts.

Running on FLOSS Output

StringSifter can be applied to arbitrary lists of strings, making it useful for practitioners looking to glean insights from alternative intelligence-gathering sources such as live memory dumps, sandbox runs, or binaries that contain obfuscated strings. For example, FireEye Labs Obfuscated Strings Solver (FLOSS) extracts printable strings just as Strings does, but additionally reveals obfuscated strings that have been encoded, packed, or manually constructed on the stack. It can be used as an in-line replacement for Strings, meaning that StringSifter can be similarly invoked on FLOSS output using the following command:

$PY2_VENV/bin/floss –q <options> <my_sample> | rank_strings <options>

Notes:

  1. The –q argument suppresses headers and formatting to show only extracted strings. To learn more about additional FLOSS options, please see its Usage Docs.
  2. FLOSS requires Python 2, while StringSifter requires Python 3. In the example command at least one of floss or rank_strings must include a relative path referencing a python virtual enviroment.
  3. FLOSS can be downloaded as a standalone executable. In this case it is not required to specify a Python environment because the executable does not rely on a Python interpreter.

Notes on running strings

This distribution includes the flarestrings program to ensure predictable output across platforms. If you choose to run your system's installed strings note that its options are not consistent across versions and platforms:

Linux

Most Linux distributions include the strings program from GNU Binutils. To extract both "wide" and "narrow" strings the program must be run twice, piping to an output file:

strings <my_sample>       > strs.txt   # narrow strings
strings -el <my_sample>  >> strs.txt   # wide strings.  note the ">>"

MacOS

Some versions of BSD strings packaged with MacOS do not support wide strings. Also note that the -a option to strings to scan the whole file may be disabled in the default configuration. Without -a informative strings may be lost. We recommend installing GNU Binutils via Homebrew or MacPorts to get a version of strings that supports wide characters. Use care to invoke the correct version of strings.

Windows

strings is not installed by default on Windows. We recommend installing Windows Sysinternals, Cygwin, or Malcode Analyst Pack to get a working strings.

Discussion

This version of StringSifter was trained using Strings outputs from sampled malware binaries associated with the first EMBER dataset. Ordinal labels were generated using weak supervision procedures, and supervised learning is performed by Gradient Boosted Decision Trees with a learning-to-rank objective function. See Quick Links for further technical details. Please note that neither labeled data nor training code is currently available, though we may reconsider this approach in future releases.

Issues

We use GitHub Issues for posting bugs and feature requests.

Acknowledgements

  • Thanks to the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams for review and feedback.
  • StringSifter was designed and developed by Philip Tully (FDS), Matthew Haigh (FLARE), Jay Gibble (FLARE), and Michael Sikorski (FLARE).
  • The StringSifter logo was designed by Josh Langner (FLARE).
  • flarestrings is derived from the excellent tool FLOSS.

More Repositories

1

commando-vm

Complete Mandiant Offensive VM (Commando VM), a fully customizable Windows-based pentesting virtual machine distribution. [email protected]
PowerShell
6,897
star
2

flare-vm

A collection of software installations scripts for Windows systems that allows you to easily setup and maintain a reverse engineering environment on a VM.
PowerShell
6,334
star
3

capa

The FLARE team's open-source tool to identify capabilities in executable files.
Python
4,775
star
4

flare-floss

FLARE Obfuscated String Solver - Automatically extract obfuscated strings from malware.
Python
3,155
star
5

red_team_tool_countermeasures

YARA
2,639
star
6

flare-ida

IDA Pro utilities from FLARE team
Python
2,031
star
7

flare-fakenet-ng

FakeNet-NG - Next Generation Dynamic Network Analysis Tool
Python
1,677
star
8

speakeasy

Windows kernel and user mode emulation.
Python
1,290
star
9

SharPersist

C#
1,213
star
10

ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.
PowerShell
1,204
star
11

gocrack

GoCrack is a management frontend for password cracking tools written in Go
Go
1,101
star
12

flare-emu

Python
735
star
13

SilkETW

C#
641
star
14

Mandiant-Azure-AD-Investigator

PowerShell
614
star
15

Azure_Workshop

HCL
572
star
16

sunburst_countermeasures

YARA
561
star
17

Ghidrathon

The FLARE team's open-source extension to add Python 3 scripting to Ghidra.
Java
556
star
18

capa-rules

Standard collection of rules for capa: the tool for enumerating the capabilities of programs
528
star
19

ReelPhish

Python
493
star
20

iocs

FireEye Publicly Shared Indicators of Compromise (IOCs)
458
star
21

DueDLLigence

C#
450
star
22

FIDL

A sane API for IDA Pro's decompiler. Useful for malware RE and vulnerability research
Python
431
star
23

flare-wmi

C++
412
star
24

GoReSym

Go symbol recovery tool
Go
379
star
25

rvmi

rVMI - A New Paradigm For Full System Analysis
C
352
star
26

PwnAuth

Python
347
star
27

idawasm

IDA Pro loader and processor modules for WebAssembly
Python
332
star
28

ADFSpoof

Python
318
star
29

SimplifyGraph

IDA Pro plugin to assist with complex graphs
C++
303
star
30

STrace

A DTrace on Windows Reimplementation
C++
299
star
31

ShimCacheParser

Python
258
star
32

OfficePurge

C#
256
star
33

msi-search

C
215
star
34

macos-UnifiedLogs

Rust
200
star
35

ioc_writer

Python
195
star
36

GeoLogonalyzer

GeoLogonalyzer is a utility to analyze remote access logs for anomalies such as travel feasibility and data center sources.
Python
194
star
37

Vulnerability-Disclosures

C++
183
star
38

flare-kscldr

FLARE Kernel Shellcode Loader
C
175
star
39

flare-qdb

Command-line and Python debugger for instrumenting and modifying native software behavior on Windows and Linux.
Python
161
star
40

flare-dbg

flare-dbg is a project meant to aid malware reverse engineers in rapidly developing debugger scripts.
Python
149
star
41

thiri-notebook

The Threat Hunting In Rapid Iterations (THIRI) Jupyter notebook is designed as a research aide to let you rapidly prototype threat hunting rules.
Python
146
star
42

route-sixty-sink

Link sources to sinks in C# applications.
C#
137
star
43

VM-Packages

Chocolatey packages supporting the analysis environment projects FLARE-VM & Commando VM.
PowerShell
135
star
44

heyserial

Programmatically create hunting rules for deserialization exploitation with multiple keywords, gadget chains, object types, encodings, and rule types
YARA
130
star
45

dncil

The FLARE team's open-source library to disassemble Common Intermediate Language (CIL) instructions.
Python
124
star
46

flashmingo

Automatic analysis of SWF files based on some heuristics. Extensible via plugins.
Python
118
star
47

Reversing

111
star
48

ioc-scanner-CVE-2019-19781

Indicator of Compromise Scanner for CVE-2019-19781
Shell
91
star
49

flare-bytecode_graph

Python
82
star
50

gocrack-ui

The User Interface for GoCrack
Vue
81
star
51

Volatility-Plugins

Python
80
star
52

unicorn-libemu-shim

libemu shim layer and win32 environment for Unicorn Engine
C++
70
star
53

citrix-ioc-scanner-cve-2023-3519

Shell
61
star
54

AuditParser

AuditParser
Python
56
star
55

remote_lookup

Resolves DLL API entrypoints for a process w/ remote query capabilities.
Visual Basic
54
star
56

synfulknock

Lua
48
star
57

SSSDKCMExtractor

Python
46
star
58

jitm

JITM is an automated tool to bypass the JIT Hooking protection on a .NET sample.
C++
43
star
59

goauditparser

Go
39
star
60

capa-testfiles

Data to test capa's code and rules.
Max
39
star
61

tf_rl_tutorial

Tutorial: Statistical Relational Learning with Google TensorFlow
Jupyter Notebook
39
star
62

macOS-tools

Python
38
star
63

apooxml

Generate YARA rules for OOXML documents.
Python
38
star
64

gootloader

Collection of scripts used to deobfuscate GOOTLOADER malware samples.
Python
36
star
65

pycommands

PyCommand Scripts for Immunity Debugger
Python
35
star
66

vocab_scraper

Vocabulary Scraper script used in FLARE's analysis of Russian-language Carbanak source code
Python
35
star
67

ARDvark

ARDvark parses the Apple Remote Desktop (ARD) files to pull out application usage, user activity, and filesystem listings.
Python
34
star
68

rvmi-rekall

Rekall Forensics and Incident Response Framework with rVMI extensions
Python
32
star
69

gocat

Provides access to libhashcat
Go
29
star
70

ics_mem_collect

Python
26
star
71

rvmi-qemu

QEMU with rVMI extensions
C
26
star
72

IDA_Pro_VoiceAttack_profile

Python
25
star
73

win10_auto

Python
23
star
74

pulsesecure_exploitation_countermeasures

YARA
23
star
75

rvmi-kvm

Linux-KVM with rVMI extensions
C
23
star
76

pivy-report

Poison Ivy Appendix/Extras
17
star
77

siglib

Python
15
star
78

DFUR-Splunk-App

The "DFUR" Splunk application and data that was presented at the 2020 SANS DFIR Summit.
13
star
79

vbScript_deobfuscator

Help deobfuscate VBScript
VBA
13
star
80

flare-gsoc-2023

Supporting resources and documentation for FLARE @ Google Summer of Code 2023
13
star
81

rpdebug_qnx

Python
11
star
82

mandiant_managed_hunting

Azure Deployment Templates for Mandiant Managed Huning
9
star
83

flare-floss-testfiles

Resources for testing FLOSS by the FLARE team.
C
6
star
84

shelidate

Go
2
star