• Stars
    star
    227
  • Rank 175,900 (Top 4 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 8 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cosa Nostra, a FOSS graph based malware clusterization toolkit.

Cosa Nostra

Cosa Nostra is an open source software clustering toolkit with a focus on malware analysis. It can create phylogenetic trees of binary malware samples that are structurally similar. It was initially released during SyScan360 Shanghai (2016).

I basically maintain it since 2016 for my one and only user, who happens to be a friend... Well.

Getting started

Required 3rd party tools

In order to use Cosa Nostra you will need a version of Python 3.X as well as one of the following tools in order to perform code analysis:

  • IDA Written in C++. It supports analysing a plethora of executable types that you probably never even heard about. Commercial product. Only the 7.X versions are now supported.
  • Radare2 Written in pure C. Same as with IDA, with support for extremely rare CPUs and binary formats. Also, it's open source!

Analysing binaries

Once you have installed any of the previously mentioned tools you will need to use the appropriate batch tool to analyse the malware samples, like in the example bellow:

$ cd $COSA_NOSTRA_DIR
$ /path/to/ida64 -B -A -S/full/path/to/ida_batch.py example.exe

Or

$ cd $COSA_NOSTRA_DIR
$ python r2_batch.py example.exe

Automating the Analysis of a Malware Dataset

The easiest way to analyse a malware dataset is by simply running a command like the following example:

$ find /your/malware/dataset/path -type f -exec python r2_batch.py {} ';'

It can be done in parallel by using the "GNU Parallel" tool, as in the following example:

$ find /your/malware/dataset/path -type f | parallel -j 8 ida64 -B -A -S/path/to/ida_batch.py {}

In the example above, it will launch a total of 8 pyew_batch processes in parallel.

Database configuration

After the malware samples are analysed, if the analysis was successful, the call graph data for each sample will be stored in, by default, one SQLite database named "db.sqlite". You can configure the database name, path, database system, etc... by editing the file $COSA_NOSTRA_DIR/, as shown bellow:

$ cat config.cfg 
########################################################################
# Configuration for SQLite3
########################################################################
[database]
dbn=sqlite
# Database name
db=db.sqlite

If you prefer to use, say, a MySQL database system, you can configure it in config.cfg by putting the following configuration sections with the appropriate values for your setup:

########################################################################
# Example configuration for MySQL
########################################################################

[database]
dbn=mysql
# Database hostname or IP address
host=localhost
# Database name
db=db_name
# Database username
user=username
# Database password
pw=password

Clusterization of malware samples

This is the step that will take more time. Once you have analysed all the malware samples from your datasets and the call graph signatures, corresponding prime numbers, etc... are calculated and stored in the database, the next step is to find cluster. The tool for doing so is called "cn_clusterer.py". It will make use of the same database configuration file ($COSA_NOSTRA_DIR/config.cfg) in order to extract the call graph signatures for the analysed samples. Running it as simple as doing the following:

$ cd $COSA_NOSTRA_DIR
$ ./cn_clusterer.py
(...)
Calculating difference matrix for 2357, iteration 5540280 out of 7507600 (4858784 matches, 600144 cache misses)
Calculating difference matrix for 2354, iteration 5543020 out of 7507600 (4861293 matches, 600373 cache misses)
Calculating difference matrix for 471, iteration 5545760 out of 7507600 (4863903 matches, 600373 cache misses)
(...)
Making tree for group with 59 sample(s), iteration 0 out of 256
Making tree for group with 393 sample(s), iteration 1 out of 256
Making tree for group with 1347 sample(s), iteration 2 out of 256
(...)
[Wed Nov  2 13:37:12 2016 2830:140561185462080] Creating unnamed cluster...
[Wed Nov  2 13:37:12 2016 2830:140561185462080] Creating cluster with name u'Win.Trojan.Skylock-4'...
[Wed Nov  2 13:37:12 2016 2830:140561185462080] Creating cluster with name u'Win.Downloader.133181-1'...
[Wed Nov  2 13:37:12 2016 2830:140561185462080] Creating cluster with name u'Win.Trojan.Agent-1213378'...
[Wed Nov  2 13:37:13 2016 2830:140561185462080] Done processing phylogenetic trees!
[Wed Nov  2 13:37:13 2016 2830:140561185462080] Done

When the process finishes, clusters grouping the analysed malware samples will be created in the specified database.

Watching clusters: the web GUI

The last step is to launch the web.py based Web application and logging in:

$ cd $COSA_NOSTRA_DIR
$ python cosa_nostra.py [optional port to listen to]
http://0.0.0.0:YOURPORT/

Then, open a browser and navigate to the address printed out by cosa_nostra.py. A login form will be displayed asking for a username and password. By default, it's "admin/cosanostra". You can change it in the file $COSA_NOSTRA_DIR/config.py:

$ cat config.py
#!/usr/bin/env python

#-----------------------------------------------------------------------
# Configuration for Cosa Nostra
#-----------------------------------------------------------------------
DEBUG=False
CN_USER="admin"
# SHA1 hash of the password "cosanostra", change to the SHA1 hash of
# whatever password you prefer.
CN_PASS="048920dedfe36c112d74dc8108abb4db5185a918"
(...)

Once you're logged in you can select from the left panel one the following options:

  • Samples: See the samples in the current database.
  • Clusters: See the list of clusters that Cosa Nostra found for the given datasets.

In the "Clusters" view, one can select different clusters and view a hierarchical graph of the discovered malware family.

Screenshots

List of clusters as shown in Cosa Nostra:

List of clusters as shown in Cosa Nostra

A small cluster of Trojan.Backspace-1 (name by ClamAV):

A small cluster of Trojan.Backspace-1, name by ClamAV

A small cluster of MiniDukes:

A small cluster of MiniDukes

A cluster of Kazy/Bifroses:

A cluster of Kazy/Bifroses

A small part of a really big cluster of FannyWorms:

A small part of a really big cluster of FannyWorms

More Repositories

1

diaphora

Diaphora, the most advanced Free and Open Source program diffing tool.
Python
3,639
star
2

pigaios

A tool for matching and diffing source codes directly against binaries.
Python
635
star
3

pyew

Official repository for Pyew.
Python
383
star
4

nightmare

A distributed fuzzing testing suite with web administration
Python
371
star
5

multiav

MultiAV scanner with Python and JSON API. Disclaimer: I don't maintain it any more.
Python
312
star
6

idamagicstrings

An IDA Python script to extract information from string constants.
Python
304
star
7

CVE-2017-7494

Remote root exploit for the SAMBA CVE-2017-7494 vulnerability
Python
256
star
8

membugtool

A DBI tool to discover heap memory related bugs
C++
125
star
9

maltindex

Mal Tindex is an Open Source tool for indexing binaries and help attributing malware campaigns
Python
66
star
10

tahh

Source codes for "The Antivirus Hackers Handbook" book.
Python
58
star
11

mynav

Automatically exported from code.google.com/p/mynav
Python
28
star
12

oldidc

IDA Python's idc.py <= 7.3 compatibility module
Python
21
star
13

deeptoad

DeepToad is a library and a tool to clusterize similar files using fuzzy hashing
C
20
star
14

diaphora-ml

Diaphora Machine Learning tools and datasets
Python
18
star
15

ubsnippets

Undefined Behaviour Snippets
17
star
16

pyavast

Python bindings for Avast antivirus server version for Linux
Python
14
star
17

jkutils

My own Python Utility Libraries
Python
11
star
18

pinpack

A PIN Tool to unpack simple write and exec packers (for Linux)
C++
10
star
19

super-irudi

Super Irudi, a command line based tool to enhance photographs.
Python
5
star
20

pigaios-databases

Pigaios SQLite databases
3
star
21

tnsids

Automatically exported from code.google.com/p/tnsids
1
star