• Stars
    star
    175
  • Rank 216,820 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Security ML models encoded as Yara rules

Sophos AI YaraML Rules Repository

Questions, concerns, ideas, results, feedback appreciated, please email [email protected]

YaraML is a tool that automatically generates Yara rules from training data by translating scikit-learn logistic regression and random forest binary classifiers into the Yara language. Give YaraML a directory of malware files and a directory of benign files of any format and it'll extract substring features, downselect your feature space, train a model, and then "compile" the model and return it as a textual Yara rule. To get a feel for what this looks like, see the logistic regression Powershell detector generated by YaraML and given below.

rule Generic_Powershell_Detector
{
strings:
...
$s4 = "DownloadFile"       fullword // weight: 3.257
$s5 = "WOW64"              fullword // weight: 3.232
$s6 = "bypass"             fullword // weight: 3.021
$s7 = "meMoRYSTrEaM"       fullword // weight: 2.68
$s8 = "obJEct"             fullword // weight: 2.679
$s9 = "OBJecT"             fullword // weight: 2.659
$s10 = "ReGeX"              fullword // weight: 2.592
$s11 = "samratashok"        fullword // weight: 2.548
$s12 = "Dependencies"       fullword // weight: 2.494
$s13 = "TVqQAAMAAAAEAAAA"   fullword // weight: 2.428
$s14 = "CompressionMode"    fullword // weight: 2.366
...
condition:
...
((#s0 * 5.567) + (#s1 * 4.122) + (#s2 * 3.904) + (#s3 * 3.820) + 
(#s4 * 3.257) + (#s5 * 3.232) + (#s6 * 3.021) + (#s7 * 2.680) + 
(#s8 * 2.679) + (#s9 * 2.659) + (#s10 * 2.592) + (#s11 * 2.548) + 
...
> 0
}

How do I get started?

Clone this repo and install it by doing python setup.py install (please use Python 3.6 or above - this has been tested on OSX, Ubuntu and Redhat, your mileage may vary on Windows). Invoke the tool as yaraml.

Here's an example invocation, assuming you have malicious Powershell scripts in powershell_malware/ (or any of its subdirectories) and benign Powershell scripts in powershell_benign/ (or any of its subdirectories):

yaraml powershell_malware/ powershell_benign/ # specify the malware and then benign directory in that order
powershell_model # specify the directory where we'll put the resulting rule
powershell_detector # specify the name of your Yara rule
--max_benign_files=100 --max_malicious_files=100 # you can optionally specify an upper bound on the number of files to train on
--model_type="logisticregression" # specify either logisticregression or randomforest here; will use sklearn default hyperparams
# N.B.; you can set hyperparams by using --model_instantiation instead of --model_type and calling the appropriate sklearn constructor:
# (--model_instantiation="LogisticRegression(penalty='l1',solver='liblinear')")

Why YaraML?

Because sometimes we want to use ML models to do blue team work but only Yara is available. And sometimes writing hand crafted rules is too time consuming, or we want an ML alternative to only trusting our rule-writing judgment.

How well maintained is this code base?

We're providing research code here but will happily respond to questions and bug reports. We want your feedback and we want to make this tool useful to the community.

How do I cite YaraML?

@misc{Saxe2020, author = {Saxe, Joshua}, title = {YaraML}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/sophos-ai/yaraml_rules/}} }

More Repositories

1

SOREL-20M

Sophos-ReversingLabs 20 million sample dataset
Python
549
star
2

Sophos-Central-SIEM-Integration

Simple integration script for 3rd party systems such as SIEMs. Offers command line, file or syslog output in CEF, JSON or key-value pair formats.
Python
103
star
3

gpt3-and-cybersecurity

GPT-3 use cases for Cybersecurity
Python
44
star
4

solarwinds-threathunt

Threathunt details for the Solarwinds compromise
31
star
5

sophos-central-api-connector

Leverage Sophos Central API
Python
22
star
6

factory-run-pipeline

A GitHub Action to run a Sophos Factory pipeline.
TypeScript
13
star
7

sophos-central-apis-postman

Postman collection to work with Sophos Central APIs
12
star
8

XG-Management-Helper

Visual Basic .NET
11
star
9

PS.Machine_Health

This will create a health report for every machines in an MSP/EDB/Single Sophos Central console
Python
6
star
10

Incident-Response

Scripts to aid in incident response scenarios
Shell
5
star
11

PS.Unprotected_Machines

This will compare all the machines in every Sophos Central MSP/EDB/Single Console and Active Directory. It will list all the machines not protected by Sophos Central and when those machines last spoke to a Domain Controller. It will also mark as suspicious any machine where the AD login time is prior to the last Sophos Central message time. Please follow the PDF guide
Python
5
star
12

sophos-firewall-sdk

Python module for working with Sophos Firewall API
Python
5
star
13

Sophos-Cloud-Optix-Remediation-Functions

Automatically remediate security issues detected in your Cloud Environments with Sophos Cloud Optix using serverless functions.
Python
3
star
14

PS.Turn_On_Tamper

This will turned on Tamper Protection for all machines in a MSP/EDB/Single console
Python
3
star
15

Sophos-Migration-Utility-CLI

Sophos Migration Utility CLI for UTM -> SFOS configuration conversion
Perl
3
star
16

se-ops.Factory_Solutions

2
star
17

App-SFDC

Command-line tools for Salesforce.com
Perl
2
star
18

Crypt-PKCS11-Easy

Try to make PKCS#11 less miserable
Perl
2
star
19

PS.Trigger_On_Demand_Scan

The script will trigger an On-Demand scan on all Windows Endpoints. It will NOT trigger on Macs, Windows Servers or Linux
Python
2
star
20

factory-eks-terraform-demo

Integration demo for Sophos Factory, AWS EKS, and Terraform.
HCL
2
star
21

WWW-SFDC

Perl wrapper around the Salesforce.com SOAP APIs
Perl
2
star
22

sophos-firewall-audit

Audit Sophos XG firewall for compliance with security baseline
Python
2
star
23

XgOnAzurePOC

Templates and Scripts Used In the XG On Azure PoC Document
Shell
1
star
24

factory-cli

Official repository for the Sophos Factory command line tool (CLI)
TypeScript
1
star
25

p5-Krb5

Kerberos v5 bindings
C
1
star
26

factory-opa-terraform-demo

Open Policy Agent Terraform Example for Sophos Factory
Open Policy Agent
1
star
27

factory-api-client

Official Sophos Factory API client library for JavaScript.
TypeScript
1
star
28

factory-runner-utils

Archive of Sophos Factory self-hosted runner agent utilities.
Shell
1
star
29

Sophos-Data-Lake-Example-Tool

Python
1
star
30

App-SFDC-Metadata

Metadata commands for App::SFDC
Perl
1
star
31

App-SFDC-Command-ExecuteAnonymous

Perl
1
star
32

factory-quickstart-resources

Resources for starting with Sophos Factory.
HTML
1
star
33

factory-compliance

Artifacts required by the Sophos Factory Compliance use case.
1
star
34

pymetascanner

Simple Python script to scan files with Metadefender
Python
1
star
35

factory-cis-certification

Resources for the Sophos Factory CIS Certification Pipelines.
Python
1
star
36

WebService-LogicMonitor

Interact with LogicMonitor through their API
Perl
1
star
37

demoscripts

Python
1
star