• Stars
    star
    130
  • Rank 277,575 (Top 6 %)
  • Language
    JavaScript
  • License
    BSD 3-Clause "New...
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A toolkit to generate an offline Chrome extension to detect phishing attacks using a bespoke convolutional neural network.

Logo

Project Phinn

A toolkit to generate an offline Chrome extension to detect phishing attacks using a bespoke convolutional neural network.

Demo

Background

When it comes to phishing attacks what is the attacker actually attempting to accomplish? Primarily, they are trying to trick a user into voluntarily giving up their primary, and sometimes even secondary, credentials through a process of brand impersonation. With improvements to browser update hygiene attackers targeting modern corporate infrastructures have become less and less reliant on browser exploits to gain a foothold in to the corporate network.

Corporate initiatives like user-training and Google's Safe-Browsing have helped stymie attackers but they have their shortcomings. Administrators can't entirely rely on the vigilance of users and blacklist approaches won't help with targeted attacks as they have likely never been seen before.

What

Phinn itself is a toolkit for enabling corporate administrators to generate and train a custom Chrome extension that can then be pushed out to the rest of their organization.

The Chrome extension analyzes rendered page content for stylistic similarities between login forms through the use of a machine learning algorithm called a Convolutional Neural Network as implemented by the convnetjs library.

How

Phinn can be configured with identity providers or other web properties that a given organization utilizes which would be likely to be phishing targets such as Google Accounts or Office 365. Once the training is complete and the chrome extension is installed, when a user navigates to a given web page and a login form is identified a screenshot is captured of the rendered page and passed through this neural network. If Phinn thinks that the page utilizes stylistic properties that are visually extremely similar to the configured identity providers an alert for the user is generated.

Getting Started

Phinn ships with a network thats pre-trained on 8 providers, specifically Amazon Web Services, Dropbox Github, Google Accounts, Live, Office 365, Salesforce, and Twitter. These might not match your threat profile and should be modified.

First and foremost, you must install the unpacked Chrome extension by:

  1. Visit chrome://extensions (via omnibox or menu -> Tools -> Extensions).

  2. Enable Developer mode by ticking the checkbox in the upper-right corner.

  3. Click on the "Load unpacked extension..." button.

  4. Select the chrome-ext directory from this repository.

You should now see the Phinn icon in your extension tray.

Collecting Samples

With a property / identity provider in mind. Create a directory in the samples subdirectory and create a config.json file in it.

The config file is a very basic json document that provides a short identifier used internally, a user-friendly name and a list of valid domains.

For instance samples/google/config.json looks like this:

{ "id":"goog", "fullname":"Google", "domains": ["accounts.google.com"]}

Note that the domain list is full FQDN and subdomains must be manually accounted for. For instance to configure Dropbox for phinn you'd want to cover dropbox.com as well as www.dropbox.com like so:

{ "id":"box", "fullname":"Dropbox", "domains": ["dropbox.com", "www.dropbox.com"]}

Once you have this done its time to take a reference sample. Navigate to a login page and click the Phinn chrome extension button. After a few seconds you will be presented with the network's analysis of the login form as can be seen in this Google example.

Demo

Click the Source link to display an unmarked version of the image and right-click and save it to the folder you created in the samples directory.

Demo

Repeat this process for all other web-properties or identity providers you care about and remove the subdirectories that you do not care about only leaving the special purpose negative folder which contains negative samples and anything that triggers a false-positive.

Training the Network

To train the network be sure you have nodejs installed and execute ./train_network.sh This duration of this process is highly dependant on both the number of configured providers and their styling and can last anywhere from a couple hours to more than twelve.

The training process will self-terminate once it reaches an accuracy of 95% and output a network.json file every 1000 ticks.

Testing the Network

Execute ./copy_net_to_extension.sh to copy the network.json file from the trainer directory to the chrome-ext folder.

Visit the chrome://extensions page again and click the Reload link on Phinn's extension.

Visit the identity provider's login page and click on Phinn's icon. If everything went well, you will be presented with the marked up image showing network activations and an affirmative This looks like a GOOGLE page to me!

If you have a known phishing sample, load it and see if the alert is generated a few seconds after page load.

Deployment

To create a package for your extension execute ./make_release.sh which will take the unpacked extension and generate a zip file that can be uploaded to the chrome-web-store.

NOTE: You'll probably have to edit the extension manifest ( chrome-ext/manifest.json) to specify the extension key as generated by Google and increment the version number.

Handling False Positives

When dealing with neural networks false positives are bound to crop up. Luckily they are fairly straight forward to handle but does require the re-training of the network.

When a report of a false positive comes in perform the collection procedure as mentioned in the collecting samples section and place the un-marked up image in the samples/negative folder and retrain the network by executing ./train_network.sh again.

Implementation Details

Neural Network Design

The CNN's input layer takes a 96x96 pixel square with 3 color channels.

This is then fed through three pairs of convolution and pooling layers with relu activations before reaching final softmax output layer corresponding the labels.

In convnetjs terms, the network is defined as follows:

var layer_defs = [];
layer_defs.push({type:'input', out_sx:SLICE_SIZE, out_sy:SLICE_SIZE, out_depth:3});
layer_defs.push({type:'conv', sx:5, filters:18, stride:1, pad: 2, activation:'relu'});
layer_defs.push({type:'pool', sx:4, stride:2});
layer_defs.push({type:'conv', sx:5, filters:20, stride:1, pad: 2, activation:'relu'});
layer_defs.push({type:'pool', sx:4, stride:2});
layer_defs.push({type:'conv', sx:5, filters:20, stride:1, pad: 2, activation:'relu'});
layer_defs.push({type:'pool', sx:4, stride:2});

layer_defs.push({type:'softmax', num_classes:labels.length});

Training Process

The training process works by taking the super-samples (full images in samples/x/) and performing a random crop to get a 96x96x3 volume that the network can ingest and trained using the adadelta algorithm. Once the network becomes fairly competent at identifying the configured labels, the trainer starts to increase the ratio of negative suer-samples to give the network more resilience in handling the open-set that is the internet. Negative samples also go through additional augmentation to stretch their usefulness.

Additionally, roulette-selection is performed when feeding positive case samples preferring the bad performers.

Extension Functionality

The chrome extension itself can be split in to three logical parts.

Form Identification

Login forms are identified by iterating through all input elements after the DOM has settled from initial page load. Visibility checks are performed to make sure the elements are actually visible before moving on to the capture phase.

Capture

Capture is performed through the screenshot API and the resulting image is scaled to 50% of its original size to increase network evaluation performance. The captured image is then cropped to the a bounding area around the login form and passed on for use in network evaluation.

Activation

The cropped, 50% scale image (ie what you get when you perform sample capture) is then manually convoluted over in 96x96 squares and passed through the network. If there are more than three strong (confidence over 50%) activations then the global label is deemed to apply and passed down to the content script for alert generation.

Limitations

  • Currently framed forms are not supported.
  • Occasionally the V8 optimizer decides its not happy and network evaluations can take a very long time.
  • Lack of GPU acceleration on activation limits the number of checks phinn can do in a reasonable amount of time. Ideally a a stride less than the network input size should be utilized. keras-js looks promising on this front.
  • Mitigating false positive cases requires full retraining of the network which is also greatly hindered by lack of GPU acceleration and limits iteration.

More Repositories

1

cloudmapper

CloudMapper helps you analyze your Amazon Web Services (AWS) environments.
JavaScript
5,990
star
2

parliament

AWS IAM linting library
Python
1,044
star
3

webauthn

WebAuthn (FIDO2) server library written in Go
Go
1,028
star
4

cloudtracker

CloudTracker helps you find over-privileged IAM users and roles by comparing CloudTrail logs with current IAM policies.
Python
885
star
5

py_webauthn

Pythonic WebAuthn 🐍
Python
863
star
6

webauthn.io

The source code for webauthn.io, a demonstration of WebAuthn.
Python
654
star
7

EFIgy

A small client application that uses the Duo Labs EFIgy API to inform you about the state of your Mac EFI firmware
Python
512
star
8

dlint

Dlint is a tool for encouraging best coding practices and helping ensure we're writing secure Python code.
Python
331
star
9

markdown-to-confluence

Syncs Markdown files to Confluence
Python
307
star
10

isthislegit

Dashboard to collect, analyze, and respond to reported phishing emails.
Python
286
star
11

idapython

A collection of IDAPython modules made with πŸ’š by Duo Labs
Python
285
star
12

chrome-extension-boilerplate

Boilerplate code for a Chrome extension using TypeScript, React, and Webpack.
TypeScript
209
star
13

secret-bridge

Monitors Github for leaked secrets
Python
189
star
14

apple-t2-xpc

Tools to explore the XPC interface of Apple's T2 chip
Python
160
star
15

twitterbots

The code used in the "Don't @ Me: Hunting Twitter Bots at Scale" Black Hat presentation
Python
151
star
16

cloudtrail-partitioner

Python
150
star
17

phish-collect

Python script to hunt phishing kits
Python
137
star
18

xray

X-Ray allows you to scan your Android device for security vulnerabilities that put your device at risk.
Java
121
star
19

android-webauthn-authenticator

A WebAuthn Authenticator for Android leveraging hardware-backed key storage and biometric user verification.
Java
110
star
20

appsec-education

Presentations, training modules, and other education materials from Duo Security's Application Security team.
JavaScript
71
star
21

mysslstrip

CVE-2015-3152 PoC
Python
43
star
22

EFIgy-GUI

A Mac app that uses the Duo Labs EFIgy API to inform you about the state of your EFI firmware.
Objective-C
40
star
23

lookalike-domains

generate lookalike domains using a few simple techniques (homoglyphs, alt TLDs, prefix/suffix)
Python
31
star
24

apk2java

Automatically decompile APK's using Docker
Dockerfile
23
star
25

journal

The boilerplate for a new Journal site
21
star
26

srtgen

Automatic '.srt' subtitle generator
Python
21
star
27

markflow

Make your Markdown sparkle!
Python
20
star
28

neustar2mmdb

Tool to convert Neustar's GeoPoint data to Maxmind's GeoIP database format for ease of use.
Python
19
star
29

narrow

Low-effort reachability analysis for third-party code vulnerabilities.
Python
19
star
30

datasci-ctf

A capture-the-flag exercise based on data analysis challenges
Jupyter Notebook
16
star
31

duo-blog-going-passwordless-with-py-webauthn

Python
15
star
32

tutorials

Python
15
star
33

sharedsignals

Python tools for using OpenID's Shared Signals Framework (including CAEP)
15
star
34

chain-of-fools

A set of tools that allow researchers to experiment with certificate chain validation issues
Python
13
star
35

journal-cli

The command-line client for Journal
Jupyter Notebook
12
star
36

unmasking_data_leaks

The code from the talk "Unmasking Data Leaks: A Guide to Finding, Fixing, and Prevention" given at BSides SATX 2019.
Python
7
star
37

journal-theme

The Hugo theme that powers Journal
HTML
7
star
38

golang-workshop

Source files for a Golang Workshop
Go
5
star
39

vimes

A local DNS proxy based on CoreDNS.
Python
3
star
40

journal-docs

The documentation for Journal
2
star
41

dlint-plugin-example

An example plugin for dlint
Python
2
star
42

twitterbots-wallpapers

Wallpapers created from the crawlers in our "Don't @ Me" technical research paper
1
star
43

holidayhack-2019

Scripts and artifacts used to solve the 2019 SANS Holiday Hack Challenge
Python
1
star