• Stars
    star
    530
  • Rank 81,742 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 2 years ago
  • Updated 9 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🐍 🔍 GuardDog is a CLI tool to Identify malicious PyPI and npm packages

GuardDog

Test Static analysis

GuardDog

GuardDog is a CLI tool that allows to identify malicious PyPI and npm packages. It runs a set of heuristics on the package source code (through Semgrep rules) and on the package metadata.

GuardDog can be used to scan local or remote PyPI and npm packages using any of the available heuristics.

GuardDog demo usage

Getting started

Installation

pip install guarddog

Or use the Docker image:

docker pull ghcr.io/datadog/guarddog
alias guarddog='docker run --rm ghcr.io/datadog/guarddog'

Note: On Windows, the only supported installation method is Docker.

Sample usage

# Scan the most recent version of the 'requests' package
guarddog pypi scan requests

# Scan a specific version of the 'requests' package
guarddog pypi scan requests --version 2.28.1

# Scan the 'request' package using 2 specific heuristics
guarddog pypi scan requests --rules exec-base64 --rules code-execution

# Scan the 'requests' package using all rules but one
guarddog pypi scan requests --exclude-rules exec-base64

# Scan a local package
guarddog pypi scan /tmp/triage.tar.gz

# Scan a local directory, the packages need to be located in the root directory
# For instance you have several pypi packages in ./samples/ like:
# ./samples/package1.tar.gz ./samples/package2.zip ./samples/package3.whl 
# FYI if a file not supported by guarddog is found you will get an error
# Here is the command to scan a directory:
guarddog pypi scan ./samples/

# Scan every package referenced in a requirements.txt file of a local folder
guarddog pypi verify workspace/guarddog/requirements.txt

# Scan every package referenced in a requirements.txt file and output a sarif file - works only for verify
guarddog pypi verify --output-format=sarif workspace/guarddog/requirements.txt

# Output JSON to standard output - works for every command
guarddog pypi scan requests --output-format=json

# All the commands also work on npm
guarddog npm scan express

# Run in debug mode
guarddog --log-level debug npm scan express

Heuristics

GuardDog comes with 2 types of heuristics:

PyPI

Source code heuristics:

Heuristic Description
shady-links Identify when a package contains an URL to a domain with a suspicious extension
obfuscation Identify when a package uses a common obfuscation method often used by malware
exfiltrate-sensitive-data Identify when a package reads and exfiltrates sensitive data from the local system
download-executable Identify when a package downloads and makes executable a remote binary
exec-base64 Identify when a package dynamically executes base64-encoded code
silent-process-execution Identify when a package silently executes an executable
steganography Identify when a package retrieves hidden data from an image and executes it
code-execution Identify when an OS command is executed in the setup.py file
cmd-overwrite Identify when the 'install' command is overwritten in setup.py, indicating a piece of code automatically running when the package is installed

Metadata heuristics:

Heuristic Description
empty_information Identify packages with an empty description field
release_zero Identify packages with an release version that's 0.0 or 0.0.0
typosquatting Identify packages that are named closely to an highly popular package
potentially_compromised_email_domain Identify when a package maintainer e-mail domain (and therefore package manager account) might have been compromised
repository_integrity_mismatch Identify packages with a linked GitHub repository where the package has extra unexpected files
single_python_file Identify packages that have only a single Python file

npm

Source code heuristics:

Heuristic Description
npm-serialize-environment Identify when a package serializes 'process.env' to exfiltrate environment variables
npm-silent-process-execution Identify when a package silently executes an executable
shady-links Identify when a package contains an URL to a domain with a suspicious extension
npm-exec-base64 Identify when a package dynamically executes code through 'eval'
npm-install-script Identify when a package has a pre or post-install script automatically running commands

Metadata heuristics:

Heuristic Description
empty_information Identify packages with an empty description field
release_zero Identify packages with an release version that's 0.0 or 0.0.0
potentially_compromised_email_domain Identify when a package maintainer e-mail domain (and therefore package manager account) might have been compromised
typosquatting Identify packages that are named closely to an highly popular package

Running GuardDog in a GitHub Action

The easiest way to integrate GuardDog in your CI pipeline is to leverage the SARIF output format, and upload it to GitHub's code scanning feature.

Using this, you get:

  • Automated comments to your pull requests based on the GuardDog scan output
  • Built-in false positive management directly in the GitHub UI

Sample GitHub Action using GuardDog:

name: GuardDog

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

permissions:
  contents: read

jobs:
  guarddog:
    permissions:
      contents: read # for actions/checkout to fetch code
      security-events: write # for github/codeql-action/upload-sarif to upload SARIF results
    name: Scan dependencies
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"

      - name: Install GuardDog
        run: pip install guarddog

      - run: guarddog pypi verify requirements.txt --output-format sarif --exclude-rules repository_integrity_mismatch > guarddog.sarif

      - name: Upload SARIF file to GitHub
        uses: github/codeql-action/upload-sarif@v2
        with:
          category: guarddog-builtin
          sarif_file: guarddog.sarif

Development

Running a local version of GuardDog

Using pip

  • Ensure >=python3.10 is installed
  • Clone the repository
  • Create a virtualenv: python3 -m venv venv && source venv/bin/activate
  • Install requirements: pip install -r requirements.txt
  • Run GuardDog using python -m guarddog

Using poetry

  • Ensure poetry has an env with python >=3.10 poetry env use 3.10.0
  • Install dependencies poetry install
  • Run guarddog poetry run guarddog or poetry shell then run guarddog

Unit tests

Running all unit tests: make test

Running unit tests against Semgrep rules: make test-semgrep-rules (tests are here). These use the standard methodology for testing Semgrep rules.

Running unit tests against package metadata heuristics: make test-metadata-rules (tests are here).

Benchmarking

You can run GuardDog on legitimate and malicious packages to determine false positives and false negatives. See ./tests/samples

Code quality checks

Run the type checker with

mypy --install-types --non-interactive guarddog

and the linter with

flake8 guarddog --count --select=E9,F63,F7,F82 --show-source --statistics --exclude tests/analyzer/sourcecode,tests/analyzer/metadata/resources,evaluator/data
flake8 guarddog --count --max-line-length=120 --statistics --exclude tests/analyzer/sourcecode,tests/analyzer/metadata/resources,evaluator/data --ignore=E203,W503

Acknowledgments

Authors:

Inspiration:

More Repositories

1

go-profiler-notes

felixge's notes on the various go profiling methods that are available.
Jupyter Notebook
3,255
star
2

glommio

Glommio is a thread-per-core crate that makes writing highly parallel asynchronous applications in a thread-per-core architecture easier for rustaceans.
Rust
2,907
star
3

datadog-agent

Main repository for Datadog Agent
Go
2,716
star
4

stratus-red-team

☁️ ⚡ Granular, Actionable Adversary Emulation for the Cloud
Go
1,664
star
5

dd-agent

Datadog Agent Version 5
Python
1,291
star
6

integrations-core

Core integrations of the Datadog Agent
Python
878
star
7

zstd

Zstd wrapper for Go
C
724
star
8

the-monitor

Markdown files for Datadog's longform blog posts: https://www.datadoghq.com/blog/
Python
613
star
9

dd-trace-js

JavaScript APM Tracer
JavaScript
605
star
10

datadogpy

The Datadog Python library
Python
575
star
11

dd-trace-go

Datadog Go Library including APM tracing, profiling, and security monitoring.
Go
545
star
12

dd-trace-py

Datadog Python APM Client
Python
502
star
13

dd-trace-java

Datadog APM client for Java
Java
500
star
14

yubikey

YubiKey at Datadog
Shell
493
star
15

kafka-kit

Kafka storage rebalancing, automated replication throttle, cluster API and more
Go
480
star
16

dd-trace-php

Datadog PHP Clients
PHP
473
star
17

documentation

The source for Datadog's documentation site.
JavaScript
418
star
18

dd-trace-dotnet

.NET Client Library for Datadog APM
C#
412
star
19

security-labs-pocs

Proof of concept code for Datadog Security Labs referenced exploits.
Shell
355
star
20

go-python3

Go bindings to the CPython-3 API
Go
344
star
21

datadog-go

go dogstatsd client library for datadog
Go
332
star
22

terraform-provider-datadog

Terraform Datadog provider
Go
329
star
23

datadog-serverless-functions

Repo of AWS Lambda and Azure Functions functions that process streams and send data to Datadog
Python
326
star
24

helm-charts

Helm charts for Datadog products
Go
322
star
25

docker-dd-agent

Datadog Agent Dockerfile for Trusted Builds.
Roff
302
star
26

ansible-datadog

Ansible role for Datadog Agent
Jinja
294
star
27

datadog-operator

Datadog Agent Kubernetes Operator
Go
285
star
28

browser-sdk

Datadog Browser SDK
TypeScript
279
star
29

dd-trace-rb

Datadog Tracing Ruby Client
Ruby
261
star
30

threatest

Threatest is a CLI and Go framework for end-to-end testing threat detection rules.
Go
260
star
31

integrations-extras

Community developed integrations and plugins for the Datadog Agent.
Python
243
star
32

watermarkpodautoscaler

Custom controller that extends the Horizontal Pod Autoscaler
Go
207
star
33

pupernetes

Spin up a full fledged Kubernetes environment designed for local development & CI
Go
200
star
34

Miscellany

Miscellaneous scripts and tools
Python
197
star
35

php-datadogstatsd

A PHP client for DogStatsd
PHP
185
star
36

dd-sdk-ios

Datadog SDK for iOS - Swift and Objective-C.
Swift
183
star
37

java-dogstatsd-client

Java statsd client library
Java
177
star
38

dogstatsd-ruby

A Ruby client for DogStatsd
Ruby
166
star
39

sketches-go

Go implementations of the distributed quantile sketch algorithm DDSketch
Go
142
star
40

chaos-controller

🐒 🔥 Datadog Failure Injection System for Kubernetes
C
142
star
41

dd-sdk-android

Datadog SDK for Android (Compatible with Kotlin and Java)
Kotlin
140
star
42

kvexpress

Go program to move data in and out of Consul's KV store.
Go
128
star
43

HASH

HASH (HTTP Agnostic Software Honeypot)
JavaScript
119
star
44

docker-compose-example

A working example of using Docker Compose with Datadog
Python
116
star
45

malicious-software-packages-dataset

An open-source dataset of malicious software packages found in the wild, 100% vetted by humans.
Python
116
star
46

ebpf-manager

This manager helps handle the life cycle of your eBPF programs
Go
114
star
47

trace-examples

trace sample apps
Python
113
star
48

sketches-java

DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees.
Java
108
star
49

dd-sdk-reactnative

Datadog SDK for ReactNative
TypeScript
105
star
50

gohai

System information collector
Go
102
star
51

datadog-lambda-js

The Datadog AWS Lambda Library for Node
TypeScript
101
star
52

chef-datadog

Chef cookbook for Datadog Agent & Integrations
Ruby
97
star
53

piecewise

Functions for piecewise regression on time series data
Python
96
star
54

orchestrion

A tool for adding instrumentation to Go code
Go
96
star
55

jmxfetch

Export JMX metrics
Java
96
star
56

extendeddaemonset

Kubernetes Extended Daemonset controller
Go
95
star
57

datadog-api-client-go

Golang client for the Datadog API
Go
95
star
58

dogstatsd-csharp-client

A DogStatsD client for C#/.NET
C#
94
star
59

gostackparse

Package gostackparse parses goroutines stack traces as produced by panic() or debug.Stack() at ~300 MiB/s.
Go
94
star
60

ansible-datadog-callback

Ansible callback to get stats & events directly into Datadog http://datadoghq.com
Python
93
star
61

dogapi-rb

Ruby client for Datadog's API
Ruby
92
star
62

redux-doghouse

Scoping helpers for building reusable components with Redux
JavaScript
90
star
63

build-plugin

Track your build performances like never before.
TypeScript
89
star
64

serverless-plugin-datadog

Serverless plugin to automagically instrument your Lambda functions with Datadog
TypeScript
87
star
65

ecommerce-workshop

Example eCommerce App for workshops and observability
Ruby
86
star
66

datadog-ci

Use Datadog from your CI.
TypeScript
85
star
67

ebpfbench

profile eBPF programs from Go
Go
83
star
68

datadog-lambda-python

The Datadog AWS Lambda Layer for Python
Python
80
star
69

sketches-py

Python implementations of the distributed quantile sketch algorithm DDSketch
Python
77
star
70

dirtypipe-container-breakout-poc

Container Excape PoC for CVE-2022-0847 "DirtyPipe"
77
star
71

datadog-api-client-typescript

Typescript client for the Datadog API
TypeScript
74
star
72

ddqa

Datadog's QA manager for releases of GitHub repositories
Python
73
star
73

datadog-trace-agent

Datadog Trace Agent archive (pre-6.10.0)
70
star
74

heroku-buildpack-datadog

Heroku Buildpack to run the Datadog Agent in a Dyno
Shell
69
star
75

datadog-api-client-python

Python client for the Datadog API
Python
68
star
76

datadog-static-analyzer

Datadog Static Analyzer
Rust
64
star
77

managed-kubernetes-auditing-toolkit

All-in-one auditing toolkit for identifying common security issues in managed Kubernetes environments. Currently supports AWS EKS.
Go
60
star
78

lading

A suite of data generation and load testing tools
Rust
60
star
79

datadog-lambda-extension

Rust
60
star
80

jsonapi

A marshaler/unmarshaler for JSON:API.
Go
59
star
81

datadog-cdk-constructs

CDK construct library to automagically instrument your Lambda functions with Datadog
TypeScript
58
star
82

datadog-lambda-go

The Datadog AWS Lambda package for Go
Go
57
star
83

datadog-api-client-java

Java client for the Datadog API
Java
54
star
84

serilog-sinks-datadog-logs

Serilog Sink that sends log events to Datadog https://www.datadoghq.com/
C#
53
star
85

puppet-datadog-agent

Puppet module to install the Datadog agent
Ruby
50
star
86

opencensus-go-exporter-datadog

Datadog exporter for OpenCensus metrics
Go
47
star
87

gello

:octocat: A self-hosted server for managing Trello cards based on GitHub webhook events
Python
45
star
88

datadog-cloudformation-resources

Python
44
star
89

effective-dashboards

A curated list of useful Datadog dashboards and Dashboard design best practices
44
star
90

ebpf-training

Go
44
star
91

jenkins-datadog-plugin

ARCHIVED: Current repository is now located https://github.com/jenkinsci/datadog-plugin
Java
42
star
92

dd-sdk-flutter

Flutter bindings and tools for utilizing Datadog Mobile SDKs
Dart
40
star
93

dd-opentracing-cpp

Datadog Opentracing C++ Client
C++
40
star
94

synthetics-ci-github-action

Use Browser and API tests in your CI/CD with Datadog Continuous Testing
TypeScript
40
star
95

rum-react-integration-examples

rum-react-integration
TypeScript
39
star
96

fluent-plugin-datadog

Fluentd output plugin for Datadog: https://www.datadog.com
Ruby
38
star
97

import-in-the-middle

Like `require-in-the-middle`, but for ESM import
JavaScript
38
star
98

ddprof

The Datadog Native Profiler for Linux
C++
35
star
99

datadog-sync-cli

Datadog cli tool to sync resources across organizations.
Python
33
star
100

apigentools

Generate API clients with ease
Python
32
star