Python Pickle Malware Scanner

Security scanner detecting Python Pickle files performing suspicious actions.

Getting started

Scan a malicious model on Hugging Face:

pip install picklescan
picklescan --huggingface ykilcher/totally-harmless-model

The scanner reports that the Pickle is calling eval() to execute arbitrary code:

https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin:archive/data.pkl: global import '__builtin__ eval' FOUND
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 1
Dangerous globals: 1

The scanner can also load Pickles from local files, directories, URLs, and zip archives (a-la PyTorch):

picklescan --path downloads/pytorch_model.bin
picklescan --path downloads
picklescan --url https://huggingface.co/sshleifer/tiny-distilbert-base-cased-distilled-squad/resolve/main/pytorch_model.bin

To scan Numpy's .npy files, pip install the numpy package first.

The scanner exit status codes are (a-la ClamAV):

0: scan did not find malware
1: scan found malware
2: scan failed

Develop

Create and activate the conda environment (miniconda is sufficient):

conda env create -f conda.yaml
conda activate picklescan

Install the package in editable mode to develop and test:

python3 -m pip install -e .

Edit with VS Code:

code .

Run unit tests:

pytest tests

Run manual tests:

Local PyTorch (zip) file

mkdir downloads
wget -O downloads/pytorch_model.bin https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin
picklescan -l DEBUG -p downloads/pytorch_model.bin

Remote PyTorch (zip) URL

picklescan -l DEBUG -u https://huggingface.co/prajjwal1/bert-tiny/resolve/main/pytorch_model.bin

Publish the package to PyPI: bump the package version in setup.cfg and create a GitHub release. This triggers the publish workflow.

Alternative manual steps to publish the package:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade build
python3 -m build
python3 -m twine upload dist/*

Test the package: bump the version of picklescan in conda.test.yaml and run

conda env remove -n picklescan-test
conda env create -f conda.test.yaml
conda activate picklescan-test
picklescan --huggingface ykilcher/totally-harmless-model

Tested on Linux 5.10.102.1-microsoft-standard-WSL2 x86_64 (WSL2).

References

pickletools.py -- The pickletool code is the most detailed documentation of the Pickle format.
Machine Learning Attack Series: Backdooring Pickle Files, Johann Rehberger, 2022
Hugging Face Pickle Scanning, Luc Georges, 2022
The hidden dangers of loading open-source AI models (ARBITRARY CODE EXPLOIT!, Yannic Kilcher, 2022
Secure Machine Learning at Scale with MLSecOps, Alejandro Saucedo, 2022
Backdooring Pickles: A decade only made things worse, ColdwaterQ, DEFCON 2022
Never a dill moment: Exploiting machine learning pickle files, Evan Sultanik, 2021 (tool: Fickling)
Exploiting Python pickles, David Hamann, 2020
Dangerous Pickles - malicious python serialization, Evan Sangaline, 2017
Python Pickle Security Problems and Solutions, Travis Cunningham, 2015
Arbitrary code execution with Python pickles, Stephen Checkoway, 2013
Sour Pickles, A serialised exploitation guide in one part, Marco Slaviero, BlackHat USA 2011 (see also: doc, slides)

mmaitre314/picklescan

mmaitre314

Reviews

Repository Details

Python Pickle Malware Scanner

Getting started

Develop

References

More Repositories