Python Pickle Malware Scanner
Security scanner detecting Python Pickle files performing suspicious actions.
Getting started
Scan a malicious model on Hugging Face:
pip install picklescan
picklescan --huggingface ykilcher/totally-harmless-model
The scanner reports that the Pickle is calling eval()
to execute arbitrary code:
https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin:archive/data.pkl: global import '__builtin__ eval' FOUND
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 1
Dangerous globals: 1
The scanner can also load Pickles from local files, directories, URLs, and zip archives (a-la PyTorch):
picklescan --path downloads/pytorch_model.bin
picklescan --path downloads
picklescan --url https://huggingface.co/sshleifer/tiny-distilbert-base-cased-distilled-squad/resolve/main/pytorch_model.bin
To scan Numpy's .npy
files, pip install the numpy
package first.
The scanner exit status codes are (a-la ClamAV):
0
: scan did not find malware1
: scan found malware2
: scan failed
Develop
Create and activate the conda environment (miniconda is sufficient):
conda env create -f conda.yaml
conda activate picklescan
Install the package in editable mode to develop and test:
python3 -m pip install -e .
Edit with VS Code:
code .
Run unit tests:
pytest tests
Run manual tests:
- Local PyTorch (zip) file
mkdir downloads
wget -O downloads/pytorch_model.bin https://huggingface.co/ykilcher/totally-harmless-model/resolve/main/pytorch_model.bin
picklescan -l DEBUG -p downloads/pytorch_model.bin
- Remote PyTorch (zip) URL
picklescan -l DEBUG -u https://huggingface.co/prajjwal1/bert-tiny/resolve/main/pytorch_model.bin
Publish the package to PyPI: bump the package version in setup.cfg
and create a GitHub release. This triggers the publish
workflow.
Alternative manual steps to publish the package:
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade build
python3 -m build
python3 -m twine upload dist/*
Test the package: bump the version of picklescan
in conda.test.yaml
and run
conda env remove -n picklescan-test
conda env create -f conda.test.yaml
conda activate picklescan-test
picklescan --huggingface ykilcher/totally-harmless-model
Tested on Linux 5.10.102.1-microsoft-standard-WSL2 x86_64
(WSL2).
References
- pickletools.py -- The pickletool code is the most detailed documentation of the Pickle format.
- Machine Learning Attack Series: Backdooring Pickle Files, Johann Rehberger, 2022
- Hugging Face Pickle Scanning, Luc Georges, 2022
- The hidden dangers of loading open-source AI models (ARBITRARY CODE EXPLOIT!, Yannic Kilcher, 2022
- Secure Machine Learning at Scale with MLSecOps, Alejandro Saucedo, 2022
- Backdooring Pickles: A decade only made things worse, ColdwaterQ, DEFCON 2022
- Never a dill moment: Exploiting machine learning pickle files, Evan Sultanik, 2021 (tool: Fickling)
- Exploiting Python pickles, David Hamann, 2020
- Dangerous Pickles - malicious python serialization, Evan Sangaline, 2017
- Python Pickle Security Problems and Solutions, Travis Cunningham, 2015
- Arbitrary code execution with Python pickles, Stephen Checkoway, 2013
- Sour Pickles, A serialised exploitation guide in one part, Marco Slaviero, BlackHat USA 2011 (see also: doc, slides)