AIJack: Security and Privacy Risk Simulator for Machine Learning
What is AIJack?
AIJack is an easy-to-use open-source simulation tool for testing the security of your AI system against hijackers. It provides advanced security techniques like Differential Privacy, Homomorphic Encryption, K-anonymity and Federated Learning to guarantee protection for your AI. With AIJack, you can test and simulate defenses against various attacks such as Poisoning, Model Inversion, Backdoor, and Free-Rider. We support more than 30 state-of-the-art methods. For more information, check our documentation and start securing your AI today with AIJack.
Installation
You can install AIJack with pip
. AIJack requires Boost and pybind11.
apt install -y libboost-all-dev
pip install -U pip
pip install "pybind11[global]"
pip install aijack
If you want to use the latest-version, you can directly install from GitHub.
pip install git+https://github.com/Koukyosyumei/AIJack
We also provide Dockerfile.
Quick Start
We briefly introduce the overview of AIJack.
Features
- All-around abilities for both attack & defense
- PyTorch-friendly design
- Compatible with scikit-learn
- Fast Implementation with C++ backend
- MPI-Backend for Federated Learning
- Extensible modular APIs
Basic Interface
Python API
For standard machine learning algorithms, AIJack allows you to simulate attacks against machine learning models with Attacker
APIs. AIJack mainly supports PyTorch or sklearn models.
# abstract code
attacker = Attacker(target_model)
result = attacker.attack()
For distributed learning such as Federated Learning and Split Learning, AIJack offers four basic APIs: Client
, Server
, API
, and Manager
. Client
and Server
represent each client and server within each distributed learning scheme. You can execute training by registering the clients and servers to API
and running it. Manager
gives additional abilities such as attack, defense, or parallel computing to Client
, Server
or API
via attach
method.
# abstract code
client = [Client(), Client()]
server = Server()
api = API(client, server)
api.run() # execute training
c_manager = ClientManagerForAdditionalAbility(...)
s_manager = ServerManagerForAdditionalAbility(...)
ExtendedClient = c_manager.attach(Client)
ExtendedServer = c_manager.attach(Server)
extended_client = [ExtendedClient(...), ExtendedClient(...)]
extended_server = ExtendedServer(...)
api = API(extended_client, extended_server)
api.run() # execute training
For example, the bellow code implements the scenario where the server in Federated Learning tries to steal the training data with gradient-based model inversion attack.
from aijack.collaborative.fedavg import FedAVGAPI, FedAVGClient, FedAVGServer
from aijack.attack.inversion import GradientInversionAttackServerManager
manager = GradientInversionAttackServerManager(input_shape)
FedAVGServerAttacker = manager.attach(FedAVGServer)
clients = [FedAVGClient(model_1), FedAVGClient(model_2)]
server = FedAVGServerAttacker(clients, model_3)
api = FedAVGAPI(server, clients, criterion, optimizers, dataloaders)
api.run()
AIValut: A simple DBMS for debugging ML Models
We also provide a simple DBMS named AIValut
designed specifically for SQL-based algorithms. AIValut currently supports Rain, a SQL-based debugging system for ML models. In the future, we have plans to integrate additional advanced features from AIJack, including K-Anonymity, Homomorphic Encryption, and Differential Privacy.
AIValut has its own storage engine and query parser, and you can train and debug ML models with SQL-like queries. For example, the Complaint
query automatically removes problematic records given the specified constraint.
# We train an ML model to classify whether each customer will go bankrupt or not based on their age and debt.
# We want the trained model to classify the customer as positive when he/she has more debt than or equal to 100.
# The 10th record seems problematic for the above constraint.
>>Select * From bankrupt
id age debt y
1 40 0 0
2 21 10 0
3 22 10 0
4 32 30 0
5 44 50 1
6 30 100 1
7 63 310 1
8 53 420 1
9 39 530 1
10 49 1000 0
# Train Logistic Regression with the number of iterations of 100 and the learning rate of 1.
# The name of the target feature is `y`, and we use all other features as training data.
>>Logreg lrmodel id y 100 1 From Select * From bankrupt
Trained Parameters:
(0) : 2.771564
(1) : -0.236504
(2) : 0.967139
AUC: 0.520000
Prediction on the training data is stored at `prediction_on_training_data_lrmodel`
# Remove one record so that the model will predict `positive (class 1)` for the samples with `debt` greater or equal to 100.
>>Complaint comp Shouldbe 1 Remove 1 Against Logreg lrmodel id y 100 1 From Select * From bankrupt Where debt Geq 100
Fixed Parameters:
(0) : -4.765492
(1) : 8.747224
(2) : 0.744146
AUC: 1.000000
Prediction on the fixed training data is stored at `prediction_on_training_data_comp_lrmodel`
For more detailed information and usage instructions, please refer to aivalut/README.md.
Please use AIValut only for research purpose.
Resources
You can also find more examples in our tutorials and documentation.
Supported Algorithms
Collaborative | Horizontal FL | FedAVG, FedProx, FedKD, FedGEMS, FedMD, DSFL |
Collaborative | Vertical FL | SplitNN, SecureBoost |
Attack | Model Inversion | MI-FACE, DLG, iDLG, GS, CPL, GradInversion, GAN Attack |
Attack | Label Leakage | Norm Attack |
Attack | Poisoning | History Attack, Label Flip, MAPF, SVM Poisoning |
Attack | Backdoor | DBA |
Attack | Free-Rider | Delta-Weight |
Attack | Evasion | Gradient-Descent Attack |
Attack | Membership Inference | Shaddow Attack |
Defense | Homomorphic Encryption | Paiilier |
Defense | Differential Privacy | DPSGD, AdaDPS |
Defense | Anonymization | Mondrian |
Defense | Debugging | Model Assertions, Rain, Neuron Coverage |
Defense | Others | Soteria, FoolsGold, MID, Sparse Gradient |
Contact
welcome2aijack[@]gmail.com
Citation
@software{Hideaki_AIJack_2023,
author = {Hideaki, Takahashi},
month = jun,
title = {{AIJack}},
url = {https://github.com/Koukyosyumei/AIJack},
year = {2023}
}