Concrete ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of Concrete by Zama. It aims to simplify the use of fully homomorphic encryption (FHE) for data scientists to help them automatically turn machine learning models into their homomorphic equivalent. Concrete ML was designed with ease-of-use in mind, so that data scientists can use it without knowledge of cryptography. Notably, the Concrete ML model classes are similar to those in scikit-learn and it is also possible to convert PyTorch models to FHE.
Main features.
Data scientists can use models with APIs which are close to the frameworks they use, with additional options to run inferences in FHE.
Concrete ML features:
- built-in models, which are ready-to-use FHE-friendly models with a user interface that is equivalent to their the scikit-learn and XGBoost counterparts
- support for customs models that can use quantization aware training. These are developed by the user using PyTorch or keras/tensorflow and are imported into Concrete ML through ONNX
Installation.
Depending on your OS, Concrete ML may be installed with Docker or with pip:
OS / HW | Available on Docker | Available on pip |
---|---|---|
Linux | Yes | Yes |
Windows | Yes | Coming soon |
Windows Subsystem for Linux | Yes | Yes |
macOS 11+ (Intel) | Yes | Yes |
macOS 11+ (Apple Silicon: M1, M2, etc.) | Yes | Yes |
Note: Concrete ML only supports Python 3.8
, 3.9
and 3.10
.
Concrete ML can be installed on Kaggle (see question on community for more details) and on Google Colab.
Docker
To install with Docker, pull the concrete-ml
image as follows:
docker pull zamafhe/concrete-ml:latest
Pip
To install Concrete ML from PyPi, run the following:
pip install -U pip wheel setuptools
pip install concrete-ml
You can find more detailed installation instructions in this part of the documentation
A simple Concrete ML example with scikit-learn.
A simple example which is very close to scikit-learn is as follows, for a logistic regression :
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression
# Lets create a synthetic data-set
x, y = make_classification(n_samples=100, class_sep=2, n_features=30, random_state=42)
# Split the data-set into a train and test set
X_train, X_test, y_train, y_test = train_test_split(
x, y, test_size=0.2, random_state=42
)
# Now we train in the clear and quantize the weights
model = LogisticRegression(n_bits=8)
model.fit(X_train, y_train)
# We can simulate the predictions in the clear
y_pred_clear = model.predict(X_test)
# We then compile on a representative set
model.compile(X_train)
# Finally we run the inference on encrypted inputs !
y_pred_fhe = model.predict(X_test, fhe="execute")
print("In clear :", y_pred_clear)
print("In FHE :", y_pred_fhe)
print(f"Similarity: {int((y_pred_fhe == y_pred_clear).mean()*100)}%")
# Output:
# In clear : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
# In FHE : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
# Similarity: 100%
This example is explained in more detail in the linear model documentation. Concrete ML built-in models have APIs that are almost identical to their scikit-learn counterparts. It is also possible to convert PyTorch networks to FHE with the Concrete ML conversion APIs. Please refer to the linear models, tree-based models and neural networks documentation for more examples, showing the scikit-learn-like API of the built-in models.
Documentation.
Full, comprehensive documentation is available here: https://docs.zama.ai/concrete-ml.
Online demos and tutorials.
Various tutorials are proposed for the built-in models and for deep learning. In addition, several complete use-cases are explored:
-
Encrypted Large Language Model: convert a user-defined part of a Large Language Model for encrypted text generation. Shows the trade-off between quantization and accuracy for text generation and shows how to run the model in FHE.
-
Credit Scoring: predicts the chance of a given loan applicant defaulting on loan repayment while keeping the user's data private. Shows how Concrete ML models easily replace their scikit-learn equivalents
-
Health diagnosis: based on a patient's symptoms, history and other health factors, gives a diagnosis using FHE to preserve the privacy of the patient.
-
MNIST: a Python script and notebook showing quantization-aware training following FHE constraints. A fully-connected neural network is implemented with Brevitas and is converted to FHE with Concrete ML.
-
Titanic: a notebook, which gives a solution to the Kaggle Titanic competition. Implemented with XGBoost from Concrete ML, this example comes as a companion of the Kaggle notebook, and was the subject of a blogpost in KDnuggets.
-
Sentiment analysis with transformers: a gradio demo which predicts if a tweet / short message is positive, negative or neutral, with FHE of course! The live interactive demo is available on Hugging Face. This blog post explains how this demo works!
-
CIFAR10 FHE-friendly model with Brevitas: code for training from scratch a VGG-like FHE-compatible neural network using Brevitas, and a script to run the neural network in FHE. Execution in FHE takes ~20 minutes per image and shows an accuracy of 88.7%.
-
CIFAR10 / CIFAR100 FHE-friendly models with Transfer Learning approach: series of three notebooks, that show how to convert a pre-trained FP32 VGG11 neural network into a quantized model using Brevitas. The model is fine-tuned on the CIFAR data-sets, converted for FHE execution with Concrete ML and evaluated using FHE simulation. For CIFAR10 and CIFAR100, respectively, our simulations show an accuracy of 90.2% and 68.2%.
-
FHE neural network splitting for client/server deployment: we explain how to split a computationally-intensive neural network model in two parts. First, we execute the first part on the client side in the clear, and the output of this step is encrypted. Next, to complete the computation, the second part of the model is evaluated with FHE. This tutorial also shows the impact of FHE speed/accuracy trade-off on CIFAR10, limiting PBS to 8-bit, and thus achieving 62% accuracy.
-
Encrypted image filtering: finally, the live demo for our 6-min is available, in the form of a gradio application. We take encrypted images, and apply some filters (for example black-and-white, ridge detection, or your own filter).
More generally, if you have built awesome projects using Concrete ML, feel free to let us know and we'll link to it!
Citing Concrete ML
To cite Concrete ML, notably in academic papers, please use the following entry, which list authors by order of first commit:
@Misc{ConcreteML,
title={Concrete {ML}: a Privacy-Preserving Machine Learning Library using Fully Homomorphic Encryption for Data Scientists},
author={Arthur Meyre and Benoit {Chevallier-Mames} and Jordan Frery and Andrei Stoian and Roman Bredehoft and Luis Montero and Celia Kherfallah},
year={2022},
note={\url{https://github.com/zama-ai/concrete-ml}},
}
Need support?
License.
This software is distributed under the BSD-3-Clause-Clear license. If you have any questions, please contact us at [email protected].