• Stars
    star
    142
  • Rank 258,495 (Top 6 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created about 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tree-based federated learning system (MLSys 2023)

Documentation

Overview

FedTree is a federated learning system for tree-based models. It is designed to be highly efficient, effective, and secure. It has the following features currently.

  • Federated training of gradient boosting decision trees.
  • Parallel computing on multi-core CPUs and GPUs.
  • Supporting homomorphic encryption, secure aggregation and differential privacy.
  • Supporting classification and regression.

The overall architecture of FedTree is shown below. FedTree_archi

Getting Started

You can refer to our primary documentation here.

Prerequisites

  • CMake 3.15 or above
  • GMP
  • NTL
  • gRPC 1.50.0 (required for distributed version)

You can follow the following commands to install NTL library.

wget https://libntl.org/ntl-11.5.1.tar.gz
tar -xvf ntl-11.5.1.tar.gz
cd ntl-11.5.1/src
./configure SHARED=on
make
make check
sudo make install

If you install the NTL library at another location, please pass the location to the NTL_PATH when building the library (e.g., cmake .. -DNTL_PATH="PATH_TO_NTL").

For gRPC, please remember to add the local bin folder to your path variable after installation, e.g.,

export PATH="$gRPC_INSTALL_DIR/bin:$PATH"

We suggest you install gPRC 1.50.0, i.e., using -b v1.50.0 when cloning gRPC repo.

If your gRPC version is not 1.50.0, you need to go to src/FedTree/grpc directory and run

protoc -I ./ --grpc_out=. --plugin=protoc-gen-grpc=`which grpc_cpp_plugin` ./fedtree.proto
protoc -I ./ --cpp_out=. ./fedtree.proto

Clone and Install submodules

git clone https://github.com/Xtra-Computing/FedTree.git
cd FedTree
git submodule init
git submodule update

Standalone Simulation

Build on Linux (Recommended)

# under the directory of FedTree
mkdir build && cd build 
cmake .. -DDISTRIBUTED=OFF
make -j

Build on MacOS

Build with Apple Clang

You need to install libomp for MacOS.

brew install libomp

Install FedTree:

# under the directory of FedTree
mkdir build
cd build
cmake -DOpenMP_C_FLAGS="-Xpreprocessor -fopenmp -I/usr/local/opt/libomp/include" \
  -DOpenMP_C_LIB_NAMES=omp \
  -DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp -I/usr/local/opt/libomp/include" \
  -DOpenMP_CXX_LIB_NAMES=omp \
  -DOpenMP_omp_LIBRARY=/usr/local/opt/libomp/lib/libomp.dylib \
  ..
make -j

Run training

# under 'FedTree' directory
./build/bin/FedTree-train ./examples/vertical_example.conf

Distributed Setting

For each machine that participates in FL, it needs to build the library first. When building the library, passing -DDISTRIBUTED=ON option to cmake.

mkdir build && cd build
cmake .. -DDISTRIBUTED=ON
make -j

Then, write your configuration file where you should specify the ip address of the server machine (ip_address=xxx). Run FedTree-distributed-server in the server machine and run FedTree-distributed-party in the party machines. Here are two examples for horizontal FedTree and vertical FedTree.

Distributed Horizontal FedTree

# under 'FedTree' directory
# under server machine
./build/bin/FedTree-distributed-server ./examples/adult/a9a_horizontal_server.conf
# under party machine 0
./build/bin/FedTree-distributed-party ./examples/adult/a9a_horizontal_p0.conf 0
# under party machine 1
./build/bin/FedTree-distributed-party ./examples/adult/a9a_horizontal_p1.conf 1

Distributed Vertical FedTree

# under 'FedTree' directory
# under server (i.e., the party with label) machine 0
./build/bin/FedTree-distributed-server ./examples/credit/credit_vertical_p0_withlabel.conf
# open a new terminal
./build/bin/FedTree-distributed-party ./examples/credit/credit_vertical_p0_withlabel.conf 0
# Under party machine 1
./build/bin/FedTree-distributed-party ./examples/credit/credit_vertical_p1.conf 1

Other information

FedTree is built based on ThunderGBM, which is a fast GBDTs and Radom Forests training system on GPUs.

Citation

Please cite our paper if you use FedTree in your work.

@inproceedings{fedtree,
  title={FedTree: A Federated Learning System For Trees},
  author={Li, Qinbin and Wu, Zhaomin and Cai, Yanzheng and Han, Yuxuan and Yung, Ching Man and Fu, Tianyuan and He, Bingsheng},
  booktitle={Proceedings of Machine Learning and Systems},
  year={2023}
}

Call for contributions

Our goal is to make FedTree stronger and we're glad if you can contribute to FedTree. If you'd like to contribute to FedTree in-depth and are familiar with C++, kindly send your CV to [email protected].

More Repositories

1

thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
C++
1,564
star
2

thundergbm

ThunderGBM: Fast GBDTs and Random Forests on GPUs
C++
692
star
3

NIID-Bench

Federated Learning Benchmark - Federated Learning on Non-IID Data Silos: An Experimental Study (ICDE 2022)
Python
558
star
4

ThunderGP

HLS-based Graph Processing Framework on FPGAs
C++
135
star
5

Medusa

Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code
Cuda
61
star
6

Awesome-Literature-ILoGs

Awesome literature on imbalanced learning on graphs
58
star
7

G3

G3: A Programmable GNN Training System on GPU
Cuda
42
star
8

briskstream

A Multicore, NUMA Optimised Data Stream Processing System
Java
39
star
9

PyOE

Python library for data stream learning
Python
28
star
10

ThunderRW

Source code of "ThunderRW: An In-Memory Graph Random Walk Engine" published in VLDB'2021 - By Shixuan Sun, Yuhang Chen, Shengliang Lu, Bingsheng He and Yuchen Li
C++
26
star
11

FedSim

A coupled vertical federated learning framework that boosts the model performance with record similarities (NeurIPS 2022)
Python
23
star
12

PrivML

20
star
13

SOFF

Python
19
star
14

ConsisGAD

Python
18
star
15

SimFL

Practical Federated Gradient Boosting Decision Trees (AAAI 2020)
C++
18
star
16

ForkGraph

C++
16
star
17

ReGraph

Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines
C++
16
star
18

ThundeRiNG

Fast Multiple Independent Random Number Sequences Generation on FPGAs
C++
14
star
19

hacc_demo

Shell
14
star
20

FedOV

Towards Addressing Label Skews in One-Shot Federated Learning (ICLR 2023)
Python
14
star
21

Vine

Accelerating Exact Constrained Shortest Paths on GPUs
C++
14
star
22

PathEnum

Source code of "PathEnum: Towards Real-Time Hop-Constrained s-t Path Enumeration", published in SIGMOD'2021 - By Shixuan Sun, Yuhang Chen, Bingsheng He, and Bryan Hooi
C++
14
star
23

OEBench

OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams (VLDB 2024)
Python
13
star
24

VertiBench

Feature partitioner by imbalance or correlation (ICLR 2024)
Jupyter Notebook
9
star
25

omniDB

General query processing engine
C++
7
star
26

LightRW

C++
6
star
27

HashjoinOnHARP

The MAIN project of the paper "Is FPGA useful for Hash Joins?"
C++
5
star
28

PMP

Python
5
star
29

RUSH

A fast library for real-time burst subgraph detection
Python
4
star
30

On-the-fly-data-shuffling-for-OpenCL-based-FPGAs

JavaScript
4
star
31

DeltaBoost

GBDT-based model with efficient unlearning (SIGMOD 2023)
C++
4
star
32

ModelGo

TeX
4
star
33

Pyper

3
star
34

KGraph

Concurrent Graph Query Processing with Memoization on Graph
3
star
35

Awesome-Prompt-For-Research

Awesome prompts for computer science research including paper editting and code debugging
2
star
36

Melia

C
2
star
37

Query_on_OpenCL_FPGA

C++
1
star
38

FedGMA

Communication-Efficient Generalized Neuron Matching for Federated Learning (ICPP'23)
Python
1
star
39

HashJoin_HMA

A hash join implementation optimized for many-core processors with die-stacked HBMs
C++
1
star
40

Clementi

Clementi: A Scalable Multi-FPGA Graph Processing Framework
C++
1
star