• Stars
    star
    832
  • Rank 54,811 (Top 2 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created about 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Leaf: A Benchmark for Federated Settings

LEAF: A Benchmark for Federated Settings

Resources

Datasets

  1. FEMNIST
  • Overview: Image Dataset
  • Details: 62 different classes (10 digits, 26 lowercase, 26 uppercase), images are 28 by 28 pixels (with option to make them all 128 by 128 pixels), 3500 users
  • Task: Image Classification
  1. Sentiment140
  • Overview: Text Dataset of Tweets
  • Details 660120 users
  • Task: Sentiment Analysis
  1. Shakespeare
  • Overview: Text Dataset of Shakespeare Dialogues
  • Details: 1129 users (reduced to 660 with our choice of sequence length. See bug.)
  • Task: Next-Character Prediction
  1. Celeba
  1. Synthetic Dataset
  • Overview: We propose a process to generate synthetic, challenging federated datasets. The high-level goal is to create devices whose true models are device-dependant. To see a description of the whole generative process, please refer to the paper
  • Details: The user can customize the number of devices, the number of classes and the number of dimensions, among others
  • Task: Classification
  1. Reddit
  • Overview: We preprocess the Reddit data released by pushshift.io corresponding to December 2017.
  • Details: 1,660,820 users with a total of 56,587,343 comments.
  • Task: Next-word Prediction.

Notes

  • Install the libraries listed in requirements.txt
    • I.e. with pip: run pip3 install -r requirements.txt
  • Go to directory of respective dataset for instructions on generating data
    • in MacOS check if wget is installed and working
  • models directory contains instructions on running baseline reference implementations