• Stars
    star
    147
  • Rank 249,816 (Top 5 %)
  • Language
    JavaScript
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The official Syft worker for Web and Node, built in Javascript

syft.js logo

Build codecov npm GitHub OpenCollective

All Contributors

Syft.js

Syft.js is the “web” part of the OpenMined's open-source ecosystem for federated learning, which currently spans across web, iOS, Android, and servers/IoT.

Syft.js has following core features:

  • 🛠️ Integration with PyGrid federated learning API.
  • ⚙️ Training and inference of any PySyft model written in PyTorch or TensorFlow.
  • 👤 Allows all data to stay on the user's device.
  • 🔒 Support for secure multi-party computation and secure aggregation protocols using peer-to-peer WebRTC connections (in progress).

The library is built on top of TensorFlow.js.

There are a variety of additional privacy-preserving protections that may be applied, including differential privacy, muliti-party computation, and secure aggregation.

If you want to know how scalable federated systems are built, Towards Federated Learning at Scale is a fantastic introduction!

Installation

Note that syft.js needs Tensorflow.js library as peer dependency.

If you're using a package manager like NPM:

npm install --save @openmined/syft.js @tensorflow/tfjs-core

Or if Yarn is your cup of tea:

yarn add @openmined/syft.js @tensorflow/tfjs-core

If you're not using a package manager, you will be able to include Syft.js within a <script> tag. In this case library classes will be available under syft global object.

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/[email protected]/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@openmined/syft.js@latest/dist/index.min.js"></script>

<script type="text/javascript">
  // Create syft worker
  const worker = syft.Syft({...});
  ...
</script>

Quick Start

As a developer, there are few steps to building your own secure federated learning system upon the OpenMined infrastructure:

  1. 🤖 Develop ML model and training procedure (aka Plan in PySyft terminology) using PySyft.
  2. 🌎 Host model and Plans on PyGrid, which will deal with all the federated learning components of your pipeline.
  3. 🎉 Execute the training on the variety of end-user devices using the client library (syft.js, SwiftSyft, KotlinSyft, PySyft).
  4. 🔒 Securely aggregate trained user models in PyGrid.

📓 The entire workflow and process is described in greater detail in the Web & Mobile Federated Learning project roadmap.

Syft.js provides minimalistic API to communicate with federated learning PyGrid endpoints and execute PySyft's Plans in a browser. The federated learning cycle implemented with syft.js would contain following steps:

  • Register into training cycle on PyGrid.
  • Download required model and Plans from PyGrid.
  • Execute the Plan with given model parameters and local user's data (multiple times) to create better model.
  • Submit difference between original and trained model parameters for aggregation.

These steps can be expressed in the following code:

import * as tf from '@tensorflow/tfjs-core';
import { Syft } from '@openmined/syft.js';

const gridUrl = 'ws://pygrid.myserver.com:5000';
const modelName = 'my-model';
const modelVersion = '1.0.0';

// if the model is protected with authentication token (optional)
const authToken = '...';

const worker = new Syft({ gridUrl, verbose: true });
const job = await worker.newJob({ modelName, modelVersion, authToken });
job.request();

job.on('accepted', async ({ model, clientConfig }) => {
  const batchSize = clientConfig.batch_size;
  const lr = clientConfig.lr;

  // Load data.
  const [data, target] = LOAD_DATA();
  const batches = MAKE_BATCHES(data, target, batchSize);

  // Load model parameters.
  let modelParams = model.params.map((p) => p.clone());

  // Main training loop.
  for (let [dataBatch, targetBatch] of batches) {
    // NOTE: this is just one possible example.
    // Plan name (e.g. 'training_plan'), its input arguments and outputs depends on FL configuration and actual Plan implementation.
    let updatedModelParams = await job.plans['training_plan'].execute(
      job.worker,
      dataBatch,
      targetBatch,
      batchSize,
      lr,
      ...modelParams
    );

    // Use updated model params in the next iteration.
    for (let i = 0; i < modelParams.length; i++) {
      modelParams[i].dispose();
      modelParams[i] = updatedModelParams[i];
    }
  }

  // Calculate & send model diff.
  const modelDiff = await model.createSerializedDiff(modelParams);
  await job.report(modelDiff);
});

job.on('rejected', ({ timeout }) => {
  // Handle the job rejection, e.g. re-try after timeout.
});

job.on('error', (err) => {
  // Handle errors.
});

Model Training API

The Plan execution and Model training can be implemented easier using training helper that will do training loop for you (model, batch size, etc. are automatically taken from Job):

  // Main training loop.
  const training = job.train('training_plan', {
    inputs: [/* ... */],
    outputs: [/* ... */],
    data,
    target,
  });

  training.on('end', async () => {
      // Calculate & send model diff.
      const modelDiff = await model.createSerializedDiff(modelParams);
      await job.report(modelDiff);
  });

inputs and outputs need to be specified using PlanInputSpec and PlanOutputSpec and need to match with Plan's arguments and outputs. For example, if the Plan has following arguments and outputs:

loss, accuracy, modelParams1, modelParams2, modelParams3, modelParams4 = 
    plan(dataBatch, targetBatch, batchSize, lr, modelParams1, modelParams2, modelParams3, modelParams4)

Corresponding inputs, outputs in job.train will be:

const inputs = [
    new PlanInputSpec(PlanInputSpec.TYPE_DATA),
    new PlanInputSpec(PlanInputSpec.TYPE_TARGET),
    new PlanInputSpec(PlanInputSpec.TYPE_BATCH_SIZE),
    new PlanInputSpec(PlanInputSpec.TYPE_CLIENT_CONFIG_PARAM, 'lr'),
    new PlanInputSpec(PlanInputSpec.TYPE_MODEL_PARAM, 'param1', 0),
    new PlanInputSpec(PlanInputSpec.TYPE_MODEL_PARAM, 'param2', 1),
    new PlanInputSpec(PlanInputSpec.TYPE_MODEL_PARAM, 'param3', 2),
    new PlanInputSpec(PlanInputSpec.TYPE_MODEL_PARAM, 'param4', 3),
];

const outputs = [
    new PlanOutputSpec(PlanOutputSpec.TYPE_LOSS),
    new PlanOutputSpec(PlanOutputSpec.TYPE_METRIC, 'accuracy'),
    new PlanOutputSpec(PlanOutputSpec.TYPE_MODEL_PARAM, 'param1', 0),
    new PlanOutputSpec(PlanOutputSpec.TYPE_MODEL_PARAM, 'param2', 1),
    new PlanOutputSpec(PlanOutputSpec.TYPE_MODEL_PARAM, 'param3', 2),
    new PlanOutputSpec(PlanOutputSpec.TYPE_MODEL_PARAM, 'param4', 3),
];

Stop & Resume

PlanTrainer allows stopping and resuming the training using stop and resume methods:

  // Main training loop.
  const training = job.train('training_plan', {
    inputs: [/* ... */],
    outputs: [/* ... */],
    data,
    target,
  });

  training.on('start', () => {
    // training is started!
  });

  training.on('stop', () => {
    // training is stopped!
  });

  document.getElementById('stop-button').onclick = () => {
    training.stop();
  };

  document.getElementById('resume-button').onclick = () => {
    training.resume();
  };

Checkpointing

stop method returns current training state as PlanTrainerCheckpoint object, which can be serialized to JSON to restored from JSON later to continue the training:

const checkpoint = await training.stop();
const checkpointJson = await checkpoint.toJSON();
const checkpointJsonString = JSON.stringify(checkpointJson);
localStorage.setItem('checkpoint', checkpointJsonString);

// ... checkpoint can survive page reload ...

const checkpointJsonString = localStorage.getItem('checkpoint');
const checkpointJson = JSON.parse(checkpointJsonString);
const checkpoint = PlanTrainerCheckpoint.fromJSON(worker, checkpointJson);
    
// Main training loop.
const training = job.train('training_plan', {
  // Pass checkpoint into train method to resume from it
  // NOTE: checkpoint doesn't store Plan and training data, these still need to be supplied
  checkpoint,
  inputs: [/* ... */],
  outputs: [/* ... */],
  data,
  target,
});

Checkpoint can be created directly from PlanTrainer object using createCheckpoint method and applied back using applyCheckpoint:

const checkpoint = training.createCheckpoint();
// ...
training.applyCheckpoint(checkpoint);
training.resume();

Dataset / DataLoader API

One way to provide training data into PlanTrainer is to prepare and pass data and target parameters as plain tf.Tensor's. Another way is to use Dataset and DataLoader classes, which are simplified version of PyTorch's implementation.

Dataset class needs to be extended to implement element-wise access to the data. Resulting dataset is used with DataLoader that handles shuffling and batching of dataset elements.
The DataLoader can passed as data parameter into the PlanTrainer.

class MyDataset extends data.Dataset {
  
  constructor() {
    super();
    // this.data = ...;
    // this.target = ...;
  }

  getItem(index) {
    return [
      this.data[index],
      this.target[index]
    ];
  }

  get length() {
    return this.data.length;
  }
}

const dataset = new MyDataset();
const dl = new DataLoader({dataset, batchSize: 64, shuffle: true});

// Use with PlanTrainer
const training = job.train('training_plan', {
    inputs: [/* ... */],
    outputs: [/* ... */],
    data: dl
});

// Or use with custom training loop
for (const [data, target] of dl) {
  // ...
}

MNIST example has implementation of MNIST dataset based on Dataset class and DataLoader usage and additionally introduces data transformations using Transform.

API Documentation

See API Documentation for complete reference.

Running the Demo App

The “Hello World” syft.js demo is MNIST training example located in examples/mnist folder. It demonstrates how a simple neural net model created in PySyft can be trained in a browser and the result of training averaged from multiple federated learning participants.

syft.js MNIST demo animation

Running the demo is multi-stage and multi-component process (as the federated learning itself).

Below are example instructions that assume you want to put everything under ~/fl-demo folder.

Installation

It is recommended that you install python packages in separate virtualenv or conda environment, e.g.:

virtualenv -p python3 syft
source syft/bin/activate

or

conda create -n syft python=3.7
conda activate syft

Now, you will need to install following packages:

  • PySyft. Follow PySyft installation guide to install the latest 0.2.x branch of PySyft.

  • PyGrid. Follow PyGrid documentation to install the latest dev branch of PyGrid.

  • Syft.js with MNIST demo. Check out the latest dev branch of syft.js with MNIST demo app included:

    cd ~/fl-demo
    git clone https://github.com/OpenMined/syft.js
    cd syft.js
    npm install
    cd examples/mnist
    npm install

Seeding the Model & Plan

Syft.js connects to PyGrid to pick up the model and training Plan. For the demo to work, we need to populate that data into PyGrid.

Run PyGrid Node

See Getting Started for details. It is possible to start PyGrid Node using docker or using console script.

We assume you don't need to change default PyGrid Node configuration and it listens on the localhost:5000. If you need to use different host/port, PyGrid URL will need to be adjusted accordingly in further steps.

Create Model & Plan

After PyGrid is running, the next step is to create the model and training plan and host them in PyGrid. MNIST example jupyter notebooks guide you through this process.

Fire up jupyter notebook in PyGrid root folder:

cd ~/fl-demo/PyGrid
jupyter notebook --notebook-dir=$(pwd)

In the console, you should see URL you should open, or the browser will open automatically. After this, navigate to examples/model-centric and run the first notebook. At this point, you can pull down the model and training plan with syft.js. However, if you'd like to see how to execute the plan using the PySyft FL worker, try running the second notebook.

PyGrid Node Clean-up

In case you need to reset PyGrid Node database to blank state, stop the process with Ctrl+C and remove databaseGateway.db file in PyGrid. Or, if you used docker-compose, stop and re-start it using docker-compose up --force-recreate command.

Starting the Demo

Finally, we got to the browser part of the demo:

cd ~/fl-demo/syft.js/examples/mnist
npm start

This should start development server and open localhost:8080 in the browser. Assuming PyGrid URL, MNIST model name and version were not modified in previous steps, just press “Start FL Worker”.

You should see following in dev console:

  • Syft.js registers into training cycle on PyGrid and gets configuration, Plan, and the model.
  • App loads MNIST dataset and executes the training plan with each data batch. Charts are updated during this process, and you should see the training loss going down and the accuracy going up.
  • After the training is complete, model diff is submitted to PyGrid.

If “Keep making cycle requests” is checked, the whole cycle process is repeated until PyGrid tells worker that model training is complete.

Compatibility

PySyft

Syft.js has been tested with PySyft 0.2.7

PyGrid

Syft.js has been tested with the latest version of PyGrid on master.

Tensorflow.js

Syft.js was tested with Tensorflow.js v1.2.5.

Browser Support

Syft.js was tested with Chrome and Firefox browsers.

Support

For support in using this library, please join the #lib_syftjs Slack channel. If you’d like to follow along with any code changes to the library, please join the #code_syftjs Slack channel. Click here to join our Slack community!

Contributing

Please check open issues as a starting point.

Bug reports and feature suggestions are welcomed as well.

The workflow is usual for github, the master branch is considered stable and the dev branch is actively under development:

  1. Star, fork, and clone the syft.js repository.
  2. Create a new branch for changes from dev.
  3. Push changes to this branch.
  4. Submit a PR to OpenMined/syft.js.
  5. PR is reviewed and accepted.

Read the contribution guide as a good starting place. Additionally, we welcome you to the slack for queries related to the library and contribution in general. The Slack channel #lib_syftjs is specific to syft.js development. See you there!

Contributors

These people were integral part of the efforts to bring syft.js to fruition and in its active development.


Patrick Cason

🤔 💻 🎨 📖 💼

Vova Manannikov

💻 📖 ⚠️

Mike Nolan

💻

Ravikant Singh

💻 ⚠️ 📖

varun khare

💻

Pedro Espíndula

📖

José Benardi de Souza Nunes

⚠️

Tajinder Singh

💻

License

Apache License 2.0

More Repositories

1

PySyft

Perform data science on data that remains in someone else's server
Python
9,418
star
2

TenSEAL

A library for doing homomorphic encryption operations on tensors
C++
790
star
3

PyGrid-deprecated---see-PySyft-

A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science
Python
614
star
4

PyDP

The Python Differential Privacy Library. Built on top of: https://github.com/google/differential-privacy
Python
486
star
5

private-ai-resources

SOON TO BE DEPRECATED - Private machine learning progress
469
star
6

PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
Python
271
star
7

PyVertical

Privacy Preserving Vertical Federated Learning
Python
211
star
8

SyferText

A privacy preserving NLP framework
Python
196
star
9

courses

A place where our community can discuss OpenMined Courses, including posting questions, sharing feedback, or providing comments for discussion!
168
star
10

Roadmap

This repository contains OpenMined's official development and community roadmap.
131
star
11

PSI

Private Set Intersection Cardinality protocol based on ECDH and Bloom Filters
C++
128
star
12

SyMPC

A SMPC companion library for Syft
Python
96
star
13

KotlinSyft

The official Syft worker for secure on-device machine learning
Kotlin
83
star
14

PyDentity

A repository for leveraging Self-Sovereign Identity in applications
Jupyter Notebook
65
star
15

PySyft-TensorFlow

SOON TO BE DEPRECATED - The TensorFlow bindings for PySyft
Python
57
star
16

Threepio

A multi-language library for translating commands between PyTorch, TensorFlow, and TensorFlow.js
Python
56
star
17

sycret

Function Secret Sharing library for Python and Rust with hardware acceleration
Rust
50
star
18

SwiftSyft

The official Syft worker for iOS, built in Swift
Swift
47
star
19

openmined-website

The OpenMined website...
JavaScript
43
star
20

covid-alert

A privacy-preserving app for comparing last-known locations of coronavirus patients
JavaScript
43
star
21

PyFE

A library for running Functional Encryption on tensors
Python
41
star
22

PIR

Private Information Retrieval protocol
C++
41
star
23

PyZPK

Python wrapper for open source Zero Proof Knowledge Library
C++
27
star
24

openmined

OpenMined courses application
TypeScript
25
star
25

opus

Python
22
star
26

PyAriesFL

Federated Learning on HyperLedger Aries
Python
21
star
27

syft-proto

Defines types for all Serde encoding across languages
JavaScript
20
star
28

datasets

Jupyter Notebook
16
star
29

pygrid-admin

The user interface for PyGrid!
TypeScript
13
star
30

JavaDP

Differential privacy implementation in the Java family of languages (Java, Kotlin, Scala etc...)
11
star
31

aries-did.js

A repo for exploring the use of Hyperledger Aries to facilitate decentralised identity services.
TypeScript
11
star
32

syft_experimental

Deliberate experimental Rust implementation of Syft
Rust
11
star
33

SwiftDP

Swift wrapper for Google's Differential Privacy Project
Objective-C++
11
star
34

writing

11
star
35

sgx-experiments

Trusted execution experiments with Intel SGX
Makefile
11
star
36

omui

The OpenMined UI component system for usage in all our web applications and Framer prototyping
TypeScript
10
star
37

design

This is the main hub for those interested in design in the OpenMined community
Jupyter Notebook
10
star
38

CampX

Tensor Based Environment Framework for Training RL Agents - Pre Alpha
Python
8
star
39

.github

All our community health files
7
star
40

design-assets

All OpenMined design assets
7
star
41

Bootcamps

7
star
42

serverless-website-api

SOON TO BE DEPRECATED - A Github statistics fetcher, running on a cron job, with permanent storage to DynamoDB, for the OpenMined community.
JavaScript
7
star
43

privacy-conference

The website for our 2020 privacy conference
JavaScript
6
star
44

PyDPValidator

Validation assets for core OpenMined libraries
Jupyter Notebook
6
star
45

X-PenTest

Repository for carrying out Pentesting on OM Infrastructure
6
star
46

NetworkRegistry

5
star
47

miner

A collection of web scraping technologies focused around making it easy for users to download their data.
5
star
48

paillier.js

A pure javascript implementation of paillier - runnable in browser, node, or react native
TypeScript
5
star
49

research

5
star
50

Hackathon-DSA

Jupyter Notebook
4
star
51

openmined-ghost-theme

SOON TO BE DEPRECATED - The theme for the OpenMined and Weekly Digs blogs.
SCSS
4
star
52

GridMonitor

SOON TO BE DEPRECATED - A user interface for monitoring a network router for PyGrid Platform
CSS
3
star
53

diffPrivR

R implementation of google's differential privacy library
3
star
54

daa.js

A javascript wrapper around https://github.com/xaptum/ecdaa
3
star
55

syft-enclave

Python
3
star
56

syft.cpp

SOON TO BE DEPRECATED - A library for encrypted, privacy preserving machine learning
C++
3
star
57

OpenGridNodes

1
star
58

KotlinPSI

A Kotlin library for private set intersection
1
star
59

clojure-dp

Clojure
1
star
60

trasterisk

kwarger is a Flake8 plugin which enforces named kwargs or trasterisks in your function arguments
Python
1
star
61

SwiftPSI

A Swift library for private set intersection
1
star