Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Clojure

Shell

PHP

Perl

Python

Jupyter Notebook

Haskell

Zig

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Groovy

Dart

Zig

Elixir

Erlang

PHP

Kotlin

Lua

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇰🇳 Saint Kitts and Nevis

🇬🇶 Equatorial Guinea

🇦🇺 Australia

🇲🇭 Marshall Islands

🇱🇨 Saint Lucia

🇹🇱 Timor-Leste

🇬🇸 South Georgia and the South Sandwich Islands

All Countries Compare Countries

Top Contributors
Users
Organizations
Repositories
Discover Languages
Awesome lists
Ranking by Country
Interviews

kingoflolz/swarm-jax

Stars
234
Rank 171,630 (Top 4 %)
Language
Python
Created almost 4 years ago
Updated over 1 year ago

kingoflolz/swarm-jax

kingoflolz

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes

Pipelined Swarm Training

Swarm training "framework" using Haiku + Jax + Ray.

Designed for training large language models in a model parallel fashion with unreliable, heterogeneous nodes. (eventually)

Look in swarm_run.py for an example of running a character transformer on enwik8.

TODOs

Forward passes
Backward passes with activation reconstruction
Run optimizer
Logging
Checkpointing
Actually do pipelining
fp16 with static loss scaling
Integer quantization for activations and gradients between layers
Get rid of pipeline stalls from running optimizer
Data parallelism with multiple nodes per layer and gradient/weight aggregation
Heterogeneous nodes with potentially multiple layers per node
Handle unbalanced and unreliable nodes (layerdrop)
Dynamic node addition
1T or bust?

More Repositories

mesh-transformer-jax

Model parallel transformers in JAX and Haiku

linear-actuator-hardware

cc_img_dl

linear-actuator-firmware

fiducial

A LFTag and TopoTag implementation in Rust

remote-obj

A rust proc-macro which allows for reading and writing to remote objects through a generated enum

mic_gateware

mic_hardware

routing

routing-actor

theory-problems

swarm-tensorflow

flickr_warc

Extract multimodal training data from Flickr WARC files

commoncrawl_filter

benwang.dev

mic_host

proxy

Home
Users
Organizations
Repositories
Rating by Country
Discover
Awesome
Interviews
Support
Contact

© Copyright 2024 Opensource Heroes

Love Open Source and this site? Check out how you can help us