Awesome-DL-Scheduling-Papers
A curated list of DL cluster scheduling papers.
Please feel free to pull requests or open an issue to add papers.
Schedulers for DL training
Scheduler | Year | Series | Paper | Objective | Heter. | Elastic | AutoML | Code |
---|---|---|---|---|---|---|---|---|
Synergy | 2022 | OSDI | Paper | - | - | - | - | |
Singularity | 2022 | arxiv | Paper | - | - | - | ||
GADGET | 2022 | INFOCOM | Paper | - | - | Code | ||
EDL | 2022 | TPDS | Paper | - | - | - | ||
Aryl | 2022 | arxiv | Paper | - | - | |||
AOnline | 2022 | TCC | Paper | - | - | - | ||
Ali-MLaaS | 2022 | NSDI | Paper | - | - | - | Code | |
SMD | 2021 | INFOCOM | Paper | - | - | - | - | |
SEER | 2021 | SoCC | Paper | â–² | - | - | ||
RubberBand | 2021 | EuroSys | Paper | - | - | |||
POP | 2021 | SOSP | Paper | - | - | Code | ||
Pollux | 2021 | OSDI | Paper | - | Code | |||
ONES | 2021 | SC | Paper | - | - | Code | ||
Liquid | 2021 | TPDS | Paper | - | - | - | Code | |
Jigsaw | 2021 | DistributedML | Paper | - | - | - | - | |
Horus | 2021 | TPDS | Paper | - | - | - | - | |
Hermes | 2021 | Electronics | Paper | - | - | - | ||
Helios | 2021 | SC | Paper | - | - | - | Code | |
DynamoML | 2021 | CLOSER | Paper | - | - | - | ||
Chronus | 2021 | SoCC | Paper | ✿ | - | - | - | Code |
Astraea | 2021 | TPDS | Paper | - | - | - | Code | |
ANDREAS | 2021 | FCloud | Paper | - | - | - | - | |
AFS | 2021 | NSDI | Paper | - | - | - | ||
2021 | TPDS | Paper | - | - | Code | |||
Yeung | 2020 | HotCloud | Paper | - | - | - | - | |
Vaibhav et al. | 2020 | MASCOTS | Paper | - | - | - | ||
Themis | 2020 | NSDI | Paper | - | - | - | - | |
SPIN | 2020 | INFOCOM | Paper | - | - | - | - | |
Salus | 2020 | MLSys | Paper | - | - | - | Code | |
Parrot | 2020 | TCC | Paper | - | - | - | - | |
Non-Intrusive | 2020 | SC | Paper | - | - | - | ||
MLFS | 2020 | CoNext | Paper | - | - | - | Code | |
MLCloudPrice | 2020 | DISPA | Paper | - | - | - | Code | |
MARBLE | 2020 | CCGRID | Paper | - | - | - | ||
HiveD | 2020 | OSDI | Paper | - | - | - | Code | |
GENIE | 2020 | TPDS | Paper | ✿ | - | - | - | |
Gavel | 2020 | OSDI | Paper | - | - | Code | ||
E-LAS | 2020 | ICPP | Paper | - | - | - | - | |
Elan | 2020 | ICDCS | Paper | - | - | - | ||
Co-scheML | 2020 | ACSOS | Paper | - | - | - | - | |
CODA | 2020 | ICDCS | Paper | - | - | - | ||
Antman | 2020 | OSDI | Paper | - | - | Code | ||
Ada-SRSF | 2020 | arxiv | Paper | - | - | - | ||
2020 | EuroSys | Paper | - | - | - | |||
Tiresias | 2019 | NSDI | Paper | - | - | - | Code | |
Philly | 2019 | ATC | Paper | - | - | - | Code | |
JPAS | 2019 | JNCA | Paper | - | - | - | ||
Jahani | 2019 | ICCCS | Paper | - | - | |||
HyperSched | 2019 | SoCC | Paper | ✿ ▲ | - | - | ||
Harmony | 2019 | INFOCOM | Paper | - | - | - | - | |
FfDL | 2019 | Middleware | Paper | - | - | - | Code | |
Dragon | 2019 | CLOSER | Paper | - | - | - | ||
Cynthia | 2019 | ICPP | Paper | - | - | - | ||
2019 | GLOBECOM | Paper | - | - | - | - | ||
2019 | CC | Paper | - | - | ||||
Optimus | 2018 | EuroSys | Paper | - | - | Code | ||
OASiS | 2018 | INFOCOM | Paper | - | - | - | ||
Gandiva | 2018 | OSDI | Paper | - | - | |||
Topology-Aware | 2017 | SC | Paper | - | - | - | Code | |
HyperDrive | 2017 | Middleware | Paper | - | - | - | ||
Dorm | 2017 | SMARTCOMP | Paper | - | - | - | - |
JCT:
Schedulers for DL Inference
Scheduler | Year | Series | Paper | Objective | Batch | Share | Cloud | Source Code |
---|---|---|---|---|---|---|---|---|
Cocktail | 2022 | NSDI | Paper | - | - | - | ||
INFaaS | 2021 | ATC | Paper | - | Code | |||
Mendoza et al. | 2021 | EuroMLSys | Paper | - | - | - | ||
Morphling | 2021 | SoCC | Paper | Code | ||||
Abacus | 2021 | SC | Paper | - | - | Code | ||
MIG-SERVING | 2021 | CoRR | Paper | - | - | |||
GSLICE | 2020 | SoCC | Paper | - | - | |||
Clockwork | 2020 | OSDI | Paper | - | - | Code | ||
CMS | 2020 | Future Internet | Paper | - | - | - | - | |
Irina | 2020 | APNet | Paper | - | - | |||
PERSEUS | 2020 | IC2E | Paper | - | Code | |||
AutoDeep | 2020 | Infocom | Paper | - | - | |||
DyBatch | 2020 | CCGrid | Paper | - | - | |||
Inferline | 2020 | SoCC | Paper | - | Code | |||
MArk | 2019 | ATC | Paper | - | Code | |||
Tolerance Tiers | 2019 | ISPASS | Paper | - | - | - | ||
ParM | 2019 | SOSP | Paper | - | - | Code | ||
Gilman et al. | 2019 | DIDL | Paper | - | - | - | ||
Nanily | 2019 | HPCC | Paper | - | - | - | ||
RRL | 2019 | SC | Paper | - | Code | |||
Kube-Knots | 2019 | CLUSTER | Paper | - | - | |||
TrIMS | 2019 | CLOUD | Paper | Code | ||||
Ebird | 2019 | ICCD | Paper | - | Code | |||
Rafiki | 2018 | VLDB | Paper | - | - | Code | ||
Space-Time | 2018 | NIPS | Paper | - | - | |||
Ease.ml | 2018 | VLDB | Paper | - | - | - | Code | |
HiveMind | 2018 | NIPS | Paper | - | - | |||
Clipper | 2017 | NSDI | Paper | - | - | Code |
Accuracy:
Gloosary of Terms
Terminology | Definition |
---|---|
JCT | Job Completion Time (Job Finish Time - Job Submission Time) |
Fairness | a metric to assess whether resources are fairly shared among users or jobs |
QoS | Quality of Service |
DDL | Deadline, a time point where DL job must be completed |
SLO | Service Level Objective |