• Stars
    star
    352
  • Rank 120,622 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for "Temporal Difference Learning for Model Predictive Control"

Temporal Difference Learning for Model Predictive Control

Original PyTorch implementation of TD-MPC from

Temporal Difference Learning for Model Predictive Control by

Nicklas Hansen, Xiaolong Wang*, Hao Su*



[Paper] [Website]

Method

TD-MPC is a framework for model predictive control (MPC) using a Task-Oriented Latent Dynamics (TOLD) model and a terminal value function learned jointly by temporal difference (TD) learning. TD-MPC plans actions entirely in latent space using the TOLD model, which learns compact task-centric representations from either state or image inputs. TD-MPC solves challenging Humanoid and Dog locomotion tasks in 1M environment steps.

Citation

If you use our method or code in your research, please consider citing the paper as follows:

@article{Hansen2022tdmpc,
	title={Temporal Difference Learning for Model Predictive Control},
	author={Nicklas Hansen and Xiaolong Wang and Hao Su},
	eprint={2203.04955},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	year={2022}
}

Instructions

Assuming that you already have MuJoCo installed, install dependencies using conda:

conda env create -f environment.yaml
conda activate tdmpc

After installing dependencies, you can train an agent by calling

python src/train.py task=dog-run

Evaluation videos and model weights can be saved with arguments save_video=True and save_model=True. Refer to the cfgs directory for a full list of options and default hyperparameters, and see tasks.txt for a list of supported tasks. We also provide results for all 23 state-based DMControl tasks in the results directory.

The training script supports both local logging as well as cloud-based logging with Weights & Biases. To use W&B, provide a key by setting the environment variable WANDB_API_KEY=<YOUR_KEY> and add your W&B project and entity details to cfgs/default.yaml.

Changelog

  • [08-29-2022] Added safeguard against NaNs in rare cases. Fixed an issue that caused multi-dimensional observation spaces to be inferred incorrectly.
  • [03-27-2022] Reduced memory usage in pixel experiments by 6x. Code improvements. Refactoring. Update default pixel hyperparameters.
  • [03-10-2022] Initial code release.

License & Acknowledgements

TD-MPC is licensed under the MIT license. MuJoCo and DeepMind Control Suite are licensed under the Apache 2.0 license. We thank the DrQv2 authors for their implementation of DMControl wrappers.

More Repositories

1

tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"
Python
327
star
2

rnn_lstm_from_scratch

How to build RNNs and LSTMs from scratch with NumPy.
Jupyter Notebook
247
star
3

voice-activity-detection

Voice Activity Detection (VAD) using deep learning.
Jupyter Notebook
190
star
4

dmcontrol-generalization-benchmark

DMControl Generalization Benchmark
Python
165
star
5

puppeteer

Code for "Hierarchical World Models as Visual Whole-Body Humanoid Controllers"
Python
140
star
6

policy-adaptation-during-deployment

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.
Python
111
star
7

neural-net-optimization

PyTorch implementations of recent optimization algorithms for deep learning.
Python
61
star
8

minimal-nas

Minimal implementation of a Neural Architecture Search system.
Python
36
star
9

svea-vit

Code for the paper "Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation"
Python
17
star
10

adaptive-learning-rate-schedule

PyTorch implementation of the "Learning an Adaptive Learning Rate Schedule" paper found here: https://arxiv.org/abs/1909.09712.
Python
10
star
11

nicklashansen.github.io

Repository for my personal site https://nicklashansen.github.io/, built with plain html.
HTML
9
star
12

a3c

Asynchronous Advantage Actor-Critic using Generalized Advantage Estimation (PyTorch)
Python
8
star
13

smallrl

Personal repository for quick RL prototyping. Work in progress!
Python
3
star
14

docker-from-conda

Builds a docker image from a conda environment.yml file.
Dockerfile
3
star
15

music-genre-classification

Exam project on Audio Features for Music Genre Classification for course 02452 Audio Information Processing Systems at Technical University of Denmark (DTU).
Jupyter Notebook
1
star
16

bachelor-thesis

Repository for bachelor thesis on Automatic Multi-Modal Detection of Autonomic Arousals in Sleep. The thesis itself and all related data is confidential and thus not publicly available, but access to the thesis can be granted by sending a request to [email protected].
Python
1
star
17

reinforcement-learning-sutton-barto

Personal repository for course on reinforcement learning. Includes implementations of various problems from the Reinforcement Learning: An Introduction book by R. Sutton and A. Barto.
Jupyter Notebook
1
star
18

nautilus-launcher

Minimal launcher for Nautilus
Python
1
star