• Stars
    star
    111
  • Rank 314,510 (Top 7 %)
  • Language
    Python
  • Created over 4 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment

PyTorch implementation of PAD and evaluation benchmarks from

Self-Supervised Policy Adaptation during Deployment

Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

[Paper] [Website]

samples

Citation

If you find our work useful in your research, please consider citing the paper as follows:

@article{hansen2020deployment,
  title={Self-Supervised Policy Adaptation during Deployment},
  author={Nicklas Hansen and Rishabh Jangir and Yu Sun and Guillem Alenyà and Pieter Abbeel and Alexei A. Efros and Lerrel Pinto and Xiaolong Wang},
  year={2020},
  eprint={2007.04309},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Setup

We assume that you have access to a GPU with CUDA >=9.2 support. All dependencies can then be installed with the following commands:

conda env create -f setup/conda.yml
conda activate pad
sh setup/install_envs.sh

Training & Evaluation

We have prepared training and evaluation scripts that can be run by sh scripts/train.sh and sh scripts/eval.sh. Alternatively, you can call the python scripts directly, e.g. for training call

CUDA_VISIBLE_DEVICES=0 python3 src/train.py \
    --domain_name cartpole \
    --task_name swingup \
    --action_repeat 8 \
    --mode train \
    --use_inv \
    --num_shared_layers 8 \
    --seed 0 \
    --work_dir logs/cartpole_swingup/inv/0 \
    --save_model

which should give you an output of the form

| train | E: 1 | S: 1000 | D: 0.8 s | R: 0.0000 | BR: 0.0000 | 
  ALOSS: 0.0000 | CLOSS: 0.0000 | RLOSS: 0.0000

We provide a pre-trained model that can be used for evaluation. To run Policy Adaptation during Deployment, call

CUDA_VISIBLE_DEVICES=0 python3 src/eval.py \
    --domain_name cartpole \
    --task_name swingup \
    --action_repeat 8 \
    --mode color_hard \
    --use_inv \
    --num_shared_layers 8 \
    --seed 0 \
    --work_dir logs/cartpole_swingup/inv/0 \
    --pad_checkpoint 500k

which should give you an output of the form

Evaluating logs/cartpole_swingup/inv/0 for 100 episodes (mode: color_hard)
eval reward: 666

Policy Adaptation during Deployment of logs/cartpole_swingup/inv/0 for 100 episodes (mode: color_hard)
pad reward: 722

Here's a few samples from the training and test environments of our benchmark:

samples

Please refer to the project page and paper for results and experimental details.

Acknowledgements

We want to thank the numerous researchers and engineers involved in work of which this implementation is based on. Our SAC implementation is based on this repository, the original DeepMind Control suite is available here and the gym wrapper for it is available here. Go check them out!

More Repositories

1

tdmpc

Code for "Temporal Difference Learning for Model Predictive Control"
Python
352
star
2

tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"
Python
327
star
3

rnn_lstm_from_scratch

How to build RNNs and LSTMs from scratch with NumPy.
Jupyter Notebook
247
star
4

voice-activity-detection

Voice Activity Detection (VAD) using deep learning.
Jupyter Notebook
190
star
5

dmcontrol-generalization-benchmark

DMControl Generalization Benchmark
Python
165
star
6

puppeteer

Code for "Hierarchical World Models as Visual Whole-Body Humanoid Controllers"
Python
140
star
7

neural-net-optimization

PyTorch implementations of recent optimization algorithms for deep learning.
Python
61
star
8

minimal-nas

Minimal implementation of a Neural Architecture Search system.
Python
36
star
9

svea-vit

Code for the paper "Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation"
Python
17
star
10

adaptive-learning-rate-schedule

PyTorch implementation of the "Learning an Adaptive Learning Rate Schedule" paper found here: https://arxiv.org/abs/1909.09712.
Python
10
star
11

nicklashansen.github.io

Repository for my personal site https://nicklashansen.github.io/, built with plain html.
HTML
9
star
12

a3c

Asynchronous Advantage Actor-Critic using Generalized Advantage Estimation (PyTorch)
Python
8
star
13

smallrl

Personal repository for quick RL prototyping. Work in progress!
Python
3
star
14

docker-from-conda

Builds a docker image from a conda environment.yml file.
Dockerfile
3
star
15

music-genre-classification

Exam project on Audio Features for Music Genre Classification for course 02452 Audio Information Processing Systems at Technical University of Denmark (DTU).
Jupyter Notebook
1
star
16

bachelor-thesis

Repository for bachelor thesis on Automatic Multi-Modal Detection of Autonomic Arousals in Sleep. The thesis itself and all related data is confidential and thus not publicly available, but access to the thesis can be granted by sending a request to [email protected].
Python
1
star
17

reinforcement-learning-sutton-barto

Personal repository for course on reinforcement learning. Includes implementations of various problems from the Reinforcement Learning: An Introduction book by R. Sutton and A. Barto.
Jupyter Notebook
1
star
18

nautilus-launcher

Minimal launcher for Nautilus
Python
1
star