• Stars
    star
    116
  • Rank 302,098 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created about 3 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Conservative Q Learning on top of SAC

CQL

A simple and modular implementation of the Conservative Q Learning and Soft Actor Critic algorithm in PyTorch.

If you like Jax, checkout my reimplementation of this codebase in Jax, which runs 4 times faster.

Installation

  1. Install and use the included Ananconda environment
$ conda env create -f environment.yml
$ source activate SimpleSAC

You'll need to get your own MuJoCo key if you want to use MuJoCo.

  1. Add this repo directory to your PYTHONPATH environment variable.
export PYTHONPATH="$PYTHONPATH:$(pwd)"

Run Experiments

You can run SAC experiments using the following command:

python -m SimpleSAC.sac_main \
    --env 'HalfCheetah-v2' \
    --logging.output_dir './experiment_output'

All available command options can be seen in SimpleSAC/conservative_sac_main.py and SimpleSAC/conservative_sac.py.

You can run CQL experiments using the following command:

python -m SimpleSAC.conservative_sac_main \
    --env 'halfcheetah-medium-v0' \
    --logging.output_dir './experiment_output'

If you want to run on CPU only, just add the --device='cpu' option. All available command options can be seen in SimpleSAC/sac_main.py and SimpleSAC/sac.py.

Visualize Experiments

You can visualize the experiment metrics with viskit:

python -m viskit './experiment_output'

and simply navigate to http://localhost:5000/

Weights and Biases Online Visualization Integration

This codebase can also log to W&B online visualization platform. To log to W&B, you first need to set your W&B API key environment variable:

export WANDB_API_KEY='YOUR W&B API KEY HERE'

Then you can run experiments with W&B logging turned on:

python -m SimpleSAC.conservative_sac_main \
    --env 'halfcheetah-medium-v0' \
    --logging.output_dir './experiment_output' \
    --device='cuda' \
    --logging.online

Results of Running CQL on D4RL Environments

In order to save your time and compute resources, I've done a sweep of CQL on certain D4RL environments with various min Q weight values. The results can be seen here. You can choose the environment to visualize by filtering on env. The results for each cql.cql_min_q_weight on each env is repeated and average across 3 random seeds.

Credits

The project organization is inspired by TD3. The SAC implementation is based on rlkit. THe CQL implementation is based on CQL. The viskit visualization is taken from viskit, which is taken from rllab.

More Repositories

1

EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
Python
2,333
star
2

koala_data_pipeline

The data processing pipeline for the Koala chatbot language model
Python
115
star
3

scalax

A simple library for scaling up JAX programs
Python
113
star
4

m3ae_public

Multimodal Masked Autoencoders (M3AE): A JAX/Flax Implementation
Python
99
star
5

JaxCQL

Conservative Q learning in Jax
Python
47
star
6

mlxu

Machine Learning eXperiment Utilities
Python
39
star
7

SimpleSAC

A simple and easy to use implementation of the soft actor-critic algorithm.
Python
15
star
8

gps_superball_public

GPS codebase for NASA Superball tensegrity robot
Python
15
star
9

mintext

Minimal but scalable implementation of large language models in JAX
Python
12
star
10

tpu_pod_commander

TPU pod commander is a package for managing and launching jobs on Google Cloud TPU pods.
Python
8
star
11

MLClass

Coursera machine learning class
MATLAB
6
star
12

cs184_final

CS184 final project: a point cloud to mesh generator
Python
5
star
13

parameterized_model

TensorFlow parameterized model library
Python
4
star
14

ResumeAnalytics

StatNews Resume Project
Python
3
star
15

brc_cql_example

Python
3
star
16

cs288

Jupyter Notebook
1
star
17

cs9e

CS 9E homework
Shell
1
star
18

acd_archive

Automatically archive to Amazon cloud drive
Python
1
star
19

UVaClient

Command Line Client for UVa Online Judge
Python
1
star
20

iColorTF

iColor implementation in TensorFlow
Python
1
star
21

ShuffleX

A smart music player with machine learning based shuffle algorithm.
Python
1
star
22

EvoLib

A framework in Java for genetic algorithm development
Java
1
star
23

pytorch-maml-q-rl

Python
1
star