CORL (Clean Offline Reinforcement Learning)
🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA ORL algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too!
- 📜 Single-file implementation
- 📈 Benchmarked Implementation for N algorithms
- 🖼 Weights and Biases integration
- ⭐ If you're interested in discrete control, make sure to check out our new library — Katakomba. It provides both discrete control algorithms augmented with recurrence and an offline RL benchmark for the NetHack Learning environment.
Getting started
git clone https://github.com/tinkoff-ai/CORL.git && cd CORL
pip install -r requirements/requirements_dev.txt
# alternatively, you could use docker
docker build -t <image_name> .
docker run --gpus=all -it --rm --name <container_name> <image_name>
Algorithms Implemented
D4RL Benchmarks
You can check the links above for learning curves and details. Here, we report reproduced final and best scores. Note that they differ by a significant margin, and some papers may use different approaches, not making it always explicit which reporting methodology they chose. If you want to re-collect our results in a more structured/nuanced manner, see results
.
Offline
Last Scores
Gym-MuJoCo
Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
---|---|---|---|---|---|---|---|---|---|---|
halfcheetah-medium-v2 | 42.40 ± 0.19 | 42.46 ± 0.70 | 48.10 ± 0.18 | 49.46 ± 0.62 | 47.04 ± 0.22 | 48.31 ± 0.22 | 64.04 ± 0.68 | 68.20 ± 1.28 | 67.70 ± 1.04 | 42.20 ± 0.26 |
halfcheetah-medium-replay-v2 | 35.66 ± 2.33 | 23.59 ± 6.95 | 44.84 ± 0.59 | 44.70 ± 0.69 | 45.04 ± 0.27 | 44.46 ± 0.22 | 51.18 ± 0.31 | 60.70 ± 1.01 | 62.06 ± 1.10 | 38.91 ± 0.50 |
halfcheetah-medium-expert-v2 | 55.95 ± 7.35 | 90.10 ± 2.45 | 90.78 ± 6.04 | 93.62 ± 0.41 | 95.63 ± 0.42 | 94.74 ± 0.52 | 103.80 ± 2.95 | 98.96 ± 9.31 | 104.76 ± 0.64 | 91.55 ± 0.95 |
hopper-medium-v2 | 53.51 ± 1.76 | 55.48 ± 7.30 | 60.37 ± 3.49 | 74.45 ± 9.14 | 59.08 ± 3.77 | 67.53 ± 3.78 | 102.29 ± 0.17 | 40.82 ± 9.91 | 101.70 ± 0.28 | 65.10 ± 1.61 |
hopper-medium-replay-v2 | 29.81 ± 2.07 | 70.42 ± 8.66 | 64.42 ± 21.52 | 96.39 ± 5.28 | 95.11 ± 5.27 | 97.43 ± 6.39 | 94.98 ± 6.53 | 100.33 ± 0.78 | 99.66 ± 0.81 | 81.77 ± 6.87 |
hopper-medium-expert-v2 | 52.30 ± 4.01 | 111.16 ± 1.03 | 101.17 ± 9.07 | 52.73 ± 37.47 | 99.26 ± 10.91 | 107.42 ± 7.80 | 109.45 ± 2.34 | 101.31 ± 11.63 | 105.19 ± 10.08 | 110.44 ± 0.33 |
walker2d-medium-v2 | 63.23 ± 16.24 | 67.34 ± 5.17 | 82.71 ± 4.78 | 66.53 ± 26.04 | 80.75 ± 3.28 | 80.91 ± 3.17 | 85.82 ± 0.77 | 87.47 ± 0.66 | 93.36 ± 1.38 | 67.63 ± 2.54 |
walker2d-medium-replay-v2 | 21.80 ± 10.15 | 54.35 ± 6.34 | 85.62 ± 4.01 | 82.20 ± 1.05 | 73.09 ± 13.22 | 82.15 ± 3.03 | 84.25 ± 2.25 | 78.99 ± 0.50 | 87.10 ± 2.78 | 59.86 ± 2.73 |
walker2d-medium-expert-v2 | 98.96 ± 15.98 | 108.70 ± 0.25 | 110.03 ± 0.36 | 49.41 ± 38.16 | 109.56 ± 0.39 | 111.72 ± 0.86 | 111.86 ± 0.43 | 114.93 ± 0.41 | 114.75 ± 0.74 | 107.11 ± 0.96 |
locomotion average | 50.40 | 69.29 | 76.45 | 67.72 | 78.28 | 81.63 | 89.74 | 83.52 | 92.92 | 73.84 |
Maze2d
Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
---|---|---|---|---|---|---|---|---|---|---|
maze2d-umaze-v1 | 0.36 ± 8.69 | 12.18 ± 4.29 | 29.41 ± 12.31 | 82.67 ± 28.30 | -8.90 ± 6.11 | 42.11 ± 0.58 | 106.87 ± 22.16 | 130.59 ± 16.52 | 95.26 ± 6.39 | 18.08 ± 25.42 |
maze2d-medium-v1 | 0.79 ± 3.25 | 14.25 ± 2.33 | 59.45 ± 36.25 | 52.88 ± 55.12 | 86.11 ± 9.68 | 34.85 ± 2.72 | 105.11 ± 31.67 | 88.61 ± 18.72 | 57.04 ± 3.45 | 31.71 ± 26.33 |
maze2d-large-v1 | 2.26 ± 4.39 | 11.32 ± 5.10 | 97.10 ± 25.41 | 209.13 ± 8.19 | 23.75 ± 36.70 | 61.72 ± 3.50 | 78.33 ± 61.77 | 204.76 ± 1.19 | 95.60 ± 22.92 | 35.66 ± 28.20 |
maze2d average | 1.13 | 12.58 | 61.99 | 114.89 | 33.65 | 46.23 | 96.77 | 141.32 | 82.64 | 28.48 |
Antmaze
Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
---|---|---|---|---|---|---|---|---|---|---|
antmaze-umaze-v2 | 55.25 ± 4.15 | 65.75 ± 5.26 | 70.75 ± 39.18 | 57.75 ± 10.28 | 92.75 ± 1.92 | 77.00 ± 5.52 | 97.75 ± 1.48 | 0.00 ± 0.00 | 0.00 ± 0.00 | 57.00 ± 9.82 |
antmaze-umaze-diverse-v2 | 47.25 ± 4.09 | 44.00 ± 1.00 | 44.75 ± 11.61 | 58.00 ± 7.68 | 37.25 ± 3.70 | 54.25 ± 5.54 | 83.50 ± 7.02 | 0.00 ± 0.00 | 0.00 ± 0.00 | 51.75 ± 0.43 |
antmaze-medium-play-v2 | 0.00 ± 0.00 | 2.00 ± 0.71 | 0.25 ± 0.43 | 0.00 ± 0.00 | 65.75 ± 11.61 | 65.75 ± 11.71 | 89.50 ± 3.35 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
antmaze-medium-diverse-v2 | 0.75 ± 0.83 | 5.75 ± 9.39 | 0.25 ± 0.43 | 0.00 ± 0.00 | 67.25 ± 3.56 | 73.75 ± 5.45 | 83.50 ± 8.20 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
antmaze-large-play-v2 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 20.75 ± 7.26 | 42.00 ± 4.53 | 52.25 ± 29.01 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
antmaze-large-diverse-v2 | 0.00 ± 0.00 | 0.75 ± 0.83 | 0.00 ± 0.00 | 0.00 ± 0.00 | 20.50 ± 13.24 | 30.25 ± 3.63 | 64.00 ± 5.43 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
antmaze average | 17.21 | 19.71 | 19.33 | 19.29 | 50.71 | 57.17 | 78.42 | 0.00 | 0.00 | 18.12 |
Adroit
Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
---|---|---|---|---|---|---|---|---|---|---|
pen-human-v1 | 71.03 ± 6.26 | 26.99 ± 9.60 | -3.88 ± 0.21 | 81.12 ± 13.47 | 13.71 ± 16.98 | 78.49 ± 8.21 | 103.16 ± 8.49 | 6.86 ± 5.93 | 5.07 ± 6.16 | 67.68 ± 5.48 |
pen-cloned-v1 | 51.92 ± 15.15 | 46.67 ± 14.25 | 5.13 ± 5.28 | 89.56 ± 15.57 | 1.04 ± 6.62 | 83.42 ± 8.19 | 102.79 ± 7.84 | 31.35 ± 2.14 | 12.02 ± 1.75 | 64.43 ± 1.43 |
pen-expert-v1 | 109.65 ± 7.28 | 114.96 ± 2.96 | 122.53 ± 21.27 | 160.37 ± 1.21 | -1.41 ± 2.34 | 128.05 ± 9.21 | 152.16 ± 6.33 | 87.11 ± 48.95 | -1.55 ± 0.81 | 116.38 ± 1.27 |
door-human-v1 | 2.34 ± 4.00 | -0.13 ± 0.07 | -0.33 ± 0.01 | 4.60 ± 1.90 | 5.53 ± 1.31 | 3.26 ± 1.83 | -0.10 ± 0.01 | -0.38 ± 0.00 | -0.12 ± 0.13 | 4.44 ± 0.87 |
door-cloned-v1 | -0.09 ± 0.03 | 0.29 ± 0.59 | -0.34 ± 0.01 | 0.93 ± 1.66 | -0.33 ± 0.01 | 3.07 ± 1.75 | 0.06 ± 0.05 | -0.33 ± 0.00 | 2.66 ± 2.31 | 7.64 ± 3.26 |
door-expert-v1 | 105.35 ± 0.09 | 104.04 ± 1.46 | -0.33 ± 0.01 | 104.85 ± 0.24 | -0.32 ± 0.02 | 106.65 ± 0.25 | 106.37 ± 0.29 | -0.33 ± 0.00 | 106.29 ± 1.73 | 104.87 ± 0.39 |
hammer-human-v1 | 3.03 ± 3.39 | -0.19 ± 0.02 | 1.02 ± 0.24 | 3.37 ± 1.93 | 0.14 ± 0.11 | 1.79 ± 0.80 | 0.24 ± 0.24 | 0.24 ± 0.00 | 0.28 ± 0.18 | 1.28 ± 0.15 |
hammer-cloned-v1 | 0.55 ± 0.16 | 0.12 ± 0.08 | 0.25 ± 0.01 | 0.21 ± 0.24 | 0.30 ± 0.01 | 1.50 ± 0.69 | 5.00 ± 3.75 | 0.14 ± 0.09 | 0.19 ± 0.07 | 1.82 ± 0.55 |
hammer-expert-v1 | 126.78 ± 0.64 | 121.75 ± 7.67 | 3.11 ± 0.03 | 127.06 ± 0.29 | 0.26 ± 0.01 | 128.68 ± 0.33 | 133.62 ± 0.27 | 25.13 ± 43.25 | 28.52 ± 49.00 | 117.45 ± 6.65 |
relocate-human-v1 | 0.04 ± 0.03 | -0.14 ± 0.08 | -0.29 ± 0.01 | 0.05 ± 0.03 | 0.06 ± 0.03 | 0.12 ± 0.04 | 0.16 ± 0.30 | -0.31 ± 0.01 | -0.17 ± 0.17 | 0.05 ± 0.01 |
relocate-cloned-v1 | -0.06 ± 0.01 | -0.00 ± 0.02 | -0.30 ± 0.01 | -0.04 ± 0.04 | -0.29 ± 0.01 | 0.04 ± 0.01 | 1.66 ± 2.59 | -0.01 ± 0.10 | 0.17 ± 0.35 | 0.16 ± 0.09 |
relocate-expert-v1 | 107.58 ± 1.20 | 97.90 ± 5.21 | -1.73 ± 0.96 | 108.87 ± 0.85 | -0.30 ± 0.02 | 106.11 ± 4.02 | 107.52 ± 2.28 | -0.36 ± 0.00 | 71.94 ± 18.37 | 104.28 ± 0.42 |
adroit average | 48.18 | 42.69 | 10.40 | 56.75 | 1.53 | 53.43 | 59.39 | 12.43 | 18.78 | 49.21 |
Best Scores
Gym-MuJoCo
Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
---|---|---|---|---|---|---|---|---|---|---|
halfcheetah-medium-v2 | 43.60 ± 0.14 | 43.90 ± 0.13 | 48.93 ± 0.11 | 50.06 ± 0.50 | 47.62 ± 0.03 | 48.84 ± 0.07 | 65.62 ± 0.46 | 72.21 ± 0.31 | 69.72 ± 0.92 | 42.73 ± 0.10 |
halfcheetah-medium-replay-v2 | 40.52 ± 0.19 | 42.27 ± 0.46 | 45.84 ± 0.26 | 46.35 ± 0.29 | 46.43 ± 0.19 | 45.35 ± 0.08 | 52.22 ± 0.31 | 67.29 ± 0.34 | 66.55 ± 1.05 | 40.31 ± 0.28 |
halfcheetah-medium-expert-v2 | 79.69 ± 3.10 | 94.11 ± 0.22 | 96.59 ± 0.87 | 96.11 ± 0.37 | 97.04 ± 0.17 | 95.38 ± 0.17 | 108.89 ± 1.20 | 111.73 ± 0.47 | 110.62 ± 1.04 | 93.40 ± 0.21 |
hopper-medium-v2 | 69.04 ± 2.90 | 73.84 ± 0.37 | 70.44 ± 1.18 | 97.90 ± 0.56 | 70.80 ± 1.98 | 80.46 ± 3.09 | 103.19 ± 0.16 | 101.79 ± 0.20 | 103.26 ± 0.14 | 69.42 ± 3.64 |
hopper-medium-replay-v2 | 68.88 ± 10.33 | 90.57 ± 2.07 | 98.12 ± 1.16 | 100.91 ± 1.50 | 101.63 ± 0.55 | 102.69 ± 0.96 | 102.57 ± 0.45 | 103.83 ± 0.53 | 103.28 ± 0.49 | 88.74 ± 3.02 |
hopper-medium-expert-v2 | 90.63 ± 10.98 | 113.13 ± 0.16 | 113.22 ± 0.43 | 103.82 ± 12.81 | 112.84 ± 0.66 | 113.18 ± 0.38 | 113.16 ± 0.43 | 111.24 ± 0.15 | 111.80 ± 0.11 | 111.18 ± 0.21 |
walker2d-medium-v2 | 80.64 ± 0.91 | 82.05 ± 0.93 | 86.91 ± 0.28 | 83.37 ± 2.82 | 84.77 ± 0.20 | 87.58 ± 0.48 | 87.79 ± 0.19 | 90.17 ± 0.54 | 95.78 ± 1.07 | 74.70 ± 0.56 |
walker2d-medium-replay-v2 | 48.41 ± 7.61 | 76.09 ± 0.40 | 91.17 ± 0.72 | 86.51 ± 1.15 | 89.39 ± 0.88 | 89.94 ± 0.93 | 91.11 ± 0.63 | 85.18 ± 1.63 | 89.69 ± 1.39 | 68.22 ± 1.20 |
walker2d-medium-expert-v2 | 109.95 ± 0.62 | 109.90 ± 0.09 | 112.21 ± 0.06 | 108.28 ± 9.45 | 111.63 ± 0.38 | 113.06 ± 0.53 | 112.49 ± 0.18 | 116.93 ± 0.42 | 116.52 ± 0.75 | 108.71 ± 0.34 |
locomotion average | 70.15 | 80.65 | 84.83 | 85.92 | 84.68 | 86.28 | 93.00 | 95.60 | 96.36 | 77.49 |
Maze2d
Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
---|---|---|---|---|---|---|---|---|---|---|
maze2d-umaze-v1 | 16.09 ± 0.87 | 22.49 ± 1.52 | 99.33 ± 16.16 | 136.61 ± 11.65 | 92.05 ± 13.66 | 50.92 ± 4.23 | 162.28 ± 1.79 | 153.12 ± 6.49 | 149.88 ± 1.97 | 63.83 ± 17.35 |
maze2d-medium-v1 | 19.16 ± 1.24 | 27.64 ± 1.87 | 150.93 ± 3.89 | 131.50 ± 25.38 | 128.66 ± 5.44 | 122.69 ± 30.00 | 150.12 ± 4.48 | 93.80 ± 14.66 | 154.41 ± 1.58 | 68.14 ± 12.25 |
maze2d-large-v1 | 20.75 ± 6.66 | 41.83 ± 3.64 | 197.64 ± 5.26 | 227.93 ± 1.90 | 157.51 ± 7.32 | 162.25 ± 44.18 | 197.55 ± 5.82 | 207.51 ± 0.96 | 182.52 ± 2.68 | 50.25 ± 19.34 |
maze2d average | 18.67 | 30.65 | 149.30 | 165.35 | 126.07 | 111.95 | 169.98 | 151.48 | 162.27 | 60.74 |
Antmaze
Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
---|---|---|---|---|---|---|---|---|---|---|
antmaze-umaze-v2 | 68.50 ± 2.29 | 77.50 ± 1.50 | 98.50 ± 0.87 | 78.75 ± 6.76 | 94.75 ± 0.83 | 84.00 ± 4.06 | 100.00 ± 0.00 | 0.00 ± 0.00 | 42.50 ± 28.61 | 64.50 ± 2.06 |
antmaze-umaze-diverse-v2 | 64.75 ± 4.32 | 63.50 ± 2.18 | 71.25 ± 5.76 | 88.25 ± 2.17 | 53.75 ± 2.05 | 79.50 ± 3.35 | 96.75 ± 2.28 | 0.00 ± 0.00 | 0.00 ± 0.00 | 60.50 ± 2.29 |
antmaze-medium-play-v2 | 4.50 ± 1.12 | 6.25 ± 2.38 | 3.75 ± 1.30 | 27.50 ± 9.39 | 80.50 ± 3.35 | 78.50 ± 3.84 | 93.50 ± 2.60 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.75 ± 0.43 |
antmaze-medium-diverse-v2 | 4.75 ± 1.09 | 16.50 ± 5.59 | 5.50 ± 1.50 | 33.25 ± 16.81 | 71.00 ± 4.53 | 83.50 ± 1.80 | 91.75 ± 2.05 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.50 ± 0.50 |
antmaze-large-play-v2 | 0.50 ± 0.50 | 13.50 ± 9.76 | 1.25 ± 0.43 | 1.00 ± 0.71 | 34.75 ± 5.85 | 53.50 ± 2.50 | 68.75 ± 13.90 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
antmaze-large-diverse-v2 | 0.75 ± 0.43 | 6.25 ± 1.79 | 0.25 ± 0.43 | 0.50 ± 0.50 | 36.25 ± 3.34 | 53.00 ± 3.00 | 69.50 ± 7.26 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
antmaze average | 23.96 | 30.58 | 30.08 | 38.21 | 61.83 | 72.00 | 86.71 | 0.00 | 7.08 | 21.04 |
Adroit
Task-Name | BC | 10% BC | TD3+BC | AWAC | CQL | IQL | ReBRAC | SAC-N | EDAC | DT |
---|---|---|---|---|---|---|---|---|---|---|
pen-human-v1 | 99.69 ± 7.45 | 59.89 ± 8.03 | 9.95 ± 8.19 | 121.05 ± 5.47 | 58.91 ± 1.81 | 106.15 ± 10.28 | 127.28 ± 3.22 | 56.48 ± 7.17 | 35.84 ± 10.57 | 77.83 ± 2.30 |
pen-cloned-v1 | 99.14 ± 12.27 | 83.62 ± 11.75 | 52.66 ± 6.33 | 129.66 ± 1.27 | 14.74 ± 2.31 | 114.05 ± 4.78 | 128.64 ± 7.15 | 52.69 ± 5.30 | 26.90 ± 7.85 | 71.17 ± 2.70 |
pen-expert-v1 | 128.77 ± 5.88 | 134.36 ± 3.16 | 142.83 ± 7.72 | 162.69 ± 0.23 | 14.86 ± 4.07 | 140.01 ± 6.36 | 157.62 ± 0.26 | 116.43 ± 40.26 | 36.04 ± 4.60 | 119.49 ± 2.31 |
door-human-v1 | 9.41 ± 4.55 | 7.00 ± 6.77 | -0.11 ± 0.06 | 19.28 ± 1.46 | 13.28 ± 2.77 | 13.52 ± 1.22 | 0.27 ± 0.43 | -0.10 ± 0.06 | 2.51 ± 2.26 | 7.36 ± 1.24 |
door-cloned-v1 | 3.40 ± 0.95 | 10.37 ± 4.09 | -0.20 ± 0.11 | 12.61 ± 0.60 | -0.08 ± 0.13 | 9.02 ± 1.47 | 7.73 ± 6.80 | -0.21 ± 0.10 | 20.36 ± 1.11 | 11.18 ± 0.96 |
door-expert-v1 | 105.84 ± 0.23 | 105.92 ± 0.24 | 4.49 ± 7.39 | 106.77 ± 0.24 | 59.47 ± 25.04 | 107.29 ± 0.37 | 106.78 ± 0.04 | 0.05 ± 0.02 | 109.22 ± 0.24 | 105.49 ± 0.09 |
hammer-human-v1 | 12.61 ± 4.87 | 6.23 ± 4.79 | 2.38 ± 0.14 | 22.03 ± 8.13 | 0.30 ± 0.05 | 6.86 ± 2.38 | 1.18 ± 0.15 | 0.25 ± 0.00 | 3.49 ± 2.17 | 1.68 ± 0.11 |
hammer-cloned-v1 | 8.90 ± 4.04 | 8.72 ± 3.28 | 0.96 ± 0.30 | 14.67 ± 1.94 | 0.32 ± 0.03 | 11.63 ± 1.70 | 48.16 ± 6.20 | 12.67 ± 15.02 | 0.27 ± 0.01 | 2.74 ± 0.22 |
hammer-expert-v1 | 127.89 ± 0.57 | 128.15 ± 0.66 | 33.31 ± 47.65 | 129.66 ± 0.33 | 0.93 ± 1.12 | 129.76 ± 0.37 | 134.74 ± 0.30 | 91.74 ± 47.77 | 69.44 ± 47.00 | 127.39 ± 0.10 |
relocate-human-v1 | 0.59 ± 0.27 | 0.16 ± 0.14 | -0.29 ± 0.01 | 2.09 ± 0.76 | 1.03 ± 0.20 | 1.22 ± 0.28 | 3.70 ± 2.34 | -0.18 ± 0.14 | 0.05 ± 0.02 | 0.08 ± 0.02 |
relocate-cloned-v1 | 0.45 ± 0.31 | 0.74 ± 0.45 | -0.02 ± 0.04 | 0.94 ± 0.68 | -0.07 ± 0.02 | 1.78 ± 0.70 | 9.25 ± 2.56 | 0.10 ± 0.04 | 4.11 ± 1.39 | 0.34 ± 0.09 |
relocate-expert-v1 | 110.31 ± 0.36 | 109.77 ± 0.60 | 0.23 ± 0.27 | 111.56 ± 0.17 | 0.03 ± 0.10 | 110.12 ± 0.82 | 111.14 ± 0.23 | -0.07 ± 0.08 | 98.32 ± 3.75 | 106.49 ± 0.30 |
adroit average | 58.92 | 54.58 | 20.51 | 69.42 | 13.65 | 62.62 | 69.71 | 27.49 | 33.88 | 52.60 |
Offline-to-Online
Scores
Task-Name | AWAC | CQL | IQL | SPOT | Cal-QL |
---|---|---|---|---|---|
antmaze-umaze-v2 | 52.75 ± 8.67 → 98.75 ± 1.09 | 94.00 ± 1.58 → 99.50 ± 0.87 | 77.00 ± 0.71 → 96.50 ± 1.12 | 91.00 ± 2.55 → 99.50 ± 0.50 | 76.75 ± 7.53 → 99.75 ± 0.43 |
antmaze-umaze-diverse-v2 | 56.00 ± 2.74 → 0.00 ± 0.00 | 9.50 ± 9.91 → 99.00 ± 1.22 | 59.50 ± 9.55 → 63.75 ± 25.02 | 36.25 ± 2.17 → 95.00 ± 3.67 | 32.00 ± 27.79 → 98.50 ± 1.12 |
antmaze-medium-play-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 59.00 ± 11.18 → 97.75 ± 1.30 | 71.75 ± 2.95 → 89.75 ± 1.09 | 67.25 ± 10.47 → 97.25 ± 1.30 | 71.75 ± 3.27 → 98.75 ± 1.64 |
antmaze-medium-diverse-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 63.50 ± 6.84 → 97.25 ± 1.92 | 64.25 ± 1.92 → 92.25 ± 2.86 | 73.75 ± 7.29 → 94.50 ± 1.66 | 62.00 ± 4.30 → 98.25 ± 1.48 |
antmaze-large-play-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 28.75 ± 7.76 → 88.25 ± 2.28 | 38.50 ± 8.73 → 64.50 ± 17.04 | 31.50 ± 12.58 → 87.00 ± 3.24 | 31.75 ± 8.87 → 97.25 ± 1.79 |
antmaze-large-diverse-v2 | 0.00 ± 0.00 → 0.00 ± 0.00 | 35.50 ± 3.64 → 91.75 ± 3.96 | 26.75 ± 3.77 → 64.25 ± 4.15 | 17.50 ± 7.26 → 81.00 ± 14.14 | 44.00 ± 8.69 → 91.50 ± 3.91 |
antmaze average | 18.12 → 16.46 | 48.38 → 95.58 | 56.29 → 78.50 | 52.88 → 92.38 | 53.04 → 97.33 |
pen-cloned-v1 | 88.66 ± 15.10 → 86.82 ± 11.12 | -2.76 ± 0.08 → -1.28 ± 2.16 | 84.19 ± 3.96 → 102.02 ± 20.75 | 6.19 ± 5.21 → 43.63 ± 20.09 | -2.66 ± 0.04 → -2.68 ± 0.12 |
door-cloned-v1 | 0.93 ± 1.66 → 0.01 ± 0.00 | -0.33 ± 0.01 → -0.33 ± 0.01 | 1.19 ± 0.93 → 20.34 ± 9.32 | -0.21 ± 0.14 → 0.02 ± 0.31 | -0.33 ± 0.01 → -0.33 ± 0.01 |
hammer-cloned-v1 | 1.80 ± 3.01 → 0.24 ± 0.04 | 0.56 ± 0.55 → 2.85 ± 4.81 | 1.35 ± 0.32 → 57.27 ± 28.49 | 3.97 ± 6.39 → 3.73 ± 4.99 | 0.25 ± 0.04 → 0.17 ± 0.17 |
relocate-cloned-v1 | -0.04 ± 0.04 → -0.04 ± 0.01 | -0.33 ± 0.01 → -0.33 ± 0.01 | 0.04 ± 0.04 → 0.32 ± 0.38 | -0.24 ± 0.01 → -0.15 ± 0.05 | -0.31 ± 0.05 → -0.31 ± 0.04 |
adroit average | 22.84 → 21.76 | -0.72 → 0.22 | 21.69 → 44.99 | 2.43 → 11.81 | -0.76 → -0.79 |
Regrets
Task-Name | AWAC | CQL | IQL | SPOT | Cal-QL |
---|---|---|---|---|---|
antmaze-umaze-v2 | 0.04 ± 0.01 | 0.02 ± 0.00 | 0.07 ± 0.00 | 0.02 ± 0.00 | 0.01 ± 0.00 |
antmaze-umaze-diverse-v2 | 0.88 ± 0.01 | 0.09 ± 0.01 | 0.43 ± 0.11 | 0.22 ± 0.07 | 0.05 ± 0.01 |
antmaze-medium-play-v2 | 1.00 ± 0.00 | 0.08 ± 0.01 | 0.09 ± 0.01 | 0.06 ± 0.00 | 0.04 ± 0.01 |
antmaze-medium-diverse-v2 | 1.00 ± 0.00 | 0.08 ± 0.00 | 0.10 ± 0.01 | 0.05 ± 0.01 | 0.04 ± 0.01 |
antmaze-large-play-v2 | 1.00 ± 0.00 | 0.21 ± 0.02 | 0.34 ± 0.05 | 0.29 ± 0.07 | 0.13 ± 0.02 |
antmaze-large-diverse-v2 | 1.00 ± 0.00 | 0.21 ± 0.03 | 0.41 ± 0.03 | 0.23 ± 0.08 | 0.13 ± 0.02 |
antmaze average | 0.82 | 0.11 | 0.24 | 0.15 | 0.07 |
pen-cloned-v1 | 0.46 ± 0.02 | 0.97 ± 0.00 | 0.37 ± 0.01 | 0.58 ± 0.02 | 0.98 ± 0.01 |
door-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.83 ± 0.03 | 0.99 ± 0.01 | 1.00 ± 0.00 |
hammer-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.65 ± 0.10 | 0.98 ± 0.01 | 1.00 ± 0.00 |
relocate-cloned-v1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 |
adroit average | 0.86 | 0.99 | 0.71 | 0.89 | 0.99 |
Citing CORL
If you use CORL in your work, please use the following bibtex
@inproceedings{
tarasov2022corl,
title={{CORL}: Research-oriented Deep Offline Reinforcement Learning Library},
author={Denis Tarasov and Alexander Nikulin and Dmitry Akimov and Vladislav Kurenkov and Sergey Kolesnikov},
booktitle={3rd Offline RL Workshop: Offline RL as a ''Launchpad''},
year={2022},
url={https://openreview.net/forum?id=SyAS49bBcv}
}