Code for "Implementation Matters in Deep RL: A Case Study on PPO and TRPO"
This repository contains our implementation of PPO and TRPO, with manual toggles for the code-level optimizations described in our paper. We assume that the user has a machine with MuJoCo and mujoco_py properly set up and installed, i.e. you should be able to run the following command on your system without errors:
import gym
gym.make_env("Humanoid-v2")
The code itself is quite simple to use. To run the ablation case study discussed in our paper, you can run the following list of commands:
cd configs/
mkdir PATH_TO_OUT_DIR
and changeout_dir
to this in the relevant config file. By default agents will be written toresults/{env}_{algorithm}/agents/
.python {config_name}.py
cd ..
- Edit the
NUM_THREADS
variables in therun_agents.py
file according to your local machine. - Train the agents:
python run_agents.py PATH_TO_OUT_DIR/agent_configs
- The outputs will be in the
agents
subdirectory ofOUT_DIR
, readable with thecox
python library.
See the MuJoCo.json
file for a full list of adjustable parameters.