• Stars
    star
    210
  • Rank 181,360 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A simple and highly efficient RTS-game-inspired environment for reinforcement learning (formerly Gym-MicroRTS)

Formerly Gym-μRTS/Gym-MicroRTS

This repo contains the source code for the gym wrapper of μRTS authored by Santiago Ontañón.

MicroRTS-Py will eventually be updated, maintained, and made compliant with the standards of the Farama Foundation (https://farama.org/project_standards). However, this is currently a lower priority than other projects we're working to maintain. If you'd like to contribute to development, you can join our discord server here- https://discord.gg/jfERDCSw.

demo.gif

Get Started

Prerequisites:

  • Python 3.8+
  • Poetry
  • Java 8.0+
  • FFmpeg (for video recording utilities)
$ git clone --recursive https://github.com/Farama-Foundation/MicroRTS-Py.git && \
cd gym-microrts
poetry install
# By default, the torch wheel is built with CUDA 10.2. If you are using newer NVIDIA GPUs (e.g., 3060 TI), you may need to specifically install CUDA 11.3 wheels by overriding the torch dependency with pip:
# poetry run pip install "torch==1.12.1" --upgrade --extra-index-url https://download.pytorch.org/whl/cu113
python hello_world.py

To train an agent, run the following

cd experiments
python ppo_gridnet.py \
    --total-timesteps 100000000 \
    --capture-video \
    --seed 1

asciicast

For running a partial observable example, tune the partial_obs argument.

cd experiments
python ppo_gridnet.py \
    --partial-obs \
    --capture-video \
    --seed 1

Technical Paper

Before diving into the code, we highly recommend reading the preprint of our paper: Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games

Depreciation note

Note that the experiments in the technical paper above are done with gym_microrts==0.3.2. As we move forward beyond v0.4.x, we are planing to deprecate UAS despite its better performance in the paper. This is because UAS has more complex implementation and makes it really difficult to incorporate selfplay or imitation learning in the future.

Environment Specification

Here is a description of Gym-μRTS's observation and action space:

  • Observation Space. (Box(0, 1, (h, w, 27), int32)) Given a map of size h x w, the observation is a tensor of shape (h, w, n_f), where n_f is a number of feature planes that have binary values. The observation space used in this paper uses 27 feature planes as shown in the following table. A feature plane can be thought of as a concatenation of multiple one-hot encoded features. As an example, if there is a worker with hit points equal to 1, not carrying any resources, owner being Player 1, and currently not executing any actions, then the one-hot encoding features will look like the following:

    [0,1,0,0,0], [1,0,0,0,0], [1,0,0], [0,0,0,0,1,0,0,0], [1,0,0,0,0,0]

    The 27 values of each feature plane for the position in the map of such worker will thus be:

    [0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0]

  • Partial Observation Space. (Box(0, 1, (h, w, 29), int32)) Given a map of size h x w, the observation is a tensor of shape (h, w, n_f), where n_f is a number of feature planes that have binary values. The observation space for partial observability uses 29 feature planes as shown in the following table. A feature plane can be thought of as a concatenation of multiple one-hot encoded features. As an example, if there is a worker with hit points equal to 1, not carrying any resources, owner being Player 1, currently not executing any actions, and not visible to the opponent, then the one-hot encoding features will look like the following:

    [0,1,0,0,0], [1,0,0,0,0], [1,0,0], [0,0,0,0,1,0,0,0], [1,0,0,0,0,0], [1,0]

    The 29 values of each feature plane for the position in the map of such worker will thus be:

    [0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0]

  • Action Space. (MultiDiscrete(concat(h * w * [[6 4 4 4 4 7 a_r]]))) Given a map of size h x w and the maximum attack range a_r=7, the action is an (7hw)-dimensional vector of discrete values as specified in the following table. The first 7 component of the action vector represents the actions issued to the unit at x=0,y=0, and the second 7 component represents actions issued to the unit at x=0,y=1, etc. In these 7 components, the first component is the action type, and the rest of components represent the different parameters different action types can take. Depending on which action type is selected, the game engine will use the corresponding parameters to execute the action. As an example, if the RL agent issues a move south action to the worker at $x=0, y=1$ in a 2x2 map, the action will be encoded in the following way:

    concat([0,0,0,0,0,0,0], [1,2,0,0,0,0,0], [0,0,0,0,0,0,0], [0,0,0,0,0,0,0]] =[0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

image

Evaluation

You can evaluate trained agents against a built-in bot:

cd experiments
python ppo_gridnet_eval.py \
    --agent-model-path gym-microrts-static-files/agent_sota.pt \
    --ai coacAI

Alternatively, you can evaluate the trained RL bots against themselves

cd experiments
python ppo_gridnet_eval.py \
    --agent-model-path gym-microrts-static-files/agent_sota.pt \
    --agent2-model-path gym-microrts-static-files/agent_sota.pt

Evaluate Trueskill of the agents

This repository already contains a preset Trueskill database in experiments/league.db. To evaluate a new AI, try running the following command, which will iteratively find good matches for agent.pt until the engine is confident agent.pt's Trueskill (by having the agent's Trueskill sigma below --highest-sigma 1.4).

cd experiments
python league.py --evals gym-microrts-static-files/agent_sota.pt --highest-sigma 1.4 --update-db False

To recreate the preset Trueskill database, start a round-robin Trueskill evaluation among built-in AIs by removing the database in experiments/league.db.

cd experiments
rm league.csv league.db
python league.py --evals randomBiasedAI workerRushAI lightRushAI coacAI

Multi-maps support

The training script allows you to train the agents with more than one maps and evaluate with more than one maps. Try executing:

cd experiments
python ppo_gridnet.py \
    --train-maps maps/16x16/basesWorkers16x16B.xml maps/16x16/basesWorkers16x16C.xml maps/16x16/basesWorkers16x16D.xml maps/16x16/basesWorkers16x16E.xml maps/16x16/basesWorkers16x16F.xml \
    --eval-maps maps/16x16/basesWorkers16x16B.xml maps/16x16/basesWorkers16x16C.xml maps/16x16/basesWorkers16x16D.xml maps/16x16/basesWorkers16x16E.xml maps/16x16/basesWorkers16x16F.xml

where --train-maps allows you to specify the training maps and --eval-maps the evaluation maps. --train-maps and --eval-maps do not have to match (so you can evaluate on maps the agent has never trained on before).

Known issues

[ ] Rendering does not exactly work in macos. See jpype-project/jpype#906

Papers written using Gym-μRTS

PettingZoo API

We wrapped our Gym-µRTS simulator into a PettingZoo environment, which is defined in gym_microrts/pettingzoo_api.py. An example usage of the Gym-µRTS PettingZoo environment can be found in hello_world_pettingzoo.py.

Cite this project

To cite the Gym-µRTS simulator:

@inproceedings{huang2021gym,
  author    = {Shengyi Huang and
               Santiago Onta{\~{n}}{\'{o}}n and
               Chris Bamford and
               Lukasz Grela},
  title     = {Gym-{\(\mathrm{\mu}\)}RTS: Toward Affordable Full Game Real-time Strategy
               Games Research with Deep Reinforcement Learning},
  booktitle = {2021 {IEEE} Conference on Games (CoG), Copenhagen, Denmark, August
               17-20, 2021},
  pages     = {1--8},
  publisher = {{IEEE}},
  year      = {2021},
  url       = {https://doi.org/10.1109/CoG52621.2021.9619076},
  doi       = {10.1109/CoG52621.2021.9619076},
  timestamp = {Fri, 10 Dec 2021 10:41:01 +0100},
  biburl    = {https://dblp.org/rec/conf/cig/HuangO0G21.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

To cite the invalid action masking technique used in our training script:

@inproceedings{huang2020closer,
  author    = {Shengyi Huang and
               Santiago Onta{\~{n}}{\'{o}}n},
  editor    = {Roman Bart{\'{a}}k and
               Fazel Keshtkar and
               Michael Franklin},
  title     = {A Closer Look at Invalid Action Masking in Policy Gradient Algorithms},
  booktitle = {Proceedings of the Thirty-Fifth International Florida Artificial Intelligence
               Research Society Conference, {FLAIRS} 2022, Hutchinson Island, Jensen
               Beach, Florida, USA, May 15-18, 2022},
  year      = {2022},
  url       = {https://doi.org/10.32473/flairs.v35i.130584},
  doi       = {10.32473/flairs.v35i.130584},
  timestamp = {Thu, 09 Jun 2022 16:44:11 +0200},
  biburl    = {https://dblp.org/rec/conf/flairs/HuangO22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

More Repositories

1

Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
Python
5,815
star
2

PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
Python
2,392
star
3

HighwayEnv

A minimalist environment for decision-making in autonomous driving
Python
2,382
star
4

Arcade-Learning-Environment

The Arcade Learning Environment (ALE) -- a platform for AI research.
C++
2,078
star
5

Minigrid

Simple and easily configurable grid world environments for reinforcement learning
Python
2,016
star
6

ViZDoom

Reinforcement Learning environments based on the 1993 game Doom :godmode:
C++
1,673
star
7

chatarena

ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.
Python
1,227
star
8

D4RL

A collection of reference environments for offline reinforcement learning
Python
1,201
star
9

Metaworld

Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning
Python
1,116
star
10

Miniworld

Simple and easily configurable 3D FPS-game-like environments for reinforcement learning
Python
672
star
11

Gymnasium-Robotics

A collection of robotics simulation environments for reinforcement learning
Python
437
star
12

SuperSuit

A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers
Python
432
star
13

MicroRTS

A simple and highly efficient RTS-game-inspired environment for reinforcement learning
Java
267
star
14

miniwob-plusplus

MiniWoB++: a web interaction benchmark for reinforcement learning
HTML
260
star
15

MO-Gymnasium

Multi-objective Gymnasium environments for reinforcement learning
Python
241
star
16

Minari

A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities
Python
219
star
17

D4RL-Evaluations

Python
184
star
18

MAgent2

An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments
C++
183
star
19

stable-retro

Retro games for Reinforcement Learning
C
124
star
20

Shimmy

An API conversion tool for popular external reinforcement learning environments
Python
118
star
21

AutoROM

A tool to automate installing Atari ROMs for the Arcade Learning Environment
Python
75
star
22

gym-examples

Example code for the Gym documentation
Python
66
star
23

Jumpy

On-the-fly conversions between Jax and NumPy tensors
Python
43
star
24

gym-docs

Code for Gym documentation website
39
star
25

CrowdPlay

A web based platform for collecting human actions in reinforcement learning environments
Jupyter Notebook
26
star
26

Procgen2

Fast and procedurally generated side-scroller-game-like graphical environments (formerly Procgen)
C++
24
star
27

momaland

Benchmarks for Multi-Objective Multi-Agent Decision Making
Python
22
star
28

TinyScaler

A small and fast image rescaling library with SIMD support
C
19
star
29

rlay

A relay between Gymnasium and any software
Rust
7
star
30

gymnasium-env-template

A template gymnasium environment for users to build upon
Jinja
6
star
31

farama.org

HTML
2
star
32

A2Perf

A2Perf is a benchmark for evaluating agents on sequential decision problems that are relevant to the real world. This repository contains code for running and evaluating participant's submissions on the benchmark platform.
Python
2
star
33

gym-notices

Python
1
star
34

Celshast

Sass
1
star
35

MPE2

1
star
36

Farama-Notifications

Allows for providing notifications on import to all Farama Packages
Python
1
star
37

a2perf-benchmark-submission

Python
1
star
38

a2perf-web-nav

HTML
1
star
39

a2perf-quadruped-locomotion

Python
1
star
40

a2perf-reliability-metrics

Python
1
star
41

a2perf-code-carbon

Jupyter Notebook
1
star