• Stars
    star
    676
  • Rank 66,790 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An OpenAI Gym interface to Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The NES

gym-super-mario-bros

BuildStatus PackageVersion PythonVersion Stable Format License

Mario

An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) using the nes-py emulator.

Installation

The preferred installation of gym-super-mario-bros is from pip:

pip install gym-super-mario-bros

Usage

Python

You must import gym_super_mario_bros before trying to make an environment. This is because gym environments are registered at runtime. By default, gym_super_mario_bros environments use the full NES action space of 256 discrete actions. To contstrain this, gym_super_mario_bros.actions provides three actions lists (RIGHT_ONLY, SIMPLE_MOVEMENT, and COMPLEX_MOVEMENT) for the nes_py.wrappers.JoypadSpace wrapper. See gym_super_mario_bros/actions.py for a breakdown of the legal actions in each of these three lists.

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample())
    env.render()

env.close()

NOTE: gym_super_mario_bros.make is just an alias to gym.make for convenience.

NOTE: remove calls to render in training code for a nontrivial speedup.

Command Line

gym_super_mario_bros features a command line interface for playing environments using either the keyboard, or uniform random movement.

gym_super_mario_bros -e <the environment ID to play> -m <`human` or `random`>

NOTE: by default, -e is set to SuperMarioBros-v0 and -m is set to human.

NOTE: SuperMarioBrosRandomStages-* support the --stages/-S flag for supplying the set of stages to sample from like -S 1-4 2-4 3-4 4-4.

Environments

These environments allow 3 attempts (lives) to make it through the 32 stages in the game. The environments only send reward-able game-play frames to agents; No cut-scenes, loading screens, etc. are sent from the NES emulator to an agent nor can an agent perform actions during these instances. If a cut-scene is not able to be skipped by hacking the NES's RAM, the environment will lock the Python process until the emulator is ready for the next action.

Environment Game ROM Screenshot
SuperMarioBros-v0 SMB standard
SuperMarioBros-v1 SMB downsample
SuperMarioBros-v2 SMB pixel
SuperMarioBros-v3 SMB rectangle
SuperMarioBros2-v0 SMB2 standard
SuperMarioBros2-v1 SMB2 downsample

Individual Stages

These environments allow a single attempt (life) to make it through a single stage of the game.

Use the template

SuperMarioBros-<world>-<stage>-v<version>

where:

  • <world> is a number in {1, 2, 3, 4, 5, 6, 7, 8} indicating the world
  • <stage> is a number in {1, 2, 3, 4} indicating the stage within a world
  • <version> is a number in {0, 1, 2, 3} specifying the ROM mode to use
    • 0: standard ROM
    • 1: downsampled ROM
    • 2: pixel ROM
    • 3: rectangle ROM

For example, to play 4-2 on the downsampled ROM, you would use the environment id SuperMarioBros-4-2-v1.

Random Stage Selection

The random stage selection environment randomly selects a stage and allows a single attempt to clear it. Upon a death and subsequent call to reset the environment randomly selects a new stage. This is only available for the standard Super Mario Bros. game, not Lost Levels (at the moment). To use these environments, append RandomStages to the SuperMarioBros id. For example, to use the standard ROM with random stage selection use SuperMarioBrosRandomStages-v0. To seed the random stage selection use the seed method of the env, i.e., env.seed(222), before any calls to reset. Alternatively pass the seed keyword argument to the reset method directly like reset(seed=222).

In addition to randomly selecting any of the 32 original stages, a subset of user-defined stages can be specified to limit the random choice of stages to a specific subset. For example, the stage selector could be limited to only sample castle stages, water levels, underground, and more.

To specify a subset of stages to randomly sample from, create a list of each stage to allow to be sampled and pass that list to the gym.make() function. For example:

gym.make('SuperMarioBrosRandomStages-v0', stages=['1-4', '2-4', '3-4', '4-4'])

The example above will sample a random stage from 1-4, 2-4, 3-4, and 4-4 upon every call to reset.

Step

Info about the rewards and info returned by the step method.

Reward Function

The reward function assumes the objective of the game is to move as far right as possible (increase the agent's x value), as fast as possible, without dying. To model this game, three separate variables compose the reward:

  1. v: the difference in agent x values between states
    • in this case this is instantaneous velocity for the given step
    • v = x1 - x0
      • x0 is the x position before the step
      • x1 is the x position after the step
    • moving right ⇔ v > 0
    • moving left ⇔ v < 0
    • not moving ⇔ v = 0
  2. c: the difference in the game clock between frames
    • the penalty prevents the agent from standing still
    • c = c0 - c1
      • c0 is the clock reading before the step
      • c1 is the clock reading after the step
    • no clock tick ⇔ c = 0
    • clock tick ⇔ c < 0
  3. d: a death penalty that penalizes the agent for dying in a state
    • this penalty encourages the agent to avoid death
    • alive ⇔ d = 0
    • dead ⇔ d = -15

r = v + c + d

The reward is clipped into the range (-15, 15).

info dictionary

The info dictionary returned by the step method contains the following keys:

Key Type Description
coins int The number of collected coins
flag_get bool True if Mario reached a flag or ax
life int The number of lives left, i.e., {3, 2, 1}
score int The cumulative in-game score
stage int The current stage, i.e., {1, ..., 4}
status str Mario's status, i.e., {'small', 'tall', 'fireball'}
time int The time left on the clock
world int The current world, i.e., {1, ..., 8}
x_pos int Mario's x position in the stage (from the left)
y_pos int Mario's y position in the stage (from the bottom)

Citation

Please cite gym-super-mario-bros if you use it in your research.

@misc{gym-super-mario-bros,
  author = {Christian Kauten},
  howpublished = {GitHub},
  title = {{S}uper {M}ario {B}ros for {O}pen{AI} {G}ym},
  URL = {https://github.com/Kautenja/gym-super-mario-bros},
  year = {2018},
}

More Repositories

1

nes-py

A Python3 NES emulator and OpenAI Gym interface
C++
233
star
2

limit-order-book

A C++ and Python implementation of the limit order book.
C++
230
star
3

a-neural-algorithm-of-artistic-style

Keras implementation of "A Neural Algorithm of Artistic Style"
Jupyter Notebook
117
star
4

RackNES

A Nintendo Entertainment System (NES) emulator module for VCV Rack.
C++
97
star
5

playing-mario-with-deep-reinforcement-learning

An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.
Jupyter Notebook
68
star
6

rosbag-tools

Tools and scripts for working with ROS bag files.
Python
51
star
7

gym-tetris

An OpenAI Gym interface to Tetris on the NES.
Python
46
star
8

PotatoChips

Eurorack modules based on programmable sound chip emulation.
C++
45
star
9

gym-zelda-1

An OpenAI Gym interface to The Legend of Zelda on the NES.
Python
24
star
10

ios-semantic-segmentation

An example of semantic segmentation on iOS using CoreML and Keras.
Swift
17
star
11

MIMS

MIMS (Medical Informatics Management System) is an iOS solution to Medical Informatics
Swift
6
star
12

semantic-segmentation-baselines

Baseline implementation of deep learning semantic segmentation models.
Jupyter Notebook
6
star
13

nes-iOS

An NES emulator for iOS based on the nes-py emulation system
C++
5
star
14

very-good-semantic-segmentation-labeling-app

A Python app for labeling semantic segmentations in images.
Python
5
star
15

keras-pyramid-pooling-module

An implementation of the Pyramid Pooling Module as a Keras layer.
Jupyter Notebook
4
star
16

parse-server-boilerplate

boilerplate code for starting Parse Server projects in node.js
JavaScript
4
star
17

gotorch

Tensors and Dynamic neural networks in Golang
Go
4
star
18

object-pool

A simple template implementation of an object pool in C++.
C++
3
star
19

financial-analysis

A python package with basic financial analysis functions
Python
3
star
20

UIPopupDatePicker

a simple PopupDialog for selecting a datetime
Swift
2
star
21

ntsc-py

a CTypes interface to Blargg's NTSC shaders for Python
Jupyter Notebook
2
star
22

robust-graph-convolutional-networks-against-adversarial-attacks-implementation

A Keras implementation of the paper "Robust Graph Convolutional Networks Against Adversarial Attacks"
Jupyter Notebook
2
star
23

parse-server-mock

mock elements to make unit testing parse-server cloud code easier
JavaScript
1
star
24

UIXibView

a UIView subclass for easily building IBDesignable views without boilerplate
Swift
1
star
25

UIBankPayoutFlow

an iOS UI flow for applications that require payout information to a bank account
Swift
1
star
26

csv

a C++ implementation of a CSV reader / writer
C++
1
star
27

gotorch-example

An example usage of GoTorch
Go
1
star