• Stars
    star
    279
  • Rank 147,967 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Benchmarking the Spectrum of Agent Capabilities

Status: Stable release

PyPI

Crafter

Open world survival game for evaluating a wide range of agent abilities within a single environment.

Crafter Terrain

Overview

Crafter features randomly generated 2D worlds where the player needs to forage for food and water, find shelter to sleep, defend against monsters, collect materials, and build tools. Crafter aims to be a fruitful benchmark for reinforcement learning by focusing on the following design goals:

  • Research challenges: Crafter poses substantial challenges to current methods, evaluating strong generalization, wide and deep exploration, representation learning, and long-term reasoning and credit assignment.

  • Meaningful evaluation: Agents are evaluated by semantically meaningful achievements that can be unlocked in each episode, offering insights into the ability spectrum of both reward agents and unsupervised agents.

  • Iteration speed: Crafter evaluates many agent abilities within a single env, vastly reducing the computational requirements over benchmarks suites that require training on many separate envs from scratch.

See the research paper to find out more: Benchmarking the Spectrum of Agent Capabilities

@article{hafner2021crafter,
  title={Benchmarking the Spectrum of Agent Capabilities},
  author={Danijar Hafner},
  year={2021},
  journal={arXiv preprint arXiv:2109.06780},
}

Play Yourself

python3 -m pip install crafter  # Install Crafter
python3 -m pip install pygame   # Needed for human interface
python3 -m crafter.run_gui      # Start the game
Keyboard mapping (click to expand)
Key Action
WASD Move around
SPACE Collect material, drink from lake, hit creature
TAB Sleep
T Place a table
R Place a rock
F Place a furnace
P Place a plant
1 Craft a wood pickaxe
2 Craft a stone pickaxe
3 Craft an iron pickaxe
4 Craft a wood sword
5 Craft a stone sword
6 Craft an iron sword

Crafter Video

Interface

To install Crafter, run pip3 install crafter. The environment follows the OpenAI Gym interface. Observations are images of size (64, 64, 3) and outputs are one of 17 categorical actions.

import gym
import crafter

env = gym.make('CrafterReward-v1')  # Or CrafterNoReward-v1
env = crafter.Recorder(
  env, './path/to/logdir',
  save_stats=True,
  save_video=False,
  save_episode=False,
)

obs = env.reset()
done = False
while not done:
  action = env.action_space.sample()
  obs, reward, done, info = env.step(action)

Evaluation

Agents are allowed a budget of 1M environmnent steps and are evaluated by their success rates of the 22 achievements and by their geometric mean score. Example scripts for computing these are included in the analysis directory of the repository.

  • Reward: The sparse reward is +1 for unlocking an achievement during the episode and -0.1 or +0.1 for lost or regenerated health points. Results should be reported not as reward but as success rates and score.

  • Success rates: The success rates of the 22 achievemnts are computed as the percentage across all training episodes in which the achievement was unlocked, allowing insights into the ability spectrum of an agent.

  • Crafter score: The score is the geometric mean of success rates, so that improvements on difficult achievements contribute more than improvements on achievements with already high success rates.

Scoreboards

Please create a pull request if you would like to add your or another algorithm to the scoreboards. For the reinforcement learning and unsupervised agents categories, the interaction budget is 1M. The external knowledge category is defined more broadly.

Reinforcement Learning

Algorithm Score (%) Reward Open Source
Curious Replay 19.4±1.6 - AutonomousAgentsLab/cr-dv3
DreamerV3 14.5±1.6 11.7±1.9 danijar/dreamerv3
LSTM-SPCNN 12.1±0.8 astanic/crafter-ood
EDE 11.7±1.0 yidingjiang/ede
OC-SA 11.1±0.7 astanic/crafter-ood
DreamerV2 10.0±1.2 9.0±1.7 danijar/dreamerv2
PPO 4.6±0.3 4.2±1.2 DLR-RM/stable-baselines3
Rainbow 4.3±0.2 6.0±1.3 Kaixhin/Rainbow

Unsupervised Agents

Algorithm Score (%) Reward Open Source
Plan2Explore 2.1±0.1 2.1±1.5 danijar/dreamerv2
RND 2.0±0.1 0.7±1.3 alirezakazemipour/PPO-RND
Random 1.6±0.0 2.1±1.3

External Knowledge

Algorithm Score (%) Reward Uses Interaction Open Source
Human 50.5±6.8 14.3±2.3 Life experience 0 crafter_human_dataset
SPRING 27.3±1.2 12.3±0.7 LLM, scene description, Crafter paper 0
ELLM 6.0±0.4 LLM, scene description 5M

Baselines

Baseline scores of various agents are available for Crafter, both with and without rewards. The scores are available in JSON format in the scores directory of the repository. For comparison, the score of human expert players is 50.5%. The baseline implementations are available as a separate repository.

Questions

Please open an issue on Github.

More Repositories

1

handout

Turn Python scripts into handouts with Markdown and figures
Python
1,994
star
2

dreamerv2

Mastering Atari with Discrete World Models
Python
770
star
3

dreamerv3

Mastering Diverse Domains through World Models
Python
603
star
4

dreamer

Dream to Control: Learning Behaviors by Latent Imagination
Python
456
star
5

layered

Clean implementation of feed forward neural networks
Python
237
star
6

mindpark

Testbed for deep reinforcement learning
Python
161
star
7

daydreamer

DayDreamer: World Models for Physical Robot Learning
Jupyter Notebook
141
star
8

director

Deep Hierarchical Planning from Pixels
Python
60
star
9

embodied

Fast reinforcement learning research
Python
50
star
10

ninjax

General Modules for JAX
Python
45
star
11

computer-game

Data-oriented voxel game engine
C++
37
star
12

elements

Building blocks for productive research
Python
36
star
13

crafter-baselines

Docker containers of baseline agents for the Crafter environment
Python
25
star
14

sets

Read datasets in a standard way
Python
19
star
15

diamond_env

Standardized Minecraft Diamond Environment for Reinforcement Learning
Python
18
star
16

voxel-smoothing-2d

Orientation independent bézier smoothing of voxel grids
C++
17
star
17

course-machine-intelligence-2

Jupyter Notebook
13
star
18

npgame

Write simple games in Numpy!
Python
12
star
19

dotfiles

My Linux and Mac configuration
Perl
12
star
20

semantic

Python
10
star
21

training-py

My solutions to programming puzzles
Python
8
star
22

imptools

Tools for improving Python imports
Python
8
star
23

bridgewalk

Visual reinforcement learning benchmark for controllability
Python
6
star
24

cowherd

Partially-observed visual reinforcement learning domain
Python
6
star
25

definitions

Load and validate YAML definitions against a schema
Python
5
star
26

map-pdf

Generate printable PDF documents from Leaflet maps
JavaScript
4
star
27

modurale

Modular real time engine for computer graphics applications
CMake
4
star
28

seminar-knowledge-mining

Wikimedia image classification and suggestings for article authors
Python
3
star
29

couse-ml-stanford

Programming assignments for the Stanford Machine Learning course by Andrew Ng
MATLAB
3
star
30

invoicepad

Freelancer solution covering time tracking, invoice generation and archiving
JavaScript
3
star
31

teleport

Efficiently send large arrays across machines
Python
2
star
32

training-ml

Python
2
star
33

chunkedfile

Save file writes into multiple chunks
Python
1
star
34

notebook-big-data

Jupyter Notebook
1
star
35

course-ml-fuberlin

Python
1
star
36

bookmarks-switcher

Chrome plugin to select which bookmarks folder to show as the bookmarks bar
JavaScript
1
star
37

training-cpp

My solutions to programming puzzles
C++
1
star
38

scope

Metrics logging and analysis
Python
1
star
39

jumper

Platformer and puzzle solving game written in Python
Python
1
star