• Stars
    star
    149
  • Rank 239,759 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Reinforcement Learning agent that learns how to to solve maze missions in Minecraft.

Minecraft AI

GitHub license

Table of Contents

About

In this project, we used reinforcement learning to train a PPO agent to solve maze missions in Minecraft using the Malmo library. The agent was tested using different action spaces rewards, and compression techniques. Our results showed that it was able to successfully navigate the mazes and complete the missions. This project demonstrates the potential of reinforcement learning for solving complex problems in gaming environments and has potential applications in a wide range of fields. It represents an important step towards realizing the full potential of reinforcement learning for solving complex problems in virtual environments.

The Model structure:
Model Structure

To get started, follow their respective instructions.

Requirements

This project has only been tested on Ubuntu 22.04. In theory any non arm based OS should work, but it has not been tested.

OS Requirements

First, you need to install the follwioing libraries:

  1. GCC and CMake

    sudo apt-get update
    sudo apt install -y gcc
    sudo apt-get install -y make
  2. CUDA 11.0

    wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run
    sudo sh cuda_11.0.2_450.51.05_linux.run
  3. Anaconda

  4. Java 8

    sudo apt-get install -y openjdk-8-jdk
    echo -e "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" >> ~/.bashrc
    source ~/.bashrc

Python Requirements

  1. Now you need to create a conda environment and install the required python packages:

    conda create -n minerl python=3.6
    conda activate minerl
    pip install -r requirements.txt
  2. Finally, you need to bootstrap the Malmo library and set the MALMO_XSD_PATH environment variable as follows:

    python3 -c "import malmo.minecraftbootstrap; malmo.minecraftbootstrap.download();"
    echo -e "export MALMO_XSD_PATH=$PWD/MalmoPlatform/Schemas" >> ~/.bashrc
    source ~/.bashrc
    cd MalmoPlatform/Minecraft
    (echo -n "malmomod.version=" && cat ../VERSION) > ./src/main/resources/version.properties
    cd ../..

Domain Problem

The problem we are working on is using reinforcement learning to train an agent to solve maze missions in Minecraft using the Malmo library. In this problem, the agent is placed in a maze within the Minecraft game world and must navigate the maze to reach the goal. The rules of the game are defined by the Minecraft environment and the specific maze mission that the agent is attempting to solve. To frame this problem as an MDP (Markov Decision Process), we need to define the state space, action space, and reward structure. The state space in this problem is the set of all possible configurations of the Minecraft game world that the agent can encounter while navigating the maze. This includes the location of the agent, the layout of the maze, and any other relevant information that the agent can observe.

The action space in this problem is the set of all possible actions that the agent can take at any given time. In the context of Minecraft, these actions might include moving in different directions, turning left and right, interacting with objects in the game world, or using items in the agent’s inventory. The reward structure in this problem is the set of rewards that the agent receives for taking different actions and achieving different goals within the maze. These rewards might include positive rewards for reaching the goal, negative rewards for encountering obstacles or hazards, and other rewards for achieving intermediate goals or completing certain tasks.

We examined different options for the state space, action space, and reward structure in this problem. For the state space, we used the raw pixel data from the Minecraft game screen as input to the agent, as well as using pre-processed representations of the game state that extracted relevant features and abstracted away irrelevant details. An example of what the agent sees can be found in Figure 1. For the action space, we considered using a discrete set of actions that the agent could take. For the reward structure, we considered using a variety of different rewards for different actions and goals, including both intrinsic rewards that were defined by the game environment and extrinsic rewards that were defined by the objective of the maze mission.

The environment that we designed was a maze variant that forced the agent to rely on the visual input. The agent’s goal was to touch the tower of emerald blocks in the shortest time possible and without dying. There were two obstacles to this goal; First there were small walls that were scattered around the environment. These walls would stop the agent from being able to move in a direction while still allowing the agent to have a sight line on the target. The second set of obstacles was fires that were generated throughout the environment. These fires would spread as the episode ran, creating a dynamic environment. The reward system was set up so if the agent successfully made it to the tower, it would receive a large positive reward while if the episode timed out, or the agent walked into fire it would receive a large negative reward. In addition, the agent would get a small negative reward at every step to incentivize finding the target faster.

Running the code

In order to run the code, first, make sure you are in the correct virtual environment:

$ conda activate minerl

$ which python
/home/drkostas/anaconda3/envs/minerl/bin/python

Minecraft

Run the following command to start the Minecraft client:

cd MalmoPlatform/Minecraft
./launchClient.sh -port 9000

RL Agent

The missions are places in the missions folder and the configuration files in the configs folder. To train the agent run the following command:

python train_agent.py -c configs/mazes.yml

The results are going to be saved in the checkpoints and logs folders.

An example of what the agent sees:
What the agent sees

The maze outline:
Maze outline

Results

The PPO model showed promising levels of learning when implemented in the correct conditions. Each model was run for 50 epoch which took approximately 5 hours of run time. Our initial agent interface allowed the agent to take six different actions: move forward, move backward, move left, move right, turn left, and turn right. Early tests of this action space mainly proved successful, however, the model rarely turned to observe its surroundings. Instead, the model used the image to avoid obvious obstacles in front of itself while moving in semi-random directions until it reached the pillar. This behavior would likely be fixed given extensive training times, however turning only provided potential delayed rewards, inhibiting its regular use. To force the model to rely on the image further we restricted the action space to four actions: move forward, backward, turn right, and turn left. Therefore, the model is now required to turn to move in a specific direction. This quickened the model learning and led to the rewards shown in the figure below:
AVG Reward - No PCA

Training with the modified action space shows clear improvement over time. However, there is a brief but significant dip that appears during epochs 30 to 32. While it is unclear what caused this dip, it recovered quickly and continued the improvement trend. After 50 training epochs the average reward is consistently close to the maximum allowed by the environment. This shows that the agent has not only learned to move around the walls in an efficient manner, but learned to avoid walking into areas where fire is spreading.

Using the PCA to reduce the dimensionality of the image proved to interfere with models’ ability to train. Figure 6 shows that the model makes no improvement over the 50 training epochs. It is likely that while PCA is able to retain a significant amount of the information, the reduction method removes trends that the agent uses to learn. It is possible applying PCA in a different manner or introducing a different dimensionality reduction technique could lead to an improved result. Figure 6:
AVG Reward - With PCA

Conclusion

The PPO model showed a promising ability to complete tasks in a simple dynamic environment using a convolutional network as its only input. When the data is directly fed the model learns at a mostly consistent rate until it has found the optimal behavior. This model could be further tested by complicating the environment. The map could be expanded, or the site lines could be restricted, requiring a more complex and comprehensive decision-making process.

In addition, other dynamic variables could be introduced, including hostile non player characters. The attempt to use dimensionality reduction to reduce training time proved to be a failure with the current method. PCA appeared promising as it is a faster method than other reduction techniques and allows for tuning of the data reduction amount. However, it appears that the method fails to capture the patterns the model requires in order to learn. Future work could try different techniques for dimensionality reduction or improve the application of PCA through tuning.

License

This project is licensed under the Apache License - see the LICENSE file for details.

More Repositories

1

Youtube-FirstCommentBot

A bot that post the first comment on every new video of specified channels.
Python
121
star
2

drkostas.github.io

VScode Portfolio
JavaScript
105
star
3

drkostas

53
star
4

3D-Semantic-Segmentation

Semantic Segmentation with Transformers on 3D Medical Images
Jupyter Notebook
44
star
5

Insta-Likes-Predict

First attempt on predicting the likes a photo will get on Instagram.
Python
42
star
6

JobApplicationBot

A bot that automatically sends emails to new ads posted in any desired xe.gr search url.
HTML
25
star
7

TuneCraft

Effortlessly create personalized Spotify playlists with fresh, undiscovered tracks tailored to your taste, by checking all your liked songs and existing playlists to provide you with truly new music.
Python
19
star
8

SpotiClick

An app that clicks a physical button whenever Spotify starts playing on the target device.
Python
18
star
9

HGN

Hybrid Girvan Newman. Code for the "A Distributed Hybrid Community Detection Methodology for Social Networks" paper.
Python
15
star
10

high-sql

A high-level sql command utility. Currently only MySQL is supported.
Python
12
star
11

Numpy-CNN

A Vanilla Numpy-only Convolutional Neural Network.
Python
12
star
12

OnDemandMinecraft-Mirrored

An AWS hosted Minecraft server that will only run when players are active. Players can start the server through a simple UI accessed through free Heroku server hosting. Mirrored from the fork: https://github.com/drkostas/OnDemandMinecraft
Python
9
star
13

Cross-The-Floor

Uses Sankey Diagrams to visualize politicians that have "crossed the floor" from election to election.
HTML
7
star
14

pyemail-sender

A utility for sending emails with attachments. Currently only Gmail is supported.
Python
7
star
15

Pong-Q-Learning

Using Q-learning to beat a Pong game program
Jupyter Notebook
6
star
16

Numpy-NeuralNet-1

A Vanilla Numpy-only Feed-Forward Neural Network.
Python
6
star
17

Machine-Learning-Algorithms

Code for the Machine Learning Course (COSC-522) of the UTK.
Jupyter Notebook
5
star
18

Quantum-Mechanics-Quiz-App

Android app with 10 questions about Quantum Mechanics.
Java
5
star
19

bench-utils

A collection of benchmarking tools.
Python
4
star
20

shooter-game-with-p5js

Simple 2d shooter game written with JavaScript and the p5.js library.
JavaScript
4
star
21

covid19-vaccinations-predict

Simultaneous Time Series Forecasting on the global COVID-19 Daily Vaccinations
Jupyter Notebook
4
star
22

LSTMs-Training-Demo

RNNs and LSTMs
PureBasic
4
star
23

EESTech-BigData-Challenge

EESTech Challenge is a brand new competition organized by EESTEC, that has the aim to create opportunities for European students to gain knowledge in the field of EECS and develop a professional network. The technological topic of 2017-2018ths competition was Big Data. This is the code I sumbitted with my team (BFS), which consisted of 3 members in total.
Jupyter Notebook
4
star
24

yaml-config-wrapper

A YAML configuration wrapper.
Python
3
star
25

cloud-filemanager

A high-level filemanager utility for cloud services. Currently only Dropbox is supported.
Python
2
star
26

Tensorflow-Training-Demo

Project 3 for the Deep Learning course (COSC 525). Training various networks with Tensorflow.
PureBasic
2
star
27

python_search_engine

This is a search engine created for the Gutenberg Project archive. It is implemented in python and the front end part is created with the flask framework.
HTML
2
star
28

multi-docker-course

Part of a Docker Course.
JavaScript
2
star
29

Bayesian-Statistics-Algorithms

Code for the Assignments of the Statistics I Course (MATH-525) of the UTK.
Jupyter Notebook
2
star
30

RL-Value-Iteration

Implementation of value iteration algorithm for calculating an optimal MDP policy.
Jupyter Notebook
1
star
31

termcolor-logger

A logger with text formatting using termcolor.
Python
1
star
32

Scripts

I will be saving here my scripts in various languages for future use and improvements.
PHP
1
star
33

multi-kubernetes-course

JavaScript
1
star
34

tda_examples

Exploratory code on topological data analysis
HTML
1
star
35

Data-Science-Methods

A playground repo for the DSE-512 course
Jupyter Notebook
1
star