• Stars
    star
    207
  • Rank 189,769 (Top 4 %)
  • Language
    Go
  • License
    MIT License
  • Created about 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A reimplementation of AlphaGo in Go (specifically AlphaZero)

agogo

A reimplementation of AlphaGo in Go (specifically AlphaZero)

About

The algorithm is composed of:

  • a Monte-Carlo Tree Search (MCTS) implemented in the mcts package;
  • a Dual Neural Network (DNN) implemented in the dualnet package.

The algorithm is wrapped into a top-level structure (AZ for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.

The contract specifies the description of a game state.

In this package, the contract is a Go interface declared in the game package: State.

Description of some concepts/ubiquitous language

  • In the agogo package, each player of the game is an Agent, and in a game, two Agents are playing in an Arena

  • The game package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on a State of the game. A State is an interface that represents the current game state as well as the allowed interactions. The interaction is made by an object Player who is operating a PlayerMove. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.

Training process

Applying the Algo on a game

This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the game package. Then, the model can be saved and used as a player.

The steps to train the algorithm are:

  • Creating a structure that is fulfilling the State interface (aka a game).
  • Creating a configuration for your AZ internal MCTS and NN.
  • Creating an AZ structure based on the game and the configuration
  • Executing the learning process (by calling the Learn method)
  • Saving the trained model (by calling the Save method)

The steps to play against the algorithm are:

  • Creating an AZ object
  • Loading the trained model (by calling the Read method)
  • Switching the agent to inference mode via the SwitchToInference method
  • Get the AI move by calling the Search method and applying the move to the game manually

Examples

Four board games are implemented so far. Each of them is defined as a subpackage of game:

tic-tac-toe

Tic-tac-toe is a m,n,k game where m=n=k=3.

Training

Here is a sample code that trains AlphaGo to play the game. The result is saved in a file example.model

// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe
func encodeBoard(a game.State) []float32 {
     board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
     for i := range board {
     if board[i] == 0 {
          board[i] = 0.001
     }
     }
     playerLayer := make([]float32, len(a.Board()))
     next := a.ToMove()
     if next == game.Player(game.Black) {
     for i := range playerLayer {
          playerLayer[i] = 1
     }
     } else if next == game.Player(game.White) {
     // vecf32.Scale(board, -1)
     for i := range playerLayer {
          playerLayer[i] = -1
     }
     }
     retVal := append(board, playerLayer...)
     return retVal
}

func main() {
    // Create the configuration of the neural network
     conf := agogo.Config{
         Name:            "Tic Tac Toe",
         NNConf:          dual.DefaultConf(3, 3, 10),
         MCTSConf:        mcts.DefaultConfig(3),
         UpdateThreshold: 0.52,
     }
     conf.NNConf.BatchSize = 100
     conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
     conf.NNConf.K = 3
     conf.NNConf.SharedLayers = 3
     conf.MCTSConf = mcts.Config{
         PUCT:           1.0,
         M:              3,
         N:              3,
         Timeout:        100 * time.Millisecond,
         PassPreference: mcts.DontPreferPass,
         Budget:         1000,
         DumbPass:       true,
         RandomCount:    0,
     }

     conf.Encoder = encodeBoard

    // Create a new game
    g := mnk.TicTacToe()
    // Create the AlphaZero structure 
    a := agogo.New(g, conf)
    // Launch the learning process
    err := a.Learn(5, 50, 100, 100) // 5 epochs, 50 episode, 100 NN iters, 100 games.
    if err != nil {
        log.Println(err)
    }
    // Save the model
     a.Save("example.model")
}

Inference

func encodeBoard(a game.State) []float32 {
    board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
    for i := range board {
        if board[i] == 0 {
            board[i] = 0.001
        }
    }
    playerLayer := make([]float32, len(a.Board()))
    next := a.ToMove()
    if next == game.Player(game.Black) {
        for i := range playerLayer {
            playerLayer[i] = 1
        }
    } else if next == game.Player(game.White) {
        // vecf32.Scale(board, -1)
        for i := range playerLayer {
            playerLayer[i] = -1
        }
    }
    retVal := append(board, playerLayer...)
    return retVal
}

func main() {
    conf := agogo.Config{
        Name:     "Tic Tac Toe",
        NNConf:   dual.DefaultConf(3, 3, 10),
        MCTSConf: mcts.DefaultConfig(3),
    }
    conf.Encoder = encodeBoard

    g := mnk.TicTacToe()
    a := agogo.New(g, conf)
    a.Load("example.model")
    a.A.Player = mnk.Cross
    a.B.Player = mnk.Nought
    a.B.SwitchToInference(g)
    a.A.SwitchToInference(g)
    // Put x int the center
    stateAfterFirstPlay := g.Apply(game.PlayerMove{
        Player: mnk.Cross,
        Single: 4,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · · · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥

    // What to do next
    move := a.B.Search(stateAfterFirstPlay)
    fmt.Println(move)
    // 1
    g.Apply(game.PlayerMove{
        Player: mnk.Nought,
        Single: move,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · O · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥
}

Misc

A Funny Thing Happened On The Way To Reimplementing AlphaGo - A talk by @chewxy (one of the authors) about this specific implementation

Credits

Original implementation credits to

More Repositories

1

gorgonia

Gorgonia is a library that helps facilitate machine learning in Go.
Go
5,520
star
2

cu

package cu provides an idiomatic interface to the CUDA Driver API.
Go
468
star
3

tensor

package tensor provides efficient and generic n-dimensional arrays in Go that are useful for machine learning and deep learning purposes
Go
313
star
4

golgi

Golgi is a library built on top of Gorgonia that provides neural network components
Go
39
star
5

bindgen

libbindgen is a package that generates bindings and idiomatic Go interfaces to C-libraries
Go
27
star
6

vecf64

package vecf64 provides common functions and methods for slices of float64
Assembly
21
star
7

gorgonia.github.io

The source code (and website) for gorgonia.org
CSS
13
star
8

vecf32

Package vecf32 provides common functions and methods for slices of float32
Assembly
11
star
9

parser

A simple parser that turns a formulae written in unicode into an ExprGraph
Go
10
star
10

vulkan

Vulkan back-end for Gorgonia (early development)
Go
8
star
11

examples

Examples of neural networks and other things written in Gorgonia
5
star
12

qol

Neural Network Quality of Life Stuff
Go
5
star
13

shapes

Package shapes provides an algebra for handling shapes
Go
4
star
14

dawson

package dawson is a package that provides utilities for testing involving floats
Go
3
star
15

exp

experimental things for Gorgonia, to sketch things out
Go
3
star
16

randomkit

randomkit is a repository for random number generation.
Go
2
star
17

dev

Scripts and helpers for developing Gorgonia
Shell
2
star
18

internal

internal use libraries
Go
1
star
19

dtype

Package dtype provides a definition of a Dtype, which is a part of the type system that Gorgonia uses.
Go
1
star