• Stars
    star
    551
  • Rank 77,976 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

πŸ‹ Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

TF Depend GYM Depend License Badge

Deep Reinforcement Learning in TensorFlow2

DeepRL-TensorFlow2 is a repository that implements a variety of popular Deep Reinforcement Learning algorithms using TensorFlow2. The key to this repository is an easy-to-understand code. Therefore, if you are a student or a researcher studying Deep Reinforcement Learning, I think it would be the best choice to study with this repository. One algorithm relies only on one python script file. So you don't have to go in and out of different files to study specific algorithms. This repository is constantly being updated and will continue to add a new Deep Reinforcement Learning algorithm.

Algorithms


DQN

Paper Playing Atari with Deep Reinforcement Learning
Author Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Idea

# idea01. Approximate Q-Function using NeuralNetwork
def create_model(self):
    model = tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(16, activation='relu'),
        Dense(self.action_dim)
    ])
    model.compile(loss='mse', optimizer=Adam(args.lr))
    return model

# idea02. Use target network
self.target_model = ActionStateModel(self.state_dim, self.action_dim)
 
# idea03. Use ReplayBuffer to increase data efficiency
class ReplayBuffer:
    def __init__(self, capacity=10000):
        self.buffer = deque(maxlen=capacity)
    
    def put(self, state, action, reward, next_state, done):
        self.buffer.append([state, action, reward, next_state, done])
    
    def sample(self):
        sample = random.sample(self.buffer, args.batch_size)
        states, actions, rewards, next_states, done = map(np.asarray, zip(*sample))
        states = np.array(states).reshape(args.batch_size, -1)
        next_states = np.array(next_states).reshape(args.batch_size, -1)
        return states, actions, rewards, next_states, done
    
    def size(self):
        return len(self.buffer)

Getting Start

# Discrete Action Space Deep Q-Learning
$ python DQN/DQN_Discrete.py

DRQN

Paper Deep Recurrent Q-Learning for Partially Observable MDPs
Author Matthew Hausknecht, Peter Stone
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Previous state uses LSTM layer as feature
def create_model(self):
    return tf.keras.Sequential([
        Input((args.time_steps, self.state_dim)),
        LSTM(32, activation='tanh'),
        Dense(16, activation='relu'),
        Dense(self.action_dim)
    ])

Getting Start

# Discrete Action Space Deep Recurrent Q-Learning
$ python DRQN/DRQN_Discrete.py

DoubleDQN

Paper Deep Reinforcement Learning with Double Q-learning
Author Hado van Hasselt, Arthur Guez, David Silver
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Resolved the issue of 'overestimate' in Q Learning
on_action = np.argmax(self.model.predict(next_states), axis=1)
next_q_values = self.target_model.predict(next_states)[range(args.batch_size), on_action]
targets[range(args.batch_size), actions] = rewards + (1-done) * next_q_values * args.gamma

Getting Start

# Discrete Action Space Double Deep Q-Learning
$ python DoubleQN/DoubleDQN_Discrete.py

DuelingDQN

Paper Dueling Network Architectures for Deep Reinforcement Learning
Author Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Q-Function has been separated into Value Function and Advantage Function
def create_model(self):
    backbone = tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(16, activation='relu')
    ])
    state_input = Input((self.state_dim,))
    backbone_1 = Dense(32, activation='relu')(state_input)
    backbone_2 = Dense(16, activation='relu')(backbone_1)
    value_output = Dense(1)(backbone_2)
    advantage_output = Dense(self.action_dim)(backbone_2)
    output = Add()([value_output, advantage_output])
    model = tf.keras.Model(state_input, output)
    model.compile(loss='mse', optimizer=Adam(args.lr))
    return model

Gettting Start

# Discrete Action Space Dueling Deep Q-Learning
$ python DuelingDQN/DuelingDQN_Discrete.py

A2C

Paper Actor-Critic Algorithms
Author Vijay R. Konda, John N. Tsitsiklis
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of Ideas

# idea01. Use Advantage to reduce Variance
def advatnage(self, td_targets, baselines):
    return td_targets - baselines

Getting Start

# Discrete Action Space Advantage Actor-Critic
$ python A2C/A2C_Discrete.py

# Continuous Action Space Advantage Actor-Critic
$ python A2C/A2C_Continuous.py

A3C

Paper Asynchronous Methods for Deep Reinforcement Learning
Author Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of Ideas

# idea01. Reduce the correlation of data by running asynchronously multiple workers
def train(self, max_episodes=1000):
    workers = []

    for i in range(self.num_workers):
        env = gym.make(self.env_name)
        workers.append(WorkerAgent(
            env, self.global_actor, self.global_critic, max_episodes))

    for worker in workers:
        worker.start()

    for worker in workers:
        worker.join()

# idea02. Improves exploration through entropy loss
entropy_loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

Getting Start

# Discrete Action Space Asyncronous Advantage Actor-Critic
$ python A3C/A3C_Discrete.py

# Continuous Action Space Asyncronous Advantage Actor-Critic
$ python A3C/A3C_Continuous.py

PPO

Paper Proximal Policy Optimization
Author John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of ideas

# idea01. Use Importance Sampling to act like an Off-Policy algorithm
# idea02. Use clip to prevent rapid changes in parameters.
def compute_loss(self, old_policy, new_policy, actions, gaes):
    gaes = tf.stop_gradient(gaes)
    old_log_p = tf.math.log(
        tf.reduce_sum(old_policy * actions))
    old_log_p = tf.stop_gradient(old_log_p)
    log_p = tf.math.log(tf.reduce_sum(
        new_policy * actions))
    ratio = tf.math.exp(log_p - old_log_p)
    clipped_ratio = tf.clip_by_value(
        ratio, 1 - args.clip_ratio, 1 + args.clip_ratio)
    surrogate = -tf.minimum(ratio * gaes, clipped_ratio * gaes)
    return tf.reduce_mean(surrogate)

Getting Start

# Discrete Action Space Proximal Policy Optimization
$ python PPO/PPO_Discrete.py

# Continuous Action Space Proximal Policy Optimization
$ python PPO/PPO_Continuous.py

DDPG

Paper Continuous control with deep reinforcement learning
Author Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Continuous

Core of ideas

# idea01. Use deterministic Actor Model
def create_model(self):
    return tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(32, activation='relu'),
        Dense(self.action_dim, activation='tanh'),
        Lambda(lambda x: x * self.action_bound)
    ])

# idea02. Add noise to Action
action = np.clip(action + noise, -self.action_bound, self.action_bound)

Getting Start

# Continuous Action Space Proximal Policy Optimization
$ python DDPG/DDPG_Continuous.py

TRPO

Paper Trust Region Policy Optimization
Author John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

# NOTE: Not yet implemented!

TD3

Paper Addressing Function Approximation Error in Actor-Critic Methods
Author Scott Fujimoto, Herke van Hoof, David Meger
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Continuous

# NOTE: Not yet implemented!

SAC

Paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Author Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

# NOTE: Not yet implemented!

Reference

More Repositories

1

GANs-TensorFlow2

πŸš€ Variants of GANs most easily implemented as TensorFlow2. GAN, DCGAN, LSGAN, WGAN, WGAN-GP, DRAGAN, ETC...
Python
90
star
2

ConvNets-TensorFlow2

⛡️ Implementation a variety of popular Image Classification Models using TensorFlow2. [ResNet, GoogLeNet, VGG, Inception-v3, Inception-v4, MobileNet, MobileNet-v2, ShuffleNet, ShuffleNet-v2, etc...]
Python
90
star
3

aquvitae

Knowledge Distillation Toolkit
Python
88
star
4

DistRL-TensorFlow2

🐳 Implementation of various Distributional Reinforcement Learning Algorithms using TensorFlow2.
Python
56
star
5

LeNet-keras

Implementation of LeNet-5 with keras
Python
10
star
6

kgm-calculator

ꡬ건λͺ¨ 계산기
JavaScript
9
star
7

youtube-video-downloader-chrome-extension

one click youtube video download for chrome
JavaScript
5
star
8

AutoEncoders-TensorFlow2

🚒 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2
Python
5
star
9

AI-Wafer-Edge-Trouble-Inspection-System-DL-Code

AI Wafer Edge Trouble Inspection System Deep Learning Code
Python
4
star
10

emotic-studio-frontend

EmoticStudio Frontend Code with React.js
JavaScript
3
star
11

desole.log

Pretty console for debugging
JavaScript
3
star
12

hugo-academic

HTML
2
star
13

awesome-kubeflow

😎 Awesome lists about kubeflow
2
star
14

GANs-TensorFlow-2.0

A repository that implements GANs with TF2
Python
2
star
15

scc_ml_embedded

C
2
star
16

js-study

JavaScript
2
star
17

board

nodejs board for learn nodejs
JavaScript
2
star
18

marload-archive

profile
2
star
19

awesome-hpo

😎 Awesome lists about hyperparameter optimization
2
star
20

ConsoleHexView

C
2
star
21

video-processing

Easy Video CV Handler
Python
2
star
22

ImprovedResNet-TensorFlow2

Implementation 'Improved Residual Networks for Image and Video Recognition'
Python
2
star
23

online-judge

πŸ’΅ μ•Œκ³ λ¦¬μ¦˜ 수련 μŠˆμ›… μŠˆμ›…~ πŸ’΅
C++
2
star
24

swid-serve

Python
1
star
25

multiple-gcd-lcm

Python
1
star
26

swid-tcp-server

Python
1
star
27

acc_cpp_students_grade

C++
1
star
28

test-driven-dev-by-kent_beck

Test Driven Development by Kent Beck for python
Python
1
star
29

SuperResolution-TensorFlow2

1
star
30

DEVIEW2020-PPAI

DEVIEW2020 μš°λ¦¬κ°€ λ³Ό 수 μ—†λŠ” λ°μ΄ν„°λ‘œ λͺ¨λΈμ„ ν•™μŠ΅μ‹œν‚¬ 수 μžˆμ„κΉŒ?
1
star
31

Self-Driving-lane-detecting-using-Convolutional-Neural-Network

λŒ€ν•œμ „μžκ³΅ν•™ν™” 2018 μΆ”κ³„ν•™μˆ λŒ€νšŒ
Python
1
star
32

yunikorn-release

Apache YuniKorn Release
Go
1
star