Reinforcement Learning
여러 환경에 적용해보는 강화학습 예제(파이토치로 옮기고 있습니다)
Here is my new Repo for Policy Gradient!!
[Breakout / Use DQN(Nature2015)]
1. Q-Learning / SARSA
- FrozenLake(Gridword)
- WindyGridWorld(in Sutton's book)
2. Q-Network (Action-Value Function Approximation)
3. DQN
DQN(NIPS2013)은 (Experience Replay Memory / CNN) 을 사용.
- CartPole(Classic Control) - Cartpole 같은 경우에는 CNN을 사용하지 않고 센서 정보를 통해서 학습
DQN(Nature2015)은 (Experience Replay Memory / Target Network / CNN) 을 사용
- CartPole(Classic Control)
- Breakout(atari)
- Breakout(atari)
- this code is made by pytorch and more efficient memory and train
5. Vanilla Policy Gradient(REINFORCE)
6. Advantage Actor Critic
- episodic
- one-step
- n-step
7. Deep Deterministic Policy Gradient
8. Parallel Advantage Actor Critic(is called 'A2C' in OpenAI)
- CartPole(Classic Control)(used a single thread instead of multi thread)
- CartPole(Classic Control)(used multiprocessing in pytorch)
- Super Mario Bros(used multiprocessing in pytorch)