#Reinforcement Learning" (1) 커리큘럼

과목 : Reinforcement Learning

파이썬, 인공지능, 머신러닝, 통계/수학 등 인공지능 학습을 위해 요구 된다는 기초 지식 없이

무작정 인공지능에 대한 동경 만으로 불혹의 나이에 시작하게 된 박사 과정과 기업의 AI 조직에 소속된 연구원으로써 진행 중인 과제 수행 및 서비스 개발 과정 에서

목적 달성을 위해 검토한 자료와 서적 그리고 유투브 강의와 대학원 과정을 이수 하며 얻게 된 지식과 경험을 본 블로그의 “major in” 메뉴에서 각 과목 별로 구분하여 정리하며 공유해 나가고자 합니다.

변화 하는 현실 대비 부족한 능력과 알 수 없는 미래에 대한 걱정에서 오는 직장인의 불안함은 오로지 자기 학습과 체력 그리고 이에 기반한 도전만이 근본적인 해결이 가능한 처방이 이며

시간이 부족한 직장인의 경우 상생과 시간의 소중함을 아는 사람들과 함께해야만 성장이 가능 하기 때문 입니다

같은 고민을 하고 계시는 분들에게 도움이 되길 바랍니다.

접근 방향(목표) : 불확실한 환경 에서 의 최적 의사 결정 그리고 강화 학습에 기반한 개인 화 서비스 개발 -2021년 ~ -

- 목 차 -

내 용

배경 지식 (Background)

MDP
Bellman Expectation Equation
Q function

순차적 의사 결정 (Sequential decision making) 문제의 접근

Planning : model based & dynamic programming
Model free RL : no model , learning value function and/or policy from experience

Model free Reinforcement learning

Model Free prediction (evaluation)
Model Free control (improvement)

Value Function vs Policy

Value-based

Learned Value Function
Implicit policy (e.g. 𝜀-greedy)

Policy-based

No Value Function
Learned Policy

Actor-Critic

Learned Value Function
Learned Policy

Model based RL : learn a model from experience , plan value function and/or policy from model

강화 학습 입문

Model free control with Monte Carlo (MC)
Model free control with Temporal Difference (TD)

Dummy Q Learning - Sarsa

Learning Q (s,q)

discounted future reward

Future reward
Discounted future reward (environment is stochastic)

Q Learning

Select a action "Exploit vs Exploration" * 현재 값 이용 vs 모험과 도전

Exploit vs Exploration (E-greedy)
Exploit vs Exploration (decaying)
Exploit vs Exploration (add random noise)

Deterministic vs Stochastic * 확정적 vs 확률론적

Deterministic
Stochastic (non-deterministic)

Learning incrementally : learning with Learning rate

강화 학습 알고리즘

DQN (Deep Q-networks) [NIPS 2013] [Nature 2015] [코드]
Gorila (Google Reinforcement Learning Architecture) DQN
D-DQN (Double DQN) [AAAI 2016]
Prioritized DQN (Prioritized Experience Replay) [ICLR 2016]
Dueling DQN (Dueling Network Architecture) [ICLR 2016]
Multi-Step Learning [ICLR 2016]
Distributional Reinforcement Learning [ICLR 2017]
Noisy Nets [ICLR 2018]
RAINBOW [AAAI 2018]

Policy Gradient

개요
Policy Gradient Theorem
Proof of Policy Gradient Theorem

알고리즘 성능 개선

Policy Gradient Algorithms

VPG (Vanilla Policy Gradient Algorithm) [Sutton 2000]
REINFORCE (Monte-Carlo Policy gradient)
AC (Actor-Critic)
A3C (Asynchronous Advantage Actor-Critic) [ICML 2016] [논문|코드]
A2C ((synchronous) Advantage Actor-Critic) [논문|코드]
TRPO (Trust region policy optimization)
PPO (Proximal Policy Optimization)
ACKTR (Actor-Critic using Kronecker-factored Trust Region)

Exploration

Curiosity-driven Exploration by self-Supervised Prediction [2017 CVPRW]
Random Network Distillation [ICLR 2019]
Distributed Prioritized Experience Replay [ICLR 2018]
R2D2 (Recurrent Experience Replay in Distributed Reinforcement Learning) [ICLR 2019]
Never Give Up [ICLR 2020]
Agent 57 [2020]
Self-Supervised Exploration via Disagreement [2019 ICML]
AGAC(Adversarially Guided Actor-Critic) [ICLR 2021]
GAIL(Generative Adversarial Imitation Learning) [2016 NIPS]
Imitation learning

Type of Imitation learning

Behavioral Cloning
Direct Policy Learning
Inverse Reinforcement Learning

Generative Adversarial Imitation Learning (GAIL) [2016 NIPS]
Learning by Imitating Animals [RSS 2020]

2. 주요 논문

3. 실습 (환경 구축 및 시행 착오 review)

openai gym

code review

Frozen lake
Cart Pole
halfcheetah bullet env

환경 설정

OS : Windows 10

Neural Network

Pytorch
TensorFlow

4. Reference

참고 서적

심층 강화 학습 인 액션 (Deep Reinforcement Learning In Action) -알랙스 짜이, 브랜던 브라운 지음, 류광 옮김
파이썬과 케라스로 배우는 강화 학습 (내 손으로 직접 구현하는 게임 인공지능) -이웅원, 양혁렬, 김건우, 이영무, 이의령 지음
PyTorch 를 활용한 강화학습/심층 강화 학습 실전 입문 - 오이와 유타로 지음, 심효섭 옮김
Machine Learning, Tom Mitchell, 1997

도움이 되는 강의

http://hunkim.github.io/ml/ 모두를 위한 머신러닝/딥러닝 강의 - 홍콩과기대 김성훈-
https://youtube.com/playlist?list=PLQ28Nx3M4JrhkqBVIXg-i5_CVVoS1UzAv 모두를 위한 딥러닝 시즌 2 [PyTorch]
2021년 Reinforcement learning 강의 - 김유성 교수/성균관대학교

논문

Q-Learning : https://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf

참고 강의

시즌 RL - Deep Reinforcement Learning

비디오 리스트 (시즌 1 먼저 들으신 다음 들으시면 좋습니다.)

Lecture 1: 수업의 개요 비디오 강의 슬라이드
Lecture 2: OpenAI GYM 게임해보기 비디오 강의 슬라이드

Lab 2: OpenAI GYM 게임해보기 실습 비디오 실습슬라이드

Lecture 3: Dummy Q-learning (table) 비디오 강의 슬라이드

Lab 3: Dummy Q-learning (table) 비디오 실습슬라이드

Lecture 4: Q-learning exploit&exploration and discounted reward 비디오 강의 슬라이드

Lab 4: Q-learning exploit&exploration and discounted reward 비디오 실습슬라이드

Lecture 5: Q-learning in non-deterministic world 비디오 강의 슬라이드

Lab 5: Q-learning in non-deterministic world 비디오 실습슬라이드
Lab 5-1: Q-learning web Demo 비디오

Lecture 6: Q-Network 비디오 강의 슬라이드

Lab 6-1: Q Network for Frozen Lake 비디오 실습슬라이드
Lab 6-2: Q Network for Cart Pole 비디오 실습슬라이드

Lecture 7: DQN 비디오 강의 슬라이드

Lab 7-1: DQN 1 (NIPS 2013) 비디오 실습슬라이드
Lab 7-2: DQN 2 (Nature 2015) 비디오 실습슬라이드
Lab 7-3: DQN Cart Pole Demo 비디오
Lab 7-4: DQN Simple Pacman Demo (여러분은 최고 몇점까지 갈수 있나요?) 비디오

이 블로그 검색

#Reinforcement Learning" (1) 커리큘럼 - Reinforcement Learning