Lecture 1: Introduction to Reinforcement Learning (작성중)

2026-01-15

Reinforcement Learning을 공부 해보고 싶어, RL의 vible이라고 불리는 David Silver 교수님의 강의를 보게 되었다.

2. About Reinforcement Learning

RL이 사용되는 범위와 그 교집합이다.
RL이 다른 meachine learning과 다른 점은 다음과 같다
- supervisor가 없고, 오직 reward signal만 존재한다.
- feedback은 즉각적이지 않고 delay가 존재한다.
- 시간이 정말 중요하다 (data가 순차적이며, 독립적이지 않다.)
- agent의 행동이 그 이후에 생기는 data에 영향을 미친다.
이에 따른 예시는 다음과 같다.
- Fly stunt manoeuvres in a helicopter
- Defeat the world champion at Backgammon
- Manage an investment portfolio
- Control a power station
- Make a humanoid robot walk
- Play many different Atari games better than humans

3. The Reinforcement Learning Problem

reward $R_{t}$ 는 scalar feedback signal 값이다.
agent의 역할은 누적 reward 값을 최대화 시키는 것이다.
RL은 reward hypothesis 를 기반으로 한다.
예시는 다음과 같다.
- Fly stunt manoeuvres in a helicopter
  - +ve reward for following desired trajectory
  - −ve reward for crashing
- Defeat the world champion at Backgammon
  - +/−ve reward for winning/losing a game
- Manage an investment portfolio
  - +ve reward for each $ in bank
- Control a power station
  - +ve reward for producing power
  - −ve reward for exceeding safety thresholds
- Make a humanoid robot walk
  - +ve reward for forward motion
  - −ve reward for falling over
- Play many different Atari games better than humans
  - +/−ve reward for increasing/decreasing score

4. Imside An RL Agent

5. Problems within Reinforcement Learning