Recurrent Neural Networks

1. Recurrent Neural Networks

2. LSTM & GRU

Recurrent Nueral Networks

Vanilla RNN

→ 연속적인 data 정보를 가지는 시계열 data 등 에 적합한 model

기본 Neural Networks 구조

one-to-one

vanilla neural networks

one-to-many

e.g. Image Captioning (image → sequence of words)

many-to-one

e.g. Sentiment Classification (sequence of words → sentiment)

many-to-many

e.g. Machine Translation (seq of words → seq of words)

many-to-many´

e.g. Video classification on frame level (input 입력 후, 실시간으로 output 출력)

RNN의 탄생 배경

RNN의 hidden state 계산

h(t) = f(w)(h(t-1), x(t))
h(t) = new state
f(W) = some function with parameters W
h(t-1) = old state
x(t) = input vector at some time step

→ 모든 step에서 same function, set of parameter 사용

RNN의 output 계산

h(t) = tanh(W(hh)h(t-1) + W(xh)x(t)) → 이전 hidden state + 현재 step에서 input vector
y(t) = W(hy)h(t)

<RNN으로 'hello'를 train하는 과정>

· training sequence: 'hello'

· Vocabulary: [h, e, l, o]

# Vocabulary = 중복을 제거한 text의 총 단어 set

· h(t) = tanh(W(hh)h(t-1) + W(xh)x(t))

→ RNN에서는 zero-centered가 중요하기 때문에 tanh activation function 사용 (mean = 0)

· 이전 hidden state와 현재 input vector의 합으로 현재 hidden layer로 추출

· output layer에 softmax를 적용하여 값 추출

→ Backpropagation을 거쳐 weight update

→ 모든 step에서 same function, parameter를 사용하기 때문에 종합적으로 train

Vanishing/Exploding gradient of RNN

language model이나 question answering word의 다음 step 예측에 영향을 주는 word가 예측 word로 부터 멀어질수록 gradient update 시, h(t) = tanh(ax(t+1) + bh(t) + c)의 hidden state 계산에서 계수 b가 계속되는 거듭제곱으로 gradient vanishing/exploding이 크게 발생.

→ Long Sequence 처리를 위한 LSTM model 개발

LSTM & GRU

Long-Short Term model [Hochreiter et al., 1997]

f: Forget gate
i: Input gate
g: Gate gate
o: Output gate

Cell state: Long-term 정보 담당→ Long-term dependency 해결

정보가 아무 변화없이 쭉 흐를 수 있는 구조
Cell state의 + 연산으로 RNN의 gradient vanishing/exploding 극복
C(t) = f·C(t-1) + i·g

Hidden state: Short-term 정보 담당

h(t) = o·tanh(C(t))

Forget gate: 이전 정보 중 얼마나 버릴 것인가?

f(t) = σ(W(f)·[h(t-1), x(t)] + b(f))

Input gate & Gate gate: 새로운 정보 중 어떤 것을 Cell state에 저장할 것인가?

i(t) = σ(W(i)·[h(t-1), x(t)] + b(i))
C(t) = tanh(W(C)·[h(t-1), x(t)] + b(C))

Cell state: forget gate로 선별한 과거 date + Input gate로 선별한 현재 정보로 현재 cell state 생성

C(t) = f(t)*C(t-1) + i(t)*C´(t)

Output gate: tanh에 통과된 현재의 cell state를 output gate에 통과

→ 현재의 hidden state 생성, 다음 time step으로 전달

o(t) = σ(W(o)[h(t-1), x(t)] + b(o))
h(t) = o(t) * tanh(C(t))

Gated Recurrent Unit

LSTM 경량화: LSTM의 4 gate vector → 3 gate vector
forget gate와 Input gate 결합
Cell state와 Hidden state 결합
LSTM보다 월등히 향상된 속도와 성능도 비슷한 model

z(t) = σ(W(z)·[h(t-1), x(t)]): Update gate - 이전 시점의 정보가 다음 시점에서 얼마나 유지될 지 결정
r(t) = σ(W(r)·[h(t-1), x(t)]): Reset gate - 지난 정보를 얼마나 버릴 지 결정
h´(t) = tanh(W·[r(t) * h(t-1), x(t)]) → Output gate 역할
h(t) = (1 - z(t)) * h(t-1) + z(t) * h´(t) → Cell state + Hidden state

(1 - z(t)) → LSTM의 forget gate

z(t) → LSTM의 Input gate

→ 둘의 합이 1이 되도록 modeling

'Deep Learning' 카테고리의 다른 글

Attention Models (0)	2022.06.12
Convolution 연산 시, channel 수 / filter 수와 parameter 수 관계 정리 (0)	2022.06.09
CNN Architectures (0)	2022.06.07
Convolutional Neural Network (0)	2022.06.03
Training Neural Networks (0)	2022.06.02

동영`s 인공지능 공부방

Recurrent Neural Networks

Recurrent Nueral Networks

Vanilla RNN

기본 Neural Networks 구조

one-to-one

one-to-many

many-to-one

many-to-many

many-to-many´

RNN의 탄생 배경

RNN의 hidden state 계산

RNN의 output 계산

<RNN으로 'hello'를 train하는 과정>

Vanishing/Exploding gradient of RNN

LSTM & GRU

Long-Short Term model [Hochreiter et al., 1997]

Cell state: Long-term 정보 담당→ Long-term dependency 해결

Hidden state: Short-term 정보 담당

Forget gate: 이전 정보 중 얼마나 버릴 것인가?

Input gate & Gate gate: 새로운 정보 중 어떤 것을 Cell state에 저장할 것인가?

Cell state: forget gate로 선별한 과거 date + Input gate로 선별한 현재 정보로 현재 cell state 생성

Output gate: tanh에 통과된 현재의 cell state를 output gate에 통과

→ 현재의 hidden state 생성, 다음 time step으로 전달

Gated Recurrent Unit

'Deep Learning' 카테고리의 다른 글

티스토리툴바

Recurrent Neural Networks

Recurrent Nueral Networks

Vanilla RNN

기본 Neural Networks 구조

one-to-one

one-to-many

many-to-one

many-to-many

many-to-many´

RNN의 탄생 배경

RNN의 hidden state 계산

RNN의 output 계산

<RNN으로 'hello'를 train하는 과정>

Vanishing/Exploding gradient of RNN

LSTM & GRU

Long-Short Term model [Hochreiter et al., 1997]

Cell state: Long-term 정보 담당→ Long-term dependency 해결

Hidden state: Short-term 정보 담당

Forget gate: 이전 정보 중 얼마나 버릴 것인가?

Input gate & Gate gate: 새로운 정보 중 어떤 것을 Cell state에 저장할 것인가?

Cell state: forget gate로 선별한 과거 date + Input gate로 선별한 현재 정보로 현재 cell state 생성

Output gate: tanh에 통과된 현재의 cell state를 output gate에 통과

→ 현재의 hidden state 생성, 다음 time step으로 전달

Gated Recurrent Unit

'Deep Learning' 카테고리의 다른 글

'Deep Learning' Related Articles

티스토리툴바