본문 바로가기

NLP

Text Generation

Text Generation


Text Generation

(Natural Language Generation: NLG)

Given some inputs, a model generates new texts

 

Text Generation Applications

· Machine Translation

· Open-ended Genration (자유 생성)

· Documnet Summarization

· Dialogue System

· Question & Answering / Entity Retrival

 

 


Formulation of Text Generation

Text classification Text Generation
Only a few prediciton Large probability space of text generation
(vocab_size=10000, sequence_lenght=30 → the number of cases for possible text = 10000^30)
MLE with Cross-Entropy Conditional probability with chain rule

Conditional probability with chain rule

· X: Source Text, Y: Target Text (to be genrated)

(In Machine Translation, X: Source, Y: Target,

In Summarization, X: Long paragraph, Y: Summary)

 

 

In Seq2Seq model,

 

 

Teacher forcing

"How to provide input words to a decoder?"

With Teacher forcing

- Groud-truth target word가 decoder의 next input으로 투입

- 이전 timestep의 prediction은 다음 prediction에 영향을 끼치지 않음

Without Teacher forcing

- Generated word가 decoder의 next input으로 투입

- 이전 timestep의 prediction이 다음 prediction에 영향을 끼침

 

Pros of Teacher forcing: Fast, stable, efficient training

Cons of Teacher forcing: Exposure Bias(Generation model test시, groud-truth word가 없음)

→ Train과 test의 groud-truth 유무로 인한 discrependency가 model 성능 저하

 

Scheduled Sampling

Train 초반에 Teacher forcing 시행 후, 점차 제거

Scheduled Sampling으로 train과 test 사이의 discrependency 감소

 

cf) 다만, 최근 model에서는 매번 teacher forcing을 시행하는 추세 (by GPU 병렬처리)

 


Decoding Strategy

Train된 model로 text generate하는 과정

 

Greedy Decoding

Most probable word (argmax)를 generate

Greedy Decoding

→ Optimal한 generation은 아님

 

 

Beam Search

Decoder의 각 step에서 K개의 most probable partial sequence 선택

If beam size = 2:

매 decoder step에서 k=2를 유지하는 게 point !

Pros of Beam search: High recall value (TP / (TP + FN))

Cons of Beam search: Generate dull, reptitive sequence

Generate dull: Generate된 sequence가 너무 optimal해서 diversity ↓
Repeated phrase의 probability가 postivie feedback loop를 생성하여 반복

Penalized sampling

이전에 나온 word의 확률 값을 조정하여 Beam Search의 reptitive sequence 문제 해소

 

 

Sampling-based Decoding Strategy

"Human do not always select the words with the highest probability."

→ More diverse, surprising, not boring texts generate 필요

Randomness 요소 추가

→ Random Sampling, Top-k Sampling, Top-p Sampling (or nucleus sampling)

 

Random Sampling

Sample the next word from the probability

Con of Random Sampling: Very low probability word가 선택될 수도 있음

 

Random Sampling with Temperature

t < 1

Adjust the probability flat or sharp

t = 0.5 일 시,

- t < 1: 더 sharp한 graph (seqeunce optimize ↑)

- t > 1: 더 flat한 graph (seqeunce diversity ↑

- t→0: Greedy Decoding

- t→∞: Uniform Sampling

 

Top-K Sampling

K개의 most likely word를 sampling

Con of Tok-K Sampling: Fixed된 K개를 sampling하는 것은 별로 비효율적

각 Sequence별로 효율적인 K 값이 다름

 

 

Top-P Sampling (Nucleus Sampling)

Probability P를 설정하여 가장 possible한 set of words sampling

K의 값이 fixed되지 않는다!

 


Evaluation Metric in NLG

NLG의 정량적 평가는 최고 난제

 

Common approach: Generated된 text와 answer text간의 similarity 비교

의미가 유사한 representation에 대한 판단&darr;

Word-level Similarity

- BLEU: Machine Translation에서 많이 사용, n-gram 기반

- ROUGE

- METEOR

Embedding Similarity

- Word Average / Extrema / Greedy

- BERTScore

 

 

Perplexity (PPL)

Target sentence가 존재하지 않을 때,

CrossEntropy Loss랑 비슷한 형태

(Generated text가 아닌!!) Generation model의 performance를 측정

- Model이 얼마나 text를 predict할수 있는지 reciprocal

- PPL↓ model is better

(일종의 loss 값이기 때문에..!)


References

CS224n(http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture15-nlg.pdf)

https://ai-information.blogspot.com/2019/03/scheduled-sampling.html

 

Scheduled sampling

AI에 관련된 논문과 지식을 포스팅한 블로그입니다.

ai-information.blogspot.com

[1908.04319] Neural Text Generation with Unlikelihood Training (arxiv.org)

[1904.09751] The Curious Case of Neural Text Degeneration (arxiv.org)

 

The Curious Case of Neural Text Degeneration

Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators. The counter-intuitive empirical observation is that even though the use of likelihood as tr

arxiv.org

[1704.04368] Get To The Point: Summarization with Pointer-Generator Networks (arxiv.org)

 

Get To The Point: Summarization with Pointer-Generator Networks

Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: th

arxiv.org

[1909.05858] CTRL: A Conditional Transformer Language Model for Controllable Generation (arxiv.org)

 

CTRL: A Conditional Transformer Language Model for Controllable Generation

Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on con

arxiv.org

Decoding Strategies that You Need to Know for Response Generation | by Vitou Phy | Towards Data Science

 

Decoding Strategies that You Need to Know for Response Generation

Techniques to make your generative models better

towardsdatascience.com

 

'NLP' 카테고리의 다른 글

Neural Machine Translation  (0) 2022.07.08
Sequence/Token Classification  (0) 2022.07.06
GPT  (0) 2022.07.05
BERT  (0) 2022.07.01
Sinusoidal Positional Encoding 직접 계산해보기  (0) 2022.07.01