Text Generation

(Natural Language Generation: NLG)

Given some inputs, a model generates new texts

Text Generation Applications

· Machine Translation

· Open-ended Genration (자유 생성)

· Documnet Summarization

· Dialogue System

· Question & Answering / Entity Retrival

Formulation of Text Generation

Text classification	Text Generation
Only a few prediciton	Large probability space of text generation (vocab_size=10000, sequence_lenght=30 → the number of cases for possible text = 10000^30)
MLE with Cross-Entropy	Conditional probability with chain rule

· X: Source Text, Y: Target Text (to be genrated)

(In Machine Translation, X: Source, Y: Target,

In Summarization, X: Long paragraph, Y: Summary)

In Seq2Seq model,

Teacher forcing

"How to provide input words to a decoder?"

With Teacher forcing

- Groud-truth target word가 decoder의 next input으로 투입

- 이전 timestep의 prediction은 다음 prediction에 영향을 끼치지 않음

Without Teacher forcing

- Generated word가 decoder의 next input으로 투입

- 이전 timestep의 prediction이 다음 prediction에 영향을 끼침

Pros of Teacher forcing: Fast, stable, efficient training

Cons of Teacher forcing: Exposure Bias(Generation model test시, groud-truth word가 없음)

→ Train과 test의 groud-truth 유무로 인한 discrependency가 model 성능 저하

Scheduled Sampling

Train 초반에 Teacher forcing 시행 후, 점차 제거

cf) 다만, 최근 model에서는 매번 teacher forcing을 시행하는 추세 (by GPU 병렬처리)

Decoding Strategy

Train된 model로 text generate하는 과정

Greedy Decoding

Most probable word (argmax)를 generate

→ Optimal한 generation은 아님

Beam Search

Decoder의 각 step에서 K개의 most probable partial sequence 선택

If beam size = 2:

Pros of Beam search: High recall value (TP / (TP + FN))

Cons of Beam search: Generate dull, reptitive sequence

Generate dull: Generate된 sequence가 너무 optimal해서 diversity ↓

Repeated phrase의 probability가 postivie feedback loop를 생성하여 반복

Penalized sampling

이전에 나온 word의 확률 값을 조정하여 Beam Search의 reptitive sequence 문제 해소

Sampling-based Decoding Strategy

"Human do not always select the words with the highest probability."

→ More diverse, surprising, not boring texts generate 필요

→ Randomness 요소 추가

→ Random Sampling, Top-k Sampling, Top-p Sampling (or nucleus sampling)

Random Sampling

Sample the next word from the probability

Con of Random Sampling: Very low probability word가 선택될 수도 있음

Random Sampling with Temperature

Adjust the probability flat or sharp

- t < 1: 더 sharp한 graph (seqeunce optimize ↑)

- t > 1: 더 flat한 graph (seqeunce diversity ↑)

- t→0: Greedy Decoding

- t→∞: Uniform Sampling

Top-K Sampling

K개의 most likely word를 sampling

Con of Tok-K Sampling: Fixed된 K개를 sampling하는 것은 별로 비효율적

Top-P Sampling (Nucleus Sampling)

Probability P를 설정하여 가장 possible한 set of words sampling

Evaluation Metric in NLG

NLG의 정량적 평가는 최고 난제

Common approach: Generated된 text와 answer text간의 similarity 비교

Word-level Similarity

- BLEU: Machine Translation에서 많이 사용, n-gram 기반

- ROUGE

- METEOR

Embedding Similarity

- Word Average / Extrema / Greedy

- BERTScore

Perplexity (PPL)

Target sentence가 존재하지 않을 때,

(Generated text가 아닌!!) Generation model의 performance를 측정

- Model이 얼마나 text를 predict할수 있는지 reciprocal

- PPL↓ model is better

(일종의 loss 값이기 때문에..!)

References

CS224n(http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture15-nlg.pdf)

https://ai-information.blogspot.com/2019/03/scheduled-sampling.html

Scheduled sampling

AI에 관련된 논문과 지식을 포스팅한 블로그입니다.

ai-information.blogspot.com

[1908.04319] Neural Text Generation with Unlikelihood Training (arxiv.org)

[1904.09751] The Curious Case of Neural Text Degeneration (arxiv.org)

The Curious Case of Neural Text Degeneration

Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators. The counter-intuitive empirical observation is that even though the use of likelihood as tr

arxiv.org

[1704.04368] Get To The Point: Summarization with Pointer-Generator Networks (arxiv.org)

Get To The Point: Summarization with Pointer-Generator Networks

Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: th

arxiv.org

[1909.05858] CTRL: A Conditional Transformer Language Model for Controllable Generation (arxiv.org)

CTRL: A Conditional Transformer Language Model for Controllable Generation

Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on con

arxiv.org

Decoding Strategies that You Need to Know for Response Generation | by Vitou Phy | Towards Data Science

Decoding Strategies that You Need to Know for Response Generation

Techniques to make your generative models better

towardsdatascience.com

'NLP' 카테고리의 다른 글

Neural Machine Translation (0)	2022.07.08
Sequence/Token Classification (0)	2022.07.06
GPT (0)	2022.07.05
BERT (0)	2022.07.01
Sinusoidal Positional Encoding 직접 계산해보기 (0)	2022.07.01

동영`s 인공지능 공부방

Text Generation

Text Generation

Text Generation

Text Generation Applications

Formulation of Text Generation

Teacher forcing

With Teacher forcing

Without Teacher forcing

Scheduled Sampling

Decoding Strategy

Greedy Decoding

Beam Search

Penalized sampling

Sampling-based Decoding Strategy

Random Sampling

Random Sampling with Temperature

Top-K Sampling

Top-P Sampling (Nucleus Sampling)

Evaluation Metric in NLG

Common approach: Generated된 text와 answer text간의 similarity 비교

Perplexity (PPL)

References

'NLP' 카테고리의 다른 글

티스토리툴바

Text Generation

Text Generation

Text Generation

Text Generation Applications

Formulation of Text Generation

Teacher forcing

With Teacher forcing

Without Teacher forcing

Scheduled Sampling

Decoding Strategy

Greedy Decoding

Beam Search

Penalized sampling

Sampling-based Decoding Strategy

Random Sampling

Random Sampling with Temperature

Top-K Sampling

Top-P Sampling (Nucleus Sampling)

Evaluation Metric in NLG

Common approach: Generated된 text와 answer text간의 similarity 비교

Perplexity (PPL)

References

'NLP' 카테고리의 다른 글

'NLP' Related Articles

티스토리툴바