Word Embedding

＊사용된 모든 영문 image의 출처는 cs231n 강의 자료입니다.＊

1. Word Embedding

2. Word2Vec

3. GloVe

4. Doc2Vec

Frequency 기반 Word Embedding	Distributed Representation Word Embedding
TF-IDF, Topic Modeling ···	Word2Vec, GloVe, Doc2Vec ···

Word의 의미를 vector의 각 차원에 고루 값을 가지도록 설정

e.g.)

'cat' / 'kitty' → Similar vector representation → Short distance

'hamburger' / 'cat' → Different vector representation → Far distance

Distributed vector representation

- Multiple dimension의 nonzero값을 가지는 vector representation

- Euclidean distance, inner product, cosine similarity를 이용하여 word vector간의 semantic similarity 계산

Word2Vec

Word2Vec Overview

Assumption) Similar context의 word는 similar meaning을 갖는다.

Context word의 vector representation을 train
Similar word간의 vector distance를 줄이게 train

Word2Vec algorithm

→ 각 단어의 pair를 구성하여 center word를 input으로, around word를 output으로 train: SkipGram 방식

- Sentence: "I study math."

- Vocabulary: {'I', 'study', 'math'} # unique word set

- Input: 'study' [0, 1, 0]

- Output: 'math' [0, 0, 1]

'study' vector: W₁의 2번째 column

'math' vector: W₂의 3번째 row

→ 'study' vector in W₁과 'math' vector in W₂는 high inner-product value를 갖는다.

→ Input vector 'eat'는 Output vector로 'apple', 'orange', 'rice' 중 하나를 predict

→ 'eat'와 'apple', 'orange', 'rice'는 high inner product value를 갖는다.

Property of Word2Vec

Space 상 vector point 간의 word vector = word 간의 relationship
same relationship = same vector로 표현

vec[queen] - vec[king] = vec[woman] - vec[man]

→ vec[queen] - vec[king] + vec[man] = vec[woman]

i.g. Company - CEO

Application of Word2Vec

Word similarity
Machine translation
Part-of-speech tagging and named entity recognition
Sentiment analysis
Clustering
Semantic Lexicon building

Recommender system by Word2Vec

- User vocabulary: {John, Jane, Michael}

- Item vocabulary: {Galaxy, iPhone, Macbook, iPad}

→ W₁의 'Jane' vector와 W₂의 'Mac' vector는 high inner product value를 갖는다.

→ Word2Vec을 recommender system에 사용

GloVe

Global Vectors for Word Representation
먼저, Co-occurrence matriox를 계산하여 identical word pair들이 다시 training 되는 것을 방지
그 후, co-occurrent matrix위 matrix decomposition 시행

Word2Vec → 한 word가 중복하여 instance가 늘어나면 여러번 train돼서 비효율적

GloVe → word가 함께 등장한 빈도 수 계산

→ Train 효율 상승 (속도↑)

GloVe Loss function:

Log함수 사용 → 'the'와 같이 값이 기하급수적으로 커지는 word의 similarity 폭주 방지
f(Pij)함수 → 각 word의 빈도 수에 비례한 train weight → 최댓값 고정

Doc2Vec

Doc2Vec (Paragraph2Vec)

Word와 다양한 categorical varaiance를 함께 train
Same paragraph상의 word와 document는 high similarity를 갖는다.
Document는 wrod vector로서 same space에 embedded된다.

Multi-hot vector 사용

((study, female, 10s, math)

W₁의 'study', 'female', '10s' vector와 W₂의 'math' vector 간의 inner product value의 합은 high해야 한다.
Total loss = α·loss_word +β·loss_gender + γ·loss_age (α, β, γ는 hyperparameter)

'NLP' 카테고리의 다른 글

Language Model & AWD techniques (0)	2022.06.22
Character-level Language Model (0)	2022.06.21
Topic Modeling (0)	2022.06.17
Bag-of-Words (0)	2022.06.16
NLP overview (0)	2022.06.16

동영`s 인공지능 공부방

Word Embedding - Word2Vec, Glove, Doc2Vec

Word Embedding