Clustering

1. Unsupervised learning introduction

2. K-means algorithm

3. Optimization objective

4. Random initialization

5. Choosing the number of clusters

Unsupervised learning introduction

Supervised learning

Training set: {(x₁, y₁), (x₂, y₂), (x₃, y₃), ... }

Unsupervised learning

Training set: {x₁, x₂, x₃, ... }

Clustering 적용 분야:

Market segmentation, Social network analysis, Organize computing clusters, Astronomical data analysis ...

K-means algorithm

Unsupervised data

Random 하게 cluster centroid 생성 (k = 2)

두 centroid를 잇는 직선에 수직 이등분 선을 그어 clustering

각 cluster 된 값들의 평균값으로 centroid 이동

다시 이동된 centroid를 잇는 직선의 수직 이등분선을 그어 clustering

다시 각 cluster된 값들의 평균 값으로 centroid 이동

다시 이동된 centroid를 잇는 직선의 수직 이등분선을 그어 clustering

이 과정을 반복하여 최종적인 cluster 생성

K 개수가 여러개라면? Voronoi diagram 이용

K-means algorithm

input:

- K (number of clusters)

- Training set: {x₁, x₂, x₃, x₄, ... , x(m)} # Unsupervised data

cluster centroid m₁, m₂, m₃, ... m(k)을 Random 하게 initialize 한 후,
Cluster assignment step: i = 1부터 m까지 C(i)를 cluster centroid가 x(i)에 근접하게 update
Move centroid: k = 1부터 K까지 m(k)를 각 cluster의 평균값으로 update # m₂ = (x₁ + x₄ + x₃) / 4

Optimization objective

K-means optimization objective

C(i) = 각 x(i)에 cluster(1, 2, ..., K)의 index가 할당되는 변수
m(k) = cluster centroid k (m(k) € R(n))
m(c(i)) = 각 x(i)의 cluster centroid가 할당되는 변수

Opimization objective:

M(i)와 각 x(i)들의 거리 (loss) 최소화

Random initialization

K < m (data 개수)
cluster centroid K를 random으로 initialize 할 때,

→ 너무 M (cluster centroid)가 data에서 멀리 initialize 되면 clustering이 잘 안 될 수 있음.

∴ cluster centroid를 data 분포의 x(i) 점 위로 initialize

Local optima

K-means algorithm 시행할 때마다 서로 다른 극점이 나올 수 있음.

→ objective function 값이 가장 작은 clustering 선택

Choosing the number of clusters

Elbow method

여러 k값을 설정한 후, loss 값이 급격히 낮아지는 지점 (elbow)으로 k 선택
elbow 지점 이후엔 loss값이 서서히 감소

'Machine Learning' 카테고리의 다른 글

Dimension Reduction (0)	2022.05.25
Cross Validation & Dimension Reduction (0)	2022.05.24
Regularization (0)	2022.05.23
Logistic Regression (0)	2022.05.20

동영`s 인공지능 공부방

Clustering

Unsupervised learning introduction

Supervised learning

Unsupervised learning

Clustering 적용 분야:

K-means algorithm

K-means algorithm

Optimization objective

Random initialization

Random initialization

Local optima

Choosing the number of clusters

Elbow method

'Machine Learning' 카테고리의 다른 글

티스토리툴바

Clustering

Unsupervised learning introduction

Supervised learning

Unsupervised learning

Clustering 적용 분야:

K-means algorithm

K-means algorithm

Optimization objective

Random initialization

Random initialization

Local optima

Choosing the number of clusters

Elbow method

'Machine Learning' 카테고리의 다른 글

'Machine Learning' Related Articles

티스토리툴바