Clustering

Clusetring

์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•ด์„œ ๋ฐ์ดํ„ฐ ์ง‘๋‹จ์„ ์ •์˜ํ•˜๊ณ , ๋ฐ์ดํ„ฐ ์ง‘๋‹จ์„ ๋Œ€ํ‘œํ•  ์ˆ˜ ์žˆ๋Š” ๋Œ€ํ‘œ์ ์„ ์ฐพ๋Š” ๊ณผ์ •์ด๋‹ค.

1. K-Means Clustering

Kmeans

Step 1. ํด๋ž˜์Šค ๊ฐœ์ˆ˜ ๊ฒฐ์ • & ์ค‘์‹ฌ์  ๋ฌด์ž‘์œ„ ์„ ํƒ
Step 2. ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋ž˜์Šค์— ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฐ์ •
Step 3. ํด๋ž˜์Šค ์ค‘์‹ฌ์  ์žฌ๊ณ„์‚ฐ
Step 4. ์ˆ˜๋ ดํ•  ๋•Œ๊นŒ์ง€ Step 2~3 ๋ฐ˜๋ณต

๋‹จ์  1. ํด๋ž˜์Šค ๊ฐœ์ˆ˜๋ฅผ ๋ฏธ๋ฆฌ ๊ฒฐ์ •ํ•˜์—ฌ์•ผ ํ•œ๋‹ค.
๋‹จ์  2. ์ด์ƒ์น˜๋“ค๋กœ ์ธํ•ด ํ‰๊ท ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ํ•˜๋Š” ๊ฒƒ์ด ์˜ณ์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ๋Š” ์ค‘์•™๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š” K-Medians๋ฅผ ํ™œ์šฉํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

2. Mean-Shift Clustering

MeanShift1

๋†’์€ ๋ฐ€๋„๋ฅผ ๋ณด์ด๋Š” ์ง€์—ญ์œผ๋กœ sliding window๋ฅผ ์˜ฎ๊ฒจ๊ฐ€๋Š” hill climbing ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.

MeanShift2

์žฅ์  1. ํด๋ž˜์Šค ๊ฐœ์ˆ˜๋ฅผ ๋ฏธ๋ฆฌ ๊ฒฐ์ •ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค.
๋‹จ์  1. kernel = ๋ฐ˜์ง€๋ฆ„ r ์‚ฌ์ด์ฆˆ๋ฅผ ์„ ํƒํ•ด์•ผ ํ•œ๋‹ค.

3. DBSCAN

Density-Based Spatial Clustering of Applications with Noise

DBSCAN

์žฅ์  1. ํด๋ž˜์Šค ๊ฐœ์ˆ˜๋ฅผ ๋ฏธ๋ฆฌ ๊ฒฐ์ •ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค.
๋‹จ์  1. ๋ณ„๋กœ ์ž˜ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š”๋‹ค.

4. EM Clusteinrg with GMM

Expectation-Maximization using Gausian Mixture Models

EMwithGMM

Step 1. K-means์™€ ๊ฐ™์ด ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜๋ฅผ ์ •ํ•˜๊ณ ,๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ ๋ชจ์ˆ˜๋ฅผ ์ •ํ•œ๋‹ค.
Step 2. ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ํŠน์ • ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•  ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค.
Step 3. Step 2์—์„œ ๊ณ„์‚ฐํ•œ ํ™•๋ฅ (likelihood)์— ๊ทผ๊ฑฐํ•˜์—ฌ ์ด๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์˜ ์ƒˆ๋กœ์šด ๋ชจ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

์žฅ์  1. K-Means๋ณด๋‹ค ์œ ์—ฐํ•˜๋‹ค.
์žฅ์  2. ๊ฐ ๋ฐ์ดํ„ฐ๋Š” ์—ฌ๋Ÿฌ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์ง€๋งŒ, ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜์—ฌ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค.

5. Agglomerative Hierarchical Clustering

HAC

Step 1. ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ๊ฐ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋ณธ๋‹ค.
Step 2. ํ‰๊ท  ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ†ตํ•ด์„œ ๋‘ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ•˜๋‚˜๋กœ ํ•ฉ์นœ๋‹ค.
Step 3. ๋‚˜๋ฌด์˜ ๋ฟŒ๋ฆฌ๊ฐ€ ๋งŒ๋“ค์–ด์งˆ ๋•Œ๊นŒ์ง€ Step2๋ฅผ ๊ณ„์†ํ•œ๋‹ค. ๋˜๋Š” ์›ํ•˜๋Š” ๊ฐœ์ˆ˜์˜ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ๋˜๋ฉด ๋ฉˆ์ถ˜๋‹ค.

์žฅ์  1. ํด๋ž˜์Šค ์ˆ˜๋ฅผ ๋ฏธ๋ฆฌ ๊ฒฐ์ •ํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์˜คํžˆ๋ ค ์›ํ•˜๋Š” ํด๋ž˜์Šค ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.
์žฅ์  2. ๋ฐ์ดํ„ฐ์˜ ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ์ž˜ ๋ฐ˜์˜ํ•œ๋‹ค.
์žฅ์  3. ๋ถˆ๊ท ํ˜•๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ข‹๋‹ค(?, [5])
๋‹จ์  1. K-Means๋‚˜ GMM์— ๋น„ํ•ด ๊ณ„์‚ฐ๋Ÿ‰์ด ํฌ๋‹ค.

6. Deep Clustering for Unsupervised Learning of Visual Features

unsupervised learning ๋ชจ๋ธ(k-means)์—์„œ ๋‚˜์˜ค๋Š” pseudo label(cluster index)๋ฅผ pre-training๋ชจ๋ธ์— fine-tuning์„ ์‹œํ‚จ๋‹ค.

DeepClustering

6-1. Main

  1. Conv Top layer๋ฅผ ์ด์šฉํ•ด ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜(K-Means)์„ ์‚ฌ์šฉ
  2. Pseudo Label๋ฅผ ์ƒ์„ฑํ•ด Fine-Tuning

6-2. Detail

  1. Sobel filter๋ฅผ ํ†ตํ•ด edge ๊ฒ€์ถœ
  2. Feature map ์ฐจ์› ์ถ•์†Œ (PCA)
  3. Pseudo Label๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ๊ท ๋“ฑ ์ƒ˜ํ”Œ๋ง

์ฐธ๊ณ ์ž๋ฃŒ

[1] https://zinniastop.blogspot.com/2019/10/5.html
[2] https://astralworld58.tistory.com/58
[3] https://www.youtube.com/watch?v=cCwzxVwfrgM
[4] http://dsba.korea.ac.kr/seminar/?mod=document&uid=28
[5] https://towardsdatascience.com/clustering-analyses-with-highly-imbalanced-datasets-27e486cd82a4