Delving into Deep Imbalanced Regression
   4 min read    ์†์ง€์šฐ

Yang, Y., Zha, K., Chen, Y. C., Wang, H., & Katabi, D. (2021). Delving into Deep Imbalanced Regression. arXiv preprint arXiv:2102.09554.

In Short

Imbalanced Regression(not classification) with LDS and FDS using kernel function

1. Introduction

๋ถˆ๊ท ํ˜•๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ํ•™์Šตํ•  ๋•Œ, ๋งŽ์€ ๊ฒฝ์šฐ์— ํšŒ๊ท€ ๋ฌธ์ œ๋ณด๋‹ค๋Š” ๋ถ„๋ฅ˜ ๋ฌธ์ œ์— ์ดˆ์ ์ด ๋งž์ถฐ์ ธ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์—์„œ๋Š” ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถˆ๊ท ํ˜•์ธ ๊ฒฝ์šฐ๋„ ์ถฉ๋ถ„ํžˆ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์—ฐ๋ น ๋ถ„ํฌ ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ์—๋Š” ๊ฐ ๋‚˜๋ผ์— ๋”ฐ๋ผ์„œ ๋‚˜์ด๋Œ€๋ณ„ ๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅด๋‹ค. ์ด์™ธ์—๋„ ํ˜ˆ์••์ด๋‚˜ ๋งฅ๋ฐ•์ˆ˜์™€ ๊ฐ™์€ ํ™˜์ž ํ™œ๋ ฅ ์ง•ํ›„ ๋ฐ์ดํ„ฐ๋‚˜ ์‘๊ธ‰์‹ค ์ฒด๋ฅ˜์‹œ๊ฐ„๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋“ค๋„ ๊ทธ ์˜ˆ์‹œ๊ฐ€ ๋  ์ˆ˜ ์žˆ๊ฒ ๋‹ค.

2-1. Imbalanced Classification

  1. Data-level
  • ROS (Random Oversampling)
  • RUS (Random Undersampling)
  • SMOTE
  • GAN (CGAN, FSC-GAN, MFC-GAN)
  1. Algorithm-level
  • Inverse frequency weight
  • Square root weight
  • Focal Loss
  • Two Stage Training

2-2. Imbalanced Regression

๋ถˆ๊ท ํ˜• ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ ํŠน์ง•

DIR1

  1. ํด๋ž˜์Šค ๊ตฌ๋ถ„์ด ์—†๋‹ค.
  2. ์ฃผ๋ณ€๊ฐ’์˜ ๋ถ„ํฌ์— ๋”ฐ๋ผ ๋ถˆ๊ท ํ˜• ์ˆ˜์ค€์ด ๋‹ค๋ฅด๋‹ค.
  3. ํŠน์ • ํƒ€๊ฒŸ๊ฐ’์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†์„ ์ˆ˜ ์žˆ๋‹ค.

์œ„์™€ ๊ฐ™์€ ํŠน์ง• ๋•Œ๋ฌธ์— ๋ถˆ๊ท ํ˜• ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ์—๋Š” imbalanced classification์™€ ๋‹ค๋ฅด๋‹ค. ๊ทธ๋ž˜์„œ…!

  1. resampling ๋˜๋Š” reweighting ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต๋‹ค.
  2. ๋ถˆ๊ท ํ˜•/๊ท ํ˜• ๊ฒฝ๊ณ„๊ฐ€ ๋šœ๋ ทํ•˜์ง€ ์•Š๋‹ค.
  3. ์ฃผ๋ณ€ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด interpolation ๋˜๋Š” extrapolation์„ ํ•ด์•ผ ํ•œ๋‹ค.

DIR2

  • CIFAR-100: 100๊ฐœ ํด๋ž˜์Šค
  • IMDB-WIKI: 0~99์„ธ

์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ์˜ ํ•™์Šต ๊ฒฐ๊ณผ๋Š” ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ์˜ ํ•™์Šต ๊ฒฐ๊ณผ์™€ ๋‹ค์†Œ ๋‹ค๋ฅธ ์–‘์ƒ์„ ๋ณด์ธ๋‹ค.

  • ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ๋Š” ๋ถˆ๊ท ํ˜•์˜ ์ •๋„๊ฐ€ ์˜ค๋ถ„๋ฅ˜์œจ ๋ถ„ํฌ์™€ ๋ฐ€์ ‘ํ•œ ๊ด€๊ณ„๊ฐ€ ์žˆ๋‹ค. (์ƒ๊ด€๊ณ„์ˆ˜ -0.76)
  • ํ•œํŽธ, ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ๋Š” ๋ถˆ๊ท ํ˜• ์ •๋„๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ๋œ ์ •ํ™•ํ•˜๊ฒŒ ์˜ค๋ถ„๋ฅ˜์œจ ๋ถ„ํฌ์— ๋ฐ˜์˜๋œ๋‹ค. (์ƒ๊ด€๊ณ„์ˆ˜ -0.47)

3. Methods

Problem Setting

  • ์ธ์ ‘ ๋ฐ์ดํ„ฐ ๊ฐ„ ์œ ์‚ฌ์„ฑ ํ™œ์šฉ
  • ์ปค๋„ ํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ ํ•ด์†Œ
  • ์ปค๋„๋ฐ€๋„์ถ”์ •(KDE)

3-1. Label Distribution Smoothing (LDS)

๋ ˆ์ด๋ธ” ๊ณต๊ฐ„ ๊ด€์ 

Figure2์—์„œ ๋ณด์ด๋Š” ๋ฐ”์™€ ๊ฐ™์ด, ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ์™€ ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ๊ฐ€ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ์ด์œ ๋Š” Empirical label distribution๊ณผ (unseen data๊ฐ€ ํฌํ•จ๋œ) Real label density distribution์ด ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์‹ค์ œ ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ๋Š” ์œ„์—์„œ ์–ธ๊ธ‰๋œ ๋ฐ”์™€ ๊ฐ™์ด ์ฃผ๋ณ€ ๋ ˆ์ด๋ธ”๊ฐ„ ์—ฐ๊ด€์„ฑ์„ ๊ฐ€์ง„๋‹ค.

DIR3

๊ทธ๋ž˜์„œ LDS์˜ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ • ๊ณผ์ •์„ ํ†ตํ•ด ์ฃผ๋ณ€ ๋ฐ์ดํ„ฐ์˜ ์—ฐ์†ํ˜•์ด ๋ฐ˜์˜๋œ Effective Label Density๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ๋˜๋ฉด, ์˜ˆ์ธก ํƒœ์Šคํฌ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์‹ค์ œ ๋ถˆ๊ท ํ˜• ์ •๋„๋ฅผ ์ž˜ ๋ฐ˜์˜ํ•˜๊ฒŒ ๋จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” ์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ -0.47์—์„œ -0.83๋กœ, ๊ทธ ์ ˆ๋Œ“๊ฐ’์ด ์ƒ์Šนํ–ˆ๋‹ค๋Š” ์ ์—์„œ๋„ ์ˆ˜์น˜์ ์œผ๋กœ ํ™•์ธ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด๋กœ ์ธํ•ด $\tilde{p}(y')$์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค๋ฉด, ์ด์˜ ์—ญ์ˆ˜๋ฅผ ์†์‹คํ•จ์ˆ˜์˜ ๊ฐ€์ค‘์น˜๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

$$
\tilde{p}(y') = \int_{Y}k(y, y')p(y)dy
$$

์ฐธ๊ณ ๋กœ ์—ฌ๊ธฐ์„œ ์ปค๋„ ํ•จ์ˆ˜๋ž€, ์›์ ์„ ์ค‘์‹ฌ์œผ๋กœ ๋Œ€์นญ์ด๋ฉฐ ์ ๋ถ„๊ฐ’์ด 1์ธ non-negative ํ•จ์ˆ˜๋ฅผ ๋œปํ•œ๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ๋Š” Gaussian ์ปค๋„ ๋˜๋Š” Laplacian ์ปค๋„์ด ์žˆ๋‹ค.

3-2. Feature Distribution Smoothing (FDS)

ํŠน์ง• ๊ณต๊ฐ„ ๊ด€์ 
ํƒ€๊ฒŸ ๊ณต๊ฐ„์—์„œ์˜ ์—ฐ์†์„ฑ์€ ์ž˜ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ํŠน์ง•๊ณต๊ฐ„์—๋„ ๋ฐ˜์˜๋œ๋‹ค.

DIR5

bin = ํƒ€๊ฒŸ ๊ณต๊ฐ„์„ b๊ฐœ๋กœ ๋‚˜๋ˆ„๋Š” ๋™์ผํ•œ ๊ฐ„๊ฒฉ (ex. ์—ฐ๋ น: 1์‚ด)

์ž˜ ํ•™์Šต๋œ encoder๋ฅผ ํ†ตํ•ด์„œ ํŠน์ง• ๊ณต๊ฐ„์„ ์–ป์„ ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. ์—ฌ๊ธฐ์„œ ์ธ๋ฌผ image์˜ ํŠน์ง•์ด ํ•™์Šต๋œ ํŠน์ง• ๊ณต๊ฐ„ z์— ์š”์•ฝํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ธฐ์ดˆํ†ต๊ณ„๋Ÿ‰์„ ๊ตฌํ•˜๊ฒŒ ๋˜๋ฉด, ๋ชจ๋“  b์— ๋Œ€ํ•ด์„œ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํŠน์ • ๊ฐ’ $b_0$๋ฅผ ๊ณ ์ •ํ•˜์—ฌ๋‘๊ณ  ๋‹ค๋ฅธ $b$์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

DIR4

์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” ์ผ๋‹จ 30์‚ด์„ ๊ธฐ์ค€์œผ๋กœ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•œ ๊ฒƒ์ด๋‹ค. ์ƒ์‹๊ณผ ๋น„์Šทํ•˜๊ฒŒ, 30์‚ด ์ฃผ๋ณ€์˜ ๊ฐ’๋“ค๊ณผ๋Š” ๋†’์€ ์œ ์‚ฌ๋ฅผ ๋‚˜ํƒ€๋ƒˆ๋‹ค. ํ•˜์ง€๋งŒ, ํŠน์ดํ•˜๊ฒŒ๋„ 0~6์‚ด์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ’๋“ค๊ณผ ์œ ์‚ฌ๋„๊ฐ€ ๊ฝค ๋†’๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š” ์ด์ƒํ•œ ํ˜„์ƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” ํ•ด๋‹น ๋ฐ์ดํ„ฐ๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์ ์–ด์„œ(few-shot region), ์ฆ‰ ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•œ ํ˜„์ƒ์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ ์—ญ์‹œ LDS์ฒ˜๋Ÿผ ์ปค๋„ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ํ•ด๊ฒฐํ•œ๋‹ค.

$$\mu_b = \frac{1}{N_b}\sum_{i=1}^{N_b}z_i \rightarrow \tilde{\mu_b} = \sum_{b' \in B}k(y_b,y_{b'})\mu_{b'} \\ \Sigma_b = \frac{1}{N_b-1}\sum_{i=1}^{N_b}(z_i-\mu_b)(z_i-\mu_b)^T \rightarrow \tilde{\Sigma_b} = \sum_{b' \in B}k(y_b,y_{b'})\Sigma_{b'}$$

ํ•œ ์—ํญ์—์„œ ํ•™์Šต๋œ z์˜ ํ†ต๊ณ„๋Ÿ‰์— ์ปค๋„ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•ด์„œ calibration์‹œํ‚ค๊ณ , regression layer๋ฅผ ํ†ต๊ณผ์‹œ์ผœ์„œ ์†์‹คํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ LDS์™€ ๋‹ฌ๋ฆฌ ์ถ”๊ฐ€๋œ ๋ถ€๋ถ„์ด ์žˆ๋Š”๋ฐ, ์ด๋Š” ๋ฐ”๋กœ ์—…๋ฐ์ดํŠธ ๋ฐฉ์‹์ด๋‹ค. ํ•™์Šต๊ณผ์ •์—์„œ ์•ˆ์ •์ ์ด๊ณ  ์ •ํ™•ํ•œ ์ถ”์ •์น˜๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ, ๋งค epoch๋งˆ๋‹ค EMA๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋งํ•˜์ž๋ฉด, ํ˜„์žฌ ์—ํญ ๋‚ด์— ์žˆ๋Š” ์ƒ˜ํ”Œ์— ๋Œ€ํ•ด์„œ ์ง„ํ–‰์ด ๋˜๋ฉด, ํ†ต๊ณ„๋Ÿ‰์„ ์—…๋ฐ์ดํŠธ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ชจ๋ฉ˜ํ…€ ์—…๋ฐ์ดํŠธ ๋ฐฉ์‹(EMA, exponential moving average)์„ ํ™œ์šฉํ•œ๋‹ค.

๊ทธ๋ฆฌ๊ณ ๋‚˜์„œ ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ˜„์žฌ ํ†ต๊ณ„๋Ÿ‰์— ์ปค๋„ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•จ์œผ๋กœ์จ ๋‹ค์Œ epoch์œผ๋กœ ์ „๋‹ฌํ•ด์ค€๋‹ค.

Calibration
$$\tilde{z} = \tilde{\Sigma}_{b}^{\frac{1}{2}}\Sigma_{b}^{-\frac{1}{2}}(z-\mu_b)+\tilde{\mu_b}$$

DIR8

FDS์˜ ๊ฒฐ๊ณผ๋Š” ์œ„์™€ ๊ฐ™๋‹ค. ์™ผ์ชฝ์€ FDS๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š์€ ๊ฒƒ์ด๊ณ , ์˜ค๋ฅธ์ชฝ์€ FDS๋ฅผ ์ ์šฉํ•œ ๊ฒƒ์ด๋‹ค. FDS๋ฅผ ์ ์šฉํ•œ ์˜ค๋ฅธ์ชฝ ๊ทธ๋ž˜ํ”„๊ฐ€ ๊ธฐ๋ณธ์ ์ธ ์ƒ์‹์„ ๋ฐ˜์˜ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ธ๋‹ค.

์ด์™ธ์— FDS์˜ ์žฅ์ ์„ ์ •๋ฆฌํ•ด๋ณด์ž๋ฉด, ์ผ์ข…์˜ calibration layer๋กœ์„œ ์–ด๋–ค ๋ชจ๋ธ์—๋„ ์ง์ ‘์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด๋‹ค.

4. Performace Comparison

4-1. Dataset

5๊ฐœ์˜ Dataset์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ์ง์ ‘ ๋งŒ๋“  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

DIR_dataset

  • IMDB-WIKI-DIR (age)
  • AgeDB-DIR (age)
  • STS-B-DIR (text similarity score)
  • NYUD2-DIR (depth)
  • SHHS-DIR (health condition score)

DIR_dataset_dist

๊ฐ ๋ฐ์ดํ„ฐ๋“ค์€ ๋ชจ๋‘ ๋ถˆ๊ท ํ˜•ํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

4-2. Baseline

imbalanced classfication์—์„œ ํ™œ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์ฐจ์šฉํ•จ.

  • Synthetic samples: (1) SmoteR (2) SMOGN
  • Error-aware loss: (3) Focal-R
  • Two-stage training: (4) regressor re-training(RRT)
  • Cost-sensitive re-weighting: (5) naive inverse(INV) (6) square-root inverse(SQINV)

์ด๋ฅผ (1) LDS (2) FDS (3) LDS+FDS๊ฐ€ ์ถ”๊ฐ€๋œ ๋ฒ„์ „๊ณผ ํ•จ๊ป˜ ๋น„๊ตํ•จ. ๊ทธ๋ฆฌ๊ณ  ์ด ๋ชจ๋“  ๊ฒƒ๋“ค ์ค‘์—์„œ ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์€ ๊ฒƒ์„ VANILLA์™€ ๋งˆ์ง€๋ง‰์œผ๋กœ ๋น„๊ตํ•œ๋‹ค.

4-3. Main Results

๋น„๊ต metrics์€ MAE(Mean Average Eror)์™€ GM(Geometric Mean Error)๊ฐ€ ์žˆ๋‹ค.

DIR_result1

IMDB-WIKI-DIR์—์„œ๋Š” ์œ„์™€ ๊ฐ™์ด Medium-Shot๊ณผ Few-Shot์—์„œ ํŠนํžˆ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ์ƒ์Šน์ด ์žˆ์—ˆ๋‹ค๋Š” ์ ์ด ํŠนํžˆ ์ฃผ๋ชฉํ•ด๋ณผ ๋งŒํ•˜๋‹ค. ์ด์™ธ์˜ ๋ฐ์ดํ„ฐ์—์„œ ์„ฑ๋Šฅ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

DIR_result2


DIR_result3


DIR_result4


DIR_result5

4-4. Further Analysis

Extraopolation & Interpolation
Training Dataset์—๋Š” ์—†๊ณ , Test Dataset์—๋Š” ์žˆ๋Š” ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ์˜ ์„ฑ๋Šฅ์„ ์ด์•ผ๊ธฐํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

DIR7


DIR-Table6

5. Conclusion

New task: Deep Imbalanced Regression(DIR)
New techniques: LDS & FDS
New benchmarks: IMDB-WIKI-DIR / AgeDB-DIR / STS-B-DIR / NYUD2-DIR / SHHS-DIR

Critical Point (MY OWN OPINION)

  1. bin์„ ๋ช‡ ๊ฐœ์˜ b๋กœ ๋‚˜๋ˆŒ์ง€์— ๋”ฐ๋ผ์„œ ์„ฑ๋Šฅ์ด ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๊ฒ ๋‹ค. ๋งŒ์•ฝ ์—„์ฒญ ์„ธ๋ถ„ํ™”ํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๋Š”๋ฐ, ์ด๋ ‡๊ฒŒ ๋ณธ๋‹ค๋ฉด ์™„๋ฒฝํ•œ ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ๋ผ๊ณ ๋Š” ๋ณด๊ธฐ ํž˜๋“ค์ง€ ์•Š์„๊นŒ?

Reference

[1] Youtube ์—ฐ๊ตฌ์ž ๋ฐœํ‘œ ์˜์ƒ
[2] Youtube