這邊主要是我對 Variational Autoencoder (VAE) 理解的一些筆記,之後可以Review 自己的這些理解。

Autoencoder的是一種網路把Training Input 當作 Training 的 Target (e.g., encoder(X) = Z; decoder(Z)= y = X ),當這種網路的權重訓練完成後,我們可以透過權重做一些非監督學習的分析,常見於尋找一個維度較小的Latent Vector Z

  • 我們這邊使用KL Divergence 當作最佳化的Loss函數

離散表示 \(K L(p \| q)=\sum p(x) \log \frac{p(x)}{q(x)}\)

連續表示 \(K L(p \| q)=\int p(x) \log \frac{p(x)}{q(x)} d x\)

而根據公式推導,我們可以知道對於隨機變量x,KL(p1(x),p2(x)) 為

\[\int p_{1}(x) \log \frac{p_{1}(x)}{p_{2}(x)} d x\] \[\begin{array}{l}{=\int p_{1}(x)\left(\log \frac{\sigma_{2}}{\sigma_{1}}+\left[\frac{\left(x-\mu_{2}\right)^{2}}{2 \sigma_{2}^{2}}-\frac{\left(x-\mu_{1}\right)^{2}}{2 \sigma_{1}^{2}}\right]\right) d x} \\ {=\int\left(\log \frac{\sigma_{2}}{\sigma_{1}}\right) p_{1}(x) d x+\int\left(\frac{\left(x-\mu_{2}\right)^{2}}{2 \sigma_{2}^{2}}\right) p_{1}(x) d x-\int\left(\frac{\left(x-\mu_{1}\right)^{2}}{2 \sigma_{1}^{2}}\right) p_{1}(x) d x} \\ {=\log \frac{\sigma_{2}}{\sigma_{1}}+\frac{1}{2 \sigma_{2}^{2}} \int\left(\left(x-\mu_{2}\right)^{2}\right) p_{1}(x) d x-\frac{1}{2 \sigma_{1}^{2}} \int\left(\left(x-\mu_{1}\right)^{2}\right) p_{1}(x) d x}\end{array}\]

根據標準差的化簡,我們可得

\[K L\left(\mu_{1}, \sigma_{1}\right)=-\log \sigma_{1}+\frac{\sigma_{1}^{2}+\mu_{1}^{2}}{2}-\frac{1}{2}\]

而在多維的資料條件下則為

\[K L(p 1 \| p 2)=\frac{1}{2}\left[\log \frac{\operatorname{det}\left(\Sigma_{2}\right)}{\operatorname{det}\left(\Sigma_{1}\right)}-d+\operatorname{tr}\left(\Sigma_{2}^{-1} \Sigma_{1}\right)+\left(\mu_{2}-\mu_{1}\right)^{T} \Sigma_{2}^{-1}\left(\mu_{2}-\mu_{1}\right)\right]\]
  • Variational Inference

這邊我們則利用貝葉氏技巧,定義生成模型去尋找隱函數z

\[p(z | X)=\frac{p(X | z) p(z)}{p(X)}\] \[p(z | X)=\frac{p(X | z) p(z)}{\int_{z} p(X | z) p(z) d z}\]

如同開頭所說的,我們利用一個q(z)盡量去接近 p(z given X) 並使用條件機率拆解 log內多項式

\[\begin{array}{l}{K L(q(z) \| p(z | X))=\int q(z) \log \frac{q(z)}{p(z | X)} d z} \\ {=\int q(z)[\log q(z)-\log p(z | X)] d z}\end{array}\] \[=\int q(z)[\log q(z)-\log p(X | z)-\log p(z)+\log p(X)] d z\] \[=\int q(z)[\log q(z)-\log p(X | z)-\log p(z)] d z+\log p(X)\]

最終可以得到

\[\log p(X)-K L(q(z) \| p(z | X))=\int q(z) \log p(X | z) d z-K L(q(z) \| p(z))\]

我們這邊再利用Random Proc.的技巧重新連接 p(X) 跟 q(z)

\[z^{(i)}=g_{\phi}\left(X+\varepsilon^{(i)}\right)\] \[q\left(z^{(i)}\right)=p\left(\varepsilon^{(i)}\right)\]

最終可得

\[\log p(X)-K L(q(z) \| p(z | X))=\int p(\varepsilon) \log p\left(X | g_{\phi}(X, \varepsilon)\right) d z-K L(q(z | X, \varepsilon) \| p(z))\]
  • 這邊右邊第二項,給X生成z則為 Encoder,這邊我們假設高斯分佈的均值為 0
  • 右一則為Decoder
\[K L(p 1 \| p 2)=\frac{1}{2}\left[\log \frac{\operatorname{det}\left(\Sigma_{2}\right)}{\operatorname{det}\left(\Sigma_{1}\right)}-d+\operatorname{tr}\left(\Sigma_{2}^{-1} \Sigma_{1}\right)+\left(\mu_{2}-\mu_{1}\right)^{T} \Sigma_{2}^{-1}\left(\mu_{2}-\mu_{1}\right)\right]\] \[K L(p 1 \| N(0, I))=\frac{1}{2}\left[-\log \left[\operatorname{det}\left(\Sigma_{1}\right)\right]-d+\operatorname{tr}\left(\Sigma_{1}\right)+\mu_{1}^{T} \mu_{1}\right]\]

實際計算時則為

\[\begin{array}{l}{K L\left(p 1\left(\mu_{1}, \sigma_{1}\right) \| N(0, I)\right)=\frac{1}{2}\left[-\sum_{i} \log \left[\left(\sigma_{1 i}\right)\right]-d+\sum_{i}\left(\sigma_{1 i}\right)+\mu_{1}^{T} \mu_{1}\right]} \\ {=\sum_{i=0}^{d}-\frac{1}{2} \log \left[s t d_{1 i}^{2}\right]+\sum_{i=0}^{d}\left(-\frac{1}{2}\right)+\sum_{i=0}^{d} \frac{1}{2}\left(s t d_{1 i}^{2}\right)+\sum_{i=0}^{d} \frac{1}{2}\left[\mu_{1 i}^{2}\right]} \\ {=\sum_{i=0}^{d}\left[-\frac{1}{2} \log \left[s t d_{1 i}^{2}\right]+\frac{1}{2}\left(s t d_{1 i}^{2}\right)+\frac{1}{2}\left[\mu_{1 i}^{2}\right]+\left(-\frac{1}{2}\right)\right]}\end{array}\]