Variational Autoencoder (VAE) addresses the limitations of autoencoder neural networks, which cannot generate data on their own and struggle to determine the accurate distribution of feature information in the hidden layers. By introducing a hidden variable Z in the hidden layer and controlling its distribution, VAE aims to make the output controllable [33]. The study explores various statistical feature extraction algorithms and ultimately focuses on the Deep Bayesian Network for analysis. This network can represent relationships between variables using neural networks and effectively analyze complex structured data to identify feature information accurately.
VAE is a probabilistic model based on variational inference, aiming to establish a generative model rather than just an image network. By extracting feature information through an approximate model function and reducing errors, VAE enhances computational efficiency. The text emphasizes the importance of imposing constraints on the network to ensure convergence during training and prevent potential variables from affecting final prediction results. By adding specific conditional restrictions, VAE overcomes the drawbacks of autoencoder neural networks and reflects the relationship between hidden variable Z and visible variable X. Overall, VAE generates effective and reasonable feature information within the network, with the visible variable X being generated by the hidden variable Z. As shown in Figure 2, with z following a Gaussian distribution N(0,1). Sampling z from , data is auto-generated through . Thus, the observable variable x is generated by the latent variable z, and represents the generative model , which, from the perspective of an autoencoder, acts as the decoder. can be implemented through neural networks. is the recognition model , similar to an encoder in an autoencoder. The overall structure of a VAE, as shown in Figure 3, differs from a standard autoencoder in that VAE imposes additional constraints on the hidden layer, making it controllable.
Internal operation of VAE.
The structure of VAE.
After obtaining p and q, in order to achieve a good result, q needs to be as close to p as possible. The key is to measure the gap between q and p. Variational Autoencoder (VAE) uses KL divergence to measure the difference between q and p. A smaller KL value indicates a closer distance between them. In VAE, estimation of the parameters of the generative model is required, thus assuming an unknown distribution that satisfy the following relationship:
The above Formula (1) is called the relative entropy, Kullback-Leibler divergence, or KL divergence between and . Since the above formulas are not symmetric in structure, .
By deriving the formula from the previous section, it can be seen that the Variational Autoencoder (VAE) needs to reduce the gap between p and q. In practical applications, for the latent variable z in the hidden layer, will be 0, represents the distribution that needs to be satisfied when generating the data set x, so the calculation of with does not affect it. The core idea of VAE is to sample z and calculate from the sampled z, should be as closely related to as possible. The Kullback-Leibler divergence between and is:
Applying Bayes’ rule to , substituting and into Formula (2) allows us to transform the KL divergence into:
Formula (3) is the foundation of VAE. In order to make q as close to p as possible, the KL divergence should be minimized. Therefore, the left side of the equation should be maximized. When is given, a stochastic gradient descent algorithm is used to optimize the right side. Therefore, instead of relying on z, X is predicted through training . VAE can compress high-dimensional data into low-dimensional z, and then the generative network will produce a distribution that is as similar as possible to the original data.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.