_{1}

^{*}

Generative Models have been shown to be extremely useful in learning features from unlabeled data. In particular, variational autoencoders are capable of modeling highly complex natural distributions such as images, while extracting natural and human-understandable features without labels. In this paper we combine two highly useful classes of models, variational ladder autoencoders, and MMD variational autoencoders, to model face images. In particular, we show that we can disentangle highly meaningful and interpretable features. Furthermore, we are able to perform arithmetic operations on faces and modify faces to add or remove high level features.

Generative Models have been highly successful in a wide variety of tasks by generating new observations from an existing probability density function. These models have been highly successful in various tasks such as semi-supervised learning, missing data imputation, and generation of novel data samples.

Variational Autoencoder is a very important class of models in Generative Models [

Ladder Variational Autoencoders [

It has also been observed that the evidence lower bound (ELBO) used in traditional variational autoencoders suffers from uninformative latent feature problem [

In this paper we combine these ideas to build a variational ladder autoencoder with MMD loss instead of KL divergence, and utilize this model to analyze of structure and hidden features of human faces. As an application we use this model to perform “arithmetic” operations on faces. For example, we can perform arithmetic operations such as: men with pale skin − men with dark skin + women with dark skin = women with pale skin. The way we do this is by performing arithmetic operations in the feature space, and transform the results back into image space. This can be potentially useful in games and virtual reality where arbitrary features can be added to a face through the above process of analogy. This further demonstrates the effectiveness of our model in learning highly meaningful latent features.

Generative models seek to model a distribution pdata (x) in some input space X. The model is usually a parameterized family of distribution p_{θ}(x) trained by maximum likelihood

max Epdata ( x ) [ log p θ ( x ) ] θ

Intuitively this encourages the model distribution to place probability mass where pdata is more likely.

Variational autoencoder (Kingma & Welling, 2013; Jimenez Rezende et al., 2014) is an important class of generative models. It models a probability distribution by a prior p(z) on a latent space Z, and a conditional distribution p(x|z) on. Usually p(z) is a fixed simple distribution such as white Gaussian N(0, I), and p(x|z) is parameterized by a deep network with parameters θ, so we denote it as p_{θ}(x|z). The model distribution is defined by

p θ ( x ) = ∫ z p θ ( x | z ) p ( z ) d z

However maximum likelihood training is intractable because p θ ( x ) requires an integration which is very difficult to compute. The solution is by jointly defining an inference distribution q φ ( z | x ) parameterized by φ to approximate p θ ( z | x ) . Jointly training both criteria give the following optimization function, called the evidence lower bound (ELBO)

LELBO = ( − KL q φ ( z | x ) ‖ p θ ( z | x ) ) − KL ( pdata ( x ) ‖ p θ ( x ) ) = − KL ( q φ ( z | x ) ‖ p ( z ) ) + E q φ ( z | x ) [ log p θ ( z | x ) ]

where KL denotes the Kullback-Leibler divergence. Intuitively this model achieves its goal by first applying an “encoder” q φ ( z | x ) to the input, then “decode” the generated latent code by p θ ( x | z ) and compare the generated results with the original data x using the cost function log p θ ( z | x ) .

Ladder variational autoencoders [

It has been observed that the KL ( q φ ( z | x ) ‖ p ( z ) ) term in ELBO criteria result in under-used latent features (Chen et al., 2016; Zhao et al., 2017a). A solution is to use the MMD (q(z), p(z)) instead, which is defined by

MMD ( q ( z ) , p ( z ) ) = E q ( z ) , q ( z 0 ) [ k ( z , z 0 ) ] + E p ( z ) , p ( z 0 ) [ k ( z , z 0 ) ] − 2 E p ( z ) , q ( z 0 ) [ k ( z , z 0 ) ]

where k(z, z_{0}) is a kernel function such as Gaussian. k ( z , z ′ ) = e − ‖ z − z ′ ‖ 2 2 / σ 2 Intuitively k(z, z_{0}) measures the distance between z and z_{0}, and Ep(z), q(z_{0}) [k(z, z_{0})] measures the average distance between samples from distributions p(z) and q(z_{0}). If two distributions are identical, then the average distance between samples from p, samples from q, and samples from p, q respectively, should all be identical, so MMD distance should be zero. This can be used to replace KL ( q φ ( z | x ) ‖ p ( z ) ) in ELBO VAE to achieve better properties.

We apply MMD regularization to Variational Ladder Autoencoders. In particular, we regularize all the latent features respectively

LMMD − VLAE = E q φ ( z | x ) [ log p θ ( x | z ) ] − MMD ( p ( z 0 ) , q φ ( z 0 ) ) − M M D ( p ( z 1 ) , q φ ( z 1 ) )

This combines the advantage of both models and learns meaningful hierarchical features.

To verify the effective of our method we performed experiments on MNIST and CelebA [

Samples from MNIST are shown in

Samples from CelebA are shown in

We observed that by adding or subtracting values from latent code, we can modify

certain properties of faces. In addition, we can blend multiple faces together by adding or subtracting latent codes from or to each other.

We observed convincing results from these experiments (as shown in

In this paper we proposed MMD Variational Ladder Autoencoder and its applications on various tasks, especially on facial recognition and modification on the CelebA dataset. It is capable of disentangling various features of human face and also capable of modifying or blending different faces.

Possible future works might include further discussion on the accuracy and readability of its latent code, its overfitting tendency, and its application on more unlabeled datasets.

Xu, H.J. (2018) Generate Faces Using Ladder Variational Autoencoder with Maximum Mean Discrepancy (MMD). Intelligent Information Management, 10, 108-113. https://doi.org/10.4236/iim.2018.104009