Generative Stochastic Networks trainable by Backprop
Invited talk by Yoshua Bengio
for RepLearn 2013
Recent work showed that denoising auto-encoders can be interpreted as
generative models. We generalize these results to arbitrary
parametrizations that learn to reconstruct their input and where noise
is injected, not just in input, but also in intermediate
computations. We show that under reasonable assumptions (the
parametrization is rich enough to provide a consistent estimator, and
it prevents the learner from just copying its input in output and
producing a dirac output distribution), such models are consistent
estimators of the data generating distributions, and that they define
the estimated distribution through a Markov chain that consists at
each step in re-injecting sampled reconstructions as a sequence of
inputs into the unfolded computational graph. As a consequence, one
can define deep architectures similar to deep Boltzmann machines in
that units are stochastic, that the model can learn to generate a
distribution similar to its training distribution, that it can easily
handle missing inputs, but without the troubling problem of
intractable partition function and intractable inference as stumbling
blocks for both training and using these models. In particular, we
argue that if the underlying latent variables of a graphical model
form a highly multimodal posterior (given the input), none of the
currently known training methods can appropriately deal with this
multimodality (when the number modes is much greater than the number
of MCMC samples one is willing to perform, and when the structure of
the posterior cannot be easily approximated by some tractable
variational approximation). In contrast, the proposed models can
simply be trained by back-propagating the reconstruction error (seen
as log-likelihood of reconstruction) into the parameters, benefiting
from the power and ease of training recently demonstrated for deep
supervised networks with dropout noise.