How exactly is unsupervised pre-training done. I am unable to understand how a unsupervised pre train could possibly help a different supervision task.From what I deduce from the paper is that the pre training is done only for the first few layers. So my question is what is exactly is the structure of the unsupervised network used in pre training. Is it a sparse auto encoder; that would make sense given that the first few layers anyway encode low level features.
Hi,Could someone guide me to other regularization methods used in deep networks,