Tuesday, August 26, 2014

Discussions - Unsupervised Learning


Unsupervised Learning
  • AutoEncoders [C]
    • Olsahausen and Field. Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1? PDF
  • Restricted Boltzmann Machines [C]
    • Hinton, G. E., Osindero, S. and Teh, Y. (2006)
      A fast learning algorithm for deep belief nets. PDF
  • R-CNN [X]
    • Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In Proc. ICML, 2012. [PDF]


  1. A quick and simple reading about RBM:

  2. After reading the autoencoder paper, the question is what to do with the set over complete basis functions? I suspect that the case will be made that these are good filters to extract features with since they describe independent characteristics of an image. But the generative process was completely task independent. Why should we believe that these are important or good for any task?

    1. Well, by definition the "task" is finding a_i that play nice with the cost function S. The encoding depends on the choice of S. It would be interesting to see what would happen if we modified S for some particular task (object recognition, depth estimation). Are there cases where we try to autoencode features that are sparse, reconstructive, and informative for a particular task?

    2. I guess Dey's question would be why is achieving sparsity the right task...I wonder if there is ever a right task under which visual representation is learned and rest of the other tasks are adaptations :)

  3. My belief is that the philosophy behind unsupervised learning is that there is a task independent visual representation in this world...the autoencoder, deep belief net or quoc net are just ways to extract task independent representation.....

    Once you get a task independent representation, the next step is to transfer or at best fine-tune/adjust based on a task. But most of the learning is done prior to the step of fine-tuning......

    1. Micheal Jordan recently had a Reddit AMA which is interesting in and of itself http://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ .

      But I found his comments on usupervised learning really interesting in light of the conversation we have been having here in class. For convenience I am quoting the relevant paragraphs here directly: "My understanding is that many if not most of the "deep learning success stories" involve supervised learning (i.e., backpropagation) and massive amounts of data. Layered architectures involving lots of linearity, some smooth nonlinearities, and stochastic gradient descent seem to be able to memorize huge numbers of patterns while interpolating smoothly (not oscillating) "between" the patterns; moreover, there seems to be an ability to discard irrelevant details, particularly if aided by weight- sharing in domains like vision where it's appropriate. There's also some of the advantages of ensembling. Overall an appealing mix. But this mix doesn't feel singularly "neural" (particularly the need for large amounts of labeled data).
      Indeed, it's unsupervised learning that has always been viewed as the Holy Grail; it's presumably what the brain excels at and what's really going to be needed to build real "brain-inspired computers". But here I have some trouble distinguishing the real progress from the hype. It's my understanding that in vision at least, the unsupervised learning ideas are not responsible for some of the recent results; it's the supervised training based on large data sets.
      One way to approach unsupervised learning is to write down various formal characterizations of what good "features" or "representations" should look like and tie them to various assumptions that seem to be of real-world relevance. This has long been done in the neural network literature (but also far beyond). I've seen yet more work in this vein in the deep learning work and I think that that's great. But I personally think that the way to go is to put those formal characterizations into optimization functionals or Bayesian priors, and then develop procedures that explicitly try to optimize (or integrate) with respect to them. This will be hard and it's an ongoing problem to approximate. In some of the deep learning learning work that I've seen recently, there's a different tack---one uses one's favorite neural network architecture, analyses some data and says "Look, it embodies those desired characterizations without having them built in". That's the old-style neural network reasoning, where it was assumed that just because it was "neural" it embodied some kind of special sauce. That logic didn't work for me then, nor does it work for me now."

    2. So it seems that while good reconstruction is a good assumption to make as a first order of business there can be more task dependent assumptions as well which we can leverage. In vision the whole idea of having category-independent (and some category dependent) proposal regions or segmentations is an example of this kind of domain knowledge built into the unsupervised stage itself. Whether you use deep learning or something else for this is a separate question. So visual representations can indeed be adapted for the task at hand without just resorting to reconstruction.

  4. Another interesting question on Quoc Net is how many such objects [faces, cats, people] were they able to extract in a completely unsupervised manner. I remember Quoc presentation showing 10-15 such clusters.

    I did not really like the grandmother cell argument...since i believe most neurons in deep network have a distributed representation.

  5. There was a question regarding the length of the basis vectors in the Autoencoders paper. So the authors mention a constraint on the length of the basis functions. I can't find the exact representation for that constraint though.