Deep Learning Class: Supervised Learning

Sunday, September 14, 2014

Supervised Learning

Convolutional Networks (MNIST) [I]

Handwritten digit recognition with a back-propagation network, Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel (NIPS 1989) [PDF]

Alex NET (ImageNet Challenge) [I]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS 2012. [PDF]

Visualizing Deep Networks [D]

M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV 2014 [http://arxiv.org/abs/1311.2901]

12 comments:

UnknownSeptember 15, 2014 at 4:37 PM
I think the question of whether things are a distributed code or grandmother cells is a quite interesting topic for discussion.

Another paper to read in addition to the Zeiler one: http://www.cs.berkeley.edu/~rbg/papers/cnn-analysis.pdf
Analyzing the Performance of Multilayer Neural Networks for Object Recognition. Pulkit Agrawal, Ross Girshick, Jitendra Malik. ECCV 14.

Specifically, Section 4 suggests that there are grand-mother-like cells for only a few classes and argues that the top-K visualizations are somewhat misleading in suggesting the presence of highly specific object-detectors (e.g., the dog faces in Figure 2 of the Zeiler paper). It further suggests that the elements form a distributed code of sorts.

I'm not entirely convinced of this argument -- whether things are a distributed code or grandmother cell depends really on how you break down the world into classes. For instance, is a wheel (Fig 2, layer 5, last row) a grandmother cell or not? It's shared by wheel-chair, old-style cars, and unicycles, but it's also a class of its own.
ReplyDelete
Replies
IshanSeptember 15, 2014 at 6:07 PM
I suggest also reading the paper - http://yann.lecun.com/exdb/publis/psgz/lecun-98.ps.gz
This outlines the history of conv nets from the 1960s, and goes into a lot of detail about the design decisions and architectural choice. It tries to give many motivations for these choice, though not all of them are convincing.
I suggest going through Section 2 of the paper mainly.
ReplyDelete
Replies
UnknownSeptember 16, 2014 at 4:36 AM
I generally hear that owing to smaller datasets previously, we could not train neural networks efficiently. Do we really need ImageNET to do this task? I mean YouTube has 150M videos; can we not use a sample of videos (to sample frames) to do this task?
ReplyDelete
Replies
LerrelSeptember 16, 2014 at 5:55 AM
I feel that the CNNs are considerably hacked to give the best results. There seems to be little generality in the approach taken by AlexNet. Well off course it gives a great result on the ImageNET, but then there seems to be little guarantee that it would work on a different dataset.
So how does one choose the best kernel size for each convolutional layer. Or the optimal parameters for data augmentation.
Isn't there a huge architectural dependence on the optimal paramaters.
ReplyDelete
Replies
Jack ValmadreSeptember 16, 2014 at 7:14 PM
I think someone (perhaps Krishna?) asked today whether it's possible to somehow leverage the effect that "greying out" certain regions in an image has on a classification network. I just saw a paper that seems to use exactly this to learn how to localise objects without bounding box annotations. http://arxiv.org/abs/1409.3964
ReplyDelete
Replies

Add comment