MindLab Blog: November 2011

Monday, November 28, 2011

Big problems on Computer Science and Machine Learning

One of the most intriguing questions that Richard Hamming raised in his talk : "You and your research" (which I'm sure most of you've red) is : What are the most important problems in your area?
In one hand it is obvious that one can only focus on single well defined thing to do "great" stuff but in the other, and as Feynman states:

"... You have to keep a dozen of your favorite problems constantly present in your mind, although by and large they will lay in a dormant state. Every time you hear or read a new trick or a new result, test it against each of your twelve problems to see whether it helps. ..."

So I decided to do a quick search about this and found a Wikipedia entry, a StackOverflow entry and a very interesting blog from Andreas Zwinkau, who describes an experiment : "Fantasy Research Lab" proposed by Philip Greenspun.
The Fantasy Research Lab consists of

" ... [you] pretending that you are the lab director for computer science at a brand-new research university and to come up with a plan for how you'd populate your lab with projects."

I thought it was an interesting experiment so I asked prof. Fabio to try it out, he rapidly came up with the interesting problem of Automatic Programming.
Now I'm interested in knowing the point of view of some very bright people like you, but not only regarding the entire Computer Science field (which by itself could extend to Operating Systems, Software Engineering, Security, Databases, Hardware and so on as Microsoft Academic Search unveils) but also the more specific Machine Learning, Information Retrieval and HPC view which is central to this blog.

Thanks for reading ;)

Monday, November 21, 2011

Bag of Features and Image Matching

I think bag of features is a good enough representation for image matching. Of course, it is not perfect, it still requires a lot of improvements. But it still captures many ideas that the computer vision community was working on a few years before it started to be popular.

For instance, consider the problem of image matching using SIFT features. The procedure goes like this:

Extract SIFT features from image A and B.
Build a list of descriptors for each image.
For each descriptor in image A, compute the distance to all other descriptors in image B.
Identify the minimum distance. If it is less than a threshold, count one match.
Repeat.

At the end of the procedure, we have the total number of matches between both images. As you can imagine, this is sort of expensive if we are supposed to compute the number of matches for a large set of images (not just between two images). There is when the bag-of-features (BoF) appears and introduces an intermediate layer to save some computations: the dictionary of visual patterns. Actually, instead of saving those computations, the BoF moves the effort to an off-line stage, when we can wait (the dictionary construction).

Later, when we are indexing images with the BoF, we pre-compute the number of matches between the descriptors of one image and a reference dictionary. That's the histogram of occurrences between two images. Afterwards, for a query image, we can estimate the total number of matches with respect to a previously indexed image, just by computing the histogram intersection between both histograms.

In other words, if image A shows the pattern "x" 5 times and image B shows the same pattern 3 times, guess what would be the number of common matches if we look directly from image to image without using a dictionary: 3 (because at least 3 of the patterns in A would match with those patterns in B, approximately, i.e., the minimum between 5 and 3). Sounds familiar? Of course, this is an approximation to the matching process described above using the histogram intersection metric. However, it sort of mimics what was previously done.

Notice that one important parameter in the direct matching process is the threshold to accept or reject a match. This is sort of relaxed in the BoF approach by using k-means to group similar patterns. I think it's relaxed because there is no rejection when using BoF (unless it would be sparse enough). So, here is the trick: the larger the number of patterns in the dictionary, the smaller the number of matches, and possibly the better the approximation.

I would argue that when the number of patterns goes to infinite, it is equivalent to have a matching process where the threshold is set to zero (a match requires exact descriptors). This sort of justifies the use of large dictionaries when using BoF. And it becomes even more important when we would like to use it in large scale setups, just because when we introduce more images, it is likely that more unseen visual patterns appear in the collection.

Thoughts on Multimodal Fusion

There are some arguments that can help us build a discussion about multimodal learning (or retrieval):

Assume we have a common unit of information from which we can observe two states (or modalities).
Each modality is an incomplete view of the actual information there.
Also, each observed modality is corrupted or noisy.
Modalities are not independent, they have relationships, dependences, joint probabilities.
Use multimodal fusion to complement the representation of the true information unit. To make it more accurate with respect to the original content. To reconstruct the missed information.

These are some ideas that we were discussing with prof. Fabio this morning. I think they make perfect sense from a global perspective, even though they require some formalization yet.

Saturday, November 12, 2011

Welcome....

Hi everybody,

Welcome to our new virtual meeting space. This is a space to share ideas (our own or from someone else), insights, intuitions, excitement about something, or even frustation. It is a space to discuss profound and abstract ideas, but also to share down-to-earth, cynic, humorous or frivolous comments. It is also a space to share news, papers, presentations, videos, cartoons, and whatever you think could be interesting for you and/or the rest of us.

The title of the blog includes some of our current interests as a research group, however who knows where our quest will take us, that's what the 'beyond' word stands for. So, do not feel restricted by the title and feel free to post about whatever topic you want. Changing the title is also a possibility, so suggestions are welcomed.

I definitely think interaction is very important, and face-to-face contact can be hardly replaced. However, things are changing very fast and our current physical, synchronic, slow-paced world is being replaced by a virtual, non-synchronic, fast-paced world. There are many advantages in having a space like this and complex enterprises (such as Linux) have been accomplished mainly through virtual interaction. Said this, I want to highlight that this blog is a complement to other interaction scenarios and we should keep looking for spaces to discuss ideas in formal (our meetings and/or seminars) or informal settings, e.g. a 'tinto' at the the C&T's last floor or a beer at Anthrum (we have to do it more frequently!).

The blog is a community blog, i.e., all of us are authors and readers. Initially, I created it private, but we can discuss whether to make it public or not. We can post in Spanish and/or English. English is kind of the universal language nowadays, so it is a good idea to practice it, but is up to you. Don't feel intimidated by your lack of proficiency, none of us is a native speaker, so all of us commit errors. I suggest the following mechanism to collectively improve our skills: if somebody spots a mistake in a post, send a private message to the poster so she/he can correct it.

Let the posting starts...!

Fabio G.