Friday, December 30, 2011

Embarrassingly Parallel

I remember we found this expression in one of the papers we were studying during the Large Scale Machine Learning seminar. Here we have a definition, take from Wikipedia:

In parallel computing, an embarrassingly parallel workload (or embarrassingly parallel problem) is one for which little or no effort is required to separate the problem into a number of parallel tasks. This is often the case where there exists no dependency (or communication) between those parallel tasks.
Full article: http://en.wikipedia.org/wiki/Embarrassingly_parallel

Tuesday, December 13, 2011

Synthetic Training Data

One of the recent breakthroughs in computer vision was the use of synthetic data to train effective recognition systems. The most remarkable example is the work of Shotton et al. from Microsoft Research Cambridge, who trained the kinect human pose recognition system using more than a million of synthetic images, generated by rendering fake humans using 3D software.

That's perfectly possible mainly because the computer graphics industry has been working on generating 3D humans with realistic appearance for cinematographic productions and games. Clothing materials, hair, shapes and so on, are easily simulated in 3D. So, generating 3D human poses is a well establish and well understood procedure, and it make sense to generate synthetic data for this kind of problem.

What about medical images? Can we generate synthetic images simulating various medical conditions? It sounds a bit weird, right? Well, during the SIPAIM 2011 in Bucaramanga, Juan Antonio, one of the invited speakers from Spain, was giving a tutorial on how medical images are captured and generated, specially using Magnetic Resonance Imaging (MRI). This is also a very well known physical phenomenon that could be "simulated" using 3D software. Perhaps it's not a conventional 3D rendering procedure, because the intensities observed in an MRI scan are not responses to light, such as in conventional photography, but to magnetic resonance, as its name suggests.

Anyway, we could simulate various tissues and their responses to different machine configurations (intensity of the magnetic field, for instance) and render quite good MRI scans playing with tissue parameters. Then, we could produce a really large dataset of images with quite precise labelling at the pixel level, useful to train a recognition system for medical images. It could eventually work for xrays and other imaging modalities.

On the other hand, I wonder, if we can simulate that data, why can not we use the simulation function directly inside a learning algorithm? In other words, instead of generating the data to train a learning algorithm, the recognition system might also be ready to get that sort of simulation function as prior knowledge to make more effective predictions... does it make sense?

Monday, November 28, 2011

Big problems on Computer Science and Machine Learning

One of the most intriguing questions that Richard Hamming raised in his talk : "You and your research" (which I'm sure most of you've red) is : What are the most important problems in your area?
In one hand it is obvious that one can only focus on single well defined thing to do "great" stuff but in the other, and as Feynman states:

"... You have to keep a dozen of your favorite problems constantly present in your mind, although by and large they will lay in a dormant state. Every time you hear or read a new trick or a new result, test it against each of your twelve problems to see whether it helps. ..."

 So I decided to do a quick search about this and found a Wikipedia entry, a StackOverflow entry and a very interesting blog from Andreas Zwinkau, who describes an experiment : "Fantasy Research Lab" proposed by Philip Greenspun.
The Fantasy Research Lab consists of

 " ... [you] pretending that you are the lab director for computer science at a brand-new research university and to come up with a plan for how you'd populate your lab with projects."

 I thought it was an interesting experiment so I asked prof. Fabio to try it out, he rapidly came up with the interesting problem of Automatic Programming.
Now I'm interested in knowing the point of view of some very bright people like you,  but not only regarding the entire Computer Science field (which by itself could extend to Operating Systems, Software Engineering, Security, Databases, Hardware and so on as Microsoft Academic Search unveils) but also the more specific Machine Learning, Information Retrieval and HPC view which is central to this blog.

Thanks for reading ;)

Monday, November 21, 2011

Bag of Features and Image Matching

I think bag of features is a good enough representation for image matching. Of course, it is not perfect, it still requires a lot of improvements. But it still captures many ideas that the computer vision community was working on a few years before it started to be popular.

For instance, consider the problem of image matching using SIFT features. The procedure goes like this:

  1. Extract SIFT features from image A and B.
  2. Build a list of descriptors for each image.
  3. For each descriptor in image A, compute the distance to all other descriptors in image B.
  4. Identify the minimum distance. If it is less than a threshold, count one match.
  5. Repeat.
At the end of the procedure, we have the total number of matches between both images. As you can imagine, this is sort of expensive if we are supposed to compute the number of matches for a large set of images (not just between two images). There is when the bag-of-features (BoF) appears and introduces an intermediate layer to save some computations: the dictionary of visual patterns. Actually, instead of saving those computations, the BoF moves the effort to an off-line stage, when we can wait (the dictionary construction).

Later, when we are indexing images with the BoF, we pre-compute the number of matches between the descriptors of one image and a reference dictionary. That's the histogram of occurrences between two images. Afterwards, for a query image, we can estimate the total number of matches with respect to a previously indexed image, just by computing the histogram intersection between both histograms.

In other words, if image A shows the pattern "x" 5 times and image B shows the same pattern 3 times, guess what would be the number of common matches if we look directly from image to image without using a dictionary: 3 (because at least 3 of the patterns in A would match with those patterns in B, approximately, i.e., the minimum between 5 and 3). Sounds familiar? Of course, this is an approximation to the matching process described above using the histogram intersection metric. However, it sort of mimics what was previously done.

Notice that one important parameter in the direct matching process is the threshold to accept or reject a match. This is sort of relaxed in the BoF approach by using k-means to group similar patterns. I think it's relaxed because there is no rejection when using BoF (unless it would be sparse enough). So, here is the trick: the larger the number of patterns in the dictionary, the smaller the number of matches, and possibly the better the approximation.

I would argue that when the number of patterns goes to infinite, it is equivalent to have a matching process where the threshold is set to zero (a match requires exact descriptors). This sort of justifies the use of large dictionaries when using BoF. And it becomes even more important when we would like to use it in large scale setups, just because when we introduce more images, it is likely that more unseen visual patterns appear in the collection.

Thoughts on Multimodal Fusion

There are some arguments that can help us build a discussion about multimodal learning (or retrieval):

  • Assume we have a common unit of information from which we can observe two states (or modalities).
  • Each modality is an incomplete view of the actual information there.
  • Also, each observed modality is corrupted or noisy.
  • Modalities are not independent, they have relationships, dependences, joint probabilities.
  • Use multimodal fusion to complement the representation of the true information unit. To make it more accurate with respect to the original content. To reconstruct the missed information.
These are some ideas that we were discussing with prof. Fabio this morning. I think they make perfect sense from a global perspective, even though they require some formalization yet.

Saturday, November 12, 2011

Welcome....

Hi everybody,

Welcome to our new virtual meeting space. This is a space to share ideas (our own or from someone else), insights, intuitions, excitement about something, or even frustation. It is a space to discuss profound and abstract ideas, but also to share down-to-earth, cynic, humorous or frivolous comments. It is also a space to share news, papers, presentations, videos, cartoons, and whatever you think could be interesting for you and/or the rest of us.

The title of the blog includes some of our current interests as a research group, however who knows where our quest will take us, that's what the 'beyond' word stands for. So, do not feel restricted by the title and feel free to post about whatever topic you want. Changing the title is also a possibility, so suggestions are welcomed.

I definitely think interaction is very important, and face-to-face contact can be hardly replaced. However, things are changing very fast and our current physical, synchronic, slow-paced world is being replaced by a virtual, non-synchronic, fast-paced world. There are many advantages in having a space like this and complex enterprises (such as Linux) have been accomplished mainly through virtual interaction. Said this, I want to highlight that this blog is a complement to other interaction scenarios and we should keep looking for spaces to discuss ideas in formal (our meetings and/or seminars) or informal settings, e.g. a 'tinto' at the the C&T's last floor or a beer at Anthrum (we have to do it more frequently!).

The blog is a community blog, i.e., all of us are authors and readers. Initially, I created it private, but we can discuss whether to make it public or not. We can post in Spanish and/or English. English is kind of the universal language nowadays, so it is a good idea to practice it, but is up to you. Don't feel intimidated by your lack of proficiency, none of us is a native speaker, so all of us commit errors. I suggest the following mechanism to collectively improve our skills: if somebody spots a mistake in a post, send a private message to the poster so she/he can correct it.

Let the posting starts...!

Fabio G.