MindLab Blog: Thoughts on Multimodal Fusion

Monday, November 21, 2011

Thoughts on Multimodal Fusion

There are some arguments that can help us build a discussion about multimodal learning (or retrieval):

Assume we have a common unit of information from which we can observe two states (or modalities).
Each modality is an incomplete view of the actual information there.
Also, each observed modality is corrupted or noisy.
Modalities are not independent, they have relationships, dependences, joint probabilities.
Use multimodal fusion to complement the representation of the true information unit. To make it more accurate with respect to the original content. To reconstruct the missed information.

These are some ideas that we were discussing with prof. Fabio this morning. I think they make perfect sense from a global perspective, even though they require some formalization yet.

3 comments:

Santiago Pérez RubianoDecember 3, 2011 at 8:42 PM
Could it be possible to perform something like a Principal Modalities Analysis? I mean is there any way of measuring the extent to which each modality is contributing to the common unit of information? How can we know how much information is there in a single view of an object, i.e. how can we know how much information does SIFT adds in comparison to DCT in a image representation problem?
ReplyDelete
Replies
UnknownDecember 13, 2011 at 6:54 AM
That's an interesting question... Actually the problem of validating that a new modality brings more information depends on the underlying task... So, assuming the availability of different modalities, we also have to identify the goal of mixing them up... Sometimes it can be just for fun (unsupervised fusion) and some other times we actually want to solve a problem. I would say it depends, does it make sense?
ReplyDelete
Replies
Fabio A. GonzalezDecember 13, 2011 at 7:35 AM
Santiago poses an interesting question. And it has to do with the general problem of feature selection, some alternatives to solve it:

1. Use PCA and analyse the contribution of each modality. This has the problem of being unsupervised and dependent on the scale of the different features.

2. Use a supervised alternative to PCA such as directions of maximum variance.

3. Use a traditional feature selection method (this is also a supervised alternative).

4. Use information theory. Here I'm not very sure about how to deal with this, but it is possible that some work has been done. The Principle of Maximum Entropy (http://en.wikipedia.org/wiki/Principle_of_maximum_entropy) could be related.

5. Another possibility would be to use an algorithm such as Convex NMF, on a Object X Feature matrix, CNMF will find a factorization X = XWH, and H will tell us which features are more important to represent the information in X.
ReplyDelete
Replies

Add comment