Abstract
We describe a general approach to integrate the information produced by different visual modules with the goal of generating a quantitative 3D reconstruction of the observed scene and to estimate the reconstruction errors. The integration is achieved in two steps. Firstly, several different visual modules analyze the scene in terms of a common data representation: planar patches are used by different visual modules to communicate and represent the 3D structure of the scene. We show how it is possible to use this simple data structure to share and integrate information from different visual modalities, and how it can support the necessities of the great majority of different visual modules known in literature. Secondly, we devise a communication scheme able to merge and improve the description of the scene in terms of planar patches. The applications of state-of-the-art algorithms allows to fuse information affected by an unknown grade of correlation and still guarantee conservative error estimates. Tests on real and synthetic scene show that our system produces a consistent and marked improvement over the results of single visual modules, with error reduction up to a factor of ten and with typical reduction of a factor 2-4.