Towards semantic knowledge propagation from text corpus to web images
Abstract
In this paper, we study the problem of transfer learning from text to images in the context of network data in which link based bridges are available to transfer the knowledge between the different domains. The problem of classification of image data is often much more challenging than text data because of the following two reasons: (a) Labeled text data is very widely available for classification purposes. On the other hand, this is often not the case for image data, in which a lot of images are available from many sources, but many of them are often not labeled. (b) The image features are not directly related to semantic concepts inherent in class labels. On the other hand, since text data tends to have natural semantic interpretability (because of their human origins), they are often more directly related to class labels. The semantic challenges of image features are glaringly evident, when we attempt to recognize complex abstract concepts, and the visual features often fail to discriminate such concepts. However, the copious availability of bridging relationships between text and images in the context of web and social network data can be used in order to design for effective classifiers for image data. The relationships between the images and text features (which may be derived from such web-centered bridges) provide additional hints for the classification process in terms of the image feature transformations which provide the most effective results. One of our goals in this paper is to develop a mathematical model for the functional relationships between text and image features, so as to indirectly transfer semantic knowledge through feature transformations. This feature transformation is accomplished by mapping instances from different domains into a common space of unspecified topics. This is used as a bridge to semantically connect the two heterogeneous spaces. We evaluate our knowledge transfer techniques on an image classification task with labeled text corpora and show the effectiveness with respect to competing algorithms. Copyright © 2011 by the Association for Computing Machinery, Inc. (ACM).