Understanding documents with hyperknowledge specifications
Abstract
1 Finding concepts considering their meaning and semantic relations in a document corpus is an important and challenging task. In this paper, we present our contributions on how to understand unstructured data present in one or multiple documents. Generally, the current literature concentrates efforts in structuring knowledge by identifying semantic entities in the data. In this paper, we test our hypothesis that hyperknowledge specifications are capable of defining rich relations among documents and extracted facts. The main evidence supporting this hypothesis is the fact that hyperknowledge was built on top of hypermedia fundamentals, easing the specification of rich relationships between different multimodal components (i.e. multimedia content and knowledge entities). The key challenge tackled in this paper is how to structure and correlate these components considering their meaning and semantic relations.