Enterprise information search systems for heterogeneous content repositories
Abstract
In larger enterprises, business documents are typically stored in disparate, autonomous content repositories with various formats. Efficient search and retrieval mechanisms are needed to deal with the heterogeneousness and complexity of this environment. This paper presents a general architecture and two industrial implementations of a service-based information system to perform search in Lotus Notes databases and data sources with Web service interfaces. The first implementation is based on a federated database system that maps the various schemas of the sources into a common interface and aggregates information from their native locations. This implementation offers the advantages of scalability and accessibility to real-time information. The second one is based on a one-index enterprise-scale search engine that crawls, parses and indexes the document contents from the sources. This latter implementation offers the ability of scoring the relevance ranking of documents and eliminating duplications in search results. The relative merits and limitations of both implementations will be presented.