Advanced technology for managing XML document collection
Abstract
Organizing large document collections for finding information easily and quickly has always been a challenging problem. In the last few years, XML has become the de-facto standard for content publishing and data exchange. The proliferation of XML documents and data has created new challenges and opportunities for managing document collections. Existing technologies for automatically organizing document collections are either imprecise or based only on simple grouping criteria. Since XML documents are self describing, it is possible to automatically categorize XML documents precisely, according to their content. With the availability of the standard XML query languages, e.g. XQuery, much more powerful folder and categorization technologies are now feasible. To address this new challenge and exploit this new opportunity, this paper describes a new and powerful categorization technology. This technology fully exploits the rich data model and semantic information embedded in the XML documents to dynamically categorize XML document collections precisely. Besides supporting directory-like document look-up operations, this technology also provides advanced operations such as multi-path navigation and document traversal across multiple collections. A preliminary performance study shows that this new categorization technology is both efficient and scalable. Thus, it is an ideal technology for automating the process of organizing and categorizing XML documents. © 2005 IEEE.