A framework for the selective dissemination of XML documents based on inferred user profiles
Abstract
As the amount of data available online and the number of pervasive applications that take advantage of it increase, systems that support selective dissemination of information are becoming more popular. At the same time, XML is becoming the standard for document exchange over the Internet. A key capability of emerging information dissemination systems is therefore the effective filtering of a continuous stream of XML data items according to user preferences. In this paper we propose a model for information dissemination that integrates profile inference with data dissemination and takes advantage of the structured content in XML documents. Starting from the assumption that explicitly stating one's information interests is an inconvenient and error-prone process, we aim to automatically construct user profiles. We do this by clustering items previously deemed valuable by the user according to a novel similarity measure that takes advantage of the semantic content of XML. Furthermore, we index the profiles from all users into a multi-level index structure whose nodes naturally will be a close match to subject areas present in the document collection. Such an approach is both intuitive and efficient since the indexing structure is not primarily affected by an increasing number of users. To support our claims we experimentally validate our method and report on its effectiveness and efficiency.