SemantiClean : CCleaning noisy data using semantic technology
Abstract
In our research on using information extraction to help populate semantic web resources, we have encountered significant obstacles to interoperability between the technologies. We believe these obstacles to be endemic to the basic paradigms and not quirks of the specific implementations we have worked with. In particular, we identify five dimensions of interoperability that must be addressed to successfully employ information extraction systems to populate semantic web resources that are suitable for reasoning. We call the task of transforming IE data into knowledge-based resources knowledge integration and we report results of experiments in which the knowledge integration process uses the deeper semantics of OWL ontologies to improve by between 8% and 13% the precision of relation extraction from text. © 2009 Springer Science+Business Media B.V.