Enabling enterprise mashups over unstructured text feeds with Infosphere Mashuphub and SystemT
Abstract
Enterprise mashup scenarios often involve feeds derived from data created primarily for eye consumption, such as email, news, calendars, blogs, and web feeds. These data sources can test the capabilities of current data mashup products, as the attributes needed to perform join, aggregation, and other operations are often buried within unstructured feed text. Information extraction technology is a key enabler in such scenarios, using annotators to convert unstructured text into structured information that can facilitate mashup operations. Our demo presents the integration of SystemT, an information extraction system from IBM Research, with IBM's InfoSphere MashupHub. We show how to build domain-specific annotators with SystemT's declarative rule language, AQL, and how to use these annotators to combine structured and unstructured information in an enterprise mashup.