StreamWeb: Real-time Web monitoring with stream computing
Abstract
A new trend involves Web services such as Twitter beginning to publish streaming Web APIs that enable partners and end users to retrieve streaming data. By combining such push-based Web services and existing pull-based Web services, it is now possible for us to understand the current status or trends of the world in a more real-time way, such as real-time tracking of infectious disease, real-time crime prediction, or real-time marketing, and so various innovative business services are possible. For a system architecture to implement such services, the services are normally built from the scratch, and the performance and scalability depend upon the engineers' skills. In this paper we propose a real-time Web monitoring system called "StreamWeb" on top of a stream computing system called System S developed by IBM Research. The StreamWeb system allows developers to easily describe their analytical algorithms for a variety of kinds of Web streaming data without worrying about the performance and scalability, and provides real-time and scalable Web monitoring for massive amounts of data. As an experimental proof-of-concept application, we built an application that monitors a list of keywords in the Twitter streaming data, and that displays any messages including the specified keywords onto a map of the physical location (from Google) where the message was posted. Our system can handle nearly 30 thousand Twitter messages per second on a system with 8 computing nodes. This prototype application confirms that we can build real-time Web monitoring systems while satisfying the needs for high software productivity and for system scalability. © 2011 IEEE.