Multiple stream job performance optimization with source operator graph transformations
Abstract
Multiple distributed stream queries, which are executed on stream processing systems, need to be fine-tuned to the compute cluster in order to harness the full potential of the hardware they run on. In this paper, we describe an automatic technique for conducting such stream query optimization in the presence of multiple stream jobs. During this autotuning process, we identify the structure of each program and conduct automatic program transformation to generate optimized unified streaming jobs. The operators on the unified secondary sample application are grouped into processing elements considering their performance characteristics and the stream graph topology structure to produce high performance stream query network. We implemented this multiple stream query optimization technique on a mechanism called Tahitica. We demonstrate our approach's ability for producing optimized stream query performance by comparing it to naive deployments using two real-world stream processing applications in the domains of health care and search advertising. Our stream query optimization approach reported 7.1% throughput performance improvement compared to a naive deployment.