Abstract
Cognitive applications are complex and are composed of multiple components exhibiting diverse workload behavior. Efficient execution of these applications requires systems that can effectively handle this diversity. In this paper, we show that IBM POWER9™ shared memory systems have the compute capacity and memory throughput to efficiently handle the broad spectrum of computing requirements for cognitive workloads. We first review the GraphBLAS interface defined for supporting cognitive applications, particularly whole-graph analytics. We show that this application-programming interface effectively separates the concerns between the analytics application developer and the system developer and simultaneously enables good performance by permitting system developers to make platform-specific optimizations. A linear algebra formulation and execution of betweenness centrality kernel in the High-Performance Computing Scalable Graph Analysis Benchmark, for 256 million vertices and 2 billion edges graphs, delivers a sixfold reduction in execution time over a reference implementation. Following that, we present the results of benchmarking the forward propagation step of deep neural networks (DNNs) written in GraphBLAS and executed on POWER9. We present the rationale and evidence for weight matrices of large DNNs being sparse and show that for sparse weight matrices, GraphBLAS/POWER® has a two orders-of-magnitude performance advantage over dense implementations. Applications requiring analysis of graphs larger than several tens of billion vertices require distributed computing environments such as Apache Spark to provide resilience and parallelism. We show that when linear algebra techniques are implemented in an Apache Spark environment, we are able to leverage the parallelism available in POWER9 Servers.