Low-latency and high bandwidth TCP/IP protocol processing through an integrated HW/SW approach
Abstract
Ultra low-latency networking is critical in many domains, such as high frequency trading and high performance computing (HPC), and highly desirable in many others such as VoIP and on-line gaming. In closed systems - such as those found in HPC - Infiniband, iWARP or RoCE are common choices as system architects have the opportunity to choose the best host configurations and networking fabric. However, the vast majority of networks are built upon Ethernet with nodes exchanging data using the standard TCP/IP stack. On such networks, achieving ultra low-latency while maintaining compatibility with a standard TCP/IP stack is crucial. To date, most efforts for low-latency packet transfers have focused on three main areas: (i) avoiding context switches, (ii) avoiding buffer copies, and (iii) off-loading protocol processing. This paper describes IBM PowerENTM and its networking stack, showing that an integrated system design which treats Ethernet adapters as first class citizens that share the system bus with CPUs and memory, rather than as peripheral PCI Express attached devices, is a winning solution for achieving minimal latency. The work presents outstanding performance figures, including 1.30μs from wire to wire for UDP, usually the chosen protocol for latency sensitive applications, and excellent latency and bandwidth figures for the more complex TCP. © 2013 IEEE.