A journey to enable generative AI on a new hardware platform with PyTorch 2.0
Abstract
This talk explains our journey in enabling generative AI applications on a new hardware (HW) platform. We are working on running generative AI applications on IBM z from correctness and runtime performance perspectives. We share experiences for developers to write PyTorch and its ecosystem for a new HW. IBM z has a unique feature that uses big-endian byte order. While most HW platforms use little-endian, big-endian was not supported well in PyTorch and its pip packages. We supported both endians, for example, to exchange pre-trained models among any platform by fixing test and application failures. Our 32 PRs make no test failure on IBM z. The ecosystem, like the Hugging Face (HF) transformer framework, now works well. We will share our experience to enable CI for a new HW to keep the main branch healthy for a new HW. We enable HW acceleration features in PyTorch runtime and TorchInductor, such as SIMD. We will also briefly explain exploiting an in-core AI accelerator. Here are the takeaways: - Enabling a new HW without test failures in PyTorch and its ecosystem, like HF transformers - Adding CI for a HW new platform in the upstream - Enabling performance features for a new HW