Deep Dive
6 minute read

How IBM is tailoring generative AI for enterprises

At IBM, we’re developing generative foundation models that are trustworthy, energy efficient, and portable, allowing enterprises to move AI workloads seamlessly between public and private clouds.

AIforbusiness.png

At IBM, we’re developing generative foundation models that are trustworthy, energy efficient, and portable, allowing enterprises to move AI workloads seamlessly between public and private clouds.

ChatGPT has exposed the promise and the pitfalls of generative AI. OpenAI’s chatbot can write legal opinions and college admissions essays well enough to fool some experts. But it can also be lured into generating hateful comments and made-up information. For many companies, the risks of leveraging this powerful technology may seem to outweigh the rewards.

But it is possible to build useful applications for enterprise on the cutting edge of AI today. IBM is innovating at each stage of the pipeline to help companies create value for their customers, whether that’s improving user experience or helping employees boost their productivity and creativity.  

Trustworthiness is one of our main areas of focus. If companies can’t have confidence in the predictions and content these models generate, few of the other benefits matter. Most of today’s largest foundation models, including the large language model powering ChatGPT, have been trained on examples culled from the internet. That has created predictable problems not just for ChatGPT, but earlier chatbots as well. To reduce the chance of an AI insulting a customer or hallucinating facts, we are curating domain-specific datasets to train our models.

We are also focused on developing energy-efficient methods to train, tune, and run AI models to lower costs and reduce AI’s enormous carbon footprint. By some estimates, training a large-scale model can emit as much carbon as operating five cars for their lifespan. Customizing and serving a large model adds further computational costs — in dollars and carbon emissions. We can do better by making models smaller and using computing resources more efficiently across the stack.

Our third area of emphasis is on portability. We want enterprises to be able to seamlessly and safely move their AI workflows between public and private clouds. By making it easier for companies to securely process and store data on servers they own or lease, we hope to lower the barriers to AI adoption.

The rise of transformers, deep-learning models pre-trained on vast amounts of raw, unlabeled data has led to a paradigm shift in AI. Instead of training many models on labeled, task-specific data, we can pre-train one big model built on a transformer and redeploy it repeatedly, with additional fine-tuning. These multi-purpose models have come to be known as foundation models. They are now widely used for traditional ML tasks like classification and entity extraction in addition to generative AI tasks like translation, summarization, and creating realistic content that looks like an expert made it.

But transformers aren’t without risks, especially for generative tasks. For these technologies to succeed in enterprise settings, they will have to be more trustworthy and energy efficient than they are today. Enterprises will also need to be able to easily and securely move AI workloads around, especially across modern and legacy software and hardware systems. Here’s how IBM is addressing these challenges.

Trustworthy generative models

Generative models like BLOOM or PaLM with hundreds of billions of parameters are great for open-ended tasks. But businesses may want to base their decisions on enterprise-relevant data rather than random threads on Reddit. Data curation is the first step in building a trustworthy model.

Feeding the model high-quality data at the outset, preferably enterprise data, can prevent problems later. This includes data-cleansing. IBM is developing techniques to filter data for hate, profanity, and biased language before training.

When there isn’t enough real data to choose from, synthetic data can be a valuable substitute. It comes automatically labeled and can augment or replace real data to improve the model’s accuracy and reliability. It can serve as a stand-in for health care records, financial data, and other content protected by privacy or copyright laws. Deployed as adversarial examples, it can also show where an AI model is likely to make mistakes or unfair decisions.

Training is the next step in building a trustworthy model. Reinforcement learning with human feedback, or RLHF, is one way to make models safer by aligning them with human values. RLHF played a key role in fine-tuning the language model powering ChatGPT, helping to endow it with conversational skills. RLHF was also designed to reduce hallucinations and anti-social behavior.

Humans rate the quality of the model’s responses; this feedback is incorporated by the model to iteratively produce more humanlike answers. RLHF can improve the quality of the dialogue, but it can also build in guardrails to make sure the model won’t teach you how to build a bomb or make up facts.

Tuning is the last step in building a trustworthy model. IBM has developed several easy-to-use methods to find and fix AI biases before these models are turned loose on enterprise tasks. One method, FairIJ, identifies biased data points in the training data and lets you edit them out. Another, FairReprogram, finds and fixes biased classifiers.

We are also working on ways to make the process more transparent. To be truly trustworthy, a model must not only produce accurate, reliable decisions, it must be able to explain to humans how it got there.

Train, tune, and serve AI models efficiently

For years, the size of a model strongly predicted its performance. Bigger models trained on more data did a better job than smaller models at things like summarizing or translating text. But their edge came with huge financial and environmental costs: OpenAI reportedly spent millions training GPT-3, the model that powered its original ChatGPT.

As AI evolves, IBM and other tech companies are developing ways to use computing resources more efficiently to shrink AI’s carbon footprint. With sustainability as a major focus, we are streamlining each step of the AI pipeline, from training and validation to tuning and inference.

One encouraging finding is that effective AI models can be a lot smaller than they are today. DeepMind recently showed that a smaller model trained on more data could outperform a model four times larger trained on less data. The secret, they found, was to scale parameters and data proportionally: Double the size of your model, then double the size of your training data for optimal performance. We expect models to shrink in size, speeding up tuning and inference.

Another potential energy-saver lies in scaling models iteratively, rather than starting from scratch. Researchers at the MIT-IBM Watson AI Lab found that a model can be trained four times faster simply by recycling the parameters of its smaller, earlier version. The “LiGO” algorithm analyzes the small model’s weights to learn how “to grow” it, then transfers these learned weights to the big model.

We are also developing techniques to quickly customize foundation models for new tasks. Rather than run one algorithm over and over to find the best prompts for each new application, we’ve devised an automated method for finding a universal prompt for all the tasks. Our multi-prompt tuning technique is an economical way for users to rapidly experiment and redeploy models.

While the energy costs in training AI models rightly gets lots of attention, there are other concerns, too. Validating and running these behemoth models can be expensive and time-consuming, creating an agonizingly slow experience for users. To improve inference speeds, we’re leveraging our deep expertise in quantization, or shrinking models from 32-point floating point arithmetic to much smaller bit formats.

Reducing AI-model precision brings huge training and inference benefits without sacrificing accuracy. We have devised methods to train and serve models with 4-bit integer arithmetic, significantly cutting computation and storage costs. We hope to soon run these compressed models on our AI-optimized chip, the IBM AIU.

Moving AI workflows seamlessly between clouds

Running AI workloads is computationally intensive. It’s a big reason why so much of this work is handled on distributed servers in the cloud. The ability to rent data storage and computing infrastructure, rather than own it, gives businesses the flexibility to rapidly scale, and experiment with, AI applications.

Another benefit of AI computing in the cloud is speed. The modern cloud is organized into modules that make workloads easier to split and run in parallel for faster training and inference. But not all enterprises can take advantage of this modular, cloud-native setup. Many are still running workloads on private systems, especially if they have sensitive data to protect.

IBM recently set out a new path for AI training with Vela, our first AI supercomputer in the cloud. Vela is designed to make AI research and development as productive as possible. IBM Researchers from around the globe are training advanced models using our cloud-native stack, which runs on OpenShift on top of Vela. While the work was done in the context of a public cloud, the architecture could also be adopted for on-premises AI system design. If companies have their data in the cloud, we can bring our stack to them — and if their data is on-premises, they can train and tune their models there, and have the same cloud-native, user-friendly experience.

Redesigning the entire stack took more than a year and required several innovations. IBM researchers worked with the open-source Ray and PyTorch communities to automate and simplify workflows at each stage of the AI pipeline — training, tuning, testing, and serving.

We added new technologies to Ray, an open-source distributed computing framework for machine learning applications, to make training and testing easier — including pre-processing tasks like removing duplicate data and toxic content. We also worked with the PyTorch platform to make it possible to train AI models on affordable cloud networking hardware like standard Ethernet equipment.

In collaboration with RedHat, we built features into the OpenShift platform itself to ensure that each Vela GPU in the system operates at full capacity, for maximum efficiency. We made additional improvements to OpenShift’s Kubernetes containers as well, making it easier to share code and data within a container.

This new era of foundation models, including those for generative tasks, holds great potential for enterprise. We look forward to working with businesses to put in the right guardrails and infrastructure so that their full value can be unlocked.

Date