The 2024 IBM Research annual letter

Deep Dive
17 minute read

This past year underscored exactly what only a place like IBM can achieve. At IBM Research, we made massive advances in AI, quantum computing, semiconductors, and fundamental research — much of which is either heading into products solving real problems IBM customers have, or will be in the near future.

IBM Research houses some 3,000 researchers in 14 labs across the world. By pushing the boundaries of what’s possible in such a wide range of scientific fields, we’re able to envision what’s next in computing and scale our ideas in ways that previously may have seemed impossible. Whether that’s revolutionizing the next generation of IBM mainframes with new chips and networking solutions, building AI models and agents that are solving real problems IBM Consulting clients have, or tackling intractable, fundamental problems through cutting-edge quantum supercomputing.

In this letter, we’ll highlight the major breakthroughs that came out of IBM Research over just the last 12 months. But as important as these innovations are individually, it’s what they represent when taken together. Our engine of research and development across our labs is leading to commercial products for IBM, and meaningful results for our partners and clients — faster than ever before.

AI

We have all seen how quickly AI has become integral to so many aspects of our lives. In a few short years, just about every piece of public data available on the internet has been used to train a foundation model. And yet, much of the world’s enterprise data — information that can be used to build tools to solve real business needs — remains locked in PDFs, slide decks, emails, and spreadsheets. Over the course of 2024, IBM Research was the engine of innovation behind IBM’s largest releases that are now bringing AI to enterprise, at the scale needed when end users measure in the millions.

At Think in May, IBM and Red Hat unveiled InstructLab, a new open-source project designed to lower the cost of fine-tuning LLMs, by allowing people to collaboratively add new knowledge and skills to any model. This project was conceived of and incubated within IBM Research, and a close working relationship between Research and Red Hat led to the powerful tool that has now launched on Red Hat Enterprise Linux AI. What makes InstructLab so impressive is that it gives communities the tools to create and merge changes to LLMs, without having to retrain the model from scratch. By making LLMs more like any other open-source software project, it’s considerably easier for collaborators from around the world to find new ways to make LLMs solve problems that people are struggling to solve. IBM Research has used InstructLab to generate synthetic data to improve its open-source Granite models for language and code. In October, IBM launched and open-sourced its third generation of Granite models.

These powerful new enterprise models were designed and trained within IBM Research’s Data and Model Factory. The release included the new Granite 8B and 2B models, designed to be the workhorse models for enterprise AI. These models were trained on 12 trillion tokens in 12 languages, from a combination of public and enterprise data sources. The models perform at the same level as considerably larger foundation models, at a fraction of the cost, on tasks that are important to businesses — things like RAG, classification, summarization, entity extraction, and tool use.

Subsequently in December, we released our Granite 3.1 models, each with an expanded context length of 128K. They were trained on more than 12 trillion tokens of high-quality, curated data, and open-sourced with full transparency on data sources. The Granite 3.1 8B Instruct model delivered significant performance improvements over Granite 3.0, and its average score across the Hugging Face OpenLLM Leaderboard benchmarks is among the highest of any open model in its weight class.

Granite 3-1 performance.png

We also released a new family of embedding models. These new models, like their generative counterparts, offer multilingual support in 12 languages.

As part of the earlier Granite 3.0 launch, we also open-sourced a new family of models called Granite Guardian. These let developers implement safety guardrails to their apps, by checking user prompts and LLM responses for several types of risks, including social bias, hate, toxicity, profanity, violence, and jailbreaking attempts. And when our models are coupled with RAG, they can check for groundedness, context relevance, and answer relevance. We released a new 8B version with Granite 3.1 that we believe offers the most comprehensive set of risk and harm detection capabilities available today.

Granite 3-1 performance with RAG and harm detection(1).jpg

We also announced new versions of our Granite Time Series models, which outperform models 10 times their size. For businesses, being able to accurately forecast future events based on historical data can have major impacts. Traditional LLMs struggle with these sorts of tasks, and that’s why we’ve pursued these models. This includes Granite TTM (or TinyTimeMixers), IBM’s series of compact and high performing time-series models, which are now available in watsonx.ai through the beta release of watsonx.ai Timeseries Forecasting API and SDK. We believe the open-source community has seen the value in these models, with TTM having been downloaded on Hugging Face more than 5 million times so far.

Also in October, we released the next generation of watsonx Code Assistant. Powered by Granite code models, wCA offers general-purpose coding help for languages like C, C++, Go, Java, and Python. In our own software development pipeline, we’ve improved our speed with code explanation by 90% in some cases. The new 8B code model also features support for agentic capabilities. Our Granite code models are also powering new use cases in IT and business automation through products like Instana and watsonx Orchestrate.

In addition to their open-source release on Hugging Face, and commercial availability on watsonx and Red Hat products, Granite models are now available across a wide swathe of partners including Ollama, LM Studio, AWS, Nvidia, Google Vertex, and Samsung, among others.

Building off the Granite 3 series of models, we’re now working to realize a future where AI agents can easily solve business needs. We have released an open-source framework called Bee that enables rapid development of agents to power business applications. These new lightweight models are small enough to be deployed on a laptop or other edge devices.

We also made new models that our partners are already using to make a difference in the world. Our climate and weather model, developed in partnership with NASA, is now being used to track the flood damage in Spain, deforestation in the Amazon, and heat islands in US cities.

This year also marked the one-year anniversary of the AI Alliance, a group co-founded by IBM and Meta to drive open and responsible AI development. Over the last year, it’s grown to 140 members in 23 countries, including the likes of AMD, Cleveland Clinic, Intel, Anyscale, UPenn, Hugging Face, and Yale. The organization has formed several working groups to develop responsible and accessible foundation models, AI hardware, and safety initiatives.

Semiconductors

With the burgeoning demand for AI, which has become reliant on increasingly complex models, it’s clear that traditional CPUs and GPUs are struggling to provide the necessary combination of speed and energy efficiency. We need to create new devices that were designed from the outset to handle AI workloads at scale. This has been a large part of the focus of the semiconductors and infrastructure research teams in 2024, with several major breakthroughs taking place.

In August, IBM unveiled Spyre, a new AI accelerator chip for future generations of Z and Power systems. Spyre was inspired by the work out of IBM Research on the AIU prototype design, first shown off in 2022, and the Telum chip from 2021. In each case, researchers had taken inspiration from the structure of the human brain, to create devices where both CPU and AI cores were tightly integrated on the same chip. This breakthrough came after the realization that AI workflows require a system with extremely low latency for AI inferencing.

aiu spyre.jpg

Spyre has 32 individual accelerator cores and contains 25.6 billion transistors connected by 14 miles of wire. It is produced using 5 nm node process technology, and cards can be clustered together. A cluster of eight cards adds 256 additional accelerator cores to a single IBM Z system.

Roughly 70% of the entire world’s transactions by value run through IBM mainframes. With Spyre, there’s now an effective solution for bringing generative AI to these mission-critical machines. Every day, new enterprise use cases for generative AI are springing up, from solutions for automating business processes, to generative systems for app modernization. With the Spyre Accelerator, businesses can deploy cutting-edge AI software on Z, while still benefiting from the security and reliability IBM Z offers.

northpole_prototype_chip.jpg
One of the prototype NorthPole devices used in the large language model experiments.

We’re also looking at new ways to serve models more efficiently. Last year, IBM Research unveiled its brain-inspired AIU NorthPole chip, which co-locates memory and processing units together, effectively removing the von Neumann bottleneck found in just about every chip design over the past 60 years. This year, in a collaboration between NorthPole’s hardware researchers and AI researchers, the team created a new research system using NorthPole for generative models.

1_cards_energy_vs_latency_ALL_GPUs_fig.jpg
NorthPole's efficiency compared with other devices with similar uses.

The team took 16 IBM NorthPole chips, installed them on a standard server blade, and ran a 3-billion parameter LLM. The team was able to achieve a latency of below 1 millisecond per token — nearly 47 times faster than the next most energy-efficient GPU — while using nearly 73 times less energy than the next lowest-latency GPU.

A similar sort of cross-collaboration resulted in another major breakthrough this year in the field of co-packaged optics. A team within IBM Research’s semiconductors division produced the world’s first successful polymer optical waveguide, which brings the bandwidth of optics to the very edge of chips. This device makes it possible to line up high-density bundles of optical fibers at the edge of a silicon chip, so it can communicate directly through the polymer fibers. High-fidelity optical connections require exacting tolerances of half a micron or less between a fiber and connector. The team has demonstrated the viability of a 50-micron pitch for optical channels. This represents an 80% size reduction from the conventional 250-micron pitch, which they believe can shrink even further, to 20 or 25 microns, leading to a 1,000% to 1,200% increase in bandwidth for a given chip. which brings the bandwidth of optics to the very edge of chips.

CO_PACK_1.jpg
IBM Research's co-packaged optics device breakthrough.

Again working with Research’s AI division, the team used these new prototype devices to calculate that it would be possible to cut the time it takes to train a 70 billion parameter LLM from three months to three weeks, when compared to using industry-standard GPUs and interconnects. That would save the energy equivalent of 5,000 U.S. homes’ annual power consumption per AI model trained.

The semiconductors team has also continued its breakthrough work in shrinking transistors and delivering 2 nm process devices with Rapidus in the next five years. Researchers from IBM and Japanese chipmaker Rapidus have announced that they reached a critical milestone in consistently constructing chips with a 2-nanometer process. Using selective nanosheet layer reduction, they can now build nanosheet gate-all-around transistors with multiple threshold voltages (or multi-Vt), which allows for chips that can perform complex computations without requiring as much energy. These advances further strengthen the nanosheet multi-Vt technology as the replacement for current FinFET devices.

The team also has been focused on research and development of EUV lithography using High NA EUV systems — a critical technology that allows for designing high-performance logic devices with the potential to extend the Nanosheet era and enable future vertically-stacked transistors beyond the 1 nm node.

IBM has already achieved an early demonstration of metallization of lines down to 21 nm pitch that enables the continuation of copper damascene interconnects integration, unlocking the needs of semiconductor designs below the 2 nm node and simplifying future Nanosheet node technology. The single-print 24, 23, and 21 nm pitch interconnects have demonstrable and consistent electrical functionality. This demonstrates that IBM’s development platform can continue to leverage further process co-optimization to enable mature yields in these smallest of wires.

EUVBlog_1.jpg
Researchers in Albany testing out chip designs on the EUV lithography machines.

These breakthroughs are impressive in their own right, but perhaps what’s most impactful is that these projects will not just remain as research endeavors. Through the reach of IBM and its partners, we’re already seeing these devices grow from ideas into products that can be deployed at scale to help solve real business problems. IBM Spyre is already available and will be integral to the next generation of IBM Power 11 when it is released later in 2025. AIU NorthPole and the co-packaged optics devices were tested and hardened at IBM’s facility in Bromont, Canada, where countless companies rely on the site’s services to test and package their chips. And IBM Research’s 2 nm node technology is now being made ready for large-scale production with Rapidus.

Quantum

In 2024, we made great strides advancing our mission of bringing useful quantum computing to the world. A big part of that was solidifying our vision for quantum-centric supercomputing — weaving quantum and classical computing together to solve problems beyond the ability of either compute paradigm alone — and turbocharging the performance of Qiskit, the world’s favorite quantum software development kit.

Back in 2020, IBM released an aggressive roadmap charting the course for scaling quantum computers. Last year we extended that roadmap out to 2033, detailing the hardware, software, and innovations we have planned to realize fault-tolerant quantum computing at scale. At this year’s Quantum Developer Conference, we demonstrated our progress — including the achievement of accurate results from quantum circuits on Heron out to 5,000 gate operations.

Europe’s first IBM Quantum Data Center is now open
The latest IBM Quantum Heron chip.

We unveiled a more performant Heron chip, now with 156 qubits, for our clients. Heron is shattering records for performance with superconducting quantum processors, with error rates continuing to fall. Heron’s best 2Q gate error rates are now 8x10^-4, as measured by Randomized Benchmarking. We also now have achieved a 240 times improvement in speed in two years, now at 240K circuit layer operations per second (CLOPS).

We also achieved two critical milestones on tour innovation roadmap: m-couplers with Crossbill and l-couplers with Flamingo. Crossbill’s m-couplers are short-range chip-to-chip couplers that allow us to scale chips within packages modularly. Our Flamingo demonstration introduced l-couplers, longer-range package-to-package couplers that allow us to link chips over larger distances inside the fridge. Together, these advances provide new technologies aligned to our roadmap goals of scalable fault-tolerant quantum computers with Starling and Bluejay.

But perhaps the most significant release of the year was the first stable version of Qiskit SDK, Qiskit v1.0. Based on the results of our benchmarking, we feel confident in saying that it’s the world’s most performant quantum software development kit.

A formal shift to semantic versioning with longer support cycles, coupled with a removal of the metapackage architecture, offers the stability that Qiskit’s more than 600,000 developers need to build more complex algorithms and map their most difficult challenges to quantum circuits. And a new algorithm for transpiling circuits, called LightSABRE brings significant improvements over the previous SABRE algorithm.

Furthermore, we wanted to show its performance in a fair and accurate way. We compiled and released an open-source collection of benchmarks called Benchpress, comprising over 1,000 different tests to measure a quantum SDK’s ability to generate, manipulate, and transpile circuits. Almost all these tests are standardized benchmarking tests from open-source libraries used by other members of the quantum community. We benchmarked Qiskit against other quantum software, including TKET, BQSKit, Cirq, and more. Qiskit was the clear winner in terms of performance, completing more tests than any other quantum SDK. On average, Qiskit was 29 times faster at transpiling and transpiled the circuits with 54% fewer two-qubit gates than TKET, the second highest-performing SDK.

But Qiskit is more than just a performant SDK. This year, we redefined Qiskit as a software toolset to work alongside our quantum computing hardware, taking users through the entire journey of running utility-scale quantum workloads from writing code to post-processing the results and everything in between. Today, Qiskit encompasses both the open-source SDK, as well as the software and middleware tools necessary for executing utility-scale workloads. That includes the new Qiskit Transpiler Service, the updated Qiskit Runtime Service featuring an overhaul of the Qiskit Primitives, our new Qiskit AI Code Assistant Service, Qiskit Serverless, and Qiskit Functions, an IBM-hosted instantiation of Qiskit Serverless for running workloads on IBM quantum processors and IBM Cloud.

Of those new features, Qiskit Functions has the potential to bring quantum computing to a considerably broader audience. It’s a new programming service we designed that enables access to high-performance quantum hardware and software at a higher abstraction level. After importing the Qiskit Function Catalog and passing it their API token, users add the function to their code and pass it the required inputs — such as classical data they’d like to map and run on quantum circuits. The service runs the code on a quantum computer and applies error suppression and mitigation, then the user receives their results.

Combining these software and hardware breakthroughs allowed us this year to produce the first true demonstration of quantum-centric supercomputing. Working with RIKEN, we published a new paper where we defined the paradigm as supercomputing that optimizes and orchestrates work across quantum computers and advanced classical compute clusters, either co-located or over the cloud, to accelerate both.

In that paper, the RIKEN and IBM team incorporated quantum computations of chemistry in a quantum-centric supercomputing architecture, using up to 6,400 nodes of the Fugaku supercomputer to assist an IBM Heron QPU. The computation was a simulation of the triple-bond breaking of molecular nitrogen, plus the active-space electronic structure of two iron-sulfur clusters, 2Fe–2S and 4Fe–4S. Together, the computers produced approximate solutions for problems beyond exact classical methods by incorporating the sample-based quantum diagonalization (SQD) method now included in Qiskit as an add-on.

image of an industrial space with several people milling about or sitting at tables
An IBM Quantum System Two in the Think Lab at IBM Research, among other computing clusters, is part of a vision for quantum-centric supercomputing.

As quantum-centric supercomputing comes to fruition, we envision quantum computers will assist classical computers (and vice versa) in some of the hardest computing tasks. And we believe our development roadmap will push us toward that future. And, as we said at the SC2024 conference, we feel confident that we can achieve quantum advantage in the next two years — if we work together with the classical HPC community.

Impact at a national level

When it comes to solving the biggest problems the world is facing, no one company or organization will solve them on their own. It will take collaboration and cooperation, something that nation states can organize. Through 2024, IBM was involved in several major initiatives that are having an impact at a national scale. The success of Rapidus is a critical mission for the nation of Japan, and in 2024, the 2 nm partnership has expanded to include chiplets and advanced packaging, and AI-powered fab automation.

Beyond the major initiative with Rapidus, IBM has been working with several countries to aid them in securing the future of computing in their states.

Phoenix Technologies IBM Vela AI Supercomputer Muenchenstein Basel 24082024 L1002138©Phoenix2024.jpg
The AI system installed at Phoenix Technologies's location in Switzerland. (Courtesy: Phoenix Technologies)

In Switzerland, we worked with Phoenix, the organization building the Swiss sovereign cloud, to install an end-to-end, on-premises cloud-native AI supercomputer, from the system up through the AI platform and software stack. The architecture can scale from dozens to hundreds or thousands of GPUs. It contains IBM breakthroughs, like a flexible, scalable, and affordable RDMA-enabled Ethernet-based network, and a high-performance storage system based on IBM Storage Scale, one of the industry’s fastest and most flexible file systems. The cloud-native AI platform was built using the OpenShift Container Platform and OpenShift AI, with access to watsonx.ai as needed. Our software stack provides tools needed to productively innovate with generative AI. The first phase came online at Phoenix in August and will help power sovereign AI cloud solution, kvant AI, which aims to provide sovereign AI applications for every industry.

We also strengthen our partnership with the Canadian and Quebecois governments through investments in our Bromont facility, solidifying the future of the chip supply chain in North America by advancing the assembly, testing, and packaging capabilities at this vital IBM plant.

Additionally in the semiconductor sphere, the U.S. Department of Commerce announced that the new NSTC EUV Accelerator will be based at the Albany NanoTech Complex, where the biggest breakthroughs in IBM semiconductor research have taken place, including Nanosheet technology, and inventing 2 nm nodes — the world’s smallest process technology for semiconductors currently being manufactured. And in December, this agreement secured an $825 million federal investment under the bipartisan CHIPS and Science Act.

IBM Research Albany Exterior.jpg
The Albany NanoTech Complex will house the new NSTC EUV Accelerator.

We’ve worked to expand quantum computing’s reach across the globe as well. We opened the first IBM Quantum Data Center in Europe in October, with German Chancellor Olaf Scholz in attendance. We also partnered with RIKEN, the Japanese national research laboratory, to install an IBM Quantum System Two at the same location as the Fugaku supercomputer in Kobe. And we’re bringing IBM systems to the republic of Korea, with an IBM Quantum System Two that will be installed alongside an IBM AIU cluster at Korea Quantum Computing, plus we installed the first IBM Quantum System One in the country at Yonsei University in November. We’re also working with President Emmanuel Macron and the French government to invest in quantum technologies in the county.

In each case, the goal is to expand the quantum community in each region, while fostering a new generation of quantum developers who will grow up in a world where they can call upon quantum computing like any other computing asset to solve pressing issues.

In AI, we’re also working with nations to ensure their cultures and languages are not left behind in an industry dominated by the English language. We’re working with the Spanish government to build Spanish-language LLMs for the more than 600 million Spanish speakers across the world. We’ve collaborated with the Saudi Data and Artificial Intelligence Authority (SDAIA) to build and open-source ALLaM, a capable Arabic-language LLM. We’ve also worked with the Kenyan government to use IBM and NASA’s geospatial foundation model to carefully monitor their reforestation program efforts.

Eighty years of inventing what’s next in computing

When Thomas Watson incorporated disparate businesses into IBM in 1924, he saw how automation could reshape the world. Even in those early days of tabulating machines, scales, and punch clocks, the company’s goal was to make products that made work easier. And Watson knew that to continue succeeding as a company, IBM would need to continuously uncover new ideas and design new tools to meet the changing needs of industry.

Watson saw the value in investing in research from the outset. IBM partnered with Columbia University in 1928 to devise new ways to process information. And under the leadership of Columbia professor (and Apollo lunar mission calculator) Wallace Eckert, they founded what would become the predecessor to IBM Research.

The Watson Scientific Computing Laboratory opened on Columbia’s campus in 1945. Unlike other labs with corporate funding, the goal was to push the boundaries of modern science — rather than incrementally improve existing products. It was the first corporate research facility in the US focused solely on science. Over time, the lab grew, and IBM set up a dedicated research division in 1956. It soon became clear that they needed a larger space to explore the burgeoning world of computing. That led to the IBM Research headquarters in Yorktown Heights, New York, which we still occupy today.

Four women and two men at work in the Watson Scientific Computing Laboratory, which was housed on the campus of New York's Columbia University between 1949 and 1959
The early days of the Watson Scientific Computing Laboratory at Columbia University.

This year marks 80 years since those early beginnings of IBM Research. In that time, thousands have come through IBM’s labs to explore fundamental scientific research, which has led to some of the biggest breakthroughs in computing history. Over the years, we have invented the floppy disk, the hard disk drive, the magnetic stripe card, the scanning tunneling microscope, DRAM, relational databases, RISC, Fortran, and the ATM, to name just a few major breakthroughs. We’ve published more than 110,000 research papers, and our researchers have won six Nobel Prizes, ten Medals of Technology, five National Medals of Science, and six Turing Awards.

These achievements are worth celebrating as we look towards the next century of progress at IBM, which we will do over the course of 2025. But as we look back over the last 80 years, it’s difficult not to consider the impact we have made as the research division of a major technology company. With a focus on fundamental research, which we can then apply to real products that solve real problems faced by businesses and governments, we can achieve things that academic institutions and research organizations struggle to do on their own.

From the outset, all of the work that we’ve carried out throughout our history has been in service of creating the next great computing products for IBM and its clients. This year was no exception. But we believe we’re on the precipice of a computing revolution unlike any we’ve seen since the dawn of the industry.

The world is changing rapidly, and challenges are getting increasingly complex. That’s why we’ve invested so much in new paradigms like quantum and neuromorphic computing. The traditional way we have used computers to solve problems won’t cut it anymore. What we see coming is a convergence of the bits of classical computing, the neurons of AI systems, and the qubits. Each of these computing paradigms is groundbreaking on its own. But when they complement each other, the results will be even more transformative. It’s the concept of accelerated discovery that IBM Research has pursued in recent years.

It’s why we built the new Think Lab at our headquarters in Yorktown Heights, New York. An IBM Quantum System Two sits next to a cluster of AIU chips, and when the next version of Z is released, one will be installed in the space as well. We’re starting at home when we think about how we build big things that we believe will shape the future of computing. In our Think Lab, we’ve built a space where researchers can interact with other professionals they may never have encountered before to build things that may have been impossible before. We use our own tools to make our future products better. Our own Granite models are being used for internal agents for HR, IT, and myriad other applications. Our agentic frameworks are being used to build code-building agents for Qiskit. Our AIU cluster was used to help train our Granite models.

In this way of working, we’ve started to see how these combinations can lead to new discoveries even quicker. Much as Thomas Watson did 80 years ago, we are working on the tools that will power the future. This is the challenge before us, as we look ahead to the next 80 years of innovation.