IBM’s Uncertainty Quantification 360 toolkit boosts trust in AI
Building on the socially responsible tradition of other open source efforts released by IBM Research in the field of trustworthy AI, UQ360 is the first comprehensive toolkit of its kind.
Building on the socially responsible tradition of other open source efforts released by IBM Research in the field of trustworthy AI, UQ360 is the first comprehensive toolkit of its kind.
Would you feel safe in a self-driving car that confidently misidentifies the side of a tractor-trailer as a brightly lit sky and refuses to brake or warn the human driver? Probably not.
Unfortunately, such mishaps have indeed taken lives.1 AI systems based on deep learning have a reputation for making overconfident predictions, even when they are wrong—with serious consequences at times.
This is where Uncertainty Quantification (UQ) comes in—the tech enabling an AI to express that it is unsure, giving it intellectual humility and boosting the safety of its deployment. And this is what our Uncertainty Quantification 360 (UQ360) open-source toolkit is all about.
Released at the 2021 IBM Data & AI Digital Developer Conference, it’s aimed at giving data science practitioners and developers state-of-the-art algorithms to streamline the process of quantifying, evaluating, improving, and communicating uncertainty of machine learning models.
Building on the socially responsible tradition of other open source efforts released by IBM Research in the field of trustworthy AI — AI Fairness 360, AI Explainability 360, Adversarial Robustness 360, AI FactSheets 360 — it is the first comprehensive toolkit of its kind.
We invite you to use it and contribute to it.
It’s not just about self-driving cars. There are many other applications where it is safety-critical for AI to express uncertainty. For example, a chatbot unsure when a pharmacy closes and providing a wrong answer may result in a patient not getting the medication they need.
Take sepsis. Many people die each year from complications of this disease. Early detection of sepsis is important, and AI can help – but only when AI predictions are accompanied by meaningful uncertainty estimates. Only then can doctors immediately treat patients that AI has confidently flagged as at risk and prescribe additional diagnostics for those that an AI has expressed a low level of certainty about. If the model produces unreliable uncertainty estimates, patients may die.
Common explainability techniques shed light on how AI works, but UQ exposes limits and potential failure points. Users of a house price prediction model would like to know the margin of error of the model predictions to estimate their gains or losses. Similarly, a product manager may notice that an AI model predicts a new feature A will perform better than a new feature B on average, but to see its worst-case effects on KPIs, the manager would also need to know the margin of error in the predictions.
High-quality uncertainty estimates and effective uncertainty communication can also improve human-AI collaboration.
Consider the following scenario: a nurse practitioner uses an AI system to help diagnose skin disease. If the AI's confidence (the uncertainty estimate for a given diagnosis) is high, the nurse practitioner accepts the AI decision; otherwise, the AI recommendation is discarded, and the patient is referred to a dermatologist. Uncertainties serve as a form of communication between the AI system and the human user to achieve the best accuracy, robustness, and fairness.
And then there is drug and material design.
UQ could mean the difference between discovering a new material in months rather than years. By conducting experiments where the uncertainty is greatest, we have reduced the number of experiments in the material discovery process.
AI is also a promising tool for pre-screening drug candidates prior to laboratory and real-world testing. The drug candidates that receive a high confidence score from the AI system advance to the next stage. As a result, UQ reduces the time, money, and effort expended for AI-assisted drug discovery without missing potentially life-saving candidates.
The choice of a UQ method depends on many factors: the underlying model, type of machine learning task (regression vs. classification), characteristics of the data, and the user’s goal. Sometimes a chosen UQ method may not produce high-quality uncertainty estimates and could mislead users. Therefore, it is crucial for model developers to always evaluate the quality of UQ, and improve the quantification quality if necessary, before deploying an AI system.
This is where UQ360 can help. With the toolkit, you’ve got the estimation, measurement, improvement and communication of UQ all in one place.
As a first open-source toolkit of its kind, UQ360 provides a comprehensive set of algorithms to quantify uncertainty, as well as capabilities to measure and improve UQ to streamline the development process. We provide a taxonomy and guidance for choosing these capabilities based on your needs.
Also, a high-quality UQ estimate needs to be effectively communicated. UQ360 makes the communication method of UQ an integral part of development choices in an AI lifecycle. For every UQ algorithm provided in the UQ360 Python package, a developer can make a choice of an appropriate style of communication by following our guidance on communicating UQ estimates, from concise descriptions to detailed visualizations.
And UQ360 is not just a Python package. We developed it with the hope of making it a universal platform for transparently communicating uncertainties and limitations of AI. For that, we have created an interactive experience that provides a gentle introduction to producing high-quality UQ and ways to use UQ in a house price prediction application. We've also created a number of in-depth tutorials to demonstrate how to utilize UQ across the AI lifecycle.
The toolkit has been engineered with a common interface for all the different UQ capabilities and is extensible to accelerate innovation by the community advancing trustworthy and responsible AI. Beyond the initial release, we encourage contributions of other algorithms and techniques from the broader research community.
We are open sourcing it to help create a community of practice for researchers, data scientists and other practitioners that need to understand or communicate the limitations of algorithmic decisions.
Uncertainty Quantification: We’re developing ways to foster and streamline the common practices of quantifying, evaluating, improving, and communicating uncertainty in the AI application development lifecycle.
References
-
McAllister, R., Yarin Gal, Y., Kendall, A., et al. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence AI and autonomy track. 4745-4753. (2017). ↩