IBM at NeurIPS 2023

  • New Orleans, LA, USA
This event has ended.

About

Neural Information Processing Systems (NeurIPS) is a leading machine learning and computational neuroscience conference. IBM Research is excited to sponsor NeurIPS again this year as a Platinum sponsor.  We invite all attendees to visit us during the event at booth number 1209, from Monday, Dec 11 through Thursday, Dec 14.

We look forward to meeting you and telling you more about our latest work and career opportunities at IBM Research. At our booth we’ll be demoing projects on a broad range of AI topics such as foundation models, trustworthy AI, natural language processing and understanding, knowledge and reasoning, AI automation, human-centered AI, and federated learning.

Presentation times of conference workshops, demos, papers, and tutorials can be found see the agenda section at the bottom of this page. Note: All times are displayed in your local time.

IBM Booth Demo & Staff Schedule

_
Keep up with emerging research and scientific developments from IBM Research. Subscribe to the Future Forward Newsletter.

Read our accepted papers at NeurIPS 2023

Career opportunities

Visit us at the IBM Booth to meet with IBM researchers and recruiters to speak about future job opportunities or 2024 summer internships.

Featured positions to learn more about at NeurIPS:

Full Time Positions:

2024 Internships:



Sign up to be notified of future openings by joining our Talent Network.

Explore all IBM Research openings

Agenda

  • Today enterprises of all sizes operate in very competitive market. To deliver on business expectations, IT environments continuously become more flexible and dynamic. There is contemporary microservices architecture that simplified the scope of software developers, but roles of IT Operations and System Reliability Engineers (SREs) have become even more complex. Today IT environment can generate millions of transactions a day and they can change every few seconds. The sheer scale and dynamic nature of these distributed hybrid environments is difficult to fully comprehend. The gap between IT complexity and the human ability to manage it is widening. This complexity threatens resiliency and reliability. One of the solutions to this problem that already adopted by many organizations is AIOps, employing Artificial Intelligence to assist IT Operations and SREs.      In some cases, SREs analyze incoming events or symptoms before deciding on pursuing investigative actions. Operations or SREs perform problem determination, diagnosis, and resolution based on the symptoms’ information. In the interviews conducted with SREs, they have identified diagnosis as the most difficult task. Being able to troubleshoot a problem and to arrive to a diagnosis is often considered to be an innate skill [1]. There has been a great deal of effort spent on developing methodologies for specifying and reasoning about symptoms/signals provided through monitoring of systems, be they hardware or software.   PyRCA and Merlion libraries, for example, have implementation of methods from recent research in metric-based anomaly detection and root cause analysis. These libraries might be quite helpful for researchers seeking to try these published algorithms. We however develop novel methods, that in our experiments demonstrated to be more powerful in each of these areas. Our probable cause identification is based on Causal Learning method, and for anomaly detection we are using a combination of unsupervised methods. We present a demo of the methods we developed, followed by detailed description and results of evaluation.   Fault propagation depends on the causal relations in the application, i.e., code written by the developers. Learning these relations requires both static and dynamic analysis of the code; however, observability tools in Cloud do not have access to the code and even when the code is available, doing such analysis is difficult due to large heterogeneity in use of programming languages, runtimes, and third-party services. We isolate probable cause through identifying and modeling causal dependencies between components of hybrid applications, including computer environment architectures leveraging request-response paths which are available at runtime. Extracting all the unique paths and their latencies from the collected data, we identify uniquely anomalous path, or pinpoint to missing monitoring data that is missing. In case of missing monitoring data, passive data collection is not sufficient for diagnosis, and we recommend launching on-demand some probes. We formulate this problem as partially observable Markov Decision Process which aims to select minimum set for determining probable cause. We solve this using reinforcement learning (PPO). The approach of combining three unique elements described above is novel based on our understanding (patent pending).   Our anomaly detection will be demonstrated in comparison to the methods in Merlion libraries. Given our targeted user for IT Operations and Observability are not data scientists, and reliable labeled data is extremely limited, we strongly favor unsupervised methods over supervised or semi-supervised methods. Using publicly available SMD dataset we’ll show that the combination of the methods we use could perform as well, and in some case outperform semi-supervised methods in the library.

    Reference to be provided on demand

    Presenter(s): Saurabh Jha

  • Traditional data integration techniques often require complex coding and a deep understanding of data architectures, which can be daunting for non-specialists. In the evolving landscape of AI, there's a growing need for tools that democratize data access and analysis. We present FlowPilot, a novel system that departs from the current one-shot text-to-SQL paradigms that often fail to answer complex queries.

    A key innovation in our work is the automated generation of the training/fine-tuning dataset by leveraging a dynamic set of inputs, including metadata from enterprise catalogs, database schemas, query logs, etc. The generated dataset is then used to fine-tune an LLM tailored for the customer that is able to understand the context of enterprise data by embedding its core knowledge with the relevant schemas, relationships and patterns.

    Flowpilot ensures the mitigation of errors during both the training and inference phases by leveraging the uncertainty estimation for the query validity and alignment with the user intent, also by allowing the model to execute and refine statements in a sandbox environment.

    A coordinator seamlessly integrates fine tuned text-to-SQL, text-to-Python, and text-to-chart models, delivering thorough answers to a spectrum of data-related questions.

    FlowPilot's user-friendly interface comprises three synchronized, AI-powered interactive views: chat, flow, and data. This arrangement provides users with the flexibility to select their preferred mode of interaction with the system throughout their conversation with the databases.

    FlowPilot offers an advanced approach to data integration, utilizing generative AI and a streamlined data pre-processing method. It introduces a novel conversational text-to-SQL feature, aiming to make data access simpler and provide reliable responses, thereby enhancing user interactions with enterprise databases.

    Presenter(s): Enrico Toniato

  • The emergence of foundation models has significantly lowered the barriers to applying AI to everyday problems, transforming the way organizations consume, customize and build AI-enabled applications. We are also seeing the emergence of a new persona, the AI Builder, who is in need of dedicated tooling to harness the power of LLMs while mitigating its associated risks.

    In this demonstration, we present the Big AI Models (BAM) Laboratory, an experimental platform designed to empower AI builders in the Generative AI space. Initially created over a year ago to address the unique challenge of hosting LLMs with 100B+ parameters, the BAM Laboratory has evolved to allow experimentation for thousands of internal AI builders and researchers throughout the AI application development lifecycle.

    Some of its key current areas of incubation include, improving the model selection experience by recommending the right prompt for their use case, driving better alignment of models through tuning on human feedback, and creating AI guardrails to safeguard applications from LLM-related risks (such as Hate/Profanity/Abuse, Hallucination, Social Bias, etc).

    Presenter(s): Maya Murad

  • AI for IT Operations (AIOps) is a powerful platform for Site Reliability Engineers to automate and streamline operational workflows. Automated log analysis, a critical task in AIOps, provides key insights to identify and address faults. Logs can capture a variety of information on an application, giving a deeper view of potential issues and helping to diagnose an ongoing problem. Tasks like format detection, classification, parsing, anomaly detection, and summarization are the key components of automated log analysis. These tasks require supervised learning with massive labeled data; however, there are multiple challenges due to the limited labeled and diverse nature of log data. Large Language Models (LLMs) like BERT and GPT3 are trained using self-supervision on unlabeled data. These models provide generalized representations that can be effectively used for various downstream tasks with limited labeled data. This demo will showcase LLM for log data, BERTOps - a model for AIOps that uses the IBM Slate model as a base. Our experiments demonstrate that BERTOps, when fine-tuned using a limited amount of labeled data (few-shot setting) tailored to each specific AIOps downstream task, surpasses the performance of state-of-the-art transformer models. This underscores its significance as a cost-effective and valuable augmentation to the AIOps platform. We will also show a demo and an interactive user interface that provides a summarized view of the log data and the detected anomalous log windows to help diagnose a fault. The demo uses a framework incorporating the various fine-tuned models on BERTOps. We will also demonstrate why this framework is useful when domain experts are required for log diagnosis in a complex industrial application setting while significantly reducing manual effort and visual overload. The demo will highlight specific use cases and applications of the framework in IBM Software Support, IBM Automation and IBM Consulting.

    Presenter(s): Ruchi Mahindru

  • While Foundation Models (FM) have greatly transformed AI solutions for language and vision, they often fall short in addressing sensor and numerical time-series data, which is widely used in various industries. At IBM Research, our dedicated team focuses exclusively on advancing Time Series foundation models and has made significant contributions with influential papers presented at top AI conferences. Our team has been pioneers in this space where we defined the first inaugural architecture for several popular Time-series FM backbones, including the first transformer for multi-variate time-series representation learning (TST, KDD 21), the first patched time-series transformer (PatchTST, ICLR 23), the first patched MLP-Mixer for time series (TSMixer, KDD 23), and the first multimodal transfer learning for new product time-series forecasting (NPF, KDD 20). Our line of work not only attempts to improve State-of-the-art accuracies (SOTA), but also focuses on achieving it with extremely reduced memory and computing requirements. Our latest Models (PatchTST and TSMixer) are the leading SOTAs in this space with a significant reduction (2-3X) in compute and memory requirements. For effective mindshare and open collaboration, we have released our SOTA models through various open-source channels (500+ stars, 100+ forks, and several blogs written by popular linked/medium influencers). In fact - our SOTA Models like PatchTST are so popular that - within a few months of its open source, they got quickly incorporated into almost all the famous time-series libraries like GluonTS, NeuralForecast, and timeseriesAI(tsai). Our SOTA models (PatchTST and TSMixer) are currently in the process of integrating into the HuggingFace Transformer repository and will be available at the time of the demonstration. In this session, we would like to provide a demo of our SOTA models to a larger scientific community and also showcase interesting applications in diverse industrial settings across electricity, weather, traffic, retail, etc. Through illustrative notebooks and demos, we plan to discuss the best practices and the impact of various modeling approaches, design choices, and hyper-parameters that affect the performance across datasets and use cases from different industries. We will also provide insights on the various pretraining and finetuning workflow templates that we have standardized for various industrial settings to quickly get started. This demo session will be hands-on using our open source libraries and we will release the demo notebooks and associated artifacts for wider use. 

    Presenter(s): Nam Nguyen

  • The fast-increasing complexity of modern IT in multi cloud environments is bringing unprecedented management challenges to Site Reliability Engineers (SREs) to meet Service Level Objectives (SLOs) and keep systems up and running effectively. To put in perspective, an availability SLO of 99.99% allows for 4.3 minutes of downtime per month, hardly something that can be attained by simply reacting to incidents. In this demo, we introduce our approach to address this challenge by transforming ITOps from being reactive to becoming proactive by leveraging large language models and advanced AI capabilities. The main goal of our work is to automate as much as possible the implementation of resolutions for upcoming IT issues before they turn into outages. Our demo consists of three steps: (1) Issue Diagnosis, where we have developed language model based log data representation, built an AI system for probable cause identification using novel causal analysis and reinforcement learning, complemented with LLM-based summarization techniques easing consumption of diagnosis results by SREs and by downstream issue resolution analytics; (2) Action Recommendation, which leverages state-of-the-art generative AI techniques to produce actionable recommendations; (3) Automation, where action recommendation outputs are transformed into code that can be executed to resolve the incidents.

    Presenter(s): Yu Deng

  • Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are actively exploring their potential to automate code translation, i.e., generating code in target PL from its equivalent in another PL. The pre-requisite for advancing the state of LLM-based code translation is to understand their limitations. To that end, we present a large-scale empirical study to investigate the ability of LLMs, including general LLMs and code LLMs, for code translation across pairs of different languages, including C, C++, Go, Java, and Python. Our analysis involves the translation of 1,700 code samples from three distinct benchmarks and real-world projects, revealing LLMs are yet to be reliably used to automate code translation---with incorrect translations ranging from 52.7% to 97.9% across the studied LLMs. Further manual investigation of unsuccessful translations among all PLs identifies 14 root causes for translation bugs. Based on the insights from the empirical study, we propose a prompt-crafting approach to provide additional context for LLMs, improving the performance of LLM-based code translation by 5.5% on average across different PLs, LLMs, and benchmarks. Our study is the first of its kind, in terms of its scale and breadth, that provides insights into the current limitations of LLMs in code translation and opportunities for improving them. Our collected extensive dataset---consisting of 1,700 code samples written in five PLs with 10K+ tests, 43K+ translated code, 1,725 manually labeled bugs, and 1,365 bug-fix pairs generated using LLMs---can help drive research in this area.

    Presenter(s): Rahul Krishna

  • Within enterprises, there is a growing need to intelligently navigate data lakes. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. Example applications of this type of discovery include privacy enforcement and analytical queries that span multiple tables. There are now a number of pretrained models targeting the processing of tabular data, but none that target the data discovery use case in particular. There is also a dearth of benchmark tasks to help build the learning of data discovery tasks for neural tabular models. To help with neural tabular learning of data discovery, we developed a benchmark suite, LakeBench, for a diverse set of data discovery tasks based on government data from CKAN, Socrata, and the European Central Bank. Inspired by what has been shown to work well for data discovery tasks, we also used a novel approach based on data sketches to create a neural model TabSketchFM for data discovery. We contrast the data sketch based approach of TabSketchFM against row based approaches of other models and show that for data discovery tasks, data sketch based approaches are more effective. We examine which specific types of data sketches help which tasks with ablation studies. Finally we perform initial experiments to leverage models such as TabSketchFM in search, showing that they can re-rank and even improve top-k search results of the existing non-neural systems.

    Presenter(s): Kavitha Srinivas & Julian Dolby

  • Traditional data integration techniques often require complex coding and a deep understanding of data architectures, which can be daunting for non-specialists. In the evolving landscape of AI, there's a growing need for tools that democratize data access and analysis. We present FlowPilot, a novel system that departs from the current one-shot text-to-SQL paradigms that often fail to answer complex queries.

    A key innovation in our work is the automated generation of the training/fine-tuning dataset by leveraging a dynamic set of inputs, including metadata from enterprise catalogs, database schemas, query logs, etc. The generated dataset is then used to fine-tune an LLM tailored for the customer that is able to understand the context of enterprise data by embedding its core knowledge with the relevant schemas, relationships and patterns.

    Flowpilot ensures the mitigation of errors during both the training and inference phases by leveraging the uncertainty estimation for the query validity and alignment with the user intent, also by allowing the model to execute and refine statements in a sandbox environment.

    A coordinator seamlessly integrates fine tuned text-to-SQL, text-to-Python, and text-to-chart models, delivering thorough answers to a spectrum of data-related questions.

    FlowPilot's user-friendly interface comprises three synchronized, AI-powered interactive views: chat, flow, and data. This arrangement provides users with the flexibility to select their preferred mode of interaction with the system throughout their conversation with the databases.

    FlowPilot offers an advanced approach to data integration, utilizing generative AI and a streamlined data pre-processing method. It introduces a novel conversational text-to-SQL feature, aiming to make data access simpler and provide reliable responses, thereby enhancing user interactions with enterprise databases.

    Presenter(s): Enrico Toniato

  • The emergence of foundation models has significantly lowered the barriers to applying AI to everyday problems, transforming the way organizations consume, customize and build AI-enabled applications. We are also seeing the emergence of a new persona, the AI Builder, who is in need of dedicated tooling to harness the power of LLMs while mitigating its associated risks.

    In this demonstration, we present the Big AI Models (BAM) Laboratory, an experimental platform designed to empower AI builders in the Generative AI space. Initially created over a year ago to address the unique challenge of hosting LLMs with 100B+ parameters, the BAM Laboratory has evolved to allow experimentation for thousands of internal AI builders and researchers throughout the AI application development lifecycle.

    Some of its key current areas of incubation include, improving the model selection experience by recommending the right prompt for their use case, driving better alignment of models through tuning on human feedback, and creating AI guardrails to safeguard applications from LLM-related risks (such as Hate/Profanity/Abuse, Hallucination, Social Bias, etc).

    Presenter(s): Maya Murad

  • Today enterprises of all sizes operate in very competitive market. To deliver on business expectations, IT environments continuously become more flexible and dynamic. There is contemporary microservices architecture that simplified the scope of software developers, but roles of IT Operations and System Reliability Engineers (SREs) have become even more complex. Today IT environment can generate millions of transactions a day and they can change every few seconds. The sheer scale and dynamic nature of these distributed hybrid environments is difficult to fully comprehend. The gap between IT complexity and the human ability to manage it is widening. This complexity threatens resiliency and reliability. One of the solutions to this problem that already adopted by many organizations is AIOps, employing Artificial Intelligence to assist IT Operations and SREs.      In some cases, SREs analyze incoming events or symptoms before deciding on pursuing investigative actions. Operations or SREs perform problem determination, diagnosis, and resolution based on the symptoms’ information. In the interviews conducted with SREs, they have identified diagnosis as the most difficult task. Being able to troubleshoot a problem and to arrive to a diagnosis is often considered to be an innate skill [1]. There has been a great deal of effort spent on developing methodologies for specifying and reasoning about symptoms/signals provided through monitoring of systems, be they hardware or software.   PyRCA and Merlion libraries, for example, have implementation of methods from recent research in metric-based anomaly detection and root cause analysis. These libraries might be quite helpful for researchers seeking to try these published algorithms. We however develop novel methods, that in our experiments demonstrated to be more powerful in each of these areas. Our probable cause identification is based on Causal Learning method, and for anomaly detection we are using a combination of unsupervised methods. We present a demo of the methods we developed, followed by detailed description and results of evaluation.   Fault propagation depends on the causal relations in the application, i.e., code written by the developers. Learning these relations requires both static and dynamic analysis of the code; however, observability tools in Cloud do not have access to the code and even when the code is available, doing such analysis is difficult due to large heterogeneity in use of programming languages, runtimes, and third-party services. We isolate probable cause through identifying and modeling causal dependencies between components of hybrid applications, including computer environment architectures leveraging request-response paths which are available at runtime. Extracting all the unique paths and their latencies from the collected data, we identify uniquely anomalous path, or pinpoint to missing monitoring data that is missing. In case of missing monitoring data, passive data collection is not sufficient for diagnosis, and we recommend launching on-demand some probes. We formulate this problem as partially observable Markov Decision Process which aims to select minimum set for determining probable cause. We solve this using reinforcement learning (PPO). The approach of combining three unique elements described above is novel based on our understanding (patent pending).   Our anomaly detection will be demonstrated in comparison to the methods in Merlion libraries. Given our targeted user for IT Operations and Observability are not data scientists, and reliable labeled data is extremely limited, we strongly favor unsupervised methods over supervised or semi-supervised methods. Using publicly available SMD dataset we’ll show that the combination of the methods we use could perform as well, and in some case outperform semi-supervised methods in the library.

    Reference to be provided on demand

    Presenter(s): Saurabh Jha

  • AI for IT Operations (AIOps) is a powerful platform for Site Reliability Engineers to automate and streamline operational workflows. Automated log analysis, a critical task in AIOps, provides key insights to identify and address faults. Logs can capture a variety of information on an application, giving a deeper view of potential issues and helping to diagnose an ongoing problem. Tasks like format detection, classification, parsing, anomaly detection, and summarization are the key components of automated log analysis. These tasks require supervised learning with massive labeled data; however, there are multiple challenges due to the limited labeled and diverse nature of log data. Large Language Models (LLMs) like BERT and GPT3 are trained using self-supervision on unlabeled data. These models provide generalized representations that can be effectively used for various downstream tasks with limited labeled data. This demo will showcase LLM for log data, BERTOps - a model for AIOps that uses the IBM Slate model as a base. Our experiments demonstrate that BERTOps, when fine-tuned using a limited amount of labeled data (few-shot setting) tailored to each specific AIOps downstream task, surpasses the performance of state-of-the-art transformer models. This underscores its significance as a cost-effective and valuable augmentation to the AIOps platform. We will also show a demo and an interactive user interface that provides a summarized view of the log data and the detected anomalous log windows to help diagnose a fault. The demo uses a framework incorporating the various fine-tuned models on BERTOps. We will also demonstrate why this framework is useful when domain experts are required for log diagnosis in a complex industrial application setting while significantly reducing manual effort and visual overload. The demo will highlight specific use cases and applications of the framework in IBM Software Support, IBM Automation and IBM Consulting.

    Presenter(s): Ruchi Mahindru

  • The fast-increasing complexity of modern IT in multi cloud environments is bringing unprecedented management challenges to Site Reliability Engineers (SREs) to meet Service Level Objectives (SLOs) and keep systems up and running effectively. To put in perspective, an availability SLO of 99.99% allows for 4.3 minutes of downtime per month, hardly something that can be attained by simply reacting to incidents. In this demo, we introduce our approach to address this challenge by transforming ITOps from being reactive to becoming proactive by leveraging large language models and advanced AI capabilities. The main goal of our work is to automate as much as possible the implementation of resolutions for upcoming IT issues before they turn into outages. Our demo consists of three steps: (1) Issue Diagnosis, where we have developed language model based log data representation, built an AI system for probable cause identification using novel causal analysis and reinforcement learning, complemented with LLM-based summarization techniques easing consumption of diagnosis results by SREs and by downstream issue resolution analytics; (2) Action Recommendation, which leverages state-of-the-art generative AI techniques to produce actionable recommendations; (3) Automation, where action recommendation outputs are transformed into code that can be executed to resolve the incidents.

    Presenter(s): Yu Deng

  • While Foundation Models (FM) have greatly transformed AI solutions for language and vision, they often fall short in addressing sensor and numerical time-series data, which is widely used in various industries. At IBM Research, our dedicated team focuses exclusively on advancing Time Series foundation models and has made significant contributions with influential papers presented at top AI conferences. Our team has been pioneers in this space where we defined the first inaugural architecture for several popular Time-series FM backbones, including the first transformer for multi-variate time-series representation learning (TST, KDD 21), the first patched time-series transformer (PatchTST, ICLR 23), the first patched MLP-Mixer for time series (TSMixer, KDD 23), and the first multimodal transfer learning for new product time-series forecasting (NPF, KDD 20). Our line of work not only attempts to improve State-of-the-art accuracies (SOTA), but also focuses on achieving it with extremely reduced memory and computing requirements. Our latest Models (PatchTST and TSMixer) are the leading SOTAs in this space with a significant reduction (2-3X) in compute and memory requirements. For effective mindshare and open collaboration, we have released our SOTA models through various open-source channels (500+ stars, 100+ forks, and several blogs written by popular linked/medium influencers). In fact - our SOTA Models like PatchTST are so popular that - within a few months of its open source, they got quickly incorporated into almost all the famous time-series libraries like GluonTS, NeuralForecast, and timeseriesAI(tsai). Our SOTA models (PatchTST and TSMixer) are currently in the process of integrating into the HuggingFace Transformer repository and will be available at the time of the demonstration. In this session, we would like to provide a demo of our SOTA models to a larger scientific community and also showcase interesting applications in diverse industrial settings across electricity, weather, traffic, retail, etc. Through illustrative notebooks and demos, we plan to discuss the best practices and the impact of various modeling approaches, design choices, and hyper-parameters that affect the performance across datasets and use cases from different industries. We will also provide insights on the various pretraining and finetuning workflow templates that we have standardized for various industrial settings to quickly get started. This demo session will be hands-on using our open source libraries and we will release the demo notebooks and associated artifacts for wider use. 

    Presenter(s): Nam Nguyen

Upcoming events