Automated System Guides Users on When to Collaborate with an AI Assistant

   Artificial intelligence (AI) models are great at spotting patterns in images, often outperforming human eyes. However, there's a challenge – when should a medical professional, like a radiologist, trust the AI's insights and when should they not?

Researchers at MIT and the MIT-IBM Watson AI Lab have a solution: a tailored onboarding process that guides users on when to collaborate with an AI assistant. This process is especially crucial in scenarios like a radiologist using AI to detect pneumonia in X-rays.

The system they've developed identifies instances where the radiologist might incorrectly trust the AI's advice. It then autonomously learns collaboration rules and expresses them in natural language. During onboarding, the radiologist practices working with the AI through training exercises based on these rules, getting feedback on both her and the AI's performance.

Results showed a notable 5 percent boost in accuracy when humans and AI collaborated on image prediction tasks following this onboarding process. It's worth noting that merely instructing the user when to trust the AI without the training exercises led to poorer performance.

Crucially, the researchers' system is entirely automated, capable of tailoring the onboarding process based on the specific task performed by the human and AI. This adaptability means it can be applied across various scenarios where humans and AI collaborate, such as social media content moderation, writing, and programming.

Hussein Mozannar, a graduate student at the Institute for Data, Systems, and Society, emphasizes the need for such onboarding in AI tool usage. He notes that, unlike other tools that typically come with tutorials, AI tools often lack this guidance. The researchers believe their approach addresses this gap from both a methodological and behavioral standpoint.

Looking ahead, they envision onboarding becoming a vital component of training for medical professionals. Senior author David Sontag, a professor at MIT, suggests that doctors relying on AI for treatment decisions may undergo similar training. This shift could impact everything from medical education to the design of clinical trials, reflecting the evolving landscape of human-AI collaboration.

The paper detailing this training process is set to be presented at the Conference on Neural Information Processing Systems.

Training that evolves

Traditional onboarding methods for human-AI collaboration often rely on materials created by human experts for specific situations, limiting their scalability. Additionally, some methods use explanations where the AI communicates its confidence in each decision. However, research suggests that these explanations are not consistently helpful, according to Mozannar.

Given that AI models and user perceptions are continuously evolving, there's a need for a training approach that adapts over time. To address this, the researchers developed an onboarding method that autonomously learns from data. The process starts by creating a dataset with numerous instances of a task, like identifying a traffic light in a blurry image.

The system collects data on how humans and AI collaborate in performing this task, embedding the information into a latent space where similar data points are grouped together. Using an algorithm, the system identifies regions in this space where human-AI collaboration is incorrect, indicating instances where the human wrongly trusted the AI's prediction and vice versa.

Once these regions are identified, a second algorithm employs a large language model to articulate each region as a rule in natural language. The algorithm refines these rules iteratively by finding contrasting examples. For instance, a rule might evolve to "ignore AI when it is a highway during the night."

These rules then form the basis for training exercises. The onboarding system presents examples to the human, such as a blurry highway scene at night, along with the AI's prediction. The user responds by indicating whether the image contains traffic lights, with options to agree or disagree with the AI's prediction.

If the human's response is incorrect, the correct answer is revealed along with performance statistics for both the human and AI on those task instances. This process is repeated for each region, and at the end of the training, the exercises that the human answered incorrectly are revisited.

Mozannar explains, "After that, the human has learned something about these regions that we hope they will take away in the future to make more accurate predictions." This data-driven onboarding method aims to provide an adaptive and effective approach to improving collaboration between humans and AI over time.

Onboarding boosts accuracy

The researchers conducted tests on their system, involving users in two tasks: detecting traffic lights in blurry images and answering multiple-choice questions spanning various domains like biology, philosophy, and computer science.

Users were presented with a card detailing the AI model, its training process, and a breakdown of its performance across different categories. The participants were divided into five groups for other experimental conditions. Some only viewed the card, some underwent the researchers' onboarding process, some followed a baseline onboarding procedure, some went through the researchers' onboarding process with AI trust recommendations, and others received only the advice.

Surprisingly, without recommendations, the researchers' onboarding process significantly improved users' accuracy, enhancing their performance in predicting traffic lights by about 5 percent without causing a slowdown. However, the effectiveness of onboarding was less pronounced for the question-answering task. The researchers attribute this to ChatGPT, the AI model, which already provides explanations for each answer, indicating trustworthiness.

On the flip side, providing recommendations without onboarding had adverse effects. Users not only performed poorly but also took more time to make predictions. According to Mozannar, giving suggestions alone may lead to confusion and hinder decision-making, as people may feel uncomfortable being told what to do.

Mozannar notes that recommendations alone could be detrimental if they are incorrect. In contrast, the primary limitation of onboarding lies in data availability. If there isn't enough data, the onboarding process may not be as effective.

Looking ahead, the researchers aim to conduct more extensive studies to assess the short- and long-term impacts of onboarding. They also plan to explore using unlabeled data for the onboarding process and develop methods to effectively reduce the number of regions without excluding important examples.

Dan Weld, a professor emeritus at the University of Washington, emphasizes the importance of helping humans discern when it's safe to rely on AI suggestions. He praises Mozannar et al.'s innovative method for identifying trustworthy AI situations and communicating them effectively to enhance human-AI interactions.

This research is partially funded by the MIT-IBM Watson AI Lab.