Teaching AI To Say "I Don't Know": A New Set Of Data Reduces Hallucinations Of Strengthening Finetuning

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

The reinforcement finish uses reward signals to guide Great language model towards desirable behavior. This method sharpens the model's ability to produce logical and structured outputs by strengthening correct answers. However, the challenge persists in ensuring that these models also know when not to answer, in particular in the face of incomplete or misleading questions which have no final answer.

The problem arises when linguistic models, after strengthening the finality, are starting to lose their ability to refuse to respond to unclear or ambiguous requests. Instead of reporting uncertainty, models tend to produce answers indicated with confidence but incorrect. This phenomenon, identified in the document as the “hallucination tax”, highlights an increasing risk. Since the models are trained to perform better, they can also become more likely to hallucinate responses in situations where silence would be more appropriate. This is particularly dangerous in areas that require high confidence and precision.

Teaching AI to say "I don't know": a new set of data reduces hallucinations of strengthening finetuning

The tools currently used in the formation of large language models often neglect the importance of refusal behavior. Strengthening funding executives tend to reward only correct answers while penalizing incorrect responses, ignoring cases where a valid response should not be an answer. The reward systems used do not reinforce the refusal enough, which leads to too confident models. For example, the document shows that the refusal rates fell to zero on several models after standard RFT, demonstrating that current training cannot properly resolve hallucinations.

Researchers from the University of Southern California have developed the unanswered mathematics synthetic sets (SUM). SUM introduces mathematical problems implicitly unanswered by modifying existing questions through criteria such as missing key information or the creation of logical inconsistencies. The researchers used deepscaling as a basic data set and used the O3-Mini model to generate unanswered high quality questions. This synthetic data set aims to teach models to recognize when a problem lacks sufficient information and respond accordingly.

The basic SUM technique is to mix responsible and unanswered problems during training. The questions are modified to become ambiguous or insoluble while maintaining plausibility. The training invites the models to say “I don't know” for unanswered entries. By introducing only 10% of SUM data into the strengthening of finetuning, the models are starting to take advantage of the reasoning of inference time to assess uncertainty. This structure allows them to refuse the responses more appropriately without altering their performance on resolution problems.

Performance analysis shows significant improvements. After training with SUM, the QWEN2.5-7B model increased its refusal rate from 0.01 to 0.73 on the SUM reference and from 0.01 to 0.81 on the UMWP reference. On the Selfaware data set, the accuracy of the refusal increased from 0.01 to 0.94. LLAMA-3.1-8B-Instruct showed a similar trend, the refusal rates going from 0.00 to 0.75 on the Somme and from 0.01 to 0.79 on UMWP. Despite these gains in refusal behavior, the accuracy of responsible data sets, such as GSM8K and Math -500, remained stable, most of the changes ranging from 0.00 to -0.05. The minimum decline indicates that refusal training can be introduced without major sacrifices in the performance of tasks.

This study describes a clear compromise between improving reasoning and reliability. The strengthening of firming, although powerful, tends to remove cautious behavior. The SUM data set corrects this by teaching models to recognize what they cannot resolve. With only a small addition to training data, language models become better to identify the limits of their knowledge. This approach marks an important step in the manufacture of AI systems not only smarter but also more prudent and honest.

Discover the Paper And Face data set. All the merit of this research goes to researchers in this project.

🆕 Did you know? Marktechpost is the fastest growth media platform – injured by more than a million monthly readers. Book a strategy call to discuss the objectives of your campaign. Also, don't hesitate to follow us Twitter And don't forget to join our 95K + ML Subdreddit and subscribe to Our newsletter.

Nikhil is an intern consultant at Marktechpost. It pursues a double degree integrated into materials at the Indian Kharagpur Institute of Technology. Nikhil is an IA / ML enthusiast who is still looking for applications in fields like biomaterials and biomedical sciences. With a strong experience in material science, he explores new progress and creates opportunities to contribute.

Leave a Comment Cancel reply

Join our community

LEARNOPOLY

Categories

Popular

About

Teaching AI to say “I don’t know”: a new set of data reduces hallucinations of strengthening finetuning