Maximization of internal consistency (ICM): a training frame without label and not supervised for LLMS

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

Post-training methods for pre-formulated language models (LMS) depend on human supervision by demonstrations or preferably comments to specify desired behaviors. However, this approach faces critical limits as tasks and model behaviors become very complex. Human supervision is not reliable in these scenarios, because LMs learn to imitate errors in demonstrations or to exploit the defects inherent in feedback systems. The basic challenge lies in the formation of LMS for tasks that go beyond reliability human capacity in demonstrations or assessments. Recent research has identified various modes of failure, including the reward for supervision signals designed by man or real humans themselves.

Limits of human supervision in the LLM after training

The researchers explored several approaches to evolve beyond human supervision. A standard method uses high -quality verifiable rewards, such as the modeling of the model outputs with ground verification solutions in mathematical fields. Despite the evidence that pre-formed basic models have strong latent capacities for downstream tasks, with post-training adding minimum improvements, effective elicitation remains difficult. The coherent contrast research method (CCS) is an unleanished elicitation approach that uses logical consistency to find latent knowledge without supervision. However, the CCS underforms supervised approaches and often cannot identify knowledge due to other important characteristics satisfying consistency properties.

Introduction of the maximization of internal consistency (ICM)

Anthropic researchers, Schmidt Sciences, Independent, Constellation, New York University and George Washington University have proposed the maximization of internal coherence (ICM), which refines pre-formulated models on their own labels generated without using labels provided. ICM solves this by looking for sets of labels which are both logically coherent and mutually predictable according to the pre-formulated model. Since the optimal identification of the set of labels remains impregnated with calculation, ICM uses a research algorithm inspired by simulated receipt to approximate the maximum objective. In addition, this method corresponds to the performance of training on Golden labels on real and GSM8K, and surpasses training on crowdsourcée human labels on alpaca.

How the ICM algorithm works

The ICM algorithm follows an iterative process in three stages: (a) the system samples a new example not labeled from the data set for a potential inclusion, (b) It determines the optimal label for this example while solving simultaneously if a new logical example based on the score function. The ICM is evaluated in three data sets: Trueqa for the evaluation of veracity, GSM8K verification for mathematical accuracy and alpaca for aid and inranda. The researchers used four baselines in their experiences: zero-shot, zero-shot (cat), golden label and human label. In addition, the experiments used two open models, Llama 3.1 8b and 70b, and two proprietary models: Claude 3 Haiku and Claude 3.5 Haiku.

Reference performance and model comparisons

In the elicitation tasks of superhuman capacities, the ICM corresponds to the precision of the golden supervision at 80%, surpassing the estimated human precision of 60%. Using the reward models generated by the ICM, the researchers managed to form an assistant chatbot without human supervision. The unopensed reward model reaches a precision of 75.0% on the reward bench, compared to 72.2% for human higher alternatives formed on production data. In addition, using both the RM not supervised and supervisingly supervised, two policies are formed with RL to create useful, harmless and honest assistants. The policy formed with the not supervised RM reaches a 60%victory rate. However, these policies are still lagging behind the Claude 3.5 Haiku published by publication, which reaches 92%victory rates.

Conclusion and future perspectives

This article introduces the maximization of internal consistency (ICM), an increase in LM not supervised for pre-formed models on self-generated labels. The method systematically corresponds to the golden supervision performance and exceeds human supervision of Crowdsourced through the GSM8K, true and alpaca reward models. However, the limits of the ICM include dependence on the salience of the concept in pre-formed models and ineffectiveness with long entrances due to context window constraints. As the LMS progresses beyond human evaluation capacities, the ICM offers promising alternatives to traditional RLHF, ensuring an alignment of the model with human intention without limits of human supervision.


Discover the Paper. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our Subseubdredit 100k + ml and subscribe to Our newsletter.


Sajjad Ansari is a last year's first year of the Kharagpur Iit. As a technology enthusiast, he plunges into AI's practical applications by emphasizing the understanding of the impact of AI technologies and their real implications. It aims to articulate complex AI concepts in a clear and accessible way.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.