Programmers can now use large languages models (LLM) to generate IT code faster. However, this makes life easier for programmers if this code follows the rules of the programming language and does not cause a computer.
Certain methods exist to ensure that LLMs comply with the rules of the language in which they generate text, but many of these methods distort the desired meaning of the model or are too long to be possible for complex tasks.
A new approach developed by MIT researchers and elsewhere automatically guides an LLM to generate text which adheres to the rules of relevant language, as a particular programming language, and is also without error. Their method allows an LLM to allocate efforts to outings that are most likely to be valid and precise, while rejecting non -promising outings at the start of the process. This probabilistic approach increases calculation efficiency.
Due to these efficiency gains, the architecture of researchers allowed small LLM to surpass much more important models by generating precise and correctly structured outputs for several cases of use of the real world, including molecular biology and robotics.
In the long term, this new architecture could help non-experts control the content generated by AI. For example, it could allow you to write complex queries in SQL, a language for manipulation of the database, using only invites in natural language.
“This work has implications beyond research. It could improve programming assistants, analysis of data fueled by AI and scientific discovery tools, ensuring that the results generated by AI remain both useful and correct, “explains João Loula, a student graduated from the MIT and co-author of an article on this framework.
Loula is joined on the article by the co-authors of the co-publishing Benjamin Lebrun, research assistant at the Mila-Quebec Artificial Intelligence Institute, and Li of the graduate at John Hopkins University; Authors Co-Senior Vikash Mansinghka '05, Meng '09, Phd '09, principal researcher and leader of the probabilistic IT project in the Brain MIT department and cognitive sciences; Alexander K. Lew SM '20, assistant professor at the University of Yale; Tim Vieira, a post-doctoral student at Eth Zurich; and Timothy J. O'Donnell, associate professor at McGill University and President of Canada Cifar AI in Mila, who managed the international team; as well as several others. Research will be presented at the international conference on representations of learning.
Apply structure and meaning
A common approach to control the structured text generated by LLMS consists in checking an entire output, such as a block of computer code, to ensure that it is valid and will run without error. Otherwise, the user must start again, accumulating IT resources.
On the other hand, a programmer was able to stop to check the output along the way. Although this may ensure that the code adheres to the programming language and is structurally valid, the increasing correction of the code can make it derive from the meaning that the user wanted, harming its long -term precision.
“It is much easier to enforce the structure than meaning. We can quickly check if something is in the right programming language, but to check its meaning, you need to execute the code. Our work also consists in processing these different types of information, ”explains Loula.
The approach of researchers involves engineering knowledge in the LLM to direct it to the most promising outings. These outputs are more likely to follow the structural constraints defined by a user and to have the meaning that the user hears.
“We are not trying to form an LLM to do this. Instead, we manage knowledge that an expert would have and combine it with the knowledge of the LLM, which offers a very different approach to the scaling that you do not see in depth,” adds Mansinghka.
They accomplish this using a technique called Monte Carlo sequential, which allows a parallel generation of an LLM to compete with each other. The model dynamically allocates the resources to different parallel calculation threads depending on the promise of their release.
Each outing receives a weight which represents its probability of being structurally valid and semantically precise. At each stage of the calculation, the model focuses on those with higher weights and throws the rest.
In a sense, it is as if the LLM had an expert by looking over his shoulder to make sure that he makes the right choices at each stage, while keeping him focused on the overall objective. The user specifies the desired structure and meaning, as well as how to check the output, then the architecture of researchers guides the LLM to do the rest.
“We have developed hard mathematics so that, for all kinds of constraints you want to integrate, you will get the appropriate weights. In the end, you get the right answer, ”explains Loula.
Small models boosting
To test their approach, they applied the framework to LLMS responsible for generating four types of outputs: Python code, SQL database requests, molecular structures and plans for a robot to follow.
Compared to existing approaches, the researchers' method has worked more precisely while requiring less calculation.
In the generation of Python code, for example, the architecture of researchers allowed a small open source model to surpass a specialized and commercial model of a closed source which is more than double of its size.
“We are very delighted that we can allow these small models to hit well above their weight,” explains Loula.
In the future, researchers want to use their technique to control larger pieces of text generated, rather than working a small room at a time. They also want to combine their method with learning, so when they control the outputs that a model generates, he learns to be more precise.
In the long term, this project could have wider applications for non -technical users. For example, it could be combined with systems for Automated data modelingAnd Question generative database models.
The approach could also allow machine -assisted data analysis systems, where the user can converse with software that precisely models the meaning of the data and the questions posed by the user, adds Mansinghka.
“One of the fundamental questions of linguistics is how the meaning of words, sentences and sentences can be based on world models, taking into account uncertainty and imprecision in the sense and reference. LLMS, predicting probable tokens sequences, does not solve this problem. Our article shows that, in narrow symbolic fields, it is technically possible for the science of words, distributions on the founded meaning. Linguistics and artificial intelligence had to understand how machines can communicate on the world like us, ”explains O'Donnell.
This research is funded and supported, in part, by the Canada Cifar AI Chairs program, MIT Quest for Intelligence and convergent research.
