A prominent exploration area is to allow large -language models (LLM) to operate in collaboration. Multi-agent systems powered by LLM are now examined for their potential to coordinate difficult problems by dividing tasks and working simultaneously. This management drew attention due to its potential to increase efficiency and reduce latency in real -time applications.
A current problem in collaborative LLM systems is sequential communication and based on agents. In such systems, each agent must wait until others finish their reasoning stages before continuing. This slows down treatment, especially in situations requiring rapid responses. In addition, agents often reproduce efforts or generate incoherent results, as they cannot see the evolution of their peers during the generation. This latency and this redundancy reduce the practicality of the deployment of multi-agent LLM, especially when time and calculation are limited, such as EDGE devices.
Most of the current solutions have relied on sequential or independent parallel sampling techniques to improve reasoning. Methods such as the reflection chain inviting models help solve problems in a structured manner but often come with increased inference time. Approaches such as the reflective tree and the graph of reflection develop by branching the paths of reasoning. However, these approaches still do not allow real -time mutual adaptation between agents. Multi-agent configurations have explored collaborative methods, but above all via exchanges of alternating messages, which again introduces delays. Some advanced systems offer complex dynamic planning or roles -based configurations, which are not optimized for effective inference.
The search for Mediatek Research introduced a new method called Group Think. This approach allows several reasoning agents within a single LLM to operate simultaneously, observing the partial outputs of the other at the level of the token. Each reasoning thread adapts to the evolutionary thoughts of others in the middle of generation. This mechanism reduces duplication and allows agents to move management if another thread is better positioned to continue a specific reasoning line. Group Think is implemented through an attention mechanism at the tokens level which allows each agent to take care of tokens generated previously of all agents, supporting collaboration in real time.
The method works by attributing to each agent its own sequence of token clues, allowing their outings to be intertwined in memory. These intertwined tokens are stored in a shared cache accessible to all agents during generation. This design allows effective attention through reasoning wires without architectural modifications to the transformer model. The implementation works on both personal devices and in data centers. On local devices, he actually uses an inactive calculation by launching several agent outputs, even with a lot size of one. In data centers, Group Think makes it possible to process several requests together, in the intertwining of the tokens between the agents while maintaining a dynamic of correct attention.
Performance tests show that the group thinks considerably improves latency and exit quality. In enumeration tasks, such as the list of 100 separate names, he obtained almost complete results faster than the conventional approaches to the chain of thoughts. Acceleration was proportional to the number of thinkers; For example, four thinkers have reduced the latency of a factor of about four. In the problems of division and conquered, using the Floyd – Warshall algorithm on a graph of five nodes, four thinkers have reduced the completion time to half that of a single agent. The group thinks that code generation challenges have more effectively resolved programming tasks than reference models. With four or more thinkers, the model has produced correct code segments much faster than traditional reasoning models.
This research shows that existing LLMs, although they are not explicitly trained in collaboration, can already demonstrate emerging group reasoning behavior within the framework of the Think Configuration Group. In experiences, agents have naturally diversified their work to avoid redundancy, often dividing tasks by subject or area of concentration. These results suggest that the efficiency and sophistication of Group Think could be further strengthened with dedicated training on collaborative data.
Discover the Paper. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our 95K + ML Subdreddit and subscribe to Our newsletter.
Nikhil is an intern consultant at Marktechpost. It pursues a double degree integrated into materials at the Indian Kharagpur Institute of Technology. Nikhil is an IA / ML enthusiast who is still looking for applications in fields like biomaterials and biomedical sciences. With a strong experience in material science, he explores new progress and creates opportunities to contribute.
