The coordination of complex interactive systems, whether different modes of transport in a city or different components that must work together to make an efficient and efficient robot, is an increasingly important subject for software designers. From now on, MIT researchers have developed a whole new way of approaching these complex problems, using simple diagrams as a tool to reveal better approaches to the optimization of software in deep learning models.
They say that the new method makes the abandonment of these complex tasks so simple that it can be reduced to a drawing that would adapt to the back of a towel.
The new approach is described in the newspaper Automatic learning research transactionsIn an article by the new doctoral student Vincent Abbott and Professor Gioele Zardini of the MIT for information laboratory and decision -making systems (LIDS).
“We have designed a new language to talk about these new systems,” says Zardini. This new “language” based on the diagram is strongly based on something called category theory, he explains.
All this has to do with the design of the underlying architecture of computer algorithms – programs that will eventually detect and control the different parts of the optimized system. “The components are different elements of an algorithm, and they must talk to each other, exchange information, but also take into account energy consumption, memory consumption, etc.”. Such optimizations are notoriously difficult because each change in a part of the system can in turn cause changes in other parts, which can still affect other parts, etc.
The researchers decided to focus on the particular class of in -depth learning algorithms, which are currently a burning subject of research. Deep learning is the basis of large models of artificial intelligence, including large language models such as Chatppt and models of image generation such as Midjourney. These models handle data by a “deep” series of matrix multiplications interspersed with other operations. The numbers in the matrices are parameters and are updated during the long training series, allowing you to find complex models. The models are made up of billions of parameters, which makes the calculation expensive, and therefore improving the use of resources and optimization invaluable.
The diagrams can represent the details of parallelized operations that the in -depth learning models consist, revealing the relationships between algorithms and the equipment of the parallelized graphic processing unit (GPU) on which they operate, supplied by companies such as Nvidia. “I am very excited about this,” explains Zardini, because “we seem to have found a language that describes very well the in -depth learning algorithms, explicitly representing all the important things, which are the operators that you use”, for example energy consumption, the distribution of memory and any other parameter for which you are trying to optimize.
A large part of progress within deep learning comes from optimizations of the effectiveness of resources. The latest Deepseek model has shown that a small team can compete with the best models of Openai and other large laboratories by focusing on the effectiveness of resources and the relationship between software and hardware. As a general rule, by deriving these optimizations, he says that “people need a lot of tests and errors to discover new architectures.” For example, a widely used optimization program called flashatting has taken over four years to develop, he said. But with the new setting they have developed, “we can really approach this problem in a more formal way.” And all this is visually represented in a graphic language defined with precision.
But the methods that have been used to find these improvements “are very limited”, he says. “I think it shows that there is a major gap, in the sense that we do not have a formal systematic method to connect an algorithm to its optimal execution, or even really understand how many resources it will take to work.” But now, with the new method based on the diagram they have designed, such a system exists.
The theory of categories, which underlies this approach, is a way to mathematically describe the different components of a system and how they interact in a general and abstract manner. Different perspectives can be linked. For example, mathematical formulas can be linked to algorithms that implement them and use resources, or systems descriptions can be linked to “robust monoid chain diagrams”. These visualizations allow you to play directly and to experience the way the different parts connect and interact. What they have developed, he says, is equivalent to “rope diagrams on steroids”, which incorporates many more graphic conventions and many other properties.
“The theory of categories can be considered as mathematics of abstraction and composition,” explains Abbott. “Any composition system can be described using category theory, and the relationship between composition systems can then be studied.” The algebraic rules which are generally associated with the functions can also be represented as diagrams, he says. “Then, a lot of visual things that we can do with the diagrams, we can relate to algebraic tips and functions. So this creates this correspondence between these different systems. ”
Consequently, he says: “This solves a very important problem, namely that we have these in -depth learning algorithms, but they are not clearly understood as mathematical models.” But by representing them as diagrams, it becomes possible to approach them formally and systematically, he says.
One thing that allows is a clear visual understanding of how parallel processes in the real world can be represented by parallel processing in multicore computer GPUs. “In this way,” says Abbott, “diagrams can both represent a function, then reveal how to execute it optimally on a GPU.”
The “attention” algorithm is used by in -depth learning algorithms which require general and contextual information, and is a key phase of serialized blocks which constitute models of important languages such as Chatgpt. Flashatting is an optimization that has taken years to develop, but has resulted in an improvement of six times in the speed of attention algorithms.
By applying their method to the well -established flashed algorithm, Zardini says that “here we can derive it, literally, on a towel”. He then adds: “OK, it may be a large towel.” But to bring home the point of knowing to what extent their new approach can simplify the treatment of these complex algorithms, they have entitled their official research document on the work “flashed up on a towel”.
This method, known as Abbott, “allows optimization to be very quickly derived, in contrast to the methods in force.” Although they initially applied this approach to the already existing flashed algorithm, thus verifying its effectiveness, “we now hope to use this language to automate the detection of improvements”, explains Zardini, who in addition to being a main investigator in the lids, is the Rudge and Nancy Allen Institute in terms of civil and society.
The plan is in the end, he says, they will develop the software to the point that “the researcher downloads his code, and with the new algorithm, you automatically detect what can be improved, which can be optimized and you return an optimized version of the user algorithm.”
In addition to automating the optimization of algorithms, Zardini notes that a robust analysis of the way in which the in -depth learning algorithms relate to the use of hardware resources allows a systematic design of hardware and software. This work line is integrated into Zardini on categorical co-design, which uses the tools of category theory to simultaneously optimize various components of technical systems.
Abbott says that “this whole area of optimized in -depth learning models, I believe, is completely critical, and that is why these diagrams are so exciting. They open the doors to a systematic approach to this problem. ”
“I am very impressed by the quality of this research. … The new approach to the diagram of the in -depth learning algorithms used by this article could be a very important step, “explains Jeremy Howard, founder and CEO of Response.ai, who was not associated with this work. “This article is the first time that I have seen such a rating used to deeply analyze the performance of an in -depth learning algorithm on the real world equipment. … The next step will be to see if real performance gains can be performed.”
“This is a magnificently executed theoretical research element, which also targets great accessibility to uninitiated readers – a line rarely seen in this kind articles,” explains Petar Velickovic, a main researcher from Google Deepmind and a speaker at the University of Cambridge, who was not associated with this work. These researchers, he says, “are clearly excellent communicators, and I can't wait to see what they then offer!”
The new language based on the diagram, published online, has already drawn great attention and interest from software developers. A criticism of the anterior article of Abbott presenting the diagrams noted that “the diagrams of neural circuits proposed are proud of an artistic point of view (provided that I can judge this).” “These are technical research, but it's also flashy!” Said Zardini.
