The LLMs have shown impressive capacities on various programming tasks, but their potential for optimizing the program has not been fully explored. Although some recent efforts have used LLM to improve performance in languages like C ++ and Python, the wider application of LLMS to optimize the code, in particular in low -level programming contexts, remains limited. The existing LLM benchmarks are largely focused on the generation of code from natural language or the resolution of Github's problems, as Humaneval, MBPP, Applications, Swe-Bench and Swe-agent shows. In addition, models such as Codex, Alphacode and Code Llama mainly aim to improve the quality of the code generation rather than performance. However, selected research began to approach optimization, including parallelization and improvements in the effectiveness of the code, although many of these approaches are limited by the need for formal verification, limiting scalability.
On the other hand, certain more recent methods adopt validation based on tests, allowing the optimization of more complex programs with loops. Strategies based on learning to optimize the compiler – such as autophase, which uses learning to strengthen the passage of the pass, and Coreset, which applies graphic neural networks – has proven promising in improving performance. Superoptimization techniques aim to find the most effective version of a program but are generally limited to small -scale problems. In addition, executives like AutotvM and Ansor have focused on optimizing the GPU nucleus code by statistical modeling and research. Recently, LLM focused optimization has drawn attention, with strengthening learning approaches guiding LLM using tests of test cases. Techniques such as Code and Ppoder lifted methods of optimizing policies to refine models for better performance, even in programming languages linked to resources like Verilog.
Researchers from Stanford, UIUC, CMU and Visa explore the use of LLMS to optimize the performance of the assembly code – a domain traditionally managed by compilers like GCC. They introduce a strengthening learning framework using the optimization of proximal policy (PPO), guided by a reward balance and an acceleration on the basic line GCC -o3. Using a data set of 8,072 real world programs, their model, QWEN2.5-CODER-7B-PPO, reached a 96.0% test success rate and an average acceleration of 1.47 ×, over-performing 20 other models, including Claude-3.7-Sonnet. Their results show that with RL training, LLM can effectively surpass conventional compiler optimizations.
The methodology consists in optimizing C -compiled C programs for performance using an RL approach. Given a CC program, it is compiled in assembly P using GCC -o3. The objective is to generate a new P ‘assembly program which is functionally equivalent but faster. The accuracy is verified using a set of tests and the speed is measured by improving the execution time. By using Codenet as data set, the authors apply PPO to form a language model that generates an improved code. Two reward functions – acceleration and acceleration guided by correction – are used to guide training according to the validity of the program, accuracy and performance gains.
The study evaluates various language models on the optimization of the assembly code, revealing that most models fight with low test success rates and minimum accelerations. However, Qwen2.5-Coder-7B-PPO, trained with the learning of strengthening, considerably surpasses the others, reaching 96% precision and an average acceleration of 1.47 ×. Ablation studies show that the use of GCC -o3 as a reference facilitates performance, while removing it leads to strong drops. In particular, models such as Claude-3.7-Sonnet can exceed compilers by identifying optimizations specific to equipment, such as replacing loops with a single popcnt instruction, demonstrating their capacity to carry out semantic level code transformations beyond the traditional compiler capacities.
In conclusion, the study explores the use of LLMS to optimize the assembly code, an area where traditional compilers fight due to the complexity of low level performance adjustment. The authors refine Qwen2.5-Coder-7b using PPO, rewarding both accuracy (via test cases) and accelerate on GCC -o3. They introduce a reference of 8,072 real world programs to assess performance. The model reaches a 96.0% test success rate and an average acceleration of 1.47 ×, outperforming 20 other models, including Claude-3.7-Sonnet. Although effective, the limitations include a lack of formal guarantees and variability in hardware performance between systems.
Discover the Paper. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our 95K + ML Subdreddit and subscribe to Our newsletter.
Sana Hassan, consulting trainee at Marktechpost and double -degree student at Iit Madras, is passionate about the application of technology and AI to meet the challenges of the real world. With a great interest in solving practical problems, it brings a new perspective to the intersection of AI and real life solutions.
