A revolutionary approach to accelerate the model of great pre-training language

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

Large language models (LLMS), like Chatgpt, have gained popularity and the attention of the media. However, their development is mainly dominated by a few well-funded technology giants due to the excessive costs linked to the pre-electors of these models, estimated at at least $ 10 million but probably much higher.

The factor has limited access to LLM for small organizations and university groups, but a team of researchers from the University of Stanford aims to change this. Led by the graduate student Hong Liu, they have developed an innovative approach called Sophia, which can reduce the pre-training time by half.

Sophia's optimization key lies in two new techniques designed by the Stanford team. The first technique, known as curvature estimation, is to improve the efficiency of the curvature of the LLM parameters. To illustrate this, Liu compares the LLM sampling process to a mounting chain in a factory. Just as a factory director strives to optimize the stages necessary to transform the raw materials into a finished product, the LLM pre-election involves optimizing the progression of millions or billions of parameters towards the final objective. The curvature of these parameters represents their maximum achieveable speed, analogous to the workload of factory workers.

Although the curvature estimate was difficult and costly, Stanford's researchers found a way to make it more effective. They observed that previous methods updated curvature estimates at each optimization stage, thus leading to potential ineffectiveness. In Sophia, they reduced the frequency of the curvature estimate to approximately every 10 steps, which gives significant efficiency gains.

The second technique used by Sophia is called cut. It aims to overcome the problem with an inaccurate curvature estimate. By defining the maximum curvature estimate, Sophia prevents the llm parameters overflowing. The team compares this to the imposition of a workload limitation on factory employees or navigation in an optimization landscape, aimed at reaching the lowest valley while avoiding saddle points.

The Stanford team put Sophia to the test by deleting a relatively small LLM using the same size and the same configuration as the OPENAI GPT-2. Thanks to the combination of the curvature and clipping estimate, Sophia has made a 50% reduction in the number of optimization stages and the required time compared to the widely used Adam optimizer.

A notable advantage of Sophia is its adaptivity, allowing it to manage the parameters with variable curvatures more effectively than Adam. In addition, this breakthrough marks the first substantial improvement compared to Adam in the pre-training language model in nine years. Liu believes that Sophia could considerably reduce the cost of training large models in the real world, with even greater advantages as models continue to develop.

For the future, Liu and his colleagues plan to apply Sophia to larger LLMs and explore her potential in other areas, such as computer vision models and multimodal models. Although the transition from Sophia to new areas will require time and resources, its open source nature allows the wider community to contribute it and adapt it to different fields.

In conclusion, Sophia represents a major progression in the acceleration of the pre-sparkling large language model, democratizing access to these models and potentially revolutionizing various fields of automatic learning.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.