
Meta Ai Launched Llama, a collection of basic language models ranging from 7b to 65b. According to the developers, Llama can compete with or even outperform the best existing models such as GPT-3, Chinchilla and Palm.
The models of large languages (LLM) formed on massive data of data have shown their capacity to perform a variety of tasks with fundamental tasks such as text summary, the preparation of textual instructions and poetry writing to more complex poems, such as the creation of AI art descriptions.
As a training dataset for Lama developers, used a mixture of several sources: English Commoncrawl, C4, Github, Wikipedia, Books, Arxiv and Stack Exchange. It covered a diversified set of domains. Unlike Chinchilla, Palm or GPT-3, Llama only uses data accessible to the public, which makes its operation compatible with opening, while most existing models are based on data that is not available to the public or undocumented.
To improve the drive speed, LLAMA models use effective implementation of the Causal multi-head attention operator, which reduces the use and calculation of memory. To further improve learning efficiency, the developers have decided to control the control point as a means of reducing the number of recalculated activations during the rear pass.
Unlike previous studies, Meta's research on Llama demonstrates that advanced performance can be carried out by training only on data accessible to the public without using bonuse data sets. The developers hope that the publication of these models with the research community will accelerate the development of important language models, will help improve their reliability and reduce known problems such as toxicity and biases.
Read more details on research in the paper.
