While the first language models could only deal with text, contemporary models of large language now perform very diverse tasks on different types of data. For example, LLMs can include many languages, generate computer code, solve mathematical problems or answer questions about images and audio.
MIT researchers have surveyed the internal functioning of LLM to better understand how they process this matching data and have found proof that they share certain similarities with the human brain.
Neuroscientists believe that the human brain has a “semantic hub” in the anterior temporal lobe which incorporates semantic information of various methods, such as visual data and touch entries. This semantic center is connected to “shelves” specific to the modality that transport information to the concentrator. MIT researchers discovered that LLMs use a similar mechanism by abstractly processing data from various methods in a central and generalized manner. For example, a model that has English as a dominant language would be based on English as a central support to treat Japanese entries or the reason for arithmetic, computer code, etc. In addition, researchers demonstrate that they can intervene in the semantic center of a model using text in the dominant language of the model to modify its results, even when the model processes data in other languages.
These results could help scientists train future LLM which are better able to manage various data.
“Llms are big black boxes. They have achieved very impressive performance, but we have very littler Knowledge about their internal working mechanisms. I Hope this can be an Early Step to Better Understand How They Work So We Can Improve Upon Them and Better Control Them when Needed,” Says Zhaofeng Wu, an Electrical Engineering and Computer Science (EECS) Graduate Student and Lead Author of A Document on this research.
His co-authors include Xinyan Velocity Yu, a graduate student at the University of Southern California (USC); Dani Yogatama, associate professor at the USC; Jiasen Lu, scientific researcher at Apple; And the main author Yoon Kim, assistant professor of EECS at MIT and member of the computer intelligence laboratory and artificial intelligence (CSAIL). Research will be presented at the international conference on representations of learning.
Integrate various data
The researchers based the new study on previous work Who has suggested that the LLM centered on English use English to carry out reasoning processes on various languages.
WU and its collaborators have expanded this idea, launching an in -depth study in the mechanisms that LLMs use to process various data.
An LLM, which is made up of many interconnected layers, divides the input text into words or sub-words called tokens. The model attributes a representation to each token, which allows it to explore the relationships between the tokens and to generate the following word in a sequence. In the case of images or audio, these tokens correspond to particular regions of an image or sections of an audio clip.
The researchers have found that the initial layers of the model process data in its specific language or modality, such as the rays specific to the modality in the human brain. Then, the LLM converts the tokens into agnostic representations of modality because it reasons them on their subject in its internal layers, similar to the way in which the semantic center of the brain integrates various information.
The model attributes representations similar to inputs with similar meanings, despite their type of data, including images, audio, computer code and arithmetic problems. Even if an image and its text legend are distinct types of data, because they share the same meaning, the LLM would attribute similar representations to them.
For example, an LLM dominating in English “thinks” of an entry of Chinese text into English before generating a release in Chinese. The model has a reasoning trend similar to non -text entries such as computer code, mathematical problems or even multimodal data.
To test this hypothesis, the researchers have passed a pair of sentences with the same meaning but written in two different languages through the model. They measured how similar the model representations were for each sentence.
Then, they conducted a second set of experiences where they fueled a predominantly English model text in a different language, like Chinese, and measured how similar its internal representation was compared to Chinese. Researchers have conducted similar experiences for other types of data.
They have constantly found that the representations of the model were similar for sentences with similar meanings. In addition, on many types of data, the tokens treated by the model in its internal layers looked more like tokens centered on English than the type of input data.
“Many of these types of entry data seem extremely different from the language, so we were very surprised that we could probe English tournaments when the model treats, for example, mathematical or coding expressions,” explains Wu.
Take advantage of the semantic center
Researchers think that LLM can learn this semantic hub strategy during training because it is an economic means of processing various data.
“There are thousands of languages, but a large part of the knowledge is shared, such as common sense knowledge or factual knowledge. The model does not need to reproduce this knowledge between languages, ”explains Wu.
The researchers also tried to intervene in the internal layers of the model using English text when he treated other languages. They found that they could predictablely modify the outputs of the model, even if these outings were in other languages.
Scientists could take advantage of this phenomenon to encourage the model to share as much information as possible in various types of data, which potentially increases efficiency.
But on the other hand, there could be concepts or knowledge that is not translated between languages or data types, as culturally specific knowledge. Scientists may want LLM to have specific language treatment mechanisms in these cases.
“How do you share as much as possible as much as possible, but also allow languages to have specific language treatment mechanisms?”
In addition, researchers could use this information to improve multilingual models. Often, a dominant model in English which learns to speak another language will lose part of its precision in English. A better understanding of the semantic hub of an LLM could help researchers prevent this linguistic interference, he said.
“Understanding how linguistic models deal with entries in languages and modalities is a key question of artificial intelligence. This article establishes an interesting connection with neuroscience and shows that the hypothesis of the proposed semantic hub is due in modern languages models, where semantically similar representations of different types of data are created in the intermediate layers of the model, “said Mor Geva Pipek, a professor assistant at the school of the IT school of Tel Aviv University, who does not get involved. “The hypothesis and well -linked experiences and extend the results of previous work and could have an influence on future research on the creation of better multimodal models and the study of links between them and brain function and cognition in humans.”
This research is funded, in part, by the Mit-ibm Watson Ai Lab.
