
A team of engineers from Google presented a new Music generation system called musiclm. The model creates high quality music based on textual descriptions such as “a soothing violin melody supported by a distorted guitar riff”. It works in a similar way to Dall-E which generates images from texts.
Musiclm uses self -regressive modeling in several stages of Auliolm as a generative component, extending it to word processing. In order to resolve the main challenge of the scarcity of paired data, scientists applied Mulan – a joint musical text model which is formed to project music and the description of the text corresponding to representations close to each other in a space of incorporation.
While dragging Musiclm on a large set of unmarked music data, the model deals with the process of creating parole as a hierarchical sequence modeling task and generates 24 kHz music which remains constant for several minutes. To approach the lack of evaluation data, the developers have published Musiccaps – a new set of high -quality musical legends data with 5,500 examples of pairs of musical text prepared by professional musicians.
Experiences show that Musiclm surpasses previous systems in terms of sound quality and adherence to the description of the text. In addition, the musiclm model can be conditioned both on the text and the melody. The model can generate music according to the style described in the textual description and transform melodies even if the songs have been whistled or fredged.
See the Model Demo on the website.
The AI system has learned to create music by forming it on a set of data containing five million audio clips, representing 280,000 hours of songs interpreted by singers. Musiclm can create songs of different lengths. For example, he can generate a quick riff or an entire song. And it can even go beyond that by creating songs with alternating compositions, as is often the case in the symphonies, to create a feeling of a story. The system can also manage specific requests, such as instrument requests or a certain genre. He can also generate a semblance of voice.
The creation of the musiclm model is one of the applications of in -depth learning AI designed to reproduce human mental capacities, such as speaking, writing articles, drawing, passing tests or writing proofs of mathematical theorems.
For the moment, the developers have announced that Google will not publish the system for public use. The tests have shown that around 1% of the music generated by the model is copied directly from a real interpreter. Therefore, they are wary of the content of appropriation and prosecution.
