
Researchers from the artificial and artificial MIT (CSAIL) intelligence laboratory introduced a revolutionary framework called Distribution corresponding to distillation (DMD). This innovative approach simplifies the traditional process in several stages of diffusion models in a single step, addressing the previous limitations.
Traditionally, the generation of images has been a complex and prolonged process, involving several iterations to perfect the end result. However, the newly developed DMD frame simplifies this process, considerably reducing calculation time while maintaining or even going beyond the quality of the images generated. Led by Tianwei Yin, a MIT doctoral student, the research team achieved a remarkable feat: accelerating current broadcasting models such as stable and Dall-E-3 broadcast by 30 times. Simply compare the image generation results of a stable diffusion (image on the left) after 50 steps and DMD (image on the right) after a single step. The quality and details are incredible!
The key to DMD's success lies in its innovative approachwhich combines the principles of generative opponent networks (GAN) with those of diffusion models. By distilling the knowledge of more complex models in a simpler and faster, DMD achieves the generation of visual content in a single step.
But how does DMD do this feat? It combines two components:
1. Loss of regression: this anchors mapping, ensuring a coarse organization of the image space during training.
2. Loss of correspondence of the distribution: It aligns the probability of generating an image with the student model at its frequency of real occurrence.
Thanks to the use of two diffusion models as guides, DMD minimizes the distribution divergence between the images generated and real, resulting in a faster generation without compromising quality.
In their research, Yin and his colleagues have demonstrated the effectiveness of the DMD through various benchmarks. In particular, DMD has shown coherent performances on popular benchmarks such as Imagenet, by making a score of creative distance from Fréchet (FID) of only 0.3 – a testimony of the quality and diversity of the images generated. In addition, DMD has excelled in the generation of text in the industrial scale, presenting its versatility and its real applicability.
Despite its remarkable achievements, DMD's performance is intrinsically linked to the capacity of the teacher model used during the distillation process. Although the current version uses a stable V1.5 distribution as a teacher model, future iterations could benefit from more advanced models, unlocking new possibilities for high -quality real -time visual edition.
