Stable Diffusion is Midjourney’s great rival in AI image generators, mainly because it allows anyone to use it on their PC and power it with all kinds of external components. Months ago, its developers released Stable Diffusion 3, and now they’re introducing a new compact version called Stable Diffusion Medium. But there’s a problem: It creates monsters.
What happened? Although its developers released Stable Diffusion 3 on February 22, and the public API has been available since April 17, they’ve just introduced Stable Diffusion Medium. This more "compact" version can run smoothly on any PC with a powerful GPU.
All you need is a GPU with at least 5GB of memory. While SD3 Large (the original) has 8 billion parameters, SD3 Medium has 2 billion. Christian Laforte, co-CEO of Stabilit, said, “Unlike SD3 Large, SD3 Medium is smaller and can run efficiently on commodity hardware.” The developers explain that the minimum requirement is a GPU with 5GB of graphics memory. However, they recommend one with 16GB for best results and performance. Stable Diffusion 3 Medium is also available for free via Hugging Face.
The model is promising. This version benefits from all the improvements of the large model. Thus, it offers a higher degree of photorealism in the generated images, much better typography supports due to the Diffusion Transformer architecture, a better understanding of complex prompts, and perfect efficiency when running on “consumer” GPUs.
But it’s generating abhorrent bodies. However, the model’s limitations are apparent, as evidenced by some images users share publicly. As explained by Ars Technica, the model's work has popped up on Reddit’s threads, where users ridicule SD3 Medium and criticize the monstrous pictures of human bodies.
The hand issues are the least of the worries. The problems appear, for example, in images created by users with simple prompts of women lying on the grass or in the water. This model also has hand problems that seemed to be a thing of the past. But again, generally, fantastic images end up horrible because of how this AI represents these extremities.
A setback in the fight against Midjourney. These problems are a setback for Stable Diffusion, which users saw as Midjourney’s main competitor alongside DALL-E 3. One Reddit user joked, “At least our [training] datasets are safe and ethical!” pointing out that Midjourney’s training is unknown and allegedly uses copyrighted images.
Censorship is a possible cause. The creation of such anomalous images may be due to Stability AI’s insistence on censoring adult content from SD3’s training data. This data teaches the model how to generate pictures and it’s an essential source of information for the AI to learn about human anatomy. However, withholding this data makes the model fail to understand these requests and generate absurd and disturbing images. Something similar happened with Stable Diffusion 2.0 in 2022. The company fixed the problem with SD 2.1 and SD XL.
Internal problems at Stability AI. The company’s situation hasn’t been the best lately. A year ago, Getty sued it, which probably affected its development. CEO and founder Emad Mostaque resigned in March, followed by three of Stability AI’s most influential engineers. The company laid off 10% of its workforce in April. Its financial situation hasn’t looked good in recent months, complicating its future.
Images | Reddit
Related | AI Services Are Everywhere, Ushering in the Return of Subscription Fatigue
View 0 comments