Stable Diffusion 3 Announced

Gábor Bíró • 2024. February 26.

2 min read

Stability AI has officially announced the upcoming release of Stable Diffusion 3, promising a significant leap forward in the capabilities of text-to-image artificial intelligence models.

Source: Stable Diffusion

This new iteration introduces several key improvements and features designed to enhance the model's performance, image quality, and its ability to interpret and execute complex prompts compared to its predecessors like SDXL.

New Architecture and Enhanced Performance

Stable Diffusion 3 is built upon a novel diffusion transformer architecture, a departure from the primarily U-Net based structures used in previous versions. This new foundation, conceptually similar to the transformer architectures powering large language models, is designed for better scalability and potentially a more nuanced understanding of text prompts. Performance is further boosted by incorporating flow matching during training. This technique can lead to faster training times, more efficient sampling (image generation), and improved overall output quality compared to earlier diffusion training methods.

Expanded Range of Models

To cater to a wide spectrum of user needs and hardware capabilities, Stability AI announced that Stable Diffusion 3 will be available in multiple model sizes, ranging from 800 million to 8 billion parameters. This scalability allows users to select a model that best aligns with their priorities, whether it's maximizing image fidelity or optimizing for computational efficiency.

Improved Multi-Subject Prompts and Typography

A standout advancement highlighted for Stable Diffusion 3 is its significantly improved handling of prompts involving multiple subjects. It aims to generate images that accurately depict complex scenes with several distinct elements according to the prompt. Furthermore, the model boasts dramatically enhanced typography capabilities, addressing a well-known weakness of many previous text-to-image models. This allows for far more accurate and legible rendering of text specified within the generated images.

Safety and Accessibility

Stability AI emphasized its commitment to safe and responsible AI deployment, stating that numerous safety measures were being implemented from the outset to prevent misuse of Stable Diffusion 3. At the time of the announcement, the model was placed into an early preview phase, not yet widely available. The company also reaffirmed its dedication to democratizing access to generative AI technologies, stating its intention to eventually make the model weights openly available for download and local use, continuing the practice established with earlier Stable Diffusion versions, once initial testing and safety evaluations are complete.

Future Directions

While Stable Diffusion 3's initial focus is on text-to-image generation, its underlying architecture is designed with future extensibility in mind, potentially paving the way for expansion into other modalities such as 3D asset generation and video creation. This versatility underscores Stability AI's ambition to develop a comprehensive suite of generative models capable of serving a broad range of creative and commercial applications.

Recommended

Humanoid Robot in Mass Production

Gábor Bíró • 2024. August 21.

Unitree Robotics has introduced the mass-producible version of its G1 humanoid robot, which, with its price tag of approximately $16,000, opens up a market segment previously inaccessible to many. The G1 robot offers exciting opportunities not only for researchers and businesses but also for robotics enthusiasts.

Apple Acquires French AI Startup Datakalab to Bolster On-Device AI

Gábor Bíró • 2024. April 29.

In a move signaling its deepening investment in artificial intelligence, particularly for on-device processing, Apple has acquired Datakalab, a French AI startup specializing in low-power computer vision and deep learning algorithms. The acquisition, finalized in December 2023 for an undisclosed sum, was recently noted in a European Commission filing and highlights Apple's strategy ahead of expected AI feature launches, likely reinforcing its commitment to privacy-preserving AI.

Reverse Polish Notation: An Elegant Alternative for Evaluating Mathematical Expressions

Gábor Bíró • 2025. March 02.

Reverse Polish Notation (RPN) is an efficient method for evaluating mathematical expressions, characterized by placing operators after their operands. This approach allows for the omission of parentheses, simplifying and clarifying the calculation process. Although it might seem different at first, using RPN significantly speeds up the execution of operations, especially in computer systems and programmable calculators.

Cerebras IPO: Nvidia Competitor Goes Public

Gábor Bíró • 2024. October 15.

In recent years, the AI revolution has introduced new players and exciting technological solutions to the semiconductor industry. Among the most promising is Cerebras Systems, a California-based startup that recently announced its intention to go public.

Waymo Robotaxis Now Available to Everyone

Gábor Bíró • 2024. June 25.

Waymo robotaxis are now available to all users in San Francisco, expanding the self-driving taxi service previously accessible only to a limited number of passengers.

Solar Farm Construction with AI-Powered Robots

Gábor Bíró • 2024. July 07.

AES Corporation's latest development, Maximo, an artificial intelligence-supported robot, is capable of installing solar panels twice as fast and at half the cost compared to traditional methods. Amazon will be one of the first major beneficiaries of this technology, using the robot to accelerate its transition to renewable energy.

Reinterpreting the Marshmallow Experiment

Gábor Bíró • 2024. September 07.

One of the most famous and influential studies in the history of psychology is undoubtedly the Stanford marshmallow experiment. Conducted by Walter Mischel and his colleagues in the 1960s, this research shaped how we think about self-control and its long-term effects for decades. But is the picture really as simple as we once thought?