Image credit: Krea.AI, Inc.
KREA Real-time is transforming the creative landscape with its cutting-edge, AI-powered rendering, showcased as a standout SIGGRAPH 2024 Real-Time Live! contribution. Leveraging advanced GPUs and generative AI, it empowers artists to create and refine work faster, from initial concepts to polished results, while pushing the boundaries of real-time performance. Diego Rodriguez shares how KREA overcomes challenges like GPU optimization and network constraints, offering a glimpse into the future of artistry and innovation at the intersection of AI and creativity.
SIGGRAPH: How does KREA Real-time utilize AI for rendering, and what specific advantages does this bring to artists over traditional methods?
Diego Rodriguez (DR): We use a collection of GPUs inside our compute cluster that uses AI as a sort of “extra render pass” that makes new images. This is used by artists both in pre-production and post-production.
In pre-production, it is used to rapidly create images from a single text prompt — or even another image. For example, artists can either use AI by itself to make an image from scratch in less than one second and they can then start working on top of it. Other artists use sketches they have and let the AI render on top of them since the AI understands more than just words — it also understands other images including hand-drawn sketches.
In post-production, we’ve seen people grabbing an image made with any type of creative tool (e.g., Blender, Photoshop, it really does not matter) and then use our AI to reinterpret it quickly. For example, grabbing a picture of a certain scene and then slightly refining the textures/lighting/details of the scene.
SIGGRAPH: What kind of GPU-node optimizations are currently implemented, and how do these contribute to the rendering frame rate?
DR: An example is dropping stale frames. GPUs may get hundreds of frames to render, but since the architecture is meant to work in real-time, if a certain user has many changes pending, we know that they’re most interested in the latest one. The way I like to think about it is by imagining a circle in the left part of a scene meant to be a sun. As the user drags the sun from the left to the right, we may send five “render” requests. However, as a user, I’m mostly interested in the one with the final position of the sun and I can tolerate not receiving the middle frames during high-traffic hours as long as I get the final relevant one in time.
There are many other optimizations. Many are very similar in architecture to how video encoders work. Sometimes they contribute to better frame rate by removing work from the GPU (e.g., stale frame dropping), and other times they increase the actual throughput of the GPU (e.g., model compilation).
SIGGRAPH: How does the data-center level rendering handle complex workloads, such as high-polygon models or demanding texture maps?
DR: As of now, we don’t deal with high-polygon models since our rendering technology is not based on ray tracing or more traditional methods. Our source of challenges comes around the high costs of each GPU node compared to traditional CPU ones. However, we do share a common challenge: High-resolution texture maps occupy a lot of memory space.
SIGGRAPH: How does KREA Real-time ensure high-quality rendering results are maintained during streaming without losing fidelity?
DR: Turns out that even without compression techniques we can get a fast enough frame rate given the current high-velocity internet connections at most companies and households. But, with 3G-like speeds, our model suffers and our quality remains high. Lag and throughput challenges arise (we had some during Real-Time Live!), but we know that this is where good-ol’ compression techniques come to the rescue. The only difference is that video streaming solutions were not meant to receive frames generated on demand by a GPU, so there’s a bit of “engineering plumbing” to do to get a streaming-like experience when rendering with GPUs.
Internet speed is a bottleneck we have no control over. We mitigate it by carefully tuning the speed/quality tradeoffs of the image encoding approach on both of the inputs sent to the AI backend and the outputs returned from it. Lossier encoding can be used to stream inputs, as the AI model can fix light compression artifacts implicitly during the generation process.
The primary bottleneck is the generative model itself, though. The real-time experience is possible thanks to extreme optimization of state of the art models, including:
- Model compilation
- World’s fastest LoRA loading infra (allowing us to swap out some of the model’s weights to quickly change the aesthetic style of the model’s output)
- Careful quantization of certain model layers running on hardware optimized for low-precision matrix multiplication
- Timestep-distillation (a technique for reducing the number of inference steps of a generative model from 25-50 steps to 1-4 steps)
- Guidance distillation (a technique for reducing the number of model evaluations per inference step from 2 to 1)
- Optimized attention kernels
SIGGRAPH: Are there plans to further explore and enhance GPU-node optimizations to unlock even higher frame rates? If so, what areas are being targeted?
DR: Of course! We just got started. We think there’s a lot of potential around offloading some (light) work from our cluster to local GPUs, network and compression optimizations, moving GPUs closer to users, optimizing CPU-bound processes that still affect GPU nodes, apply the latest state-of-the-art in inference optimization for diffusion and autoregressive models — and, well, of course, there’s also the “optimization” to simply buy more GPUs!
SIGGRAPH: What advice do you have for other Real-Time Live! submitters looking to showcase their project on stage?
DR: During our Real-Time Live!, we talked to Iñigo Quilez, SIGGRAPH 2025 Real-Time Live! Chair. Not only was he kind and supportive while we were nervous before jumping onto stage, but I think he gave superb advice that I wouldn’t be as daring as to try to surpass:
“When you go there, you’ll see how time actually flies by and, before you know it, it’s over. So, really enjoy the moment because you’ll have a lot of fun and you’ll remember it fondly for many years to come. And besides, even if there are issues — there’s always issues — the Real-Time Live! audience is quite supportive around live demos. Trust me, we all know how hard it is for a live demo to go perfectly.”
He was wrong, though. The Real-Time Live! audience was not quite supportive — they were the most supportive audience our team ever had the pleasure, privilege, and chance to present to.
Feeling inspired? There’s still time to share your innovation at SIGGRAPH 2025 Real-Time Live! — submit by Tuesday, 8 April!

Erwann Millon (principal engineer responsible for our GPU-based optimizations) is a founding member of engineering staff at Krea.

Titus Ebbecke is a founding member of design staff at Krea.

Diego Rodriguez is a co-founder & CTO at Krea.
Mihai Petrescu is founding member of technical staff at Krea. (not pictured)