‘I am AI’ Seeks to Make Video Conferencing Better

by SIGGRAPH Conferences | 12 October 2021 | Awards, Conferences, Industry Leaders, Real-Time

s2021_realtimelive_103

Over the past year, how often has a “frozen” (virtually, at least) colleague or friend caused frustration in an otherwise great meeting or discussion? NVIDIA’s SIGGRAPH 2021 Real-Time Live! Best in Show-winning demo, “I am AI: AI-driven Digital Avatar Made Easy” sought to solve that frustration and wowed our virtual audience and judges in the process. We caught up with Ming-yu Liu (director of research, NVIDIA) to learn more about the deep learning-based system.

SIGGRAPH: Share a bit about the background of this project. What inspired the initial research?

Ming-yu Liu (ML): We relied on video conferencing for idea brainstorming and collaboration, and one of my colleagues had a poor internet connection at his place. Sometimes, we’d be talking about an exciting research idea and the connection got lost, then the video feed froze. This caused some frustration. Our research was on synthesizing visual content, so we started to imagine how we could use our research to make video conferencing better. We hypothesized that if we could create a digital avatar with a single photo of a person, we would achieve very low bandwidth video conferencing because we only needed to send the information required to animate the avatar for a video call. Beyond video conferencing, it can be used for storytelling, virtual assistants, and many other applications. (Watch the demo below!)

SIGGRAPH: When it came to executing the idea for your demonstration, talk about the challenges faced and overcome. What was most difficult part to produce/practice? How did you work across locations?

ML: We faced a couple of challenges. Our demo was about using our AI digital avatar creation technology for video conferencing, and we had two main demonstrators. One played the role of the interviewer, and the other played the role of the interviewee. This setting required four important view angles: the two screen views of the two demonstrators’ computers and the two frontal views of the two demonstrators. As these two demonstrators had to be physically located in two different rooms, we had to use multiple cameras and screen captures for filming the demo. We spent a lot of time testing different camera views and transitions just to make the demo more interesting. All of the above required a large team operating in the same location. Fortunately, all the team members were very excited about the demo.

SIGGRAPH: Explain the technologies that were used to produce the end product.

ML: There are three main technologies:

The first is a text-to-speech technology called RAD-TTS. It uses an AI to convert user input texts to speech audios.
The second is an audio-to-CG-face-video technology called Audio2Face. It uses a deep network to convert speech audios to the facial motions of a CG character. The output is a CG face video.
The third is a facial motion retargeting technology that uses a GAN to transfer the facial motion in the CG face video to a still photo of a target person or cartoon avatar. The output is a face video of the target person or cartoon avatar making the motion of the CG character in the CG face video. The AI model creates realistic digital avatars from a single photo of the subject — no 3D scan or specialized training images required.

To learn more about these technologies go here.

SIGGRAPH: What’s next for “I am AI”? What do you see as the future for the technology? Will it be made available publicly?

ML: We are working on improving the technologies in the “I am AI” demo. A version of the audio-to-face conversion technology that animates a face mesh using audio inputs is available in the NVIDIA Audo2Face App.

SIGGRAPH: Tell us about a favorite SIGGRAPH memory from a past conference.

ML: My favorite SIGGRAPH memory was the “The Making of Marvel Studios’ ‘Avengers: Endgame’” session at SIGGRAPH 2019. As a big fan of Marvel superheroes — and a beginner of computer graphics — I was very excited to see the world-renowned VFX studios’ producers share their experiences and journeys in making the “Endgame” movie. I was deeply impressed and motivated to create one amazing technology that could be used in a movie one day.

SIGGRAPH: What advice do you have for someone who wants to submit to Real-Time Live! for a future SIGGRAPH conference?

ML: Do dry runs and try to make each one better than the previous one. Real-Time Live! is the best platform to demonstrate your coolest real-time computer graphics technology. People expect unseen, brand-new technologies, and you only get a few minutes to present it live. As new technologies are often unstable, the dry runs will help you identify issues so that you can improve the robustness of your demo. You also need to work hard on making your audience understand the tech and why it is important from the demo. Watching the recordings of the dry runs helps improve the presentation.

Registration to access the virtual SIGGRAPH 2021 conference closes next Monday, 18 October. Don’t miss your chance to check out other award winners like “I Am AI”! Plus, for free you can hear more from Liu and his team when you sign up for GTC, 8-11 November.

Ming-Yu Liu is a distinguished research scientist and a director of research at NVIDIA. His research focuses on deep generative models and their applications, and his ambition is to enable machine super-human imagination capability so that machines can better assist people in creating content and expressing themselves. Liu likes to put his research into people’s daily lives — and NVIDIA Canvas/GauGAN and NVIDIA Maxine are two products enabled by his research. He also strives to make the research community better and frequently serves as an area chair for various top-tier AI conferences, including NeurIPS, ICML, ICLR, CVPR, ICCV, and ECCV, as well as organize tutorials and workshops in these conferences. Empowered by many, Liu has won several major awards in his field, including winning a SIGGRAPH Best in Show award two times.

Artificial Intelligence Awards Best in Show Deep learning NVIDIA Real Time Live! Real-time demos Real-Time Live! SIGGRAPH 2021 video conferencing Virtual

Related Posts

SIGGRAPH Spotlight: Episode 88 – In Conversation With Women of SIGGRAPH Conversations

SIGGRAPH Spotlight: Episode 88 – In Conversation With Women of SIGGRAPH Conversations

28 March 2025 • by SIGGRAPH ConferencesConferences, Industry Leaders

In this special Women’s History Month episode of SIGGRAPH Spotlight, we welcome Kalina Borkiewicz, Eveline Falcão, and Dawn Fidrick from the Women of SIGGRAPH Conversations (WOSC). They share insights on the impact of women in the SIGGRAPH community, the role of mentorship, and how WOSC has shaped their careers. Tune in for an inspiring discussion on advocacy and empowerment in the industry!

ACM SIGGRAPH, Dawn Fidrick, Eveline Falcão, Kalina Borkiewicz, podcast, SIGGRAPH Spotlight, WOSC

A Path to Smarter, More Effective Designs

A Path to Smarter, More Effective Designs

27 March 2025 • by SIGGRAPH ConferencesConferences, Software, Visual Effects

As we see Gaussian Splatting becoming more important in computer graphics and interactive techniques, we caught up with SIGGRAPH 2024 Birds of a Feather presenter Kedar Deshpande.

Birds of a Feather, Freeman, Gaussian splatting, Kedar Deshpande

MTSU SIGGRAPH Student Chapter: Growing, Connecting, and Shaping the Future of Animation

MTSU SIGGRAPH Student Chapter: Growing, Connecting, and Shaping the Future of Animation

26 March 2025 • by ACM SIGGRAPHACM SIGGRAPH, Conferences, Education, Students

The MTSU SIGGRAPH Student Chapter has grown since its 2012 founding, offering students industry connections, events, and hands-on experiences in animation. The group continues to shape future animators through networking and professional opportunities.

Computer Graphics, SIGGRAPH, Students

Top