Propel Academic and Industrial Research With Open Datasets

by | 10 July 2024 | Conferences

What if you could train a bot to explore a game map with human-like curiosity? SEED’s AI research uses reinforcement learning to train bots to navigate and play games. Learn more.

This article was authored by Konrad Tollmar, Machine Learning Research Director, Electronic Arts.

Artificial intelligence technologies are famously data hungry. However, the practical and legal challenges of acquiring high-quality data, especially specialized and curated datasets, mean that AI research is often hamstrung.

For Electronic Arts, AI is poised to provide unprecedented advancements in the complexity and richness of video games. AI can revolutionize the speed of iterative development and the creation of new gaming experiences. The emergence of generative AI models is particularly exciting as it can accelerate the creation of game assets, including textures, models, and animations. For example, large language models (LLMs) such as ChatGPT can accelerate development processes and enable players to interact with characters in innovative ways.

Regardless of the specific application, access to comprehensive datasets is essential for research and development. For example, in computer vision, the creation of datasets such as ImageNet has significantly advanced image recognition capabilities by allowing benchmarking and comparison of results. Today, open-source AI models, such as Llama 3 by Meta, contribute to the development of LLMs.

Significant advancement in computer game development crucially requires access to diverse datasets that include 3D meshes, textures, animations, rigging, sound, and voice modalities. Notable examples of such datasets include the Ubisoft La Forge Animation Dataset (LAFAN1) and the Trinity Speech-Gesture Dataset. In computer graphics, the recent CGIQA-6k dataset enables developers to intelligently assess the image quality of NSIs and CGIs. Pixar has also shared the dataset for Motunui island from the movie Moana”, which is intended to allow developers to test new rendering algorithms.

Both academia and industry are increasingly developing and sharing data, emphasizing the importance of high-quality and diverse datasets for training robust AI models. Platforms such as Kaggle and Hugging Face have provided a venue for data scientists and machine learning researchers to share models and datasets. However, the availability of high-quality data is still limited, and the demand for better specialized and curated datasets remains a challenge.

Many AI conferences now have dedicated tracks using shared high-quality datasets. For example, the premiere AI/ML conference NeurIPS has organized its Datasets and Benchmarks track to improve dataset development.

Over the last few years, we at Electronic Arts have become involved in the GENEA Challenge at ICMI. This competition involves teams that build speech-driven gesture-generation systems, followed by a joint evaluation. The power of community-driven research and development is evident in the 2023 edition of the GENEA Challenge, which featured 12 submissions and evaluations in large-scale user studies and more than 10 person-years of work in creating and animating believable characters.

Comparison of animations from the GENEA Challenge 2023

Looking ahead, we see the potential for SIGGRAPH to continue to be the premier venue for sharing advancements in computer graphics and interactive techniques. We hope to see SIGGRAPH continue to share AI/ML content, including data sharing and benchmarking innovations and thought leadership.

If you are interested in these areas of AI and ML, please contact me at ktollmar@ea.com to discuss potential future collaborations on contributing content in these areas to further position SIGGRAPH as the go-to venue for computer graphics and AI and shaping the future of entertainment technologies.

Related Posts