Benchmarking Ray Tracing

Real-time ray-tracing promises to bring new levels of realism to in-game graphics. Image source: Underwriter Labs.

Ray tracing, as experts know, is a simple algorithm that can totally consume a processor. But how much a given processor is consumed is an unanswerable question as it depends on the scene and, of course, the processor itself. So, approximations must be made, and parameters fixed to get a consistent comparison. Then, it’s left to the buyer to extrapolate the results to their situation.

Ray tracing can be done on three platforms — soon four — and runs on servers, workstations, or PCs, and has been demonstrated on tablets. Non-geometric-based ray tracing is also run on supercomputers in field simulations ranging from optical analysis to nuclear explosions and fusion reactions.

History

At the workstation and server level, in what is known as the professional graphics market, the Standard Performance Evaluation Corporation, or SPEC, has provided professionals graphics-level benchmarks since 1988. SPEC is a non-profit corporation whose membership is open to any company or organization that is willing to support the group’s goals (and pay nominal dues). Originally a bunch of people from workstation vendors devising CPU metrics, SPEC has evolved into an umbrella organization encompassing four diverse groups. Though SPEC does not have a benchmark that focuses solely on ray tracing, there are many tests within its application-based benchmarks (especially SPEC/GWPG benchmarks) that test ray tracing functionality.

In 2006, the organization produced their CPU-based ray tracing benchmark, Persistence of Vision Raytracer (POV-Ray)—453.povray. The 453.povray benchmark renders a 1280- by 1024-pixel anti-aliased image of a chessboard with all the pieces in the starting position. The objects were selected to show the various geometry objects available POV-Ray. The input file also generates a height field out of a simple fractal function, and provides the function for an isosurface object. All objects in the scene have procedural textures that determine their surface texture. Many of these textures make use of a variant of the Perlin noise function. Further, some objects refract light, others are highly reflective, and these surface attributes are procedurally defined, too.

“Tribute to Myrna Loy” © 2008 Ive

The figure in the above image is Vicky 4.1 from DAZ. Its author, Ive, created it with Blender by using all images of her that he could find as reference. Rendered with POV-Ray beta 25, it uses seven light sources and the area_illumination” feature. You can learn more about the POV-Ray free software tool for benchmarking here. Source code and a Windows installer can be found on the opg’s GitHub repository, and developments can be found at the POV-Ray.beta-test newsgroup. An unofficial Mac version is also available here.

Phasing Out and In

After several years, the SPEC CPU 2006 was phased out for SPEC CPU 2017. Then, in 2018, SPEC released the SPECworkstation 3 benchmark, which includes a totally redesigned storage workload based on traces of nearly two dozen applications — workloads that reflect changes in updated versions of Blender, Handbrake, Python, and LuxRender applications, including GPU-accelerated workloads based on LuxRender.

Applications that were traced for the new storage workload include 7zip, Adobe Media Encoder, Adobe Premier Pro, Ansys Icepak, Ansys Mechanical, Autodesk 3ds Max, Autodesk Maya, Autodesk Revit, Blender, CalculiX, Dassault Systémes Solidworks, Handbrake, Lammps, Microsoft Visual Studio 2015, Namd, the SPECviewperf 13 energy viewset, and the SPECworkstation 3 WPCcfd workload. Accurately representing GPU performance for a wide range of professional applications poses a unique set of challenges for benchmark developers. Applications behave very differently, so producing a benchmark that measures a variety of application behaviors and runs in a reasonable amount of time presents difficulties.

A scene from the updated LuxRender workload.

Even within a given application, different models and modes can produce very different GPU behavior, so ensuring sufficient test coverage is a key to producing a comprehensive performance picture.

Another major consideration is recognizing the differences between CPU and GPU performance measurement. Generally speaking, the CPU has an architecture with many complexities that allow it to execute a wide variety of codes quickly. The GPU, on the other hand, is purpose-built to execute pretty much the same set of operations on many pieces of data, such as shading every pixel on the screen with the same set of operations.

SPECworkstation 3 includes a dedicated suite for measuring GPU compute performance. The ray-tracing test uses LuxMark, a benchmark based on the new LuxCore physically based renderer, to render a chrome sphere resting on a grid of numbers in a beach scene (above).

Ray tracing doesn’t happen in a void. Even in a SPEC test that is predominantly centered on ray tracing, there is a lot of other stuff happening that impacts performance — application overhead, housekeeping, and implementation peculiarities. These need to be considered for any performance measurement to be representative of what happens in the real world.

SPEC also offers viewsets. For example, the maya-05 viewset was created from traces of the graphics workload generated by the Maya 2017 application from Autodesk. The viewset includes numerous rendering modes supported by the application, including shaded mode, ambient occlusion, multi-sample antialiasing, and transparency. All tests are rendered using Viewport 2.0.

Supplier Benchmarks

In addition to some ray tracing software, suppliers offer their own benchmark programs. For example, Chaos Group has a V-Ray benchmark. The V-Ray Benchmark is a free stand-alone application that helps users test how fast their hardware renders. The benchmark includes two test scenes, one for GPUs and another for CPUs, depending on the processor type you’d like to measure, and it does not require a V-Ray license to run.

After tests are complete, users can share the results online and see how their hardware compares to others at benchmark.chaosgroup.com. Chaos Group recommends noting any special hardware modifications that have been made, such as water cooling or overclocking. They also suggest that if you are looking to benchmark its render farm or cloud, to try the command-line interface and test without a GUI. V-Ray Benchmark runs on Windows, Mac OS, and Linux.

V-Ray benchmark tests. Image source: Chaos Group.

For PCs, the leading benchmark supplier the Underwriter Labs (UL) Futuremark team. Finnish-based Futuremark has been making PC graphics benchmarks since 1997 and, in 2018, announced its ray-tracing benchmark 3DMark Port Royal, the first dedicated real-time ray-tracing benchmark for gamers. Port Royal can be used to test and compare the real-time ray-tracing performance of any graphics AIB that supports Microsoft DirectX Raytracing, and uses DirectX Raytracing to enhance reflections, shadows, and other effects that are difficult to achieve with traditional rendering techniques.

As well as benchmarking performance, 3DMark Port Royal is a realistic and practical example of what to expect from ray tracing in upcoming games — ray-tracing effects running in real time at reasonable frame rates with a 2560- by 1440-pixel resolution. It was developed with input from AMD, Intel, Nvidia, and other leading technology companies, and UL worked especially closely with Microsoft to create an implementation of the DirectX Raytracing API.

Port Royal will run on any graphics AIB with drivers that support DirectX Raytracing. As with any new technology, there are limited options for early adopters, but more AIBs are expected to get DirectX Raytracing support.

Summary

Benchmarking will always be a challenge. There are two classes of benchmarking: synthetic or simulated, and application-based. SPEC uses application based and UL uses synthetic. The workload or script of a benchmark is always subject for criticism, especially by suppliers whose products don’t do well in the tests. The complaint is that the script (of actions in the application-based test) or the simulation (in the synthetic tests) doesn’t reflect real-world workloads or usage. That is statistically correct to a degree. However, SPEC benchmarks either run on top of actual applications or are developed based on traces of applications performing the same work as in the real world. Also, the organizations developing these benchmarks have been doing this work, and only this work, for over two decades, longer than the life of some of their critics, and over that period of time and with that much accumulated experience they can be considered experts.


Jon Peddie ResearchJon Peddie Research is a technically oriented marketing, research, and management consulting firm. Based in Tiburon, California, JPR provides specialized services to companies in high-tech fields including graphics hardware development, multimedia for professional applications and consumer electronics, entertainment technology, high-end computing, and Internet access product development.

Leave a Reply

Your email address will not be published. Required fields are marked *