While you may know Ubisoft Massive as the studio behind The Division 2, and the upcoming Star Wars and Avatar games, they are also the creators of the Snowdrop engine. As one of Ubisoft’s most versatile in-house engines, Snowdrop helped create games like The Division 2, Mario + Rabbids Kingdom Battle, Starlink: Battle for Atlas, and South Park: The Fractured But Whole in addition to powering Massive’s two aforementioned upcoming projects.
To find out how the engine is evolving and changing with a new generation of consoles, we spoke with Snowdrop Audio Architect Robert Bantin, whose presentation, “Snowdrop Audio: Latest Tech Developments” at the Ubisoft Developers Conference shed some light on how games could sound much more impactful in the future.
You’re doing a lot of exciting work with sound in the Snowdrop engine, but in layman’s terms, what exactly is a game engine? What are the benefits to having your own proprietary engine?
RB: The best way I can describe it, is that a good game engine is like a Lego set. Legos are engineered in a way so that they fit together perfectly. You can give someone a set of Legos and they can piece them together in so many ways. Having our own engine means that not only can you give developers these Legos to build our games with, but they can also develop their own Lego pieces that can then be fed back into the Snowdrop ecosystem for everyone to use.
Oftentimes, proprietary engines can be made to make a specific type of game, and so what you end up with is a customizable version of that particular game. Snowdrop was built really against that tendency - to be able to be truly versatile. Players would never guess that the same engine that The Division 2 was made with also made The Settlers, Starlink: Battle for Atlas, or Mario vs Rabbids Kingdom Battle. The aesthetics of a game can change with assets, that’s simple enough, but elements like traversal, combat, and navigation can all be underlying elements that are persistent in an engine and can be transferred from game to game. That way we don’t have to rebuild everything each time – that would not be a great use of our resources.
How have next-generation consoles changed how Snowdrop is able to produce and process audio?
RB: Designers always have to work within certain budgets. How much processing power can we use? How much memory do we have available to us? The increase in all of those obviously allows us to do more, but perhaps the biggest change comes thanks to the solid-state drives that both the PlayStation 5 and Xbox Series X|S are using. With the previous generation of consoles, when you entered into a new area, we would have to pre-emptively load assets into memory, and part of that time spent loading was getting the audio cues and sound effects into the system memory so it could be accessed quickly – sounds that need to play back instantly, like impacts, gunshots and so on. The only things we could safely stream from disk were sounds that didn’t need low latency, like music tracks and ambiences.
When you have an SSD like these consoles have, you begin to realize that you don’t have to load most things into the system memory anymore, because the drives themselves are so fast. What you can end up doing is essentially stream a lot of sounds from the SSD on a need basis. Of course, some sounds that are getting accessed constantly, like gunshots or footsteps, may still be loaded in their entirety into system memory because of how often they occur, but a lot more sounds won’t need to be loaded in fully.
Think of it like Netflix: Instead of having to download an entire movie before watching it, you’re able to stream the movie with only a small amount kept in memory at one time. The main difference in our case is our sound assets are constituent parts to the whole sound field, and those parts mostly won’t need to be fully loaded in now.
“Ray-tracing” normally refers to how light is reflected and bounced off of environmental surfaces in a game, but when it comes to your work I’ve heard the term “ray-tracing audio” used before. What exactly does that mean? How does that affect player experience?
RB: Well, in the past, we’ve used ray-tracing for generating physically accurate reverberation in indoor spaces. There are thousands of instances of that in The Division 2. I think we shipped with around 2,200, but with a scope to increase that to 5,000 if necessary. All the acoustic ray-tracing was done in a non-real-time process from inside the main Snowdrop development tool and written into the game data that we shipped to players. So those rooms were imprinted with their reverberation impulse data, and the only real-time part was our custom reverb engine that applied the local and neighboring room-reverberation data to the un-reverberated sounds playing out in those rooms.
More recently, though, we’ve been given access to the real-time ray-tracing data generated by the GPUs of these latest graphic cards. The fidelity is not quite there for reverberation, but certainly for sound obstruction testing, or window/door propagation, it’s almost free, because the information we need has already been worked out by the graphics renderer. That means we can probably do most of that work now without taxing the CPU at all.
Then there’s Snowdrop audio-specific technology, like the “Slapback” system. It currently works by using physics ray-casts on the CPU to trace where player sound can travel and bounce, and this allows us to have sounds echo off of the environment so that the same actions can sound dramatically different depending on where you are.
But this is not done just for “production value”. There are legitimate reasons why we want unique visual spaces to be unique acoustic spaces as well.
A lot of our minds are on movies at the moment, and lets say our heroes need to swing across a large chasm. Often, the way those scenes are shot, you never actually see the full scope of the setting, but you have a sense that it is an incredibly deep fall because of the echoes you hear in their voices. Their audio is responding to, and in this case, even informing the viewer of their environment to let you know that it’s a dangerous situation. It’s easy to imagine what this might look like in a wide-open hanger or even a dense jungle setting.
The difference for us, working in games, is that the player is in charge of the camera, so we need to do a lot of extra work to marry what you’re seeing with what you’re hearing. If there starts to be a disconnect between those sensory inputs, then your brain tells you it’s not helpful information and you stop paying attention to the less dominant one. Then we may be not be able to use sound to imply imagery.
Is this something that you’ll need surround sound or headphones to be able to appreciate?
RB: There are levels of improvement that can be achieved by upgrading to a decent set of headphones, or going all-out on a surround speaker system, but so long as you aren’t relying on the built-in speakers of your computer monitor, most of this audio processing will carry through to a certain degree. What has really helped recently is 3D audio encoding for headphones (HRTF), which is something both Microsoft and Sony now offer out-of-box. This allows the player to get some very impressive 3D sound immersion at a very accessible price, and for us, this technology slots into what we’re doing very neatly.
What was the goal in sharing this presentation with other Ubisoft developers at UDC?
RB: Now that we have a certain number of teams using Snowdrop at Ubisoft, it’s important for us to keep sharing internally and show other audio teams that we keep actively improving things with new features they can use to directly upgrade their own projects. That’s what UDC is for!
What excites you most about the capabilities of sound design in the future?
RB: If you look at how the new consoles have been built, each in their own way have pushed for better audio facilities.
On the Xbox Series X, this has been through continuity of the Microsoft’s Spatial Audio API (that supports Dolby Atmos, for instance) that they added to the Xbox One during its lifetime, but now with loads more horsepower behind it. And this Xbox Series X|S audio tech works identically on Windows 10, so we get a lot of coverage with a relatively small amount of effort.
With the PlayStation 5, Sony has almost duplicated the audio processing pipeline we designed for The Division 2, only on dedicated hardware called the Tempest Engine. This means that we’re likely going to move a lot of CPU-based audio processing over to that system, and of course that then frees up the CPU to do other things. Initially, I noted that one processing block in the middle wasn’t handled by Tempest – the ray-casting part – and that had me confused. But then once I saw that ray-tracing data was available, it suddenly made sense. Sony isn’t forcing you to use ray-tracing data – you can still use very bespoke physics ray-casts on the CPU, and every developer can now decide how they approach that part on a case-by-case basis.
On all next-gen platforms, audio has more options, and now it’s up to us to leverage it. This is what I dream about most nights.