Wwise Optimization: SampleRate CPU vs. Memory
Wwise Optimization: Sample Rate CPU vs Memory
People say ‘lower sample rates are more efficient’.. But what does that mean exactly? And is it true? Or is it only partly true as optimization is one area might not be the same in another area? - Let’s find out!
I have always seen a mix of sample rates used in audio source files, whether it be games that have been around for a very long period, or even a client that has been the launcher for a game, and one reason for this is probably the question of optimization. Another reason might just be that there was no typical conventional way to implement sounds between the team with guidelines or specs for sample rates, or perhaps legacy content that was originally 44,100Hz never got replaced with newer content implemented at a higher sample rate as the 48,000Hz sample rate became the standard rate most platforms now use.
Before I begin, I would like to point out that Wwise was using my system configuration, that outputs 48,000Hz.
I also tried to do the test with both down-sampling and up-sampling but Wwise doesn’t seem to recognize 96,000Hz. Anything higher than 48,000Hz gets a forced sample rate conversion to 48,000Hz, so we’ll only be looking at 48,000Hz, 44,100Hz and 24,000Hz (Appendix A).
I guess in the future I could try setting my system to 44,100Hz and comparing the CPU process with 48,000Hz but I would still expect the results to be similar regardless.
My findings were that playing sample rates of 48,000Hz is cheaper on the CPU than playing sample rates of 44,100Hz and 24,000Hz. Although in normal circumstances a sample rate of 44,1000Hz would use less CPU than a sample rate of 48,000Hz due to there being less cost to process them, there is most likely a real-time compensation that Wwise would be occurring to play them back at the sample rate of 48,000Hz that Wwise uses to compensate for pitch shifting.
There is of course, however, a clear difference in how much each sample would cost in memory.
I used 200 sounds for each playback test and although the sounds are technically the same asset (a footstep, in this instance), each of those 200 sounds are a unique .wav and not just copies from the same source .wav, so would replicate a bank with 200 unique .wav files of whatever content that may be.
For these tests I made a new Unreal Engine 4 Project and integrated it with a new Wwise project.
I played the events from inside the editor (by pressing space bar on the .uasset, not a ‘play in editor’ session with other audio assets running). There was a fluctuating CPU usage of around 0.04 and 0.10% (I assume the CPU usage of Wwise idle in the editor) so the results would technically have this fluctuating value in each sample.
I used the Profiler for each test to monitor the Audio Thread CPU slightly after the moment the events were called, and the results were collected after each session by connecting to the capture log of each test and taking the CPU reading per Event (Appendix B). I found that the CPU results were very unreliable when collecting results at the instant the event was called as sometimes there would be a CPU usage of 0.03% and sometimes 1.8% (for example) so I collected them right after the command size queue was back to 0KB and all sounds were actively playing. I assume the blend container plays them all consecutively (although very VERY quickly, not instantly at the same time?).
I did 100 Events per test as my sample size, and whilst 200 sounds probably wouldn’t trigger at the exact same time for each event in normal game play circumstances, I thought this would give better results than just comparing a couple or even 10 sounds which would arguably be negligible.
All sounds are unique .wav files that are then built in to Wwise Banks with the default conversion settings (Appendix C).
AMD Ryzen 5 3600X
nVidia GTX 1660Ti
Unreal Engine 4.22
For Test One I fire 1 Event that has 1 play action that plays a blend container with 200 unique sounds at 48,000Hz. The total size of the converted .wavs are 11,720KB.
For Test Two I fire 1 Event that has 1 play action that plays a blend container with 200 unique sounds at 44,100Hz. The total size of the converted .wavs are 10,760KB.
For Test Three I fire 1 Event that has 1 play action that plays a blend container with 200 unique sounds at 24,000Hz. The total size of the converted .wavs are 5,860KB.
Test One reports an Audio Thread CPU average of 1.98% and a total bank size of 11,720KB.
Test Two reports an Audio Thread CPU average of 2.90% and a total bank size of 10,760KB.
An increase of 0.92% CPU (46.5%) and a saving of 960KB (8%) when compared to Test One.
Test Three reports an Audio Thread CPU average of 2.78% and a total bank size of 5,860KB.
An increase of 0.8% CPU (40%) and a saving of 5,860KB (50%) when compared to Test One.
We can see that when using 44,100Hz sample rate files there is a memory saving of 8% but a CPU increase of 46.5% to process the assets in real-time. If the title isn’t too memory heavy it might be worth pushing back with CPU trade-offs, unless the CPU usage purely isn’t a problem then it could be worth the conversion.
We can also see that when using 24,000Hz sample rate files there is a memory saving of 50% but a CPU increase of 40% to process the assets in real-time. This is a much larger saving of memory with the same CPU cost of 44,100Hz - but be aware of frequencies that will get lost in the process of converting to this sample rate. I assume that the CPU results between Test Two and Test Three are close as it probably doesn’t matter which sample rate Wwise is processing in real-time to compensate, as it still needs to process them regardless and that any real-time sample rate processing will incur similar results.