Introduction
Immersive audio is really important in video games, but can often come with a performance cost. In this blog I explore different posibilities to reduce resource consumption to keep the memory and cpu of your end users happy!
But first ⬇️
Why Audio Optimization Matters
🎮 Player Experience
- High-quality, well-optimized audio enhances player immersion.
- Poorly managed audio, such as delays or artifacts can completely break player experience.
⚙️ Technical Constraints
- Storage: Limited disk space demands efficient file management.
- Performance: Audio decoding and playback require CPU and memory. Improper optimization can lead to audio glitches and frame rate drops.
Audio Optimization Strategies
🗂️ Using Efficient File Formats
Different audio file formats offer trade-offs between CPU load, memory usage, and file size. Choosing the right format ensures efficient audio playback and resource management.
⚡ PCM
PCM (Pulse Code Modulation) is an uncompressed audio format that prioritizes CPU efficiency.
- Pro: Fast for CPU
- Con: High memory usage due to large file size
Since the audio is not compressed, it can be played back with minimal processing, making it ideal for quick-play sounds where latency is critical. However, the large file sizes mean it consumes more memory, so it should be used sparingly for frequently triggered audio like UI sounds or short effects.
🎚️ Audio Channels
Does your audio file really need Quad or stereo-channels? 🤔
A lot of times sounds get spatially placed in the game engine so they are already panned. Because of this you can export or convert a lot of your audio assets to a mono channel!
🔊 Mono vs. Stereo Comparison:
- Mono: Saves memory and works well with spatial placement.
- Stereo: Doubles the data size but can add depth when needed.
💡 Tip: Test your sound in mono first—you might find that it works perfectly fine without the extra overhead of stereo!
🗑️ Deleting Files
This one is kind of obvious but often really overlooked. Does your specific type of door that is only in one level really need 15 variations for opening and closing?
Okay, maybe that's a bit overexaggerated...
But maybe 2 variations together with some pitch and volume randomization would've already done the trick!
🔀 Streaming vs. Preloading
Preloading loads sounds into memory for quick access, while streaming plays them directly from storage in manageable chunks, balancing performance and memory usage.
🚀 Preloading
What It Does: Loads audio data (from SoundBanks) from the storage drive into memory.
How It Works: Preloaded sounds are available in memory, allowing the CPU to have quick access.
When to Use: Best for short, frequently triggered sounds that require immediate playback, such as gunshots, footsteps, or UI clicks.
Trade-Off: Uses more memory, so careful management is needed to avoid running out of space.
Audio Middleware Optimization
🧳 Compressing Audio Assets
Compressing audio reduces file sizes and memory usage by encoding sounds into compressed formats like mentioned before. Middleware like FMOD and Wwise use algorithms to lower bitrates or apply encoding techniques, balancing quality and performance.
It works best for large assets like background music or ambience where minor quality loss is acceptable. However, compression increases CPU usage during playback due to real-time decoding and, if overdone, can result in noticeable quality degradation.
🧰 Key Features:
- Bitrate Control: Adjust bitrates to shrink files without losing quality.
- Format Support: Use Vorbis or AAC for efficient playback.
- Platform Optimization: Customize settings for each platform.
- Testing Workflow: Compare compressed and original files.
🎙️Virtual Voices
Middleware tracks inaudible sounds in a lightweight state (virtualized) until they become audible or important in the scene. This reduces resource usage by avoiding full processing of non-critical sounds. It's an efficient solution for managing distant sounds, ambient effects, or audio that doesn't need constant playback.
Why Use It?
- Reduces CPU usage for ambient and distant sounds.
- Ensures critical sounds always take priority.
🔍 Key Features:
- Voice Tracking: Tracks inactive sounds without processing.
- Voice Prioritization: Ensures critical sounds play first.
- Voice Stealing: Demotes low-priority sounds when needed.
🌳 Procedural Audio
Instead of using long static ambience loops, consider generating soundscapes with shorter, randomized sounds placed dynamically around the player.
Example: Rather than looping a forest ambience, you could play bird calls, rustling leaves, and distant water sounds at varying intervals and spatial positions.
This approach reduces memory usage by avoiding large looped files while adding variety and realism to the environment. Layering or randomizing these smaller sound elements can make the audio also feel more alive and reactive, enhancing player immersion.
If you are interested in this topic, you can read more about it on another blog I wrote: How to Eliminate Repetitive Ambience (Coming Soon...)
Tips and Tricks
🖥️ Profiling / Monitoring
Use middleware profiling tools to track CPU, memory, and polyphony during gameplay is very powerful. This will give you good insight on which events are taking up a lot of perfomance. FMOD and Wwise both have very powerful profiling tools.
🧮 Calculating your file size
Fun fact: You can actually calculate your uncompressed PCM files by doing this calculation:
- Audio file size = Bit Depth × Sample Rate × Duration of Audio × Number of Channels.
- 69.120.000 bits = 24 Bits/Sample × 48.000Hz × 30 Seconds × 2 Channels.
Convert to from bits to MB:
- audio file size = 69.120.000 bits × (1 byte / 8 bits) × (1 Megabyte / 1,000,000 bytes)
- audio file size = 8.64 MB (Megabytes)
So your your 30 second Stereo Ambience WAV file would be 8.64 MB!
Conclusion
Audio optimization is about finding the right balance between quality and efficiency. By understanding formats like PCM and Vorbis, leveraging middleware tools, and adopting smart strategies, you can deliver immersive audio experiences without compromising performance.