The 3D Sound Revolution: Why the Road to Ambisonics Starts with Binaural Audio

Headtracking technology, also known as positional tracking, is essential to creating virtual reality environments. It detects your position in space, so that when you move, the VR environment moves with you. Although headtracking first hit the consumer market two decades ago, we have only just begun to create and consume headtracking VR experiences with (more or less) affordable VR cameras and VR headsets.

When it comes to 3D audio, the most popular headtracking audio solution employs a technique called ambisonics. An ambisonic recorder can capture a spatial soundfield in a variety of different orientations. Some ambisonics microphones utilize four microphones arrayed in the form of a tetrahedron, some can see as many as 60 mic capsules integrated. Problem is, these mics are prohibitively expensive for the average consumer. Even as the visual component of VR steadily inches towards the mainstream, creators and consumers alike are still several years away from adopting headtracking audio en masse.

WHY THE ROAD TO AMBISONICS STARTS WITH BINAURAL AUDIO

Can you name the first virtual reality headset? Probably not. Released in 1996 and discontinued two years later, the Sony Glasstron went the way of the dodo because the everyday consumer usually isn’t ready to adopt a new technology when it first hits the market,. We tend to require an intermediate step or two by way of an introductory product. In the case of VR, that validating first step was the GoPro HERO. Priced to move and incredibly easy to use, the GoPro offers a wide-angled view designed to encapsulate the recordist’s POV and spawns content that anyone can consume on a desktop or smartphone — just like the 2D videos we’re used to watching. By the time consumer-friendly VR headsets and cameras began hitting the market, users had already been combining multiple GoPros to shoot 360 video for six years.

Before diving into ambisonics, we’re going to need a GoPro of Sound. Released this past summer, the Hooke Verse represents that crucial first step on the road to ambisonics. The Verse is the world’s first Bluetooth binaural microphone, which captures sound as you actually hear it by employing two microphones spaced to approximate the distance between your ears. When you listen to binaural audio with any ordinary pair of stereo headphones, it produces the incredibly immersive sensation of being in the same exact place where the recording was made. Recording in binaural audio used to be reserved for professionals with big rigs and big budgets. But the Verse is a pair of easy-to-use headphones priced for the everyday user. All you have to do is pair it with your smartphone and press record. You can also connect the Verse to a DSLR, a GoPro, a field recorder, or a mixing board. Whoever listens to your recording will hear exactly what you heard, as though they were listening through your ears.

Binaural audio is far superior to the two-track stereo sound that you’re used to. Granted, ambisonics is even more powerful than binaural, but it requires a VR headset in order to be experienced. The Verse is the perfect introductory product to the new frontier of immersive audio because it seamlessly integrates with the consumer tech products that you already own — a smartphone and a pair of headphones.

Like the GoPro, the Verse constrains the consumer in the recorders POV. This is what’s known as passive VR. But if you want to create an interactive VR experience with headtracking audio, there are several software programs that enable to you to upmix and render binaural audio recordings into an ambisonic soundfield. They include:

  • Songbird, an open source spatial audio encoding engine from Google that works on any web browser
  • Facebook 360 Spatial Workstation, which includes plugins for popular audio workstations and a time synchronized 360 video player.
  • Rondo360, an audio toolset for cinematic VR & 360 video.
  • Longcat, which offers custom audio solutions development for applications ranging from authoring tools and plug-ins to VR and industrial simulation.
  • Superpowered SDK, a completely mobile audio engine for smartphone games, VR and audio apps.
  • Waves Nx, a monitoring plug-in that turns your headphones into a virtual mix room.

And yet, all of these software programs require varying degrees of extra work and pro-level know-how. Placing sounds in space to create a 3D audio experience is kind of like adding color to a black-and-white photograph, only harder. Unlike the Verse, which downloads your binaural recordings directly to a consumer-friendly app, Google’s open source spatial software is entirely code-based. For most of us, using Songbird would literally require learning a new language. Plus, once you’ve rendered your binaural recording into headtracking VR audio, who will get to experience it? Only people who own headsets, which aren’t always readily at hand. Even they’re still mostly consume VR content on two-dimensional screens, navigating 360 visuals with mouse and thumb.

The Verse enables amateurs, professionals and semi-pros to record everything from live concerts to podcasts in immersive 3D audio, and consumers can enjoy that content with the tools that they’re already using on a daily basis. Ambisonics is undoubtedly the wave of the future, but the 3D sound revolution has only just begun. If you want to create an ambisonic experience, start with binaural audio and upmix your recordings with a spatial audio software program first. It might save you the expense of an ambisonic mic that you don’t actually need.