96kHz & The Music Industry's Next Digital Supply Chain
June 25th, 2016
Most modern songs are created in digital audio workstations that default to 24-bit wav file format and 44.1kHz sampling rate. The 44.1kHz sampling rate has been the de facto standard for music distributors since the first commercial CD was released in August,1982 by the Dutch technology company, Philips. In 2016, 24-bit wav, 96kHz sampling rate is becoming the high resolution audio standard for the new music industry's digital supply chain.
44.1kHz sample rate was originally chosen for the CD because it is the minimum sampling rate necessary to satisfy theNyquist – Shannon Theorem. The Nyquist – Shannon Theorem states that in order to faithfully create a digitization of a sound, the sample rate must be twice that of the highest recorded frequency. Technically, the human ear can hear frequencies up to 20kHz. Therefore, the minimum sampling rate must be 40kHz in order to properly reconstruct the signal. In addition, a steep low pass filter must be placed at 20kHz in order to remove frequencies that a digital to analog converter will incorrectly reproduce. The incorrect reproduction of frequencies beyond the Nyquist Shannon Theorem is known as aliasing. Anti-aliasing low pass filters for 44.1kHz converters have to be applied very close to the top end of the human hearing range (20-22kHz). Keep in mind that for 96kHz files, anti-aliasing filters are set at 48kHz, far above the limits of human hearing.
For reference, aliasing sounds like ghost note gritty resonance or ringing in high frequencies.
The red source signal requires 4 samples within the 2 wave cycles in order to properly capture the sound. The blue line represents the aliasing, or ringing sound at a lower frequency than the source signal.
“As a matter of practice, I always return files to the client at the sample rate that they were created with, regardless of the sample rate I choose to use during mixing and mastering.”
In an ideal world, we would all record using 32-bit floating point broadcast wav files at 96kHz sample rate from beginning to end. If your computer is fast enough, I suggest you give 32/96 a try. However, as an engineer who works primarily in post production (mixing and mastering), I do not have control over the sample rates of the digital audio files that my clients create.
“Up-sampling track files for mixing or mastering is in not a substitute for true High Resolution Audio recordings that were created, mixed and mastered at 96kHz, from beginning to end.”
Historically, it was common practice to work at the sampling rate of the source files. If someone sent a 44.1kHz wav file to master, I would master the project at 44.1kHz. If someone sent a 48kHz file, I would master the project at 48kHz. This was primarily due to the fact thatsample rate conversionand filtering was less than perfect and created digital artifacts during the conversion process. However, sample rate conversion algorithms have advanced since early digital audio workstations. Modern conversion algorithms are capable of creating up-sampled files to 96kHz with near-zero artifacts.
“If a project is delivered to a mixing or mastering engineer at 44.1kHz or 48kHz, should the engineer up-sample the project to 96kHz for mixing?”
For a better perspective on this question, lets look at digital music delivery systems and work our way back to the studios and production rooms that primarily work at 44.1kHz. In February of 2012, the Recording Academy and Apple iTunes (The largest digital retailer of music at the time) worked together to create the“Mastered for iTunes “digital delivery standard. This standard is largely misunderstood, but createsa method for the mastering engineer to comparewhat he or she hears in the studio with what the consumer will hear.
Digital distribution chain
First, the standard protects against peak distortion that can be created during the format conversion process of streaming services and digital retailers like iTunes or Amazon MP3 (WAV -> AAC or MP3). A common approach to protecting against peak distortion during the conversion process is to mandate that 0.5dBFS of unused headroom exist in the master digital audio file. Meaning, the final limiter peak output level must be set to -0.5 dBFS or lower (dependent on program material), allowing silence to be printed into the loudest section of the digital master. This 'empty space' is used to protect against peak distortion that can be encoded into the consumer files created by digital retailers. If your limiter is set to a maximum output level of 0.0, or even -0.3, peak distortion can be created in the consumer file. By leaving -0.5dBFS of headroom in the final master, most digital retailers encoding software will stay within the maximum loudness (0.0dBFS)of the encoded file, reducing the chance of peak distortion in the consumer file. The "Mastered for iTunes" applet allows you to perform the conversion process and hear the AAC file before it hits retail. It also includes a terminal command that lets you determine the number of peak distortions that exist in the consumer file. With a limiter set to -0.1dBFS you will most likely receive thousands of peak distortion points. When setting your limiter to -.5dBFS or lower (dependent on program material) you will receive almost zero. The below picture shows a wav file with a limiter's output set to a maximum loudness of -.5dBfs. When the master file is encoded to an MP3 or AAC by the retailer, the codec will create new peaks above the maximum loudness of -0.5dBFS. However, the newly created peaks do not result in distortion because the codec is printing into the remaining headroom of the master file.
Peak distortion created during the format conversion process performed by digital music retailers. The above photo shows amplitude (up, down) and time (L, R) .
Second, the MfiT protocolprefers 24-bit wav, 96kHz sample rate filesfor AAC encoding. Technically, you can deliver a 24-bit wav, 44.1kHz file to your distributor and it will still be considered "Mastered for iTunes", but 24-bit 96kHz is preferred.
In my opinion, the MfiT protocol works extremely well across the entire digital supply chain, not just the iTunes marketplace.
I must mention that the MfiT program does not recommend up-sampling a final master that was created at 44.1kHz or 48kHz. However, it does not mention whether one should mix and master at a higher resolution than the source files. I and other engineers(Tom Volpicelli of The Mastering House)have found advantages to the way that plugin algorithms perform when up-sampling to 96kHz for mixing and mastering. I will go further into this topic in my next post.
Mastered for iTunes logo
In February of 2016,The Consumer Technology Associationcreated a classification for“High Resolution Audio”as “better than CD quality”. Therefore,a minimum of 20-bit, 48kHz sampling rate is now considered “High Resolution Audio”. You may ask, why 20-bit and not 24-bit resolution? 20-bit was chosen because of the vast archives of classic songs recorded to 20-bit, 48kHz digital tape in the early 80s. If 24-bit was chosen as the minimum standard, then a decade worth of early digital recordings would not classify as "better than CD quality". Technically, these files are higher fidelity than "CD quality."
Master Quality Authenticated Logo
The High Resolution Audiologo will be available for streaming services in late 2016. It is expected that streaming services will turn the logo on if a song was submitted to the digital supply chain at "better than CD quality" and turn the logo off if the file was submitted at 16-bit, 44.1kHz (CD Quality). In addition toHigh Resolution Audiostandards, streaming services are slowly moving to High Resolution Audio with the incorporation of“Master Quality Authenticated”encoding and decoding process developed by Bob Stuart ofMeridian Audio.
TheMQAprocess allows for the encoding and decoding of 96kHz, 24-bit files by streaming services, but at a fraction of the file size. Tidal is expected to adopt the technologyby the end of 2016 and other streaming services are showing interest in Meridian's breakthroughs. MQA audio streaming will require a hardware decoder to playback the full bandwidth 96kHz, 24-bit stream. However,normal playback devices such as an iPhone or laptop will inherently support "CD quality" MQA streams without an MQA decoder. According to the limited info that has been released, MQA will be capable of delivering CD quality audio using roughly the same bandwidth currently used by Apple and Spotify. For an explanation of how it works,read this articleorAES Journal #9178. Rumor has it that Meridian has also built a software decoder that it will decode an MQA stream into a true 96kHz, 24-bit stream.
At the time that this article was writtenApple is considering purchasing Tidal.
High Resolution Audio Logo created by the Consumer Technology Association
As you can see, the largest supplier of music (iTunes) has incorporated a high resolution audio standard with it's“Mastered for iTunes “program. Apple is currently amassing the largest database of 24-bit, 96kHz music in the world. The Consumer Technology Association has designated a minimum standard and logo for High Resolution Audio and they are in the process of licensing the logo to appear dynamically within streaming services. And finally, it appears thatMaster Quality Authenticated will be incorporated into a modern streaming service in the near future, delivering 96kHz, 24-bit quality digital audio to the world.
“While many modern productions are still being created at 44.1kHz or 48kHz, digital distribution services are moving to 24-bit, 96kHz as the High Resolution Audio standard.”
Below I will do a quick test to look into the effects of modern sample rate conversion. My hypothesis is that up-sampling does not negatively affect the sound quality of your 44.1kHz or 48kHz audio, it only moves the aliasing filter out of our hearing range and raises the precision of data used by the algorithm in your plugin.
For many years, engineers would work at 44.1kHz or 88.2kHz because “it was easy math” to convert from 88.2kHz recording format to 44.1kHz delivery format (CD). It was assumed that converting from 44.1kHz to 96kHz would create additional sample rate conversion errors. The conversion from 44.1kHz to 96kHz is known as a 'fractional resampling ratio' and conversion from 44.1kHz to 88.2kHz is known as “integer ratio” conversion. However, sample rate conversion algorithms have made a lot of advances in recent years. After extensive testing,I have come to the conclusion that up-sampling to 96kHz provides more sonic benefits for audio processing (mixing and mastering)than drawbacks from the conversion process.
“Does sample rate conversion alter the sound of digital audio files?”
To test the effects of sample rate conversion I will take a 24-bit, 44.1kHz wav file called “Source” and put it in a 24-bit, 44.1khz Pro Tools, 12.5 session. I will bounce that single stereo 24-bit 44.1kHz wav file down to a 24-bit, 44.1kHz wav file and save it as “CD SR Quality” file. This bounce will not include any plugin processing or volume manipulation. Then, I will take the same “Source” file and import (up-sample)it into a 24-bit, 96kHz, Pro Tools 12.5 session. I will 'bounce to disc' to a 24-bit, 96kHz sample rate file and label it “HD SR Quality”. This bounce will not include any plugin processing or volume manipulation, either.
Essentially, I have two identical files bounced from the same source audio file, but at different sample rates. The “CD SR Quality” file was imported and bounced to disc at the source sample rate (44.1kHz) and “HD SR Quality” was up-sampled and bounced to disc at 96kHz. I then imported the “CD SR Quality” and “HD SR Quality” files into a 44.1kHz session. The “CD Sample” file did not go through any sample rate conversion, but the “HD SR Quality” went through a second sample rate conversion from 96kHz back down to 44.1kHz. We are testing the artifacts that are created when up-sampling a 44.1kHz wav file to a 96kHz session and then reducing the sample rate back down to 44.1kHz.
Now that both the “CD SR Quality” and the “HD SR Quality” files are in the same 44.1kHz session,we canphase invert one of the files, sum them and see what remains. If any audio remains, it will be the result of the sample rate conversion process.
As you can see, the remaining audio is an extremely low amplitude sound (-107dBFS)around 22kHz. Through phase inversion amplitude testing I was able to determine that the 22kHz high frequency sound was in the “CD SR Quality” file,not the “HD SR Quality” file. The 22kHz sound is NOT in the “HD SR Quality”file because that file went through a second conversion process from 96kHz to 44.1kHz and passed through the anti-alias filter, slightly reducing the high frequencies. However,the “CD SR Quality” file did not go through any SR conversion / aliasing filters and retains the high frequency information. It appears that the up-sampling process has near-zero effect on the amplitude and frequency distribution of the sound file,but the anti-aliasing filtering that takes place on the down-sampling slightly filters 22kHz.
High frequency noise at 22Kz, -107dBFS
It appears that the up-sampling process simply increases the number of data points in the audio file,does not increase or decrease precision in the file and moves the effects of the anti-aliasing filter further above our hearing range.
So, why should we up-sample 44.1kHz or 48kHz source files for mixing and mastering if the the up-sampled 96kHz "HD SR Quality"file is nearly identical to the source?