An In-Depth Conversation About Plugins With Nikolay Georgiev from Acustica Audio

Nikolay Georgiev is a freelance sound engineer and producer based in London and is also representing Acustica Audio in the UK for which he has worked on a lot of plugins, such as Navy, Lime, EQ A in Pink and also their newly released Cream plugin. He is also heavily involved with the UK section of the AES (Audio Engineering Society) and from 2011 until February 2017 Nikolay was also a member of the British Executive Committee which he consequently Chaired in 2015. 

Nikolay also develops his own on 3rd party plugins for Acustica Audio and is currently sampling the most interesting spaces and outboard gear he can find. 

This interview takes a slightly different route than others featured here and will be focused on the technical side of making plugins. You will learn how plugins are made, what's an impulse response, dynamic convolution, Acustica Audio vs Slate and so much more. It's really interesting and I hope you can learn as much as I did from this. 


Nik Cave.jpg

In broad terms, how do you sample a room/space or a piece of outboard gear, as in, how does it get from test tones to the end-user being able to put their music in the plugin?

You will first need to excite the system in a specific way and capture its behaviour. The software we have allows you to sample the character of almost any audio system, as long as you can record some specific test tones through it and there is no pitch-shift present. We can sample circuit distortions, EQs, reverbs, compressors, flangers, panners, tremolos and others, even software. If you are after the frequency and phase responses, and any resonances created by the system, you can simply use an impulse to do it. You record the impulse through the system and on the output you get the result, which is called an impulse response (IR). This new tone contains the information on how the system has affected the original pure impulse. Then all you need to do is to convolve your music with the captured impulse response. This is the basis of how classic convolution works (e.g., your common convolution reverb). This approach can deliver very good results for linear systems that exhibit no harmonic distortion, and if you are happy to ignore any changes resulting with different gain staging. You can use this method to sample an EQ curve, the phase/frequency response of a circuit or a reverb. If what you want to sample is software you can simply bounce your test tones through it.  

If you are sampling a real space, you go out of your converter to a loudspeaker, which transduces the electric energy to mechanical energy and the mechanical energy to acoustic energy. With the help of a microphone you go back to electrical energy, and last, the converter saves the signal to numbers. Because you work within an acoustic medium you can actually avoid the use of a loudspeaker giving you a greater choice of test tones (such as balloon bursts or a start pistol, which approximate what a pure impulse is). But depending on what you are sampling, the test tones vary. For example, for compressors, the test tones can be of numerous different types and the procedure can be quite complex. However, the most common test tone used for many devices are sine sweeps, which is a pure sine wave that rises up in frequency, starting from, say, 1 Hz and up to 20-30 kHz or more. Ultimately, this sweep will be transformed into an impulse response. When you run the sweep through a system it changes and will give you information about the frequency response and phase deviations and any resonance effects added by the system. On top of this, if there are any harmonics generated, those will create individual sweeps that lay separately on top of the original fundamental frequency sweep. For example, for 1 kHz you will see harmonics at 2 kHz, 3 kHz, 4 kHz etc., and each will be part of a separate sweep. Through a process called deconvolution, you turn these sweeps into individual impulse responses and apply their properties to an input signal. In our case, we use the Volterra series nonlinear convolution which allows us to model the harmonics dynamically, so, you need lots of impulses sampled at various gain levels. 

Sine sweep spectrogram of a valve circuit: from left to right is time in seconds and top to bottom is the frequency spectrum in Hz/kHz. Where you see brighter colours you have more energy, darker means less. You can clearly see the original sweep on the bottom and the harmonics generated in top, with plenty of these generated in the low frequency range.

Sine sweep spectrogram of a valve circuit: from left to right is time in seconds and top to bottom is the frequency spectrum in Hz/kHz. Where you see brighter colours you have more energy, darker means less. You can clearly see the original sweep on the bottom and the harmonics generated in top, with plenty of these generated in the low frequency range.

Spectrum analysis at 1 kHz of the same sweep as shown above.

Spectrum analysis at 1 kHz of the same sweep as shown above.

What is an impulse? 

If you think about the digital domain you have a bunch of samples at minus infinity (pure silence), followed by a single sample at 0dBFS, then the next sample will be again at minus infinity. That’s a pure impulse, as in, nothing, a burst of energy, then nothing again. If you measure a pure impulse you will see that you have a full frequency response up to Nyquist, perfect phase at every single frequency and that there will be no ringing or echo before or after that impulse. And all frequencies will be at equal level.

A pure impulse – you can think of it as a perfect transient. In this case the frequency bandwidth will be limited by the sample rate of the digital system (44kHz, 96kHz, etc.)

A pure impulse – you can think of it as a perfect transient. In this case the frequency bandwidth will be limited by the sample rate of the digital system (44kHz, 96kHz, etc.)

From the spectrogram of the impulse it is clearly visible that it has equal energy through the entire audio range.

From the spectrogram of the impulse it is clearly visible that it has equal energy through the entire audio range.

In case you are not interested in the harmonic distortion of a system, and having separate IR for the harmonics, you can actually use an IR to sample an EQ, a circuit or a room. However, the problem is that you can’t get an impulse to be at high enough amplitude due to clipping and therefore the signal to noise ratio often will be poor. For this reason, you don’t usually use impulses to sample a space or an EQ, but instead, you go for a long sweep and from that you derive the impulse of the system. But it all depends – for rooms you need a really high SPL so that you can go well above the noise around you, for example, instead of using a speaker you can use a start pistol. You could also burst a balloon or a condom. The idea is to excite the space with something really short, flat in frequency and with a really high SPL. For spaces it should ideally also radiate the produced sound wave equally in all directions – that is, it should be an omnidirectional source. Each method has some advantages and disadvantages, for example, balloons are not ideal as they work because of rapture and have resonances, but are safe to use in public spaces.

That’s why, especially with electrical systems, using sine-sweeps is more beneficial because it allows you to go slowly through the entire spectrum and through the process of de-convolution you can create an impulse response with the added benefit of a lower noise floor.

Furthermore, by using sine-sweeps it will give you the frequency and phase response of a device and also any added echo or resonance, not just from rooms but also from devices, such as EQ’s, valve circuits and transformers. Single impulse responses work well for reverbs but for devices where you want the harmonic distortion you want to use sine sweeps.

Spectrogram of an IR of an audio transformer. It can clearly be seen that there is an echo added by the resonances on the transformer. In this case it is about 200ms long.

Spectrogram of an IR of an audio transformer. It can clearly be seen that there is an echo added by the resonances on the transformer. In this case it is about 200ms long.

What makes a good plugin from a bad plugin? 

The way Acustica does it is through the method of Volterra series nonlinear convolution and no one else does it this way. Some model mathematically the behaviour of capacitors, inductors, transformers, transistors, etc. Then they build a system where these components interact with each other in a similar way in which they would interact within a real circuit. This method is much less CPU demanding and to my knowledge, is therefore preferred by companies. However, your code needs to be very optimised to do what Acustica is doing. It’s not an easy thing. Also, similar systems existed in the past, such as Sintefex™ and Focusrite’s Liquid Channel™. These were based on dynamic convolution™, which works in a similar way to what Acustica does, but it's actually different and sounds different. 

To explain dynamic convolution™, lets first look at normal static convolution - think Logic’s Space Designer, which is built using this method. It processes each incoming sample of your music with an impulse response but it’s static, that is, it doesn’t matter what the input level of your music is, it will always use the exact same impulse response. So, if the sample is at -10 dBFS and the next one is at 0 dBFS, it would be a 10 dB difference in input level and output level, but the impulse response that it used to process these two samples will be identical. 

Dynamic convolution™, on the other hand, will, consider the input level, and use a different impulse response for different input levels of the processed signal. Also, the level and character of the harmonics will vary depending on the input level, in the same way as on the device I’m sampling right now. You can see that depending on the input level the bass frequency response change. When you run it hot (not clipping) the bass frequencies get squashed, flattened out, but when you run it at lower input levels you get a bump in the lower frequencies. If you want to replicate this and you are replicating 10 harmonics, you need to have a system that considers that at this specific input level, -10 dBFS, the plug-in should use this exact set of 10 impulse responses. Next sample, at for example 0 dBFS, there’s another set of impulse responses that you will need to use. 

However, the problem here is that if your music changes rapidly in level and if you do the convolution sample by sample, the transition between these sets of impulse responses can create another kind of distortion which is something that you don’t want because it can deteriorate the sound. What we do is that instead of switching instantaneously between the different samples and sets of impulse responses, we work in blocks. For example, 30 samples will be processed through a set of impulse responses and the next 30 samples will be processed through another set of impulse responses. Between these two blocks, there is a way to interpolate or make a smooth transition.

Slate put out this video comparing Acustica Audio’s Volterra Kernel technology to their algorithmic approach, trying to prove that both are equally good and equally capable as modelling methods. What are your thoughts on this? 

First of all, the two methods sound different and each one has its advantages and disadvantages. Our method is heavier on the CPU but I think it’s more accurate and sounds better. There’s another thing I’d like to mention - you are not going to hear me say that any plugin represents 100% the sound of the hardware it is modelled after. OK, convolution, in theory, should be able to reproduce perfectly a linear system, but most devices are nonlinear. So, the sound of a plug-in is always slightly different from the original system sampled, therefore any plugin is different from the hardware. We try to stay open about this and admit it rather than telling people they got a perfect replica of the hardware. I think plugins are getting closer and closer to the hardware but I still think the hardware has a little bit of an edge when it comes to sound quality. Again, there’s advantaged and disadvantages, such as workflow and other things, but if were are talking purely sound, unless you are doing classical music, specific types of jazz or mastering where you may want the cleanest possible sound, analogue gear still sounds better to me. 

Is the discussion of analogue vs digital still relevant?

Yes, it is. In terms of sound I think it’s getting less and less relevant but sometimes analogue is still better. For example, Acustica released a DW Fern EQ VT5 replica, and the plugin sounds absolutely amazing but then you try the hardware and it’s just a tiny, tiny bit better.

Another thing that separates hardware from plugins is something I discovered when working on the Acustica Cream plugin, where I sampled 24 channels of a vintage British console. I found that some of the channels will have an almost identical frequency response and harmonic distortion levels but their phase will be different at certain frequencies. Well, for one, this will make each channel sound different and these differences may be obvious to a trained ear. Some might wonder if this is bad, but let’s say you have a kick drum on channel 1 and a bass guitar on channel 2, does it matter that there is a phase difference? The channels will be coloured in a slightly different way but your signals are not correlated, as they are of two different sources. However, when you have a stereo source there’s a very big difference because all of a sudden you have correlated material. Your left and right channel share a lot of the same information, so when you have a difference in phase between the two channels all of sudden you de-correlate further the left and the right channels. The results of that, and you can hear it, is that if you have a mono bass panned right in the centre that bass is no longer a dot in the middle, it expands a little bit to the sides of the stereo spectrum and it sounds a little bit “stereo”. And if you have a stereo source, such as overheads, all of sudden they are starting to sound a bit wider, people might say, “You know what, I ran my music through this analogue thing and now it sounds bigger” and I think this is one of the reasons for it. 

So, when we are modelling our plugins we extract this from the hardware and we deliver the same phase response. As stated above, this has a big impact on the sound quality and I think that’s why our technology has a slight edge to others.

You sampled almost the entire Acustica Audio library, such as Navy, Lime, EQ A in Pink and Cream, could you elaborate a bit more what your worked entailed?

There’s a lot of testing, let’s say you want to sample a channel strip from a console and let’s say that you are only interested in a stereo version, that is, sampling two channels. The first thing that you want to do is to run test tones, short test tones and measure all the channels to see which ones have too much distortion and noise. Then find the ones that are in good shape. After that, you want to run music through them and listen to every single channel that is in good shape, and you have to make a decision on which ones you want to sample. Maybe you are going to use the compressor from one, the circuit distortion emulation from another, the high-shelf EQ from a third and the low-shelf from the fourth channel. You need to do a lot of listening, so it’s not just measurements. 

The gain staging is also very important. If you want to sample a console, you have lots of gain stages. You have your line level input gain, then maybe a makeup gain on the compressor, the fader and mix-bus fader. Also, if you are using the groups you have the group trim or gain. If you are sampling the entire path you have to make a decision of where you want to bring the level up and trim it down so it doesn’t clip the converter. Is it going to be at the channel fader or mix out? You have to do a lot of listening and it comes down to an aesthetic decision.

State of the Ark Studios in Richmond

State of the Ark Studios in Richmond

Why are compressors much more difficult to sample?

There are more things to sample and more things that can go wrong. Some are not so difficult but some are very program dependent, as in their attack and release may vary depending on how set it up, so just because you set the Attack time at 30ms doesn’t mean it will be 30ms. I have seen behaviours of compressors that have just shocked me. For example, you set it on a 3-second release, which is a very long release, but depending on the circumstances you can get up to 30 seconds of release, which you think must be a joke. But I’ve seen it and measured it. What you need to do is to build a dynamic map of all kinds of levels and settings, as well as how the attack and realise curves behave and how they change.

You said when working with low-end in reverbs, “It is one thing to use an HPF, and a very different matter to reduce the length of the lows, by manipulating the Impulse response”. What’s the benefit of this approach?

If you have seen a spectrogram of a handclap, impulse response or a snare hit recorded in an acoustically live space you will know that almost every room naturally resonates longer in the low frequencies and shorter in the high frequencies. You may have an RT60 in the high frequency, for examples, at 5 kHz of 0.2 seconds, but at 50 hertz you may have an RT60 of 2 seconds. Sometimes it’s worse than that, for example, I just recorded some music in a lovely wooden space where the RT60 at 1 kHz was roughly 1 second, which is pretty good and allows for great flexibility. But at a 100 Hz, it was something like 5 seconds. That can be a problem because as soon as you record drums there, or as we did, a cajon, you get a lovely closed mic sound but when we recorded the room sound (by having speakers playing back the close recording and making the room) the low-frequency reverberation just completely flooded the mix and blurred the definition of the recording. 

But, if you want to have a bit of the low-end reverberation because it gives nice body to the instrument, how can you solve this? If you high-pass it, you lose the low-end but if you don’t high-pass you still have this blur in the low-end, so you have to compromise. What I do when I work with real spaces is that I will split my reverb (room mics) to two tracks by using a crossover - one for the highs and one for the lows. Then I would put an expander on the low-frequency channel to duck the tail of the low frequencies. Doing it this way allows me to have a fader to control how loud the low-end of the reverb is in the mix. When I do the impulse responses I can achieve the same thing so you don’t have to do it in your mix. Again, I use a crossover to split the low, and the mid and high frequencies of the impulse response, and that allows me to fade out the low-end. The sound does change a bit because you go through a crossover but the low frequencies are shorter and in this way you can get away with adding a lot more reverb to you source without cluttering your mix.

Will this be a part of the new set of your plugins you are working on?

Yes, this is one of the features, which I envisioned for the plugin I want to create. I have sampled a lot of spaces already but to do it the way I want to do it takes a lot of work and time. However, I am getting there and there will be something really nice out soon. Best follow me on Twitter (Click Here) or check regularly my website (Click Here).

More info and other cool stuff at Nikolay's personal website
And don't forget to follow him on Twitter:

Let me know what you think, do you prefer Acustica over Slate? Do you use both? Will you start using Acustica after reading this? Get the discussion going in the comments below!