Saturday, 11 May 2013

æ - ʊ - ʌ - i - a - ɜ

Formant Synthesis

I'm currently working on quite a large project which brings together the ideas from the two previous posts, so I thought this would serve as an interesting sound design interlude until I get that finished.
Formant: (Acoustic Phonetics) One of the regions of concentration of energy, prominent on a sound spectrogram, that collectively constitute the frequency spectrum of a speech sound. The relative positioning of the first and second formants, whether periodic or aperiodic, as of the o of hope at approximately 500 and 900 cycles per second, is usually sufficient to distinguish a sound from all others.

That definition of formant points to some interesting information about how we listen and communicate. This is potentially very useful for the purposes of sound design; different vocal sounds are constructed from combinations of formants at different frequencies. This is one of the key factors which allows us to distinguish between different vocal sounds and words, an attribute which has developed organically with our ability to communicate. Whilst speech synthesis is the most obvious area this is useful for, a potentially interesting task for a sound designer is creating vocalisations for fictional creatures (for example in fantasy or science fiction genres), and it can be assumed that if they are organic and have developed as we have, then similar rules will apply. 

Here is some interesting reading on constructed language (conlang) and creature sound design courtesy of Darren Blondin.

Formant Synthesiser

So here is a device for formant synthesis built in Max/MSP. This is heavily based on a patch from Andy Farnell's excellent book Designing Sound, so all credit for the basic design goes to him. If you're interested in real-time synthesis of non-musical sounds there is (to my knowledge) no better book. There is an introductory chapter to the book available as a free PDF, which also makes a great introduction to pd. Even if you intend to do all your patching in Max, the ideas and patches from the book are easily transferable.

Farnell's example patch "Schwa box" is built in pd (as are all those in the book), so I've adapted the patch to work in Max, and have added in a few extra features such as adjustable pitch and vibrato. It currently uses a basic synth patch as its sound source, but could easily be adapted to use audio recordings. This would then make it possible to add human vowel-like resonances to other sounds.

Download the patch here: 

As ever, you will need either a full version of Max/MSP 6 or the Max/MSP 6 runtime. Both are available from Cycling '74 here. The runtime version of Max allows you to run patches but not edit them.

The speech formants are modeled with [reson~], the resonant bandpass filter which is one of the standard MSP objects. As soon as you load the patch it should start making sound. You can see the frequencies of each formant as it cycles through the vowel sounds.

 Below is a chart detailing the individual frequencies. Note how they are not at harmonic intervals and do not have any regular spacing.

(Compiled by Tim Carmell, spectral database at Center for spoken Language Understanding, Oregon University)

These frequencies are defined by our anatomy, specifically the size and shape of the human supralaryngeal vocal tract (SVT). As we speak, the SVT continually changes shape to create the different formants needed for speech, producing a frequency pattern which changes over time. In the patch, we are using bandpass filters to physically model the resonant characteristics of this space. 

Interestingly, the specific anatomical traits necessary for human speech did not develop until the Paleolithic period (50'000 years ago), so both Neanderthals and earlier humans were physically incapable of what we consider human speech. We do not develop the ideal SVT dimensions until around 6-8 years old as the mouth shortens, the tongue changes shape and the neck lengthens during this time.

If we need to be scientifically accurate with our approach to designing creature sounds, we first need to ask some questions about the creature: 

  • Is it intelligent enough to speak?
  • Does it live in social groups, and therefore have a need for speech?
  • How will the anatomy of the creature affect the sounds it creates?
  • What is its native habitat, how will this affect its vocalisations?

So the formant synthesiser in this post addresses point three on that list, and covers part of a setup which could be used for creating creature vocalisations. There is also room for expanding the system; by changing the list of resonant frequencies this could model larger or smaller creatures (lower frequencies for larger creatures). 

On a side note, some VST users amongst you may have already encountered formant synthesis in what must be the most conceptually important plug-in ever created, the Delay Lama:

It doesn't get any better than that.