Formant

In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract.[1][2] In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum.[3][4] For harmonic sounds, with this definition, the formant frequency is sometimes taken as that of the harmonic partial that is most augmented by a resonance. The difference between these two definitions resides in whether "formants" characterise the production mechanisms of a sound or the produced sound itself. In practice, the frequency of a spectral peak differs from the associated resonance frequency, except when, by luck, harmonics are aligned with the resonance frequency.

Spectrogram of American English vowels [i, u, ɑ] showing the formants F1 and F2

A room can be said to have formants characteristic of that particular room, due to the way sound reflects from its walls and objects. Room formants of this nature reinforce themselves by emphasizing specific frequencies and absorbing others, as exploited, for example, by Alvin Lucier in his piece I Am Sitting in a Room.

History

From an acoustic point of view, phonetics had a serious problem with the idea that the effective length of vocal tract changed vowels. It was unclear how they could depend on frequencies when everyone from bass to soprano can make the same vowels. There had to be some way to normalize the frequencies. Hermann suggested a solution to this problem in 1894, coining the term “formant”. A vowel, according to him, is a special acoustic phenomenon, depending on the intermittent production of a special partial, or “formant”, or “characteristique”. The frequency of the “formant” may vary a little without altering the character of the vowel. For a, for example, the “formant” may vary from 350 to 440 Hz even in the same person.[5]

Phonetics

Average vowel formants for a male voice[6]
Vowel
(IPA)
Formant F1
(Hz)
Formant F2
(Hz)
Difference
F1F2
(Hz)
i24024002160
y23521001865
e39023001910
ø37019001530
ɛ61019001290
œ58517101125
a8501610760
ɶ8201530710
ɑ750940190
ɒ70076060
ʌ6001170570
ɔ500700200
ɤ4601310850
o360640280
ɯ30013901090
u250595345
Average vowel formants in a diagram

Formants are distinctive frequency components of the acoustic signal produced by speech, musical instruments[7] or singing. The information that humans require to distinguish between speech sounds can be represented purely quantitatively by specifying peaks in the amplitude or frequency spectrum. Most of these formants are produced by tube and chamber resonance, but a few whistle tones derive from periodic collapse of Venturi effect low-pressure zones. The formant with the lowest frequency is called F1, the second F2, and the third F3. Most often the two first formants, F1 and F2, are sufficient to identify the vowel. The relationship between the perceived vowel quality and the first two formant frequencies can be appreciated by listening to "artificial vowels" that are generated by passing a click train (to simulate the glottal pulse train) through a pair of bandpass filters (to simulate vocal tract resonances).

Nasal consonants usually have an additional formant around 2500 Hz. The liquid [l] usually has an extra formant at 1500 Hz, whereas the English "r" sound ([ɹ]) is distinguished by a very low third formant (well below 2000 Hz).

Plosives (and, to some degree, fricatives) modify the placement of formants in the surrounding vowels. Bilabial sounds (such as /b/ and /p/ in "ball" or "sap") cause a lowering of the formants; velar sounds (/k/ and /ɡ/ in English) almost always show F2 and F3 coming together in a 'velar pinch' before the velar and separating from the same 'pinch' as the velar is released; alveolar sounds (English /t/ and /d/) cause fewer systematic changes in neighbouring vowel formants, depending partially on exactly which vowel is present. The time course of these changes in vowel formant frequencies are referred to as 'formant transitions'.

If the fundamental frequency of the underlying vibration is higher than a resonance frequency of the system, then the formant usually imparted by that resonance will be mostly lost. This is most apparent in the example of soprano opera singers, who sing high enough that their vowels become very hard to distinguish.

Control of resonances is an essential component of the vocal technique known as overtone singing, in which the performer sings a low fundamental tone, and creates sharp resonances to select upper harmonics, giving the impression of several tones being sung at once.

Spectrograms may be used to visualise formants. In spectrograms, it can be hard to distinguish formants from naturally occurring harmonics when one sings. However, one can hear the natural formants in a vowel shape through atonal techniques such as vocal fry.

Formant estimation

Formants, whether they are seen as acoustic resonances of the vocal tract, or as local maxima in the speech spectrum, like band-pass filters, are defined by their frequency and by their spectral width.

Different methods exist to obtain these informations. Formant frequencies, in their acoustic definition, can be estimated from the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer. However, to estimate the acoustic resonances of the vocal tract (i.e. the speech definition of formants) from a speech recording, one can use linear predictive coding. An intermediate approach consists in extracting the spectral envelope by neutralizing the fundamental frequency,[8] and only then looking for local maxima in the spectral envelope.

Formant plots

The first two formants are important in determining the quality of vowels, and are frequently said to correspond to the open/close and front/back dimensions (which have traditionally, though not entirely accurately, been associated with the shape and position of the tongue). Thus the first formant F1 has a higher frequency for an open vowel (such as [a]) and a lower frequency for a close vowel (such as [i] or [u]); and the second formant F2 has a higher frequency for a front vowel (such as [i]) and a lower frequency for a back vowel (such as [u]).[9][10] as can be seen in Fig. 1.

Fig. 1 Schematic diagram of formant plot

Vowels will almost always have four or more distinguishable formants; sometimes there are more than six. However, the first two formants are most important in determining vowel quality, and this is often displayed in terms of a plot of the first formant against the second formant,[11] though this is not sufficient to capture some aspects of vowel quality, such as rounding.[12] An example of how the vowels of a language or dialect may be plotted on a traditional auditory vowel chart and also on a formant plot may be seen in the case of Norwegian.

While Daniel Jones's attempts at capturing vowel articulation resulted in the International Phonetic Association plotting vowels in a trapezoid, actual formant space may be more triangular. Shown is an idealized plot of the formants of Jones and John Wells pronouncing the cardinal vowels of the IPA.[13]

Many writers have addressed the problem of finding an optimal alignment of the positions of vowels on formant plots with those on the conventional vowel quadrilateral. The pioneering work of Ladefoged[14] used the Mel scale because this scale was claimed to correspond more closely to the auditory scale of pitch than to the acoustic measure of fundamental frequency expressed in Hertz as in Fig. 1. Two alternatives to the Mel scale are the Bark scale and the ERB-rate scale. A comparison of these three scales is shown by Hayward, p. 141, and formant plots based on the Hertz scale and on the Bark scale are compared on p. 153.[15] Another strategy for improving formant plots that has been widely adopted is to plot on the horizontal axis not the value of F2 but the difference between F1 and F2 for a given vowel.

Singer's formant

Studies of the frequency spectrum of trained classical singers, especially male singers, indicate a clear formant around 3000 Hz (between 2800 and 3400 Hz) that is absent in speech or in the spectra of untrained singers. It is thought to be associated with one or more of the higher resonances of the vocal tract.[16] It is this increase in energy at 3000 Hz which allows singers to be heard and understood over an orchestra. This formant is actively developed through vocal training, for instance through so-called voce di strega or "witch's voice"[17] exercises and is caused by a part of the vocal tract acting as a resonator.[18][19] In classical music and vocal pedagogy, this phenomenon is also known as squillo.

See also

References

  1. Titze, I.R. (1994). Principles of Voice Production, Prentice Hall, ISBN 978-0-13-717893-3.
  2. Titze, I.R., Baken, R.J. Bozeman, K.W., Granqvist, S. Henrich, N., Herbst, C.T., Howard, D.M., Hunter, E.J., Kaelin, D., Kent, R.D., Löfqvist, A., McCoy, S., Miller, D.G., Noé, H., Scherer, R.C., Smith, J.R., Story, B.H., Švec, J.G., Ternström, S. and Wolfe, J. (2015) "Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization." J. Acoust. Soc. America. 137, 3005–3007.
  3. Jeans, J.H. (1938) Science & Music, reprinted by Dover, 1968.
  4. Standards Secretariat, Acoustical Society of America, (1994). ANSI S1.1-1994 (R2004) American National Standard Acoustical Terminology, (12.41) Acoustical Society of America, Melville, NY.
  5. McKendrick, J. G. (1903). Experimental phonetics. In Annual report of the board of regents of the Smithsonian institution for the year ending June 30, 1902 (pp. 241–259). Smithsonian Institution.
  6. Catford, J.C. (1988) A Practical Introduction to Phonetics, Oxford University Press, p. 161. ISBN 978-0198242178
  7. Reuter, Christoph (2009): The role of formant positions and micro-modulations in blending and partial masking of musical instruments. In: Journal of the Acoustical Society of America (JASA), Vol. 126,4, p. 2237
  8. Kawahara, Hideki; Masuda-Katsuse, Ikuyo; de Cheveigné, Alain (April 1999). "Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds". Speech Communication. 27 (3–4): 187–207. doi:10.1016/S0167-6393(98)00085-5.
  9. Ladefoged, Peter (2006) A Course in Phonetics (Fifth Edition), Boston, MA: Thomson Wadsworth, p. 188. ISBN 1-4130-2079-8
  10. Ladefoged, Peter (2001) Vowels and Consonants: An Introduction to the Sounds of Language, Maldern, MA: Blackwell, p. 40. ISBN 0-631-21412-7
  11. Deterding, David (1997) 'The Formants of Monophthong Vowels in Standard Southern British English Pronunciation', Journal of the International Phonetic Association, 27, pp. 47–55.
  12. Hayward, Katrina (2000) Experimental Phonetics, Harlow, UK: Pearson, p. 149. ISBN 0-582-29137-2
  13. Geoff Lindsey, 2013. The Vowel Space.
  14. Ladefoged, P. (1967). Three Areas of Experimental Phonetics. Oxford. p. 87.
  15. Hayward, K. (2000). Experimental Phonetics. Longman. ISBN 0-582-29137-2.
  16. Sundberg, J. (1974). "Articulatory interpretation of the 'singing formant'", Journal of the Acoustical Society of America, 55, 838–844.
  17. Frisell, Anthony (2007). Baritone Voice. Boston: Branden Books. p. 84. ISBN 978-0-8283-2181-5.
  18. "Vocal Ring, or The Singer's Formant". The National Center for Voice and Speech. Retrieved 2008-04-07.
  19. Sundberg, Johan (1987). The science of the singing voice. DeKalb, Ill: Northern Illinois University Press. ISBN 0-87580-542-6.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.