|

Voice pitch frequency detection via spectrum peaks search with additional frequency weight functions

Authors: Zhukova A.B., Maslennikov A.L.
Published in issue: #12(41)/2019
DOI: 10.18698/2541-8009-2019-12-556


Category: Informatics, Computer Engineering and Control | Chapter: System Analysis, Control, and Information Processing, Statistics

Keywords: speech recognition, voice pitch frequency, formants analysis, voice spectrum, smoothing filter, frequency weighted functions, Savitzky-Golay filter, spectrum peaks
Published: 19.12.2019

In speech recognition problems two main tasks are related to determination of the voice pitch frequency and formants frequencies. Using those frequencies, the exact phoneme could be recognized with some probability. In this paper the method of determination voice pitch frequency is described. Method is based on known idea of finding spectrum peaks but with addition of spectrum smoothing and frequency weighted functions of two types. Those types are exponential and linear. The described method was applied to the set of experiments with three men and three women. The results shown that there is exists the critical cut-off frequency of the smoothing filter and that the incorporating frequency weighted function increase accuracy of the voice pitch frequency determination.


References

[1] Bondarenko M.F., Rabotyagov A.V., Shchepkovskiy S.V. Speech recognition: stages of development, modern tech-nologies and prospects of their application. Bionika intellekta [Bionics of Intelligence], 2010, no. 2(73), pp. 164–168 (in Russ.).

[2] Ortega-García J., González-Rodríguez J. Overview of speech enhancement techniques for automatic speaker recognition. Proc. ICSLP’96, 1996, vol. 2, pp. 929–932.

[3] Plotnikov V.N., Sukhanov V.A., Zhigulevtsev Yu.N. Rechevoy dialog v sistemakh upravleniya [Speech dialogue in control systems]. Moscow, Mashinostroenie Publ., 1988 (in Russ.).

[4] Derkach M.F., ed. Dinamicheskie spektry rechevykh signalov [Dynamical spectrum of speech signals]. L’vov, Vishcha shkola Publ., 1983 (in Russ.).

[5] Sorokin V.N. Teoriya recheobrazovaniya [Speech production theory]. Moscow, Radio i svyaz’ Publ., 1985 (in Russ.).

[6] Chistovich L.A., Ventsov A.V., Granstrem M.P., et al. Fiziologiya rechi. Vospriyatie rechi chelovekom [Speech physiology. Speech perception by a human]. Leningrad, Nauka Publ., 1976 (in Russ.).

[7] Labutin V.K., Molchanov A.P. Slukh i analiz signalov [Hearing and signal analysis]. Moscow, Energiya Publ., 1967 (in Russ.).

[8] Vintsyuk T.K. Analiz, raspoznavanie i interpretatsiya rechevykh signalov [Analysis, recognition and interpretation of speech signals]. Kiev, Naukova dumka Publ., 1987 (in Russ.).

[9] Zlokazov V.B. Metod dlya avtomaticheskogo poiska pikov v gamma-spektrakh [Automated search method for gamma-spectral peaks]. Dubna, OIYaI Publ., 1981 (in Russ.).

[10] Markel J.D., Gray A.H. Jr. Linear prediction of speech. Springer, 1976. (Russ. ed.: Lineynoe predskazanie rechi. Moscow, Svyaz’ Publ., 1980.)

[11] Savitzky A.A., Golay M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem., 1964, vol. 36, no. 8, pp. 1627–1639. DOI: 10.1021/ac60214a047 URL: https://pubs.acs.org/doi/abs/10.1021/ac60214a047

[12] Savitzky A.A. A historic collaboration. Anal. Chem., 1989, vol. 61, no. 15, pp. 921A–923A. DOI: 10.1021/ac00190a003 URL: https://pubs.acs.org/doi/10.1021/ac00190a003