Different segmentation criteria comparison in time-domain speech segmentation problem
Authors: Zhukova A.B., Maslennikov A.L. | |
Published in issue: #2(31)/2019 | |
DOI: 10.18698/2541-8009-2019-2-436 | |
Category: Informatics, Computer Engineering and Control | Chapter: System Analysis, Control, and Information Processing, Statistics |
|
Keywords: speech recognition, voice control, speech segmentation, segmentation criteria, Savitzky-Golay filter, moving-average filter, moving-variance filter |
|
Published: 07.02.2019 |
Speech recognition is a complex technical problem which is of interest of many scientists and commercial companies. Its solution in time-domain typically requires preliminary speech segmentation, i.e. extraction of words, syllables or phonemes. In order to do that different segmentation criteria are used. Typically, those criteria are associated with signal power or signal changes frequency during a specified time interval. Segmentation criteria could be formulated differently, that results in different algorithmically and computational complexity. In this paper different segmentation criteria (associated with signal power) for extracting words from a speech signal are comparing.
References
[1] Plotnikov V.N., Sukhanov V.A., Zhigulevtsev Yu.N. Rechevoy dialog v sistemakh upravleniya [Speech dialogue in control systems]. Moscow, Mashinostroenie Publ., 1988 (in Russ.).
[2] Rabiner L.R., Schafer R.W. Digital processing of speech signals. Pearson, 1978. (Russ. ed.: Tsifrovaya obrabotka signalov. Moscow, Radio i svyaz’ Publ., 1981.)
[3] Sapozhkov M.A. Rechevoy signal v kibernetike i svyazi [Speech signal in cybernetics and communication]. Moscow, Svyaz’izdat Publ., 1963 (in Russ.).
[4] Vintsyuk T.K. Analiz, raspoznavanie i interpretatsiya rechevykh signalov [Analysis, recognition and interpretation of speech signals]. Kiev, Naukova dumka Publ., 1987 (in Russ.).
[5] Alimuradov A.K., Churakov P.P. Review and classification methods for processing speech signals in the speech recognition systems. Izmerenie. Monitoring. Upravlenie. Kontrol’ [Measuring. Monitoring. Management. Control], 2015, no. 2(12), pp. 27–35 (in Russ.).
[6] Sorokin V.N., Tsyplikhin A.I. Segmentation and recognition of vowels. Informatsionnye protsessy [Information Processes], 2004, vol. 4, no. 2, pp. 202–220 (in Russ.).
[7] Tsyplikhin A.I., Sorokin V.N. Speech segmentation into principal elements. Informatsionnye protsessy [Information Processes], 2006, vol. 6, no. 3, pp. 177–207 (in Russ.).
[8] Savitzky A., Golay M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem., 1964, vol. 36, no. 8, pp. 1627–1639. DOI: 10.1021/ac60214a047 URL: https://pubs.acs.org/doi/10.1021/ac60214a047
[9] Savitzky A. A historic collaboration. Anal. Chem., 1989, vol. 61, no. 15, pp. 921A–923A. DOI: 10.1021/ac00190a003 URL: https://pubs.acs.org/doi/10.1021/ac00190a003
[10] Steinier J., Termonia Y., Deltour J. Smoothing and differentiation of data by simplified least square procedure. Anal. Chem., 1972, vol. 44, no. 11, pp. 1906–1909. DOI: 10.1021/ac60319a045 URL: https://pubs.acs.org/doi/10.1021/ac60319a045