|

Audio-to-video image conversion technologies

Authors: Karpov I.E., Moskalik A.A.
Published in issue: #2(97)/2025
DOI:


Category: Informatics, Computer Engineering and Control | Chapter: Information Technology. Computer techologies. Theory of computers and systems

Keywords: audio conversion technologies, visual images, hearing impairments, deep learning, natural language processing, computer vision, audio visualization, audiovisual methods, social inclusion, quality of life
Published: 09.04.2025

The paper describes a study of the audio-to-video image conversion technologies that are especially important for people with the hearing impairments. The study focuses on developing a technology that makes it possible to transform accurately and completely the emotional and contextual aspects of an audio message into the video format. The paper presents the current audio data visualization methods, their limitations, and a new approach that uses a combination of deep learning, natural language processing, and computer vision. It focuses on practical application of these developments, including educational and communication scenarios, as well as on the results of experiments with the volunteers confirming a significant improvement in audio visualization compared to the existing technologies.


References

[1] Sung-Bin K., Senocak A., Ha H., Owens A., Oh T.-H. Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment. URL: https://sound2scene.github.io/ (accessed October 15, 2024).

[2] Pambou J. Generating Images from Audio with Machine Learning. URL: https://www.comet.com/site/blog/generating-images-from-audio-with-machine-learning/ (accessed October 15, 2024).

[3] Makeev M.A. Audio signal analysis using the fast Fourier transform algorithm. Research and development in the field of mechanical engineering, energy and management: Proc. of XXIII International Scientific and Technical conference of students, postgraduates and young scientists. Gomel, Sukhoi State Technical Technical University Publ., 2023, pp. 262–265. (In Russ.). URL: https://elib.gstu.by/handle/220612/29267 (accessed October 15, 2024).

[4] Averbukh V.L. On the theory of computer visualization. Computing Technologies, 2005, vol. 10, no. 4, pp. 21–51. (In Russ.).

[5] Akimenko V.M. Features of the use of visualization technologies in correctional work with children with hearing impairments. Electronic scientific journal “Personality in a Changing World”, 2018, vol. no. 6, pp. 173–188. (In Russ.). https://doi.org/10.23888/humJ20181173-188

[6] Ogorodnikov A. N. The choice of signal analysis intervals for speech recognition. Bulletin of Tomsk State University, 2003, no. 280, pp. 295–304. (In Russ.).

[7] Agranovsky A.V., Lednov D.A. Theoretical aspects of algorithms for processing and classifying speech signals. Moscow, Radio and Communications, 2004, 164 p. (In Russ.).

[8] Konev A.A. Model and algorithms of speech signal analysis and segmentation. Ph. D. Diss. Samara, 2007, 142 p. (In Russ.).

[9] Dvoryankin S.V., Nagornykh I.M. On the issue of sound – image – sound conversion technology. Special Equipment and Communications, 2013, no. 1, pp. 28–32. (In Russ.).

[10] Ivanov S.Yu., Arzhanova M.Yu. Software development for visualization and analysis of audio files. New Information technologies in automated systems, 2010, no. 13, pp. 196–198. (In Russ.).

[11] Makarov Ya.V. Investigation of the possibility of distinguishing features in the process of audio analysis. Globus: Technical Sciences, 2019, pp. 5–11. (In Russ.).