Main Catalog Informatics, Computer Engineering and Control System Analysis, Control, and Information Processing, Statistics

Generation of text containers using the neural network apparatus

Authors: Nazarenko N.V., Bekasov D.E.
Published in issue: #2(55)/2021
DOI: 10.18698/2541-8009-2021-2-673
Category: Informatics, Computer Engineering and Control \| Chapter: System Analysis, Control, and Information Processing, Statistics
Keywords: generative neural networks, GPT-2, token, tokenizer, language model, context, text, container, corpus
Published: 01.03.2021

The paper describes an implementation option of the method for generating text containers using the apparatus of neural networks and a language model, which was trained on the Russian-language texts corpus and then retrained on the classical literature corpus. For the developed method, the results of the operation of different algorithms were compared at the stage of encoding the original embedded message for a number of parameters and the best algorithm was selected for possible use in data integration problems and subsequent transmission of meaningful information through available communication channels. Based on the analysis of the developed method results, possible directions for its further development were proposed.

References

[1] Konakhovich G.F., Puzyrenko A.Yu. Komp’yuternaya steganografiya. Teoriya i praktika [Computer stenography. Theory and practice]. Kiev, MK-Press Publ., 2006 (in Russ.).

[2] Babina O.I. Linguistic steganography: state-of-the-art. Part 1. Vestnik YuUrGU. Ser. Lingvistika [Bulletin of the South Ural State University. Ser. Linguistics], 2015, vol. 12, no. 3, pp. 27–33 (in Russ.).

[3] Fang T., Jaggi M., Argyraki K. Generating steganographic text with LSTMs. arxiv.org: website. URL: https://arxiv.org/abs/1705.10742 (accessed: 20.08.2020).

[4] Cox I., Kalker T., Pakura G., et al. Information transmission and steganography. In: Digital Watermarking. Springer, 2005, pp. 15–29.

[5] Yang Z., Guo X., Chen Z., et al. RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Security, 2019, vol. 14, no. 5, pp. 1280–1295. DOI: https://doi.org/10.1109/TIFS.2018.2871746

[6] Ziegler Z., Deng Y. Neural linguistic steganography. Proc. EMNLP-IJCNLP, 2019, pp. 1210–1215. DOI: http://dx.doi.org/10.18653/v1/D19-1115

[7] Radford A., Wu J., Child R., et al. Language models are unsupervised multitask learners. URL: https://aisc.ai.science/static/slides/20190307_EhsanAmjadian.pdf (accessed: 20.08.2020).

[8] Vaswani A., Shazeer N., Parmar N., et al. Attention is all you need. Proc. NIPS, 2017. URL: https://dl.acm.org/doi/10.5555/3295222.3295349 (accessed: 20.08.2020).

[9] Greene M. LSTM vs transformer within semantic parsing. yale-lily.github.io: website. URL: https://yale-lily.github.io/public/matt_f2018.pdf (accessed: 20.08.2020).

[10] Grankin M. Russian GPT-2. github.com: website. URL: https://github.com/mgrankin/ru_transformers (accessed: 20.08.2020).