Classification of leasing documentation using the machine learning methods
| Authors: Nasibullin D.I. | |
| Published in issue: #2(97)/2025 | |
| DOI: | |
Category: Informatics, Computer Engineering and Control | Chapter: Information Technology. Computer techologies. Theory of computers and systems |
|
Keywords: machine learning, text documentation classification, error matrix, document flow automation, leasing documentation, tree decision making method, nearest neighbor method, support vector method, Bayesian classifier |
|
| Published: 16.04.2025 | |
The paper is devoted to a relevant problem of using the documents automated classification, which is associated with the need to reduce the time and number of errors in processing a large number of documents. Various leasing documents are collected and pre-divided into several types. The paper defines the main machine learning algorithms designed for data classification. It provides constructed graphs of the test and learning samples required in determining the most acceptable hyper parameters of the models, which allows achieving the best result in forecasting a learned
model. The paper analyzes and compiles a comparative characteristic of the learned models using the studied data. It indicates that the na?ve Bayesian classifier appears to be the most suitable machine learning model for classifying the leasing documentation. The paper emphasizes that its advantage over the other models is associated with the high speed in learning and forecasting, as well as in predicting the document type with more than 90 % accuracy.
References
[1] Gusev P.Y. Text processing and preparation of vectorization models for the scientific text classification software package. Modeling, Optimization and Information Technology, 2021, No. 9 (1). (In Russ.). https://doi.org/10.26102/2310-6018/2021.32.1.010
[2] What do I need to apply for a lease? URL: https://www.ileasing.ru/about/clients/on-leasing/detail/chto-nuzhno-dlya-oformleniya-lizinga / (accessed 20.02.2025).
[3] Chizhik A.V., Zherebtsova Yu.A. Creating a chatbot: an overview of architectures and vector representations of text. International Journal of Open Information Technologies, 2020, No. 7 (8), pp. 50-56. (In Russ.).
[4] Burlaeva E.I. Review of methods for classifying text documents based on the machine learning approach. Software Engineering, 2017, No. 7 (8), pp. 328-336. (In Russ.). https://doi.org/10.17587/prin .8.328-336
[5] Popova O.A. Analysis of vectorization methods of text documents. Bulletin of the Russian State Technical University, 2023, No. 85, pp. 96-102. (In Russ.). https://doi.org/10.21667/1995-4565-2023-85-96-102
[6] Scikit-learn: machine learning in Python — scikit-learn 1.6.1 documentation. URL: https://scikit-learn.org/stable/index.html (accessed 15.02.2025).
[7] Bozhenko V.V., Klyukanov V.K. Application of machine learning algorithms in classification and clustering problems. Processing, transmission and protection of information in computer systems. The Second International Scientific Conference: collection of art. St. Petersburg, GUAP Publ., 2022, pp. 28-33. (In Russ.). https://doi.org/10.31799/978-5-8088-1701-2-2022-2-28-33
[8] Babaev A.M., Shemyakina M.A. Review of classical machine learning methods in the context of solving classification problems. Forum of Young Scientists, 2018, No. 11 (27), pp. 137-142. (In Russ.).
[9] Zolina E.V., Gamova N.A. Naive Bayes classifier for solving the problem of sentiment analysis of texts. Step into Science, 2019, No. 4, pp. 140-142.
[10] Mikhaylichenko A.A. An analytical review of methods for evaluating the quality of classification algorithms in machine learning tasks. Bulletin of ASU, 2022, No. 4 (311), pp. 52-59. (In Russ.). https://doi.org/10.53598/2410-3225-2022-4-311-52-59
