Scientific Journal

Herald of Advanced Information Technology


The architectural implementation of a machine learning system for identifying a robot on a web resource by behavioral factors is considered. The article discusses how to build software architecture for a machine learning system whose task is to determine the behavior of anonymous users. Behavioral factors for identification are a set of factors describing various components, each of which may be characteristic of the behavior of the robot. Weka software provides a mechanism for training on designed data models describing human and robot behavior. The learning algorithm – the “method of nearest neighbours”, provides the construction of images based on the largest number of combinations of factors that describe one of the models. Data models for training are stored in a file on the hard disk in the form of matrices of feature descriptions of each of the types of behaviors. The article discusses software and algorithmic solutions that will help solve the problems of combating fraudulent clicks, spam and distributed multi-session attacks on the server, as well as reducing the level of confidence in the website for search engines. The article discusses software and algorithmic solutions that will help solve the problems of fighting click fraud, spam and DDOS attacks, as well as reducing the level of trust of a web site for search engines. Because a large number of illiquid and malicious traffic reduces search positions and reduces the TIC (thematic citation index) and PR (page rank) of the site, which reduces the profitability of the web resource. A large number of illiquid and malicious traffic reduces search positions and reduces the thematic citation index and search ranking of site pages, which leads to a decrease in the profitability of a web resource. The results of this article are the proposed behavior analysis system, a description of the technical implementation shell and a system training model. The statistics for comparing malicious traffic after connecting the system to a web site are also given. The implementation language was selected as Java. Using this system possibly allows cross-platform integration of the system, both on Linux and Windows. Data collection from the site, to determine the role of the user, is carried out using JavaScript modules located on the web resource. All data collection algorithms and user information storage periods are implemented within the framework of the European Data Protection Regulation. The system also provides complete anonymity to the user. Identification is carried out exclusively using fingerprint tags.

  1. Goncharov, N. O. & Gorchakov, D. S. (2015).Rassledovanie incidentov, svjazannyh s mobil'nymi bot-setjami i vredonosnym programmnym obespecheniem, [Investigation of Incidents Related to Mobile Botnets and Malware], St.Petersburg, Russian Federation, Publ. FGAOU, No. 4, pp. 28-34 (in Russian).
  2. Verdoy, A. (2000). “Definition of Conversion Rate” [Electronic resource]. – Available at: URL: –Active link: 15.11.2019.
  3. Avinash, K. (2009). Veb-analitika, analiz informacii o posetiteljah veb-sajtov, [Web Analytics, Analysis of Information about Website Visitors], Trans. from Eng., Moscow, Russian Federation, Publ. Williams, pp. 242 (in Russian).
  4. Kashi, R. S., Lopresti, D. & Wilfong, G. T. (2002). Ocenka jeffektivnosti algoritmov obrabotki tablic, [Evaluation of the Effectiveness of Table Processing Algorithms] International journal of analysis and recognition of documents. – Moscow, Russian Federation, Publ. ICPR. N. 3, pp. 140-153 (in Russian).
  5. Sil'va, S., Horhe, A. & Torgo, L. (2008). Razrabotka skvoznogo metoda dlja izvlechenija informacii iz tablic, [Develop an end-to-end Method for Extracting Information from Tables], St. Petersburg, Russian Federation, Publ.ITLab, pp. 144-171 (in Russian).
  6. Miteva, R. (2018). “4 Highlights to look for in a Fraud Detection Solution” [Electronic resource]. – Available at: URL:  –Active link: 15.11.19.
  7. Hajkin, S. (2008). “Neural Networks: Complete Course, 2nd Edition”, Trans. from Eng., Moscow, Russian Federation,Publ. Williams, pp. 1103 (in Russian).
  8. Wilbur, C., I.Zhu. (2015). “Click Fraud Monitoring”. New York: USA, Publ. Marketing Science, 25 p.
  9. Witeen, H. (1999). “Weka: Practical Machine Learning Tools and Techniques with Java Implementations”, [Electronic resource]. – Available at: URL:– Active link: 15.11.19.
  10. Shekyan, S. & Vinegar, B. (2015). Opredeljaem Phantom-nyh botov, [Define Phantom-them Bots] [Electronic resource] – Available at: URL: – Active link: 15.11.19 (in Russian).
  11. Uossermen, F. (1992). “Neurocomputer Technology: Theory and Practice”. Moscow, Russian Federation, Publ. Mir, pp. 184 (in Russian).
  12. Shaporin, V. O., Tishin,P. M, Kopytchuk, N. B. & Shaporin, R. O. (2013). Nechetkielingvisticheskiemodeliobespechenijabezopasnostikomp'juternyhsetej, [Fuzzy Linguistic Models of Computer Network Security], Modern information and electronic technologies: 14-th international scientific-practical conference, Odessa, Ukraine, pp. 155-156 (in Russian).
  13. Laros, T. (2004). “Discovering Knowledge in Data: An Introduction to Mining”. New Jersy: USA, Publ. Spring, 240 p.
  14. Gafner, V. (2010). Informacionnaja bezopasnost': uchebnoe posobie, [Information Security: a Tutorial]. Rostov na Donu: Russian Federation, PublFeniks. 324p. (in Russian).
  15. Shaporin, V. O. Tishin, P. M, Kopytchuk, N. B. & Shaporin R. O. (2014). Razrabotka nechetkih lingvisticheskih modelej atak dlja analiza riskov v raspredelennyh informacionnyh sistemah, [Development of Fuzzy Linguistic Attack Models for Risk Analysis in Distributed Information Systems]. Modern information and electronic technologies: 15-th international scientific-practical conference, Odessa, Ukraine, pp. 131-132 (in Russian).
  16. Nesterenko, S. A., Tishin, P. M. & Makoveckij, A. S. (2013). Model' ontologii apriornogo podhoda prognozirovanija problemnyh situacij v slozhnyh vychislitel'nyh sistemah, [Ontology Model of the Priori Approach for Predicting Problem Situations in Complex Computing Systems], Electrotechnic and computer systems. Kiev: Ukraine, Publ. Tehnika, No.0, pp. 111-119 (in Russian).
  17. Kopytchuk, N. B. Tishin, P. M. & Cjurupa, M. V. (2014). Procedura sozdanija nechetkih modelej analiza riskov v slozhnyh vychislitel'nyh sistemah, [The Procedure for Creating Fuzzy risk Analysis Models in Complex Computing Systems]. Electrotechnic and computer systems, Kiev: Ukraine, Publ. Tehnika, No. 13, pp. 215-222 (in Russian).
  18. Ruban, O. (2019). “Volterra Neural Network Construction in the Nonlinear Dynamic Systems Modeling Problem”, Herald of Advanced Information Technology, Odessa, Ukraine, Publ. Science and Technical, Vol. 2, No. 1. pp. 24-28 [Electronic resource]. – Available at: URL: – Active link: 15.11.2019.
  19. Sivacorn, S., Polakis, D. & Keromitis, D. (2008). “I am not a person: hacking Google reCAPTCHA”. New York: USA, Publ. Columbia University, 13 p.
  20. Thomason, A. (2009). “Blog Spam: Akismet Review”. San Francisco:USA, Publ. Six Apart, 5 p.
  21. Dyball, J. (2009). “Anti-fraud Voter Registration and Voting System using a data Card”. New York: USA, Publ. Abloy, 14 p.
  22. Ge, L. (2007). “Real-time Click Fraud Detecting and Blocking System”, Tennessee, USA, Publ. USPTO. 19 p.
Last download:
4 July 2020

[ © KarelWintersky ] [ All articles ] [ All authors ]
[ © Odessa National Polytechnic University, 2018.]