Scientific Journal

Herald of Advanced Information Technology

OPTIMIZATION OF ANALYSIS AND MINIMIZATION OF INFORMATION LOSSES IN TEXT MINING
Abstract:
Information is one of the most important resources of today's business environment. It is difficult for any company to succeed without having sufficient information about its customers, employees and other key stakeholders. Every day, companies receive unstructured and structured text from a variety of sources, such as survey results, tweets, call center notes, phone emails, online customer reviews, recorded interactions, emails and other documents. These sources provide raw text that is difficult to understand without using the right text analysis tool. You can do text analytics manually, but the manual process is inefficient. Traditional systems use keywords and cannot read and understand language in emails, tweets, web pages, and text documents. For this reason, companies use text analysis software to analyze large amounts of text data. The software helps users retrieve textual information to act accordingly The most common manual annotation is currently the most common, which can be attributed to the high quality of annotation and its “meaningfulness”. Typical disadvantages of manual annotation systems, textual information analysis systems are the high material costs and the inherent low speed of work. Therefore, the topic of this article is to explore the methods by which you can effectively annotate reviews of various products from the largest marketplace in Ukraine. The following tasks should be solved: to analyze modern approaches to data analysis and processing; to study basic algorithms for data analysis and processing; build a program that will collect data, design the program architecture for more efficient use, based on the use of the latest technologies; clear data using minimize information loss techniques; analyze the data collected, using data analysis and processing approaches; to draw conclusions from the results of all the above works. There are quite a number of varieties of the listed tasks, as well as methods of solving them. This again confirms the importance and relevance of the topic we choose. The purpose of the study is the methods and means by which information losses can be minimized when analyzing and processing textual data. The object of the study is the process of minimizing information losses in the analysis and processing of textual data. In the course of the study, recent research on the analysis and processing of textual information was analyzed; methods of textual information processing and Data Mining algorithms are analyzed.
Authors:
Keywords
DOI
10.15276/hait01.2020.4
References
1. Aronovich, E. (2012). “TF-IDF”. – Available at: https://www.cs.tau.ac.il/~nin/Courses/Workshop13a/ tf-idf.pdf. – Access date: 12.01.2020. 
2. Barzilay, R. (2011). “Using Lexical Chains for Text Summarization”. – Available at: https://www.aclweb.org/anthology/W97-0703. – Access date: 12.12.2019. 
3. Borgman, C. L. (2018). “Text Data Mining from the Author’s Perspective: Who’s Text, who’s mining, and to who’s Benefit?” – Available at: https://arxiv.org/pdf/1803.04552.pdf. – Access date: 24.12.2019. 
4. Christopher, M. D. (2014). “The Stanford CoreNLP Natural Language Processing Toolkit”. – Available at: https://www.aclweb.org/ anthology/P14-5010. – Access date: 20.01.2020. 
5. Kolesnikova, K., Lukianov, D., Gogunskii, V., Olekh, T. & Bespanskaya-Paulenka, K. (2017). “Communication management in social networks for the actualization of publications in the world scientific community on the example of the network researchgate”. Eastern-European Journal of Enterprise Technologies . Vol 4, No. 3 (88) , pp. 60- 65. – Available at: http://journals.uran.ua/eejet/article/view/108589. – Access date: 10.12.2019. 
6. Kolomiets, A. & Tsesliv, O. (2017). “Technologiya pobydovi ta upravlinnya bazami ta shovischami danih (textbook)”. Publ. KPI, 281 p. (in Ukranian). 
7. Mezentseva, O. (2019). “Intellectualization of enterprise management using business intelligence instruments”. Eastern-European Journal of Enterprise Technologies . Vol. 4, No. 3 (88) , pp. 60-65. – Available at: http://journals.uran.ua/tarp/article/view/179264. – Access date: 14.12.2019. 
8. Miller, G. A. (1956). “The magical number seven, plus or minus two: Some limits on our capacity for processing information”. Psychological review, 63(2), pp. 81-97. 
9. Morozov, V., Kalnichenko, O., Proskurin, M. & Mezentseva, O. (2019). “Investigation of Forecasting Methods of the State of Complex ITProjects with the Use of Deep Learning Neural Networks”, Advances in Intelligent Systems and Computing. – Available at: https://link.springer.com/chapter/10.1007/978-3- 030-26474-1_19. – Access date: 24.01.2020. 
10. Morozov, V., Steshenko, G. & Kolomiiets, A. (2017). “Learning through practice in IT management projects master program implementation approach”. Proceedings of the 9th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications. – Available at: https://ieeexplore.ieee.org/document/8095223. – Access date: 15.01.2020. 
11. Qingyu, Z. (2018). “Neural Document Summarization by Jointly Learning to Score and Select Sentencesю”. – Available at: https://arxiv.org/pdf/1807.02305v1.pdf. – Access date: 29.12.2019. 
12. Rakshith, V. (2017). “What is One Hot Encoding? Why And When do you have to use it?” – Available at: https://hackernoon.com/what-isone-hot-encoding-why-and-when-do-you-haveto-use-it-e3c6186d008f. – Access date: 12.01.2020. 
13. Redmore, S. (2019). “Machine Learning for Natural Language Processing”. – Available at: https://www.lexalytics.com/lexablog/machinelearning-vs-natural-language-processing-part-1. – Access date: 17.01.2020. 
14. Sinha, S. (2019). “Extractive Text Summarization using Neural Networks”. – Available at: https://arxiv.org/pdf/1802.10137.pdf. – Access date: 12.12.2019. 
15. Stilo, G. & Velardi, P. (2016). “Efficient temporal mining of micro-blog texts and its application to event discovery, Data Mining and Knowledge Discovery”, 30(2), pp. 372-402. – Available at: https://link.springer.com/article/10.1007/s10618- 015-0412-3. – Access date: 24.01.2020. 
16. “SVM (Support Vector Machine)  – Theory”. – Available at: https://medium.com/machine-learning-101/chapter2-svm-support-vector-machine-theory-f0812effc72. – Access date: 10.12.2019. 
17. Wang, F. (2019). “Feature Learning Viewpoint of AdaBoost and a New Algorithm”. – Available at: https://arxiv.org/pdf/1904.03953.pdf. – Access date: 17.01.2020. 
18. Wong, K. (2008). “Extractive Summarization Using Supervised and Semisupervised Learning”. – Available at: https://www.aclweb.org/ anthology/C08-1124. – Access date: 20.01.2020.

Received 02.02.2020
Received after revision 16.02.2020
Accepted 19.02.2020
Published:
Last download:
22 Oct 2021

Contents


[ © KarelWintersky ] [ All articles ] [ All authors ]
[ © Odessa National Polytechnic University, 2018.]