USING MACHINE LEARNING ALGORITHMS FOR CLASSIFICATION TO IMPROVE PERFORMANCE OF BIG DATA PROCESSING (КОРИСТЕЊЕ НА АЛГОРИТМИ ЗА КЛАСИФИКАЦИЈА НА ГОЛЕМИ ПОДАТОЦИ ЗА ПОДОБРУВАЊЕ НА ПЕРФОРМАНСИТЕ ПРИ НИВНА ОБРАБОТКА)

Nasteski, Vladimir (2018) USING MACHINE LEARNING ALGORITHMS FOR CLASSIFICATION TO IMPROVE PERFORMANCE OF BIG DATA PROCESSING (КОРИСТЕЊЕ НА АЛГОРИТМИ ЗА КЛАСИФИКАЦИЈА НА ГОЛЕМИ ПОДАТОЦИ ЗА ПОДОБРУВАЊЕ НА ПЕРФОРМАНСИТЕ ПРИ НИВНА ОБРАБОТКА). Doctoral thesis, St. Kliment Ohridski University - Bitola.

Full text not available from this repository.

Official URL: http://www.fikt.uklo.edu.mk/assets/uploads/2018/10...

Abstract

The term Bid Data does not only represent a large scale data. The heterogeneity, scaling,
complexity and the privacy are just a couple of the challenges that are presented in the Big Data.
To understand the value of the Big Data, new analysis tools and methods are needed, which
differс from the analysis tools and methods used in the traditional systems. In this way, the
defects that arise in traditional systems of analysis in terms of their inability to fully cope with
the challenges of large data will be overcome. Data science is the new discipline that addresses
the challenges of large data. Machine learning algorithms have proved to be quite effective in
the processing of large data by creating a powerful learning and data analysis system.
In this doctoral dissertation several algorithms for machine learning are considered by creating
models for certain case studies. The models are tested, analyzed, optimized and improved using
a test environment developed in a visual framework for creating Apache Spark applications. The
developed models are based on several classification algorithms, where, depending on the
nature of the data and the case study, the models are improved and optimized using particular
optimization method. In some models the results lead to better predicted values. In the end the
results of the models are evaluated and compared using different evaluation metrics in order to
determine the accuracy of the model. For each model, recommendations have been made to
improve models and recommendations for future work are given appropriately.
Key words: Big Data, machine learning, mlLib, Apache Spark

Item Type:	Thesis (Doctoral)
Subjects:	Scientific Fields (Frascati) > Natural sciences > Computer and information sciences
Divisions:	Doctoral Dissertations
Depositing User:	Mr Vladimir Milevski
Date Deposited:	25 Feb 2019 13:07
Last Modified:	28 Feb 2019 10:37
URI:	https://eprints.uklo.edu.mk/id/eprint/1779

Actions (login required)

View Item