High Speed Classification of Massive Data Streaming Using Spark
| Author(s) | : | M.JALASRI, AARTHIKA.K, VISHNU PRIYA.A |
| Institution | : | Asst. Professor, Information Technology, JeppiaarMaamallan Engineering College |
| Published In | : | Vol. 5, Issue 2 — February 2018 |
| Page No. | : | 669-673 |
| Domain | : | Engineering |
| Type | : | Research Paper |
| ISSN (Online) | : | 2348-4470 |
| ISSN (Print) | : | 2348-6406 |
Big data analytics deals with the mining of massive and high speed data streams with contemporarychallenges—In this paper we perform an efficient nearest neighbor solution to classify high-speed and massive datastreams using Apache Spark. A distributed metric tree has been designed to organize the case-base and consequently tospeed up the neighbor searches. DS-RNGE algorithm is an instance selection method to find out the object in the nearestneighbor searches .Resilient distributed data set is a base to check the record in searches .Smart partitioning of theincoming data streams to parallelize the proposed algorithm using Apache Kafka which is a Spark tool to process thehuge amount of data.. Spark is able to load data into memory and query it repeatedly, making it suitable for iterativeprocesses (e.g., machine learning algorithms). Pseudo Random mode is used to partition the data in effective mannercompared to references. We use the hashing algorithm to detect the duplicate records. Our work is used sequentially forreal time entities of analyzing the live streaming records in nearest neighbor searches.
M.JALASRI, AARTHIKA.K, VISHNU PRIYA.A, “High Speed Classification of Massive Data Streaming Using Spark”, International Journal of Advance Engineering and Research Development (IJAERD), Vol. 5, Issue 2, pp. 669-673, February 2018.








