A Study: Hadoop Framework
Keywords:
Big Data, Apache Hadoop, HDFS, Map Reduce, Distributed storageAbstract
In recent years, the information that are retrieved from large datasets – known as Big Data. It’s difficult to
transfer larger files, For these reasons, we need to manipulate (e.g. edit, split, create) big data files to make them easier to
move and work with them and even split big data files to make them more manageable. For this we use Apache hadoop
software. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets
across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage. Which is based on distributed computing having HDFS file system.
This file system is written in Java and designed for portability across various hardware and software platforms. Hadoop is
very much suitable for storing high volume of data and it also provide the high speed access to the data of the application
which we want to use. But hadoop is not really a database: It stores data and you can pull data out of it, but there are no
queries involved - SQL or otherwise. Hadoop is more of a data warehousing system - so it needs a system like Map Reduce to
actually process the data.