An Overview of Hadoop YARN and its Advantages
In 2012, ‘Yet Another Resource Negotiator’ as the acronym YARN stands for, became a Hadoop subproject within the Apache Software Foundation (ASF). As one of the key features of the Hadoop v2.0 update, it has contributed significantly to Hadoop functioning and has expanded the scope of Hadoop functions significantly. Read on to find out more on what YARN involves.
Shortcomings of Hadoop v1.0 which gave rise to YARN
The Hadoop version 1.0 involved 2 major components namely; HDFS (Hadoop Distributed File System) and MapReduce, in which the batch processing framework MapReduce was in close association to HDFS. Because of relying solely on MapReduce, Hadoop faced a lot of shortcomings.
- MapReduce had to take care of both, resource management as well as processing.
- Its Job Tracker was under a lot of loads due to the multiple functions that it had to take care of such as; scheduling, monitoring the job processing, resource allocation, etc…
- Due to a single Job Tracker, there was a scalability bottleneck.
- All in all, the system was computationally inefficient as far as resource utilization was concerned.
- Systems with Hadoop 1.0 could only run MapReduce applications.
Introducing Hadoop YARN
All these difficulties led to the introduction of YARN IN Hadoop version 2.0. The premise behind introducing YARN was to divert Resource Management and Job Scheduling from Map Reduce to YARN. Before it received its present name, YARN was called NextGen MapReduce or MapReduce 2, as it had greatly removed the burden from MapReduce.
However, it enabled Hadoop to incorporate various other features and introduced a brand new approach that enabled Hadoop to support various types of processing and a wide range of applications.
Through YARN, the different data processing engines such as batch processing, graph processing, stream processing, and interactive processing can run as well as process data stored in HDFS. This, in addition to the dynamic resource allocation introduced by YARN, has made the system highly efficient, even for data processing with larger volumes.
The efficiency and versatility that YARN traduced into Hadoop are attributable to the following components in YARN:
Accepting the jobs from the users, scheduling them and allocating resources is the task of the Resource Manager which runs on the master daemon. Efficient cluster utilization is done here, which results in the optimum occupation of resources, which was a shortcoming in the previous version of Hadoop.
Application resource needs are met by the Application Master. Getting the resources from the Resource Manager, tracking the application progress and monitoring its status are the tasks that it accomplishes. It works in close association with the Node Manager to execute and monitor the allocated tasks.
It is responsible for the task execution of every single node in the Hadoop cluster and works on slave daemons. It works essentially as a monitoring and reporting agent.
Controlled by Node Managers in association with the Application Master, they are a collection of resources which can be initiated by the Container Launch Context (CLC) for each node.
YARN’s Contribution to Hadoop v2.0
Not only did YARN eliminate the various shortcomings of Hadoop 1.0, but it also allowed Hadoop to accomplish much more and added to Hadoop’s expanse of services and accomplishments.
- Alongside the MapReduce batch jobs, Hadoop YARN clusters can now run stream data processing and also perform interactive querying.
- A wide variety of processing approaches can now be supported by Hadoop with an extensive array of applications.
- YARN has vastly improved Hadoop’s cluster utilization in a dynamic utilization pattern, which was not the case with the static approach as exhibited formerly with MapReduce.
- Due to the immense compatibility of YARN with MapReduce, the existing projects in Hadoop 1.0 and MapReduce applications can be conveniently moved to Hadoop v2.0, without any hassle.
- Last, but not the least, YARN’s framework can run even those applications which were previously unable to run using MapReduce, which was a serious shortcoming of Hadoop 1.0.
Owing to the exclusivity and versatility introduced by YARN, Hadoop users have been immensely benefited. Having evolved simply from being a Resource Manager, YARN now functions to efficiently aid in Big Data utilization and processing.