Apache YARN, which stands for 'Yet Another Resource Negotiator', is Hadoop's cluster resource management system.
YARN provides APIs for requesting and working with Hadoop's cluster resources. These APIs are usually used by components of Hadoop's distributed frameworks such as MapReduce, Spark, Tez etc. which are build on top of YARN. User applications typically do not use the YARN APIs directly. Instead, they use higher level APIs provided by the framework (MapReduce, Spark, etc.) which hide the resource management details from the user.
Apache YARN, which stands for 'Yet Another Resource Negotiator', is Hadoop's cluster resource management system.
YARN provides APIs for requesting and working with Hadoop's cluster resources. These APIs are usually used by components of Hadoop's distributed frameworks such as MapReduce, Spark, Tez etc. which are build on top of YARN. User applications typically do not use the YARN APIs directly. Instead, they use higher level APIs provided by the framework (MapReduce, Spark, etc.) which hide the resource management details from the user.
The basic idea of YARN is to split the functionality of resource management and job scheduling/monitoring into separate daemons. YARN consists of the following different components
ResourceManager - The ResourceManager is a global component or daemon, one per cluster, which manages the requests to and resources across the nodes of the cluster.
NodeManager - NodeManger runs on each node of the cluster and is responsible for launching and monitoring containers and reporting the status back to the ResourceManager
ApplicationMaster is a per-application component that is responsible for negotiating resource requirements for the resource manager and working with NodeManagers to execute and monitor the tasks.
Container Container is YARN framework is a unix process running on the node that executes an application-specific process with a constrained set of resources (Memory, CPU, etc.)
The basic idea of YARN is to split the functionality of resource management and job scheduling/monitoring into separate daemons. YARN consists of the following different components
ResourceManager - The ResourceManager is a global component or daemon, one per cluster, which manages the requests to and resources across the nodes of the cluster.
NodeManager - NodeManger runs on each node of the cluster and is responsible for launching and monitoring containers and reporting the status back to the ResourceManager
ApplicationMaster is a per-application component that is responsible for negotiating resource container requirements from the ResourceManager, and working with NodeManagers to execute and monitor the container tasks.
Container Container is YARN framework is a unix process running on the node that executes an application-specific process with a constrained set of resources (Memory, CPU, etc.). Container in YARN is an abstract notion and is not a physical component.
The YARN ResourceManager is a global component or daemon, one per cluster, which manages the requests to and resources across the nodes of the cluster.
The ResourceManager has two main components - Scheduler and ApplicationsManager
Scheduler - The scheduler is responsible for allocating resources to and starting applications based on the abstract notion of resource containers having a constrained set of resources.
ApplicationManager - The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
The YARN ResourceManager is a global component or daemon, one per cluster, which manages the requests to and resources across the nodes of the cluster.
The ResourceManager has two main components - Scheduler and ApplicationsManager
Scheduler - The scheduler is responsible for allocating resources to and starting applications based on the abstract notion of resource containers having a constrained set of resources.
ApplicationManager - The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
YARN scheduler is responsible for scheduling resources to user applications based on a defined scheduling policy. YARN provides three scheduling options - FIFO scheduler, Capacity scheduler and Fair scheduler.
FIFO Scheduler - FIFO scheduler puts application requests in queue and runs them in the order of submission.
Capacity Scheduler - Capacity scheduler has a separate dedicated queue for smaller jobs and starts them as soon as they are submitted.
Fair Scheduler - Fair scheduler dynamically balances and allocates resources between all the running jobs.
You can configure the ResourceManager to use CapacityScheduler by setting the value of property 'yarn.resourcemanager.scheduler.class' to 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler' in the file 'conf/yarn-site.xml'.
You can configure the ResourceManager to use FairScheduler by setting the value of property 'yarn.resourcemanager.scheduler.class' to 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler' in the file 'conf/yarn-site.xml'.
ResourceManager is responsible for scheduling applications and tracking resources in a cluster. Prior to Hadoop 2.4, the ResourceManager does not have option to be setup for HA and is a single point of failure in a YARN cluster.
Since Hadoop 2.4, YARN ResourceManager can be setup for high availability. High availability of ResourceManager is enabled by use of Active/Standby architecture. At any point of time, one ResourceManager is active and one or more of ResourceManagers are in the standby mode. In case the active ResourceManager fails, one of the standby ResourceManagers transitions to a active mode