Hadoop - YARN

Yarn

Brief History

Hadoop 1.x

  • Core components: MapReduce and HDFS
  • Structure:
    HDFS:
      NameNode + DataNodes
    
    MapReduce (MRv1):
      JobTracker + TaskTrackers
    
    • JobTracker did everything: resource management, job scheduling, tracking tasks, failure handling
      • Also a single point of failure and a scaling bottleneck
    • MapReduce = computation engine + resource manager in one
    • Only MapReduce jobs were supported

Hadoop 2.x

  • Now Hadoop has three logical layers:
    HDFS: storage layer
      NameNode + DataNodes
    
    YARN: resource management layer
      ResourceManager + NodeManagers + ApplicationMasters
    
    MapReduce (MRv2): computation layer
      Runs on top of YARN as one of many application frameworks
    
  • Key improvements
    • Separated resource management from computation
    • MapReduce no longer monopolizes the cluster
    • Allowed more engines: Spark, Tez, Flink, Storm, etc.
    • Better scalability and cluster utilization

Hadoop 3.x

  • Structure:
    Namespace Federation:
        Multiple NameNodes managing independent namespaces
        Shared pool of DataNodes
    
    YARN:
        Handles multi-framework distributed scheduling
    
  • Improvements:
    • Federation & NameNode high availability improvements
    • Erasure coding (reduces replication overhead)
    • More NameNodes with shared DataNodes (HDFS federation)

Definition

  • Yet Another Resource Negotiator
  • It is the orchestrator of the Hadoop MapReduce jobs starting from Hadoop 2.x.
  • As the amount of data increases and 4k nodes are not enough, starting from 2.x. the two main components of Hadoop changed from MapReduce and HDFS to Yarn and HDFS.

Daemons

: provides resource management

Two Long Running

yarn architecture
  • Resource manager - 1 / cluster
    • Scheduler + ApplicationsManager
    • Manages resource and schedules across the cluster, especially large clusters
    • runs on Master node
  • Node managers - 1 / node, similar to kubectl
    • launches and monitors job containers
    • runs on worker nodes

Dynamic Components

(They are only created when a job have to run)

  • Daemons create two main types of Yarn components:

    • Containers (to host tasks)
      • Created by the RM upon request
      • Runs with a certain amount of resources (memory, CPU) on a worker node
      • An application can run as one or more tasks in one or more containers.
    • MapReduce Application Master (MRAppMaster)
      • One per application
      • Framework/application specific
      • Requests more containers to run application tasks

Anatomy of MapReduce job

5 Phases:

  1. Job Submission
  2. MRAppMaster Initialization
  3. MapReduce Job Initialization
  4. MapReduce Task Execution
  5. Job Completion
yarn mapreduce anatomy

Job submission

  • step 1-4

  • When you run MapReduce Job, it creates an internal JobSubmitter instance and calls submitJobInternal(). This function does the following:

    1. submit python code, a library convert it to a java file
    2. Asks the RM for a new **App ID**, used as MapReduce Job ID (step 2)
    3. Checks the job configurations, i.e., output, input paths, etc before it copies anything (like if there existing the same output folder) as a part of job submission
    4. Compute input splits from block information returned from RM
    5. Copies the resources needed to run the job to the shared file system (step 3)
      1. Job jar file (or source files), configuration file, input splits(when the input is small)
    6. Submits the job by calling submitApplication() on the RM (step 4)

MRAppMaster Initialization

  • Step 5a, 5b
  • RM receives a call to submitApplication() and hands off the request to the YARN scheduler
  • The scheduler allocates a container
    • scheduler is very parallel and close to the resource manager and people sometimes consider them as the same thing
  • The RM launches the MRAppMaster Process through the Node Manager (step 5a, b)

MapReduce Job Initialization

  • by AppMaster: Step 6, 7, 8
  • MRAppMaster initializes the MapReduce job by creating bookkeeping objects to keep track of the job’s progress (step 6)
  • Retrieves the input splits (not copying, but retrieving their locations) computed in the client from the shared file system (step 7)
  • Decides how to run the tasks
  • Example of resource request:
    • Resource name (hostname, rack)
    • Priority (within this application, usually map task is higher)
    • Resource requirements: memory, CPU, etc
    • Number of containers
  • Subtopics:
    • IF the tasks are small?
      • MRAppMaster will run the task in the same memory as itself - Uberization: uber the data from HDFS to the memory as the file size is small.
      • Because Overhead of allocating and running tasks in the new containers outweighs the parallel computation benefit
      • Small job: less than 10 mappers, one reducer, and input size < the size of HDFS block
    • If the tasks are big?
      • If the job is not uberized, MRAppMAster requests containers for all the map and reduce tasks from the RM.
      • Request specifies:
        • Priority (map > reduce)
        • Memory (default 1GB)
        • CPU (default 1 virtual core)
      • Map tasks are assigned based on data locality

MapReduce task execution

  • Implemented for Each Task in the job(step 9ab - 10)
  • Example of Launch context:
    • Container ID
    • Commands (to start the app)
    • Environment( configuration)
    • Local Resources (app binary, HDFS files)
  • Once a request is fulfilled – a container on a particular node is assigned to a specific task. (Step 9a)
  • MRAppMaster starts a MapReduce application whose main class is YarnChild in that task via the Node Manager (Step 9b)
  • Localizes the resources needed by the tasks including the job configuration and the source file(s) (Step 10)
  • Requests for reduce tasks are not made by RM until 5% completion of map tasks

MapReduce Job Completion

  • Once the MRAppMaster receives a completion notification from the last task, it changes the job status to “successful” (the _Success file in the output folder on GCP)
  • Job polls for status and learns that the job has completed successfully
  • MRAppMaster prints a message, some statistics, and counters to the console
  • MRAppMaster and its tasks clean up their working state

Edit YARN Daemon Config

  • Check the hadoop.env file and look for YARN configurations starting from line number 11.
  • Can change things like:
    • Your scheduling algorithm that defines the priority of execution for your job tasks. By default, it uses Capacity Scheduler.
    • Whether your data should be compressed
    • Memory allocated for your resource manager, your node manager, etc
    • The minimum and maximum size of your input split by adding -D mapred.min.split.size -D mapred.max.split.size (to control the number of mappers that will be created)

Spark on Yarn: Client mode

spark on yarn client mode
  • Because the driver runs on a client, so it requires:
    • interactive components like spark shell or pyspark
    • good for debugging and testing
  • The driver runs on the cluster in the YARN application master
    • The entire of program runs in the cluster
    • production jobs

Reading

Hadoop YARN Architecture

Hadoop YARN Commands


Last modified on 2025-11-29