Use Case

Type: Big Data / Distributed Processing Stack

Typical Use Case: Batch processing, data lakes, ETL pipelines, large-scale analytics

Famous Usage: Yahoo, Facebook data warehouses, enterprise data lakes

🐘 Hadoop Stack

HDFS · YARN · MapReduce

MR

MapReduce

Distributed Computation Framework

Map phase parallelizes data transformation tasks
Reduce phase aggregates and merges results
Fault tolerance with task retries and speculative execution

↓

Y

YARN

Resource Manager & Scheduler

ResourceManager schedules jobs across cluster nodes
NodeManagers manage containers and resources per node
Supports MapReduce, Spark, Tez, and other frameworks

↓

HD

HDFS

Distributed File System

NameNode manages metadata, file blocks, replication
DataNodes store actual data blocks with redundancy
Optimized for large files, high throughput, fault tolerance

Flow: MapReduce jobs run on YARN → YARN schedules containers → HDFS provides distributed storage with replication.