Home
Use Case
Type:
Big Data / Distributed Processing Stack
Typical Use Case:
Batch processing, data lakes, ETL pipelines, large-scale analytics
Famous Usage:
Yahoo, Facebook data warehouses, enterprise data lakes
🐘 Hadoop Stack
HDFS · YARN · MapReduce
MR
MapReduce
Distributed Computation Framework
Map phase parallelizes data transformation tasks
Reduce phase aggregates and merges results
Fault tolerance with task retries and speculative execution
↓
Y
YARN
Resource Manager & Scheduler
ResourceManager schedules jobs across cluster nodes
NodeManagers manage containers and resources per node
Supports MapReduce, Spark, Tez, and other frameworks
↓
HD
HDFS
Distributed File System
NameNode manages metadata, file blocks, replication
DataNodes store actual data blocks with redundancy
Optimized for large files, high throughput, fault tolerance
Flow:
MapReduce jobs run on YARN → YARN schedules containers → HDFS provides distributed storage with replication.