Home
Use Case
Type: Big Data / Distributed Processing Stack
Typical Use Case: Batch processing, data lakes, ETL pipelines, large-scale analytics
Famous Usage: Yahoo, Facebook data warehouses, enterprise data lakes

🐘 Hadoop Stack

HDFS · YARN · MapReduce

MR

MapReduce

Distributed Computation Framework
  • Map phase parallelizes data transformation tasks
  • Reduce phase aggregates and merges results
  • Fault tolerance with task retries and speculative execution
Y

YARN

Resource Manager & Scheduler
  • ResourceManager schedules jobs across cluster nodes
  • NodeManagers manage containers and resources per node
  • Supports MapReduce, Spark, Tez, and other frameworks
HD

HDFS

Distributed File System
  • NameNode manages metadata, file blocks, replication
  • DataNodes store actual data blocks with redundancy
  • Optimized for large files, high throughput, fault tolerance
Flow: MapReduce jobs run on YARN → YARN schedules containers → HDFS provides distributed storage with replication.