Home
Use Case
Type:
Big Data / Real-Time Analytics Stack
Typical Use Case:
Stream processing, batch analytics, machine learning pipelines, ETL
Famous Usage:
Netflix recommendations, Uber trip analytics, Airbnb pricing
⚡ Spark Stack
Apache Spark · Scala · Hadoop
S
Apache Spark
Unified Analytics Engine
In-memory processing for speed, RDD/DataFrame/Dataset APIs
Spark SQL, Streaming, MLlib, GraphX modules
Runs on YARN, Kubernetes, Mesos, standalone cluster
↓
Sc
Scala
Functional Programming Language
Native language for Spark, functional + OO paradigms
Type-safe transformations, immutability, concise syntax
sbt build tool, JVM interop for Java libraries
↓
H
Hadoop
Distributed Storage & Resource Manager
HDFS for distributed file storage, fault-tolerant replication
YARN schedules Spark jobs across cluster nodes
Integrates with HBase, Hive, Pig for ecosystem workflows
Flow:
Spark jobs written in Scala → Spark runs on YARN/Hadoop → HDFS provides distributed data storage for input/output.