Apache Hadoop

Hadoop Modules

  1. Hadoop Common/Core – libraries & utilities used by modules
  2. Hadoop Distributed File System (HDFS) – data storage
  3. YARN – resource negotiator
  4. MapReduce – large-scale data processing

Apache Spark

How Spark Works

image.png

Spark Components