Apache Hadoop

Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45284871

Hadoop Modules

  1. Hadoop Common/Core – libraries & utilities used by modules
  2. Hadoop Distributed File System (HDFS) – data storage
  3. YARN – resource negotiator
  4. MapReduce – large-scale data processing

Apache Spark

Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45284871

How Spark Works

image.png