How to Build a Data Processing Pipeline


What is data processing pipeline?

How does data processing pipeline architecture look like?

What is Lambda Architeture?

What is ETL?

What components does the architecture consist of?

How does Kafka work in this picture?

How does Spark work in this picture?

How does distributed file system (Hadoop HDFS) work in this picture?

Why not hadoop nowadays?

Reading list:
https://github.com/igorbarinov/awesome-data-engineering#data-ingestion


Author: Zijun Zhou
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Zijun Zhou !
  TOC