What is data processing pipeline?
How does data processing pipeline architecture look like?
What is Lambda Architeture?
What is ETL?
What components does the architecture consist of?
How does Kafka work in this picture?
How does Spark work in this picture?
How does distributed file system (Hadoop HDFS) work in this picture?
Why not hadoop nowadays?
Reading list:
https://github.com/igorbarinov/awesome-data-engineering#data-ingestion