CleanCodeNZ

Big Data Analytics and Engineering

Home About Archives
2021-12-02

A Simple Datalake Solution in AWS

Here is a simple datalake stack in aws:

Data storage: S3
Ingestion: Glue catalog(streaming, jdbc or s3 files) + Glue job
Transformation: Glue jobs(spark sql)
Load: Athena or Redshift or Redshift spectrum
Data catalog: Glue catalog
Job scheduling/Automation: Airflow(using bashoperators which run aws cli commands)

Share
  • airflow
  • athena
  • aws
  • datalake
  • glue
  • glue catalog
  • glue job
  • redshift spectrum
  • s3
Newer
Lakehouse solution in AWS
Older
Cleancode NZ Gone live

Tags

  • Cloudflare
  • Cloudfront
  • airflow
  • athena
  • aws
  • aws hosting
  • aws s3
  • datalake
  • dbt
  • github page
  • glue
  • glue catalog
  • glue job
  • lakehouse
  • new website
  • redshift
  • redshift spectrum
  • s3

Tag Cloud

Cloudflare Cloudfront airflow athena aws aws hosting aws s3 datalake dbt github page glue glue catalog glue job lakehouse new website redshift redshift spectrum s3

Archives

  • December 2021
  • November 2021
  • October 2021

Recent Posts

  • Lakehouse solution in AWS
  • A Simple Datalake Solution in AWS
  • Cleancode NZ Gone live
  • Hosted on aws s3
© 2022 CleanCodeNZ
Powered by Hexo
Home About Archives