The dbt is a very good tool if the backend is a database or data warehouse, and transformations can be done by SQL statements, dbt supports most of traditional databases, also the data warehouses like redshift, snowflake, for the big data data warehouses like hive and databricks
The lakehouse stack on AWS:
Data storage: S3
Ingestion: Glue catalog(streaming, jdbc or s3 files) + Glue job
Transformation: dbt
Datawarehouse: Redshift or Redshift spectrum
Data catalog: Glue catalog
Job scheduling/Automation: Airflow(using bashoperators or generic aws operators)
Data storage: S3
Ingestion: Glue catalog(streaming, jdbc or s3 files) + Glue job
Transformation: Glue jobs(spark sql)
Load: Athena or Redshift or Redshift spectrum
Data catalog: Glue catalog
Job scheduling/Automation: Airflow(using bashoperators which run aws cli commands)
The reason it is not using s3 is that s3 static web hosting does not have static ip addresses, you can only use aws Cloudfront to redirect traffic to hosted files, and also to use ssl, you need aws Certificate Manager.
While current solution is the files are hosted by github page and Cloudflare is used as nameserver and ssl certificate provider, this solution is much simpler than the static pages hosted by s3.
The idea is from Here
]]>