Here is a simple datalake stack in aws:
Data storage: S3
Ingestion: Glue catalog(streaming, jdbc or s3 files) + Glue job
Transformation: Glue jobs(spark sql)
Load: Athena or Redshift or Redshift spectrum
Data catalog: Glue catalog
Job scheduling/Automation: Airflow(using bashoperators which run aws cli commands)