The customer had their data on multiple disparate source systems and in multiple formats which resulted in data duplicity and high latency in data processing. They needed a single centralized data platform for three geographical regions. However, their existing data processing methods and reporting tools had limited compute capacity.
Use of different data ingestion and ETL techniques for streaming and historical data
Due to GDPR and compliance issues, the data lake must be built in Ireland region on AWS
Business had around 1600 tables and a lot of attributes to work with for reporting
Absence of proper Data dictionary and Data Lineage tracking feature
Power BI
AWS Lambda
Amazon EC2
Amazon Redshift Spectrum
Amazon Glue
Amazon Athena
Apache Kafka
Amazon S3
Elastic Load Balancing
AWS IAM
Amazon VPC
AWS DMS
Lack of modern visualization tools further limited their users’ ability to provide proactive and meaningful business insights. Therefore, they wanted to migrate data from legacy source systems on Cloud to a high-end scalable system for better efficiency in ETL, Data Processing, and reporting jobs, while also keeping customer records intact.
S&M Tech Solutions built a scalable centralized data lake enabling continuous optimization with evolving data requirements for the customer. The solution was developed using Postgres RDS as a backend database with an in-house Kafka connector. The Kafka tool was used to ingest streaming data and historical data from backend database into S3 buckets and DMS. respectively. A Self-Serving reporting layer was built using Power BI over Redshift Data warehouse to cater to the customer’s daily reporting needs.
Faster data processing, ETL jobs, and reporting improves business efficiency
More time spent in analytics than data discovery & preparation
Consolidated monitoring and reporting for 3 geographical regions (Ireland, Spain, and Portugal)