Shruthi Madumbu

Data Engineer

Plano, Texas, United States13 yrs 5 mos experience

Key Highlights

Expert in building scalable data pipelines.
Proficient in multiple cloud platforms including GCP, Azure, and AWS.
Strong background in data quality and ETL processes.

Stackforce AI infers this person is a Data Engineering expert with extensive experience in cloud-based data solutions.

Contact

Skills

Core Skills

Data EngineeringCloud Data WarehousingData QualityBig Data ProcessingRisk ManagementCloud MigrationApplication DevelopmentEtl Development

Other Skills

Apache SparkGoogle Cloud Platform (GCP)dataprocGoogle BigQuerySnowflakedelta lakeScalaPython (Programming Language)Apache AirflowGreat ExpectationsHiveHadoopBig DataExtract, Transform, Load (ETL)Technical Design

About

Data Engineering - Data Analytics - Data Pipeline Design and Implementation - GCP - Azure - AWS - Snowflake

Experience

13 yrs 5 mos

Total Experience

2 yrs

Average Tenure

1 yr

Current Experience

Empirx health

Senior Data Engineer

May 2025 – Present · 1 yr · United States · Remote

Digimarc

Staff Data Engineer

Mar 2023 – Mar 2025 · 2 yrs · United States · Remote

Part of Digimarc’s Data Engineering team, responsible for creating data infrastructure and building Enterprise Datalake in GCP and Warehouse in Snowflake to execute data strategies with complete ownership of managing both the platforms.
Created Re-Usable and Config driven highly scalable data pipelines processing CDC logs to build ODS using Spark-Scala framework in Dataproc and creating Delta Lake tables in GCS.
As a Snowflake Admin, had setup SSO, Users and created all the necessary RBAC and Integrations along with objects like Warehouses, Databases, Schemas etc.
Automated Data ingestion into Snowflake using Airflow from external stages in GCS.
Created Data marts including facts and dims in Snowflake optimized for reporting applications.
Implemented Data Quality at scale using Great Expectations framework and integrated with Airflow to run on daily basis and generated DQ reports.
Worked on several data migration projects from On-Prem Legacy systems to GCP Cloud platforms reducing the Legacy infrastructure cost by 50%.

Apache SparkGoogle Cloud Platform (GCP)dataprocGoogle BigQuerySnowflakedelta lake+5

Visa

Staff Data Engineer

Nov 2021 – Jan 2023 · 1 yr 2 mos

Part of Visa’s Payment and Risk team led a team of 5 to build a Risk Alert Monitoring
Framework enabling users to create reports with self-service capabilities.
Processed large volumes (Petabytes) of Visa’s transactional data using distributed computing
with PySpark on Visa’s On-Prem systems
Integrated YAML based data quality framework within data pipelines for capturing all data
quality issues and alerting the users.
Worked with cross functional teams to setup infra for robust CICD frameworks for pipelines
deployments in production.
As a tech lead, collaborated with product owners for creating and driving initiatives. Involved in
spring planning, defining architecture and best practices.

Data EngineeringHiveApache SparkApache AirflowHadoopScala+4

Honeywell

Senior Data Engineer

Oct 2019 – Nov 2021 · 2 yrs 1 mo

Part of Honeywell’s Centralized Analytics team (HIA), responsible for building Enterprise
scale applications like Data Legos for Sales, Procurement, CRM.
Built data pipelines using Apache Spark, integrating different ERP systems from different
sources like SAP, HANA, Every Angle, Snowflake, raw files, Salesforce APIs etc.
Worked on Migration from On-Prem to Azure cloud – Involved in End-End
implementation like migrating the ranger policies for access, databases, tables, code
changes, new platform integration with Databricks.
Involved in several POCs – building a new Data Quality Framework using Amazon Deequ
Libraries, Scala APIs to fetch data from 3rd Party services.

Data EngineeringBig DataExtract, Transform, Load (ETL)Technical DesignAzure DatabricksAzure Data Factory+7

Ibm

Bigdata Developer

Sep 2018 – Sep 2019 · 1 yr

Worked for ANZ bank, developed ETL pipelines in Spark using AWS EMR for regulatory
reporting applications.
Created Jobs for importing data from Teradata to S3 on daily basis using Sqoop and creating
reporting datasets in Redshift.
Worked on optimizing data processing in Teradata that has the business logic required for
regulatory reporting of Risk and Finance returns.
Worked on several automation scripts in Unix and Python to perform Teradata database
migrations while not impacting existing users and workloads.

Data EngineeringBig DataExtract, Transform, Load (ETL)ETL Development

Cognizant

ETL Developer

Sep 2015 – Sep 2018 · 3 yrs

Worked for Retail Client, migrated the ETL workflows from Informatica to Storm and
produce flat files for further processing.
Worked intensively on Teradata Utilities like TPT (stream, load, export) and BTEQ.
Replicated the data from Teradata to Cassandra and Hive databases using Sqoop for
data analytics.
Involved in Teradata DBA activities for Database migration, user & database role
creations and tagging them for the batch and consumer roles.

Data EngineeringExtract, Transform, Load (ETL)Data ModelingETL Development

Tata consultancy services

Teradata Developer

Jun 2012 – Aug 2015 · 3 yrs 2 mos

Worked for Retail Client, worked on Teradata Utilities FastLoad, MultiLoad and
FastExport to transfer the data from DB2 – Files – Teradata.
Involved working in databases like Oracle, MySQL and writing PLSQL scripts.
Received On the Spot Award for suggesting the successful fine-tuning techniques,

Extract, Transform, Load (ETL)ETL Development