Tanya Diwan

Data Engineer

Bengaluru, Karnataka, India2 yrs 5 mos experience

Key Highlights

Expert in building cloud-native data pipelines.
Proficient in AWS services and Snowflake.
Strong background in data quality and analytics.

Stackforce AI infers this person is a Data Engineer specializing in Retail Analytics with expertise in cloud-native data solutions.

Contact

Skills

Core Skills

Data WarehousingPythonAwsSqlSnowflakePysparkApache Airflow

Other Skills

Python (Programming Language)AWS GlueMicrosoft SQL ServerAWS LambdaSnowpipeAWS S3dockerData StructuresData QualityData PipelinesGitAmazon QuickSightAWS Identity and Access Management (AWS IAM)GitHubData Analysis

About

I design and build cloud-native data pipelines and dimensional models that power retail analytics across 7 countries, working end-to-end from business requirement to deployed pipeline. At Capgemini, my core work spans: → ELT pipeline development on AWS (Glue, S3, Lambda) using PySpark and Python → Unified MDM solution consolidating 7 regional retail sources into one global data model → 50+ SQL-based data quality checks covering anomaly detection, business rule validation, and completeness testing → Fact and dimension table modelling using star schema in Snowflake → SQL KPI queries powering 30+ production dashboard pages used by global business teams → Automated report refresh via Snowflake stored procedures, improving data freshness Core stack: Python, PySpark , Advanced SQL, AWS (Glue, S3, Lambda), Snowflake, Spark SQL Currently learning: dbt, Apache Airflow, Databricks, Kafka

Experience

2 yrs 5 mos

Total Experience

2 yrs 5 mos

Average Tenure

0 mo

Current Experience

Deloitte

Data Engineer I

May 2026 – Present · 0 mo · Bengaluru, Karnataka, India · Hybrid

Capgemini

2 roles

Senior Analyst

Promoted

Jul 2025 – May 2026 · 10 mos · Bengaluru, Karnataka, India

Domain SPOC for Product & Promotion modules: sole point of contact across business and engineering for requirement gathering,
transformation logic, and data validation spanning 7 countries and 8 domains (Product, Promotion, Price Card, Contracts, Suppliers,
Financial Contracts, Listings, REA).
Designed and delivered a unified MDM pipeline: consolidating 7 regional retail data sources into one global data model, enabling
consistent cross-market analytics, and eliminating manual reconciliation across country teams.
Built and maintained end-to-end ELT pipelines: on AWS (Glue, S3, Lambda) using PySpark and Python, ingesting high-volume
retail datasets from an S3 data lake into Snowflake for downstream analytics and data science workloads.
Developed 50+ SQL-based data quality checks: covering business rule validation, anomaly detection, and completeness testing,
reducing data incidents in production, and supporting root cause analysis when pipeline issues arose.
Modelled fact and dimension tables using star schema: in Snowflake, improving query performance, and enabling self-service
analytics for business users without needing to engage data engineering each time.
Wrote optimised SQL KPI queries powering 30+ production dashboard pages: used by business stakeholders across globally
distributed teams, translating reporting requirements directly into trusted analytical datasets.
Automated weekly-to-daily report refresh using Snowflake stored procedures: eliminating a recurring manual step, improving
data freshness, and freeing the team for higher-priority engineering work.

Data WarehousingPython (Programming Language)Python

Analyst

Dec 2023 – Jul 2025 · 1 yr 7 mos · Bengaluru, Karnataka, India

Built scalable PySpark ETL pipelines in AWS Glue to transform multi-country retail data across Raw → Staging → Curated zones in S3 for Snowflake-based reporting.
Automated data ingestion into Snowflake using Snowpipe triggered by AWS Lambda, ensuring timely availability for BI dashboards.
Designed and managed Airflow DAGs in Python for workflow orchestration, including integrated alerting for failure detection.
Led the creation of 50+ SQL-based data quality rules in SQL Server, improving data accuracy by 65% across regional and global pipelines.
Collaborated with the AWS DMS migration team to validate raw data loads and supported downstream Glue transformations.
Worked alongside DevOps and BI teams to align on Terraform-based infrastructure and reporting layer integration.

SQLMicrosoft SQL Server