Suvin Shah

Data Engineer

San Jose, California, United States9 yrs 11 mos experience
Most Likely To Switch

Key Highlights

  • Automated 1TB+ daily ETL pipelines, cutting runtime by 40%.
  • Unified customer datasets, driving $3.5M retention revenue.
  • Built fraud detection model achieving 99.9% accuracy.
Stackforce AI infers this person is a Fintech Data Engineer specializing in scalable data pipelines and analytics platforms.

Contact

Skills

Core Skills

Data EngineeringEtl PipelinesData VisualizationData Quality ManagementMachine LearningData Analysis

Other Skills

AWS GlueSQLPythonTableauPower BIGoogle Optimizescikit-learnApache AirflowExcelApache ZooKeeperCassandraRelational DatabasesData ValidationReportingConfluence

About

Senior Data Engineer with 10 years building production-grade data platforms across financial services. I design, optimize, and scale the pipelines that turn raw data into governed, trusted assets from real-time streaming ingest to terabyte-scale batch ETL. What I build: ➔ Cloud-native ETL/ELT pipelines on AWS (Glue, Redshift, S3, EMR, Lambda, Kinesis) and Snowflake ➔ Real-time streaming architectures with Kafka, Spark Structured Streaming, and Delta Lake ➔ Orchestration frameworks using Airflow and dbt for reproducible, tested transformations ➔ Data warehouse and lakehouse designs serving analytics, ML, and regulatory reporting Impact at scale: ➔ Automated 1TB+ daily ETL pipelines—cut runtime 40%, shifted reporting from weekly to daily ➔ Unified customer datasets across 8 global business units—drove $3.5M retention revenue ➔ Built fraud detection model (XGBoost) achieving 99.9% accuracy on credit card transactions ➔ Reduced infrastructure costs through Spark tuning and cloud-native optimization Stack: Python · SQL · PySpark · Apache Spark · Kafka · Airflow · dbt · AWS (Glue, Redshift, S3, EMR, Lambda, Kinesis) · Snowflake · Databricks · GCP (BigQuery) · Terraform · Docker · Tableau · Power BI Credentials: M.S. Data Science, Pace University · AWS Cloud Practitioner · 10K+ LinkedIn followers Open to Senior Data Engineer, Staff Data Engineer, Analytics Engineer, and Data Analyst roles. Let’s talk: suvin.shah94@gmail.com

Experience

9 yrs 11 mos
Total Experience
2 yrs 5 mos
Average Tenure
2 yrs 11 mos
Current Experience

Citi

2 roles

Senior Data Engineer

Jun 2023Present · 2 yrs 11 mos · New York, United States

  • Architected and automated AWS Glue ETL pipelines ingesting 1TB+ of financial data daily across S3, Redshift, and Snowflake reduced batch runtime by 40% and accelerated executive reporting from weekly to daily cadence.
  • Unified fragmented customer datasets across 8 global business units using SQL-based analytics and Python transformations, identifying churn drivers that increased retention 15% and delivered $3.5M in incremental revenue.
  • Designed and maintained interactive Tableau and Power BI dashboards surfacing churn, retention, and compliance KPIs shortened stakeholder decision cycles by 25% and increased engagement 17% across quarterly reviews.
  • Built end-to-end A/B testing infrastructure with Google Optimize, running controlled experiments across 500K+ users that lifted engagement 15% and retention 10% for cross-sell marketing programs.
  • Implemented data quality frameworks for 1M+ financial transactions using Python and SQL validation pipelines, improved data accuracy 20% and strengthened compliance audit reporting by 30%.
AWS GlueSQLPythonTableauPower BIData Engineering+1

Data Engineer Intern

Aug 2022May 2023 · 9 mos · New York, United States

  • Designed and implemented data architecture for stakeholders' requests to support recurring ML models improving consistency of predictive analytics.
  • Participated in design and SQL code review to provide feedback and/ recommendation for the areas of improvement (if any) before production deployment.
  • Designed end-to-end Power BI dashboards from data preparation to visualization to streamline reporting and reduce manual analysis for the marketing team.
SQLPower BIData Engineering

Radix software services pvt ltd

Data Engineer

Nov 2018Jul 2021 · 2 yrs 8 mos · Ahmedabad, Gujarat, India · Hybrid

  • Developed churn prediction models using Python and scikit-learn, improving prediction accuracy by 20% and reducing churn across 3 enterprise client portfolios through machine learning optimization.
  • Optimized SQL queries through indexing and partitioning, reducing data processing time by 35% and enabling real-time analytics for 12 monthly reporting cycles via query performance tuning.
  • Built Power BI dashboards tracking CLTV and credit risk metrics for 10K+ customers, enabling 15% faster decision-making and enhancing visibility for regional managers overseeing $20M portfolios through data visualization.
  • Collaborated with data engineering and analytics teams, integrating machine learning forecasts into executive dashboards, improving forecast accuracy by 25% and aligning predictive analytics with quarterly strategic planning through cross-functional collaboration.
Pythonscikit-learnSQLPower BIMachine LearningData Engineering

Softage information technology limited

Data Analyst

Jan 2016Oct 2018 · 2 yrs 9 mos · Ahmedabad, Gujarat, India · Hybrid

  • Developed Apache Airflow ETL pipelines, automating the intake of financial transactions to reduce operational delays by 30% and enhance
  • reporting efficiency for 5 financial clients in 3 time zones
  • Enhanced credit risk model accuracy by 25% and reduced portfolio default exposure by $1.2M annually through hyperparameter tuning
  • and Python-based ML algorithms.
  • Created Tableau dashboards visualizing KPIs across 100K+ transactions, improving visibility by 20% and supporting data-driven oversight
  • for executive teams through advanced analytics visualization.
  • Partnered with project leads, aligning analytics frameworks with Agile delivery sprints, reducing sprint delivery delays by 25% and im-
  • proving inter-team communication efficiency by 18% through coordinated project analytics.
Apache AirflowPythonTableauData Engineering

Datacrops software private limited

Data Analyst

May 2014Dec 2015 · 1 yr 7 mos · Ahmedabad, Gujarat, India

  • Wrote complex SQL queries to extract and analyze structured data from relational databases for daily operational reporting.
  • Used MS Excel (Pivot Tables, VLOOKUP, IF formulas, Charts) to prepare weekly and monthly performance reports for sales and operations teams.
  • Performed data cleansing and data validation using SQL and Excel to remove duplicates, correct inconsistencies, and improve data accuracy.
  • Assisted with ETL tasks, extracting, transforming, and loading flat files into database tables.
  • Performed exploratory data analysis using Excel and SQL to identify trends and insights.
  • Created summary dashboards using Excel charts and reporting templates to visualize KPIs and support management decision-making.
  • Ensured data accuracy by performing quality checks and reconciling source files with database records.
  • Supported ad-hoc requests by creating custom reports using SQL joins, aggregations, and filters.
  • Documented report logic, SQL queries, and data definitions for knowledge sharing and standardization.
  • Collaborated with cross-functional teams to gather requirements and deliver structured analytical outputs using database querying and spreadsheet modeling techniques.
SQLExcelData Analysis

Education

Pace University - Seidenberg School of Computer Science and Information Systems

Masters — Data science

Sep 2021Apr 2023

SILVER OAK UNIVERSITY

BE - Bachelor of Engineering — Computer Engineering

Apr 2011May 2015

Stackforce found 100+ more professionals with Data Engineering & Etl Pipelines

Explore similar profiles based on matching skills and experience