Vaibbhav B Vasa

Engineering Manager

Bengaluru, Karnataka, India9 yrs 10 mos experience

Most Likely To Switch

Key Highlights

Led migration to Lakehouse architecture using Databricks.
Optimized data processing costs by 20% through innovative strategies.
Built financial data pipelines ensuring privacy and compliance.

Stackforce AI infers this person is a Data Engineering expert with a strong focus on cloud technologies and big data solutions.

Contact

Skills

Core Skills

Data EngineeringDatabricksCost OptimizationAwsGcpEtl

Other Skills

Advanced JavaAirflowAmazon Web Services (AWS)Apache AirflowApache FlinkApache KafkaApache OozieApache SparkApache SupersetCachingDatabasesDockerGoogle Cloud Platform (GCP)Graviton MachinesHadoop

About

Experience

9 yrs 10 mos

Total Experience

1 yr 11 mos

Average Tenure

2 yrs 6 mos

Current Experience

Zepto

2 roles

Engineering Manager

Promoted

Apr 2025 – Present · 1 yr 1 mo · Bengaluru, Karnataka, India · On-site

Lead Data Engineer

Nov 2023 – May 2025 · 1 yr 6 mos · Bengaluru, Karnataka, India · On-site

1. Lead the entire migration from Warehouse(Redshift & Hevo) to Lakehouse(Databricks)
2. Deployed Open Source Airflow, Superset & Kafka Connect on K8s.
3. Build CDC Layer using Kafka & Debezium to load Data to Databricks Delta Tables
Created multiple frameworks using Airflow & Databricks Job Compute for below use cases:
a. Bronze to Silver Framework to Load Backend Postgres & Mongo Tables to Databricks
via CDC
b. Flats Framework for Analysts to create Gold tables.
c. Notebook Scheduling framework for scheduling jobs in Databricks
d. Exporter Framework for exporting Data from Databricks to Google Sheets/S3 etc.
4. Carried out below Cost Optimization activities as well to improve utilization & reduced overall
cost by ~20%:
a. Use Graviton Machines for All Workloads.
b. Clubbed Multiple Table Runs(20) into a single job to save Driver Cost
c. Liquid clustered tables < 500GB, Partitioned & ZOrdered the ones > 500GB
d. Used Pools to save on Compute Spin Up Time Cost and improve reusability
e. Leverage Caching & Optimizing PySpark Code in Notebooks for long running jobs.
f. Use Spot machines with a less frequency of interruption

Apache SupersetAirflowPySparkApache AirflowDatabricksData Engineering

Navi

Data Engineer III

Dec 2022 – Oct 2023 · 10 mos · Bengaluru, Karnataka, India · Hybrid

As a part of the Navi's Data Platform team, we served the Data Analytics & Data Science team to cater to
their data needs. Worked mostly on below components in the past 6 months:
1. Primarily worked on Financials Data, especially credit score related and built pipelines to
target customers based on their credit score ensuring privacy and security norms.
2. Worked on deploying daily jobs to Presto On Spark via Self Serve Analytics approach. Made
changes to Open Source Presto codebase to make it compatible with AWS graviton instances.
3. Automated several workflows in Airflow for External Data Sources. Automated Table
Partitioning with Hudi using Airflow to avoid manual errors.
4. Worked on building Flink pipelines using Scala for real time use cases

AirflowApache SparkAmazon Web Services (AWS)ScalaPrestoPinot+4

Glance

Data Engineer II

Nov 2020 – Dec 2022 · 2 yrs 1 mo · Bangalore Urban, Karnataka, India

As a part of the Glance Data Platform team, I have worked in planning, deploying, and upgrading the entire software stack in GCP and Azure using Microservices and Distributed systems. Along with that, also managed the underlying ETL jobs to efficiently process and store the data. As an Individual Contributor & Teamwork we have achieved the below milestones:
1. Consuming, Storing & Managing Petabyte Scale Data ensuring quality & consistency.
2. Improved SLA by 4 hours by Migrating various Hadoop jobs to Spark using Scala.
3. Onboarding the entire Stack of Roposo & Shop101 to Trino reduced Infra cost by almost 45%.
4. Deploying self-managed Airflow for Scheduling ETL Jobs processing data in Spark & Trino
5. Writing SQL queries & Custom UDFs to aggregate raw data for business needs
6. As the SME for Trino/Presto for entire Glance, used to research, experiment, and deploy various configuration and session properties of Trino, to utilize the resources efficiently.

Google Cloud Platform (GCP)Kafka StreamsApache SupersetAirflowTrinoHive+6

Adobe

2 roles

Data Engineer

Promoted

Oct 2019 – Nov 2020 · 1 yr 1 mo

Data Reprocessing for Billions of records in order to cleanse the data in Adobe Analytics
Migrating legcy Apache Pig ETL Process to WFE(Apache Airflow) for scaling and processing higher volumes of data.

HadoopMapReduceAirflowApache OoziePySparkAmazon Web Services (AWS)+2

Associate Data Engineer

Nov 2016 – Sep 2019 · 2 yrs 10 mos

Creation of Custom Solutions for features not available OOTB in Adobe products
Creation of Big Data Pipelines in Spark on AWS
Design and development of pipeline involving - Reading data from Adobe Live Streams

HadoopMapReduceApache Oozie