Varun Asija

Software Engineer

Gurugram, Haryana, India10 yrs 6 mos experience

Highly Stable

Key Highlights

Expert in building scalable lakehouse architectures.
Led teams to optimize data processing and reduce costs.
Proven track record in real-time data engineering solutions.

Stackforce AI infers this person is a Data Engineering expert in SaaS and E-commerce industries.

Contact

Skills

Core Skills

Data EngineeringLakehouseCustomer Segmentation

Other Skills

API DevelopmentAWSAirflowAmazon Web Services (AWS)Apache AirflowApache BeamApache HudiApache KafkaApache SparkApache Spark StreamingAzure Data FactoryBig Data AnalyticsChange Data CaptureData AnalysisData Extraction

About

As a results-driven Data Engineer with 10+ years of hands-on experience across cloud platforms (AWS, GCP, Microsoft Fabric), I specialize in building scalable, real-time lakehouse architectures and data-driven systems that power intelligent business decisions. Currently at Extreme Networks, I lead the design and development of a near real-time customer interaction lakehouse using Apache Hudi, PySpark, and EMR—driving cost-efficient, automated, and secure data pipelines at scale. My background spans large-scale customer segmentation, personalization systems, and streaming architectures that have delivered billions of notifications and supported deep analytics at companies like Tokopedia and Network18. I'm passionate about infrastructure-as-code, CI/CD automation, and observability, and have a proven track record of reducing costs and boosting data processing efficiency across complex environments. Whether building data lakes from scratch, optimizing pipelines, or mentoring data teams, I thrive on solving high-impact problems with clean, scalable engineering solutions.

Experience

Extreme networks

Staff Data Engineer

Oct 2024 – Present · 1 yr 5 mos · India · Remote

At Extreme Networks, I lead the design and implementation of a scalable, near real-time lakehouse architecture built on AWS using Apache Hudi and PySpark, enabling efficient, secure, and high-performance analytics across diverse business domains.
Key Responsibilities & Achievements:
Built and maintained a near real-time customer interaction lakehouse, leveraging S3, Apache Hudi, and EMR for streaming and batch data processing.
Designed and deployed highly automated, cost-efficient EMR workflows, using custom Docker containers, bootstrap optimizations, and Terraform-based infrastructure-as-code.
Implemented event-driven pipelines with SQS, Lambda, and PySpark to process new data in near real-time with failover and retry mechanisms.
Developed automated data deletion workflows for customer opt-out compliance, with dynamic table discovery and parallel partition handling.
Created and pushed custom CloudWatch metrics and dashboards for observability across services, including job performance and data pipeline health.
Architected and enforced multi-environment infrastructure strategies (dev, staging, production), separating destroyable and permanent resources for compliance and safety.
Optimized query and transformation logic using broadcast joins, partition pruning, and EMR task tuning for jobs processing millions of records.
Integrated CI/CD workflows using GitHub Actions to streamline deployment and testing across all environments.

Apache Spark StreamingApache HudiAmazon Web Services (AWS)TerraformLakehousePython+1

Tokopedia

4 roles

Data engineer Senior Lead

Promoted

Jan 2023 – Jun 2024 · 1 yr 5 mos

Worked as a Senior Tech Lead as part of a core data engineering team at Tokopedia, which is one of Indonesia's biggest e-commerce company.
My major project was to build customer segmentation & personalisation services to re-target more than 150M customers. The segmentation services include customer journey, realtime trigger, broadcast campaign, conversion and realtime reporting. The services are helpful to send more than a billion of notifications in a month.
My responsibilities include, but are not limited to:
Lead and manage a team of data engineers, provide mentorship and support to ensure project success
Design and implement data architectures, including datalake, datawarehouse and service designs
Develop and maintain robust ETL (Extract, Transform, Load) pipelines to move and transform data from various sources and share with third parties as per requirements
Helped reduce the cost from $100K to $70K by optimising storage underlying bigtable and bigquery
Develop service APIs and scripts using Python
Identify and resolve performance bottlenecks in data pipelines & services to improve response time & data processing efficiencies
Collaborate with other departments to understand their requirements and develop pipelines to support their data science models and other applications
Develop performance monitoring dashboards and help with root cause analysis in case of any issue
Create and maintain documentation for data engineering processes, services, and best practices
Data governance & access management; automated access through Jira ticket approvals

Python (Programming Language)Google Cloud Platform (GCP)Apache BeamApache AirflowGoogle BigQueryApache Spark+3

Lead Data engineer

Jul 2021 – Apr 2023 · 1 yr 9 mos

Senior Data Engineer

Jan 2020 – Sep 2021 · 1 yr 8 mos

Data Engineer

May 2018 – Jan 2020 · 1 yr 8 mos

Shore infotech

Senior Data Engineer

Nov 2017 – Apr 2018 · 5 mos · Hyderabad Area, India · On-site

My major task at Shore Infotech was to help build a data warehouse in healthcare domain. We gathered data from different sources like databases, crawled data from healthcare websites.
My responsibilities include:
Create ETL data pipelines using Airflow
Identify performance issues in data pipelines and help with root cause analysis
Build and monitor dashboards

PythonApache AirflowSQLBig Data AnalyticsData Engineering

Aptara

Data Engineer

Apr 2015 – Nov 2017 · 2 yrs 7 mos · Noida · On-site

Create pipelines using Python for automatic data extraction from the websites of more than 200 Financial companies, capturing the required metadata and storing into SQL database
Data analysis
Scheduling and work flow management of the pipelines
Maintain daily/weekly reports

Data AnalysisPythonSQLMySQLAirflowData Engineering