V

Varun Asija

Software Engineer

Gurugram, Haryana, India10 yrs 6 mos experience
Highly Stable

Key Highlights

  • Expert in building scalable lakehouse architectures.
  • Led teams to optimize data processing and reduce costs.
  • Proven track record in real-time data engineering solutions.
Stackforce AI infers this person is a Data Engineering expert in SaaS and E-commerce industries.

Contact

Skills

Core Skills

Data EngineeringLakehouseCustomer Segmentation

Other Skills

API DevelopmentAWSAirflowAmazon Web Services (AWS)Apache AirflowApache BeamApache HudiApache KafkaApache SparkApache Spark StreamingAzure Data FactoryBig Data AnalyticsChange Data CaptureData AnalysisData Extraction

About

As a results-driven Data Engineer with 10+ years of hands-on experience across cloud platforms (AWS, GCP, Microsoft Fabric), I specialize in building scalable, real-time lakehouse architectures and data-driven systems that power intelligent business decisions. Currently at Extreme Networks, I lead the design and development of a near real-time customer interaction lakehouse using Apache Hudi, PySpark, and EMR—driving cost-efficient, automated, and secure data pipelines at scale. My background spans large-scale customer segmentation, personalization systems, and streaming architectures that have delivered billions of notifications and supported deep analytics at companies like Tokopedia and Network18. I'm passionate about infrastructure-as-code, CI/CD automation, and observability, and have a proven track record of reducing costs and boosting data processing efficiency across complex environments. Whether building data lakes from scratch, optimizing pipelines, or mentoring data teams, I thrive on solving high-impact problems with clean, scalable engineering solutions.

Experience

Extreme networks

Staff Data Engineer

Oct 2024Present · 1 yr 5 mos · India · Remote

  • At Extreme Networks, I lead the design and implementation of a scalable, near real-time lakehouse architecture built on AWS using Apache Hudi and PySpark, enabling efficient, secure, and high-performance analytics across diverse business domains.
  • Key Responsibilities & Achievements:
  • Built and maintained a near real-time customer interaction lakehouse, leveraging S3, Apache Hudi, and EMR for streaming and batch data processing.
  • Designed and deployed highly automated, cost-efficient EMR workflows, using custom Docker containers, bootstrap optimizations, and Terraform-based infrastructure-as-code.
  • Implemented event-driven pipelines with SQS, Lambda, and PySpark to process new data in near real-time with failover and retry mechanisms.
  • Developed automated data deletion workflows for customer opt-out compliance, with dynamic table discovery and parallel partition handling.
  • Created and pushed custom CloudWatch metrics and dashboards for observability across services, including job performance and data pipeline health.
  • Architected and enforced multi-environment infrastructure strategies (dev, staging, production), separating destroyable and permanent resources for compliance and safety.
  • Optimized query and transformation logic using broadcast joins, partition pruning, and EMR task tuning for jobs processing millions of records.
  • Integrated CI/CD workflows using GitHub Actions to streamline deployment and testing across all environments.
Apache Spark StreamingApache HudiAmazon Web Services (AWS)TerraformLakehousePython+1

Tokopedia

4 roles

Data engineer Senior Lead

Promoted

Jan 2023Jun 2024 · 1 yr 5 mos

  • Worked as a Senior Tech Lead as part of a core data engineering team at Tokopedia, which is one of Indonesia's biggest e-commerce company.
  • My major project was to build customer segmentation & personalisation services to re-target more than 150M customers. The segmentation services include customer journey, realtime trigger, broadcast campaign, conversion and realtime reporting. The services are helpful to send more than a billion of notifications in a month.
  • My responsibilities include, but are not limited to:
  • Lead and manage a team of data engineers, provide mentorship and support to ensure project success
  • Design and implement data architectures, including datalake, datawarehouse and service designs
  • Develop and maintain robust ETL (Extract, Transform, Load) pipelines to move and transform data from various sources and share with third parties as per requirements
  • Helped reduce the cost from $100K to $70K by optimising storage underlying bigtable and bigquery
  • Develop service APIs and scripts using Python
  • Identify and resolve performance bottlenecks in data pipelines & services to improve response time & data processing efficiencies
  • Collaborate with other departments to understand their requirements and develop pipelines to support their data science models and other applications
  • Develop performance monitoring dashboards and help with root cause analysis in case of any issue
  • Create and maintain documentation for data engineering processes, services, and best practices
  • Data governance & access management; automated access through Jira ticket approvals
Python (Programming Language)Google Cloud Platform (GCP)Apache BeamApache AirflowGoogle BigQueryApache Spark+3

Lead Data engineer

Jul 2021Apr 2023 · 1 yr 9 mos

Senior Data Engineer

Jan 2020Sep 2021 · 1 yr 8 mos

Data Engineer

May 2018Jan 2020 · 1 yr 8 mos

Shore infotech

Senior Data Engineer

Nov 2017Apr 2018 · 5 mos · Hyderabad Area, India · On-site

  • My major task at Shore Infotech was to help build a data warehouse in healthcare domain. We gathered data from different sources like databases, crawled data from healthcare websites.
  • My responsibilities include:
  • Create ETL data pipelines using Airflow
  • Identify performance issues in data pipelines and help with root cause analysis
  • Build and monitor dashboards
PythonApache AirflowSQLBig Data AnalyticsData Engineering

Aptara

Data Engineer

Apr 2015Nov 2017 · 2 yrs 7 mos · Noida · On-site

  • Create pipelines using Python for automatic data extraction from the websites of more than 200 Financial companies, capturing the required metadata and storing into SQL database
  • Data analysis
  • Scheduling and work flow management of the pipelines
  • Maintain daily/weekly reports
Data AnalysisPythonSQLMySQLAirflowData Engineering

Education

PPIMT, Kurukshetra University

Bachelor of technology — Mechanical Engineering

Jan 2010Jan 2014

Govt Senior Secondary School

Jan 2009Jan 2010

Stackforce found 100+ more professionals with Data Engineering & Lakehouse

Explore similar profiles based on matching skills and experience