Keshav Khandelwal

Data Engineer

Bengaluru, Karnataka, India2 yrs 4 mos experience

Most Likely To Switch

Key Highlights

Engineered scalable data pipelines for multi-terabyte solutions.
Optimized ETL processes, reducing runtime by 35%.
Migrated legacy systems to cloud, enhancing efficiency.

Stackforce AI infers this person is a Data Engineer specializing in Big Data solutions for SaaS applications.

Contact

Skills

Core Skills

SparkEtlBig DataData IntegrationDatabase ManagementSoftware Development

Other Skills

APIAPI OptimizationAWS GlueAWS LambdaAgile MethodologiesAirflowAlgoAlgorithmsAmazon Web Services (AWS)Apache AirflowApache SparkApplication DevelopmentAzure Data FactoryAzure DatabricksBack-End Web Development

About

Data Engineer with 2.5 years of overall experience, specializing in designing and building large-scale data pipelines, robust ETL processes and efficient data warehouse solutions. Proficient in Java, Python, SQL, Spark, Databricks, Data Modeling, Snowflake and Kafka. Delivered multi-terabyte big data solutions for leading organizations such as National Stock Exchange (NSE) and Expedia Group.

Experience

2 yrs 4 mos

Total Experience

1 yr 2 mos

Average Tenure

1 yr 11 mos

Current Experience

Expedia group

Software Development Engineer 1

Jul 2024 – Present · 1 yr 11 mos · Bengaluru · Hybrid

Engineered a scalable Dataproc PySpark–driven Property and Destination Master pipeline processing 2 TB of multi-source data (TSV, SQLite, partner feeds) daily, implemented branching logic, validation, joins, and modular refresh steps (DSA/DM/PM), improving data consistency and reducing manual interventions.
Engineered and optimized Spark-based ETL pipelines to ingest and process 700-900 GB of daily Meta partner conversation data, including ad performance metrics and user engagement logs; reduced runtime by 35% via partitioning and caching, enabling near-real-time dashboards for accelerated business decisions.
Developed Spark ML feature pipelines for 1.4-2 TB of social/app reviews, integrated text cleaning, language detection and topic modeling, delivering optimized datasets and aggregate views via clustering and Streams Tasks to empower model iteration and stakeholder insights.
Revamped traveler clickstream pipeline in Spark by migrating to a new data source, deriving 100% of column mappings for full integrity, collaborated with upstream teams to standardize schemas, boosting efficiency by 30%.
Migrated and optimized 23 Hive SQL queries to Trino SQL, resulting in improved query performance and faster root cause analysis of dataset issues in production.

DataprocPySparkSparkETLData ProcessingData Consistency+2

Onelab ventures

2 roles

Software Development Engineer 1

Jan 2024 – Jun 2024 · 5 mos · Pune

Delivered a project to migrate legacy on-premise processes to the cloud using Big Data technologies (Spark), reducing processing time by 20%.
Conducted in-depth data analysis using Hive, Trino, and Spark SQL, providing SIT/UAT fixes and ensuring smooth operations in the production environment.
Worked on data ingestion pipeline to ingest the flat file in the Data lake.

Big DataSparkData AnalysisHiveTrinoData Lake

Software Engineer Intern

Jul 2023 – Jan 2024 · 6 mos · Pune

Simulas

Software Engineer Intern

Oct 2021 – Jun 2022 · 8 mos · Ahmedabad · On-site

Analysed more than 30,000 customer sales data to find the bottlenecks in the business process.
Optimized MongoDB aggregation queries in backend, reducing API response times.
Implemented CI/CD pipelines for automated deployments, eliminating manual errors.

MongoDBCI/CDData AnalysisDatabase Management