Raghavendra H S

Product Engineer

Bengaluru, Karnataka, India14 yrs 3 mos experience

Highly Stable

Key Highlights

Led modernization of 200+ data pipelines.
Achieved $350K/year in cloud savings.
Improved data reliability by ~40%.

Stackforce AI infers this person is a Data Engineering leader with expertise in building scalable data platforms for global enterprises.

Contact

Skills

Core Skills

DatabricksApache SparkData QualityAmazon EmrEtlAws

Other Skills

Agile MethodologiesAirflowApache PigBig Data AnalyticsCClearCaseCouchbaseData pipeline designDelta LakeEMREMR GravitonFlumeHBaseHDFSHadoop

About

Vision‑driven Data Engineering leader and hands‑on senior individual contributor with 14+ years building scalable, cost‑efficient Lakehouse platforms and data pipelines for global enterprises. I combine deep technical ownership with proven people leadership to deliver measurable business outcomes. I design and deliver robust, modular data pipelines focused on reusability, observability, and performance. Core technologies I work with include Databricks, Delta Lake, Apache Spark (PySpark), EMR Graviton, Airflow, Kafka, S3, and Snowflake. As a technical lead and IC, I’ve replatformed 200+ data pipelines, led Databricks upgrades to v14.3 LTS, and driven performance tuning and partitioning strategies that improved processing throughput by ~40% and supported pipelines processing ~5 TB/day. I build architecture and implementation patterns that teams can adopt and extend. As an engineering manager, I’ve built and scaled squads, hiring and mentoring engineers, owning roadmaps, and partnering with Product, SRE, and data consumers to align platform capabilities with business SLAs. I manage cross‑functional delivery, run technical reviews, and institutionalize best practices for code quality, observability, and incident response. I’m results‑oriented on cost and reliability: delivered >$350K/year in cloud savings through cluster right‑sizing, Graviton migrations, EMR optimizations, and incremental load strategies, and rolled out Spark Expectations‑based observability that improved data reliability by ~40%. Certifications include Databricks Certified Data Engineer, Databricks Spark Developer, AWS Big Data Specialty, and CCA 175.

Experience

14 yrs 3 mos

Total Experience

5 yrs 3 mos

Average Tenure

3 yrs 7 mos

Current Experience

Nike

Lead Data Engineer, Data Platform

Oct 2022 – Present · 3 yrs 7 mos · Bengaluru, Karnataka, India · Hybrid

I lead two cross‑functional squads (14–16 engineers) responsible for Supply Chain & Planning data platform modernization, reliability, and cost optimization. I combine hands‑on architecture with senior technical leadership to drive Lakehouse strategy, cross‑team design, and delivery.
Led Databricks Lakehouse modernization and replatformed 200+ pipelines from EMR to Databricks (Unity Catalog), standardizing patterns and reducing operational complexity.
Drove evaluation and adoption of Databricks Serverless for selected pipelines to reduce management overhead and improve cost/performance tradeoffs.
Directed Databricks upgrade v9.1 → v14.3 LTS and EMR upgrades with zero production defects.
Delivered >$150K/year compute savings via cluster right‑sizing and Graviton migration and $45K/year storage savings through lifecycle policies and partitioning (part of >$350K/year modernization impact).
Designed and rolled out a Spark‑based observability & data‑quality framework (Spark Expectations) with reusable alerting and dashboards, improving data reliability by ~40% and reducing incident MTTR.
Implemented Spark tuning, Z‑Ordering, Liquid Clustering and architectural simplifications that reduced pipeline runtimes by ~43%.
Acted as principal‑level technical lead across teams: influenced platform design strategy, led architecture discussions with principal data engineers, and resolved cross‑team technical tradeoffs.
Built a centralized pipeline framework and YAML/templates for faster onboarding; introduced AI‑assisted development tools (GitHub Copilot, Cursor AI) to increase delivery velocity by ~25%.
Hired, mentored, and enabled 18+ engineers; partnered with Product, SRE, and data consumers to align platform capabilities with SLAs and business KPIs.
Tech: Databricks (Unity Catalog, Serverless, Workflows) · Delta Lake · Apache Spark / PySpark · EMR (Graviton) · S3 · Airflow · Kafka · Spark Expectations · Z‑Ordering · Liquid Clustering · GitHub · Jenkins

DatabricksDelta LakeApache SparkPySparkEMR GravitonAirflow+3

Epsilon india

3 roles

Software Engineering Manager

Promoted

Apr 2022 – Oct 2022 · 6 mos

Amazon EMR clusters with Apache Spark 2.4.0, PySpark, Hive, Oozie, Hue, Sqoop, HDFS, TEZ, etc
Design and development of data pipeline to load the data from Amazon S3 to HDFS, creating the Hive databases, Hive managed and external tables to load the data from external systems.
TEZ memory parameters tuning for Hive data load to fix the performance issue in data load
Oozie workflow for automation data to load and data processing
Pyspark script development for data transformation and data aggregation for data stored in Hive tables
Performance tuning of spark applications
The UNIX shell script development for automation of the data file copy from one S3 bucket to different S3 buckets.
Hive queries design and development for deriving the insights from GBs of data stored on Hive tables
spark, Hive parameters changes in EMR environment to fix the issues
Experience in handling AWS EMR environment in terms of configuring, restarting different services like Hive, Tez, HiveServer2,etc
The design and development of data extract interface using Oracle PLSQL and SQL to extract and transfer the data to external systems
The design and development of data migration strategies from Legacy application to Enterprise cloud-based application
Owning technical deliverable and handling the team from a technical delivery perspective
Cloudera certified associate Hadoop and Spark developer (CCA175)
AWS Certified solution architect associate
AWS Certified Big data - Specialty
Have a very good understanding of the AWS services like Kinesis streams, Kinesis firehose, Kinesis Analytics, DynamoDB, S3, RedShift, EMR, AWS Glue, etc
Datawarehouse, DataMart, and Database Design and Development.
Interface design and development to load the data from the external system into an Oracle database using Oracle PLSQL, SQL
Database objects like table, procedure, functions, and view, etc design and development
Performance tuning of the application interfaces

Amazon EMRApache SparkPySparkHiveOozieSqoop+4

Lead Software Engineer

Jan 2019 – Mar 2022 · 3 yrs 2 mos

Senior Software Engineer

Jul 2017 – Dec 2018 · 1 yr 5 mos

Tesco hsc

3 roles

Senior Software Engineer

Promoted

Sep 2013 – Jun 2017 · 3 yrs 9 mos

Over 5 years of professional experience in software industry with comprehensive knowledge through all phases of software development lifecycle right from requirement analysis to maintenance of project
Currently working as a Senior Software Engineer at Tesco HSC, Bangalore since September 2011.
Engineered expansive big data infrastructures on AWS, accommodating data expansion from hundred terabytes to over 2 petabytes, which doubled the system's capacity to manage an annual data growth of 200%.
Directed high-efficiency ETL pipelines with AWS EMR and PySpark, processing terabytes of data daily into AWS S3, cutting transformation time by 45% and facilitating real-time analytics.
Defined Hive tables using HiveQL, handling and analyzing more than 5 TB of data across these tables to facilitate data driven decision making in organizational projects.
Demonstrated solid experience with Big Data services including Spark, HDFS and Hive, optimizing data processing workflows that handled over 25 petabytes of data annually, which increased operational efficiency and reduced costs by 15%.
Pioneered excellence in managing a Hive data warehouse by creating and maintaining over 50 tables, optimizing data distribution through partitioning and bucketing techniques, and enhancing query performance by 30% through HiveQL optimizations.
Migrated 20+ MapReduce programs to Spark transformations using PySpark, resulting in a 40% reduction in processing time and improving job performance and scalability within the data processing workflows.
Good knowledge in development and implementation of applications using UNIX shell scripting and Oracle PLSQL.
Extensive experience in dealing on backend data using PLSQL queries, procedures and shell scripts.
Good Experience in working under tight deadlines.
Received many Client Appreciations and Recognition for the good work done within the team and across.