Rakesh Raushan

CEO

Bengaluru, Karnataka, India6 yrs 9 mos experience

Key Highlights

Active contributor to Apache Spark and Iceberg.
Expert in building scalable data platforms.
Achieved 4x performance gain in data pipelines.

Stackforce AI infers this person is a Backend-heavy Fullstack Engineer specializing in Data Infrastructure and Big Data solutions.

Contact

Skills

Core Skills

Apache SparkData Infrastructure

Other Skills

JavaScalaDistributed SystemsData AnalyticsPythonSpring BootApache Spark StreamingPySparkApache AirflowDbtBig DataHiveC++SQL

About

Software engineer with 6+ years of experience in building core data platforms and optimizing execution engines, with a deep focus on Apache Spark and Table Formats. I enjoy solving complex scalability problems and building tools that automate data engineering at scale. Active Apache Spark contributor and currently focused on Apache Iceberg. Spark PRs: https://github.com/apache/spark/pulls?q=is%3Apr+author%3AiRakson+is%3Aclosed+sort%3Acomments-desc

Experience

6 yrs 9 mos

Total Experience

1 yr 9 mos

Average Tenure

1 yr 6 mos

Current Experience

Oracle

Principal Member of Technical Staff

Nov 2024 – Present · 1 yr 6 mos · Bengaluru, Karnataka, India · Hybrid

Building and Scaling Self serve ETL Tool
Designed YAML driven self serve aggregation framework which reduced ad-hoc requests for custom aggregations
Built AIDP connector for our ETL Tool
Designed and Implemented a pipeline which would be used for determining the pricing strategy for Fusion Data Intelligence Team. This pipelines ingests various resource usage events and computes the cost using them.
Refactored existing pipelines resulting in 4x performance gain (12 hrs -> 3 hrs)
Orchestrated a zero-data-loss migration strategy for large-scale fact tables during complex schema evolutions.

Apache SparkJavaData Infrastructure

Prophecy

Data Engineer

Feb 2024 – Oct 2024 · 8 mos · Bengaluru, Karnataka, India · Hybrid

Helped customers in migrating their legacy data solutions to modern data solutions.
Developed and optimized prophecy spark pipelines for customers.

ScalaApache SparkData Infrastructure

Visa

Senior Software Engineer

Sep 2022 – Feb 2024 · 1 yr 5 mos · Bengaluru, Karnataka, India · Hybrid

As part of data platform team, helped team in building data fabric solution for users.
Implemented Job submission module which submits given queries to spark/hive/presto in sync/async mode.
Added Spark config tuner which generated configs for applications dependent on past execution statistics, heuristics based on table/partition stats.
Helped multiple teams in migrating their spark applications to spark3. Migrated 40+ applications.

ScalaApache SparkData Infrastructure

Huawei technologies india

Software Engineer

Jun 2019 – Aug 2022 · 3 yrs 2 mos · Bangalore

At Huawei, i was part of spark team for huawei's cloud distribution.
Spark SQL and Catalyst has been my major focus areas:
Introduced incremental statistics to spark. Currently, users need to run expensive ANALYZE TABLE command after data changing queries to keep statistics updated. With incremental update, stats will be updated automatically after every data changing command which would enhance CBO performance.
Dynamic UDF: Allows users to update their UDF definitions without restarting session.
Upgraded spark's built-in hive version to 3.1, at the time open source was still using 2.3.
Contributed to spark open source project. 25+ commits including two built-in functions, refactoring of pagination framework, 10+ bug fixes and some documentation changes.

ScalaApache SparkData Infrastructure