Saurabh Chawla

Software Engineer

Bengaluru, Karnataka, India13 yrs 4 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in optimizing Big Data systems and Spark SQL.
Proven track record in cloud migration and performance tuning.
Significant contributions to open source Apache Spark.

Stackforce AI infers this person is a Big Data and Cloud Computing expert with strong experience in SaaS environments.

Contact

s_saurabh100@yahoo.co.in LinkedIn

Skills

Core Skills

Big Data SystemsData EngineeringMaster Data Management

Other Skills

Agile MethodologiesApache SparkAutoscalingCloud ComputingCore JavaData MigrationData ModelingGCPGroovy ScriptHibernateJSPJavaJava Enterprise EditionJava Web ServicesJavaServer Pages (JSP)

About

Experienced Software Developer with a demonstrated history of working in various projects related to internal of Big Data Engines, data lakes, distributed systems, Master Data Management, Enterprise Application Integration. Areas of Interest Big Data Systems - Building, Scaling and Optimizing Big Data Engines (Apache Spark) on Cloud, Distributed Systems, Data Engineering, Data lake Master Data Management Products - Building and integrating PIM/CDM using Big Data Engine.

Experience

13 yrs 4 mos

Total Experience

2 yrs 8 mos

Average Tenure

3 yrs 11 mos

Current Experience

Oracle

Principal Member of Technical Staff

Jul 2022 – Present · 3 yrs 11 mos · Bengaluru, Karnataka, India

Working as a member of the Data Flow SqlEndPoint team.
Data Flow SQL Endpoints is designed for developers, data scientists, and advanced analysts to interactively query data directly where it lives in the data lake.Support major Business Intelligence (BI) tools using ODBC or JDBC connections with IAM credentials.
(https://blogs.oracle.com/cloud-infrastructure/post/introducing-oci-data-flow-sql-endpoints ,
https://docs.oracle.com/en-us/iaas/data-flow/using/sql-endpoints-before-begin.htm#sql-endpoints-before-you-begin)

Salesforce

Senior Member Of Technical Staff

Mar 2021 – Jun 2022 · 1 yr 3 mos · Bengaluru, Karnataka, India

Worked as a member of the Unified Intelligence Platform (UIP) team.
Worked on performance optimisation of Spark SQL - Worked on optimizing the Joins in
Spark Sql, Created the framework for optimising the Deduplication process using Bloom
Filters in Spark.
Worked on Optimising the Cost on Cloud - Created a framework in the Spark where the
compute is done on the on-premise environment and the data is stored in the Cloud
Storage(S3 in aws).
Added the Support for Map type columns in the “Group By” in spark. This is needed for
migration of workloads from Hive to Spark.
Onboarding the Apache Iceberg (open table format) in UIP to read/write through
Spark, Presto, Hive.
Migration from on-premise Systems to Cloud (AWS) - Helped the team in migrating the
workloads from on-premise system to cloud by tuning various processes in the pipelines.
Contributed in Open Source Apache Spark - Fixed various bugs and added the
improvement in Apache Spark (https://github.com/SaurabhChawla100).

Apache SparkCloud ComputingPerformance OptimizationData MigrationOpen Source ContributionBig Data Systems+1

Qubole

Member Of Technical Staff 3

Dec 2017 – Nov 2020 · 2 yrs 11 mos · Bengaluru, Karnataka, India

Worked as a member of Spark Engineering Team in Qubole.
Handling Spark Jobs to use Preemptible VM (PVM) in GCP and Spot Nodes for Spark
(https://www.qubole.com/tech-blog/spark-cluster-optimization-for-cost-reliability-and-performance/).
Worked on Autoscaling in Spark - Optimised Auto Scaling Algorithms in Qubole Spark(Stage Level Autoscaling, Lag Aware Adaptive Autoscaling)
Worked on performance optimisation of Spark SQL - Dynamic Filtering, Dynamic Partition Pruning, Intersect optimisation and many other optimisations.
Worked on Spark History Server - Added the new Optimised Storage in event log file,Persistent Spark History Server and many other optimisations.
Worked on various modules related to Spark Core like task scheduling and shuffle handling.
Contributed in Open Source Apache Spark - Fixed various bugs and added PR for improvement in Apache Spark(https://github.com/SaurabhChawla100)
Worked on bringing on a new version of Qubole Spark as soon as it's released in Open Source Apache Spark.Handled CI/ CD pipeline to bring the new version of Spark(https://www.qubole.com/tech-blog/introducing-apache-spark-3-0-on-qubole/).
Worked on back-porting any bug and performance related patch from the master branch of Open Spark to lower version of Qubole Spark used by various customers.
Worked as Mentor for interns and newly joined members in the Team.
Helped Support Team, Solution Architects and CSM for any customer issues /POC while running Spark Application in Qubole.
Coordinated and worked with other engine teams like Hadoop, Hive etc.

Apache SparkPerformance OptimizationAutoscalingOpen Source ContributionBig Data Systems

Wipro limited

Technical Lead

Dec 2015 – Nov 2017 · 1 yr 11 mos

Worked as Technical Lead on the TIBCO Stack and JAVA.
Worked with Senior Architects and Business Analyst to identify business rules and requirements and re-engineered them using TIBCO Master Data Management ,TIBCO BW, Java.
Worked on Architecting and Building the Data model and integration with third party services
Worked on designing and building the Rule Base, Workflow,BW Process & Custom JAVA FUNCTIONS for various Modules.
Optimizing the sql queries by scanning the sql plans and improving the performance.

TIBCOJavaMaster Data Management

Tibco software

MDM/EAI Consultant

Jul 2012 – Nov 2015 · 3 yrs 4 mos · Pune Area, India

Worked as Consultant on the TIBCO Master Data Management , TIBCO BW ,TIBCO BE and JAVA.
Created the Rule Base, Workflow,BW Process,BE Rules & Custom JAVA FUNCTIONS for various Modules.
Attended meetings with Business Analyst and Senior Architects to identify business rules and requirements and re-engineered them using TIBCO MDM,TIBCO BE and TIBCO BW features.
Design the data model and other artifacts.

TIBCOJavaMaster Data Management