SAURAV KUMAR JHA

Data Engineer

Bengaluru, Karnataka, India3 yrs 8 mos experience

Key Highlights

  • Reduced data processing costs by 10%
  • Improved data accessibility and usability by 20%
  • Boosted data processing efficiency by 25%
Stackforce AI infers this person is a Data Engineer specializing in cloud-native data solutions and ETL processes.

Contact

Skills

Core Skills

Data EngineeringAwsEtl

Other Skills

AWS AthenaAWS CodePipelineAWS GlueAWS LambdaAWS RedshiftAWS S3AWS ServicesAirflowAmazon AthenaAmazon EC2Amazon RedshiftAmazon S3Amazon Web Services (AWS)Apache AirflowApache Livy

About

Data Engineer with 3+ years of experience in designing, implementing, and optimizing data pipelines, ETL processes, and data warehousing solutions on AWS using Apache Spark and Python. Skilled at driving measurable impact — reduced data processing costs by 10% and improved data accessibility and usability by 20%. Proficient in cloud platforms (AWS, GCP, Azure) and big data technologies, with strong problem-solving and collaboration skills. At Algonomy, I led the creation of a comprehensive ETL pipeline on AWS, orchestrating Lambda-driven workflows that boosted data processing efficiency by 25%. I also spearheaded the development of an agile data ingestion framework using Python, building 20+ custom connectors from scratch, cutting pipeline development time by 30%, and conducting PoC testing for Fivetran connectors to enhance scalability and integration. Driven by a passion for scalable, cloud-native data solutions, I focus on delivering systems that enable organizations to make smarter, faster, and data-driven decisions.

Experience

3 yrs 8 mos
Total Experience
1 yr 10 mos
Average Tenure
1 yr 10 mos
Current Experience

Algonomy

Big Data Engineer

Aug 2024Present · 1 yr 10 mos · Bengaluru, Karnataka, India · Remote

  • Big Data Engineer
  • This Databricks project cleans, transforms, and analyzes online retail data, including customer behavior analysis, sales forecasting, and clear visualizations for non- technical stakeholders.
  • Created a centralized data lake for multiple business units using AWS S3, enabling efficient data storage and access. • Migrated data from SQL Server to MySQL on Data- sync, ensuring smooth transition and minimal down- time. • Implemented Slowly Changing Dimensions (SCD1 and SCD2) using fact and dimension tables for retail data, enhancing reporting and analytics. • Optimized data processing using AWS Redshift and Athena, reducing query times by 30%.
AWS S3Data LakeSQL ServerMySQLSlowly Changing DimensionsAWS Redshift+3

Tiger analytics

2 roles

Senior Software Engineer

Promoted

Jan 2023Oct 2023 · 9 mos · Bengaluru, Karnataka, India · Remote

  • Big Data Engineer
  • Ensured Data Accuracy & Completeness in End-to-End Pipeline. • Played a pivotal role in testing the end-to-end pipeline in Databricks, meticulously checking dataset count and format, and analyzing source data for quality concerns throughout the ETL process. • Executed comprehensive test cases to validate the ETL process, ensuring the integrity and reliability of data transformations. • Identified defects and issues in the ETL process, collaborating closely with cross- functional teams to rectify them promptly, thereby enhancing the efficiency and effectiveness of data pipelines. • Developed transformation code and robust test cases to guarantee data accuracy and completeness, leveraging expertise in Databricks, Google Cloud Storage, BigQuery, and Hive.
DatabricksGoogle Cloud StorageBigQueryHiveData EngineeringETL

Software Engineer

Nov 2021Dec 2022 · 1 yr 1 mo · Bengaluru, Karnataka, India · Remote

  • Big Data Engineer
  • Led the creation of an end-to-end ETL pipeline using AWS Services, from ingesting raw data to processing and moving it to staging zones. • Implemented masking and casting transformation of specified fields, with field names dynamically read from a configuration file. • Designed the pipeline to trigger upon the upload of raw data in the landing zone, ensuring seamless automation of data Processing. • Utilized Airflow to orchestrate DAGs, triggering lambda functions for pre and post- validation checks on the code. • Improved data processing efficiency by 25% through optimization techniques and pipeline enhancements.
  • Spearheaded the development of a comprehensive data ingestion framework leveraging Python libraries, including Spark, to efficiently extract and load data from diverse sources into centralized data lakes. • Engineered over 20 custom connectors from scratch using Python libraries and APIs, facilitating seamless data ingestion from sources such as Google Analytics, Instagram, etc. • Reduced development time for new data pipelines by 30% through strategic implementation of optimized processes and automation techniques. • Conducted Proof of Concept (POC) testing for Fivetran connectors, evaluating their performance and usability to inform integration decisions. • Designed and implemented RESTful APIs for data connectors using the FastAPI framework, ensuring efficient communication between various components of the data infrastructure. • Developed a user-friendly UI platform using React.js, enabling intuitive data transfer operations from source to destination, enhancing accessibility.
AWS ServicesAirflowPythonSparkRESTful APIsFastAPI+3

Education

Asansol Engineering College (AEC)

Bachelor of Technology - BTech — Computer Science

Indian Institute of Technology, Patna

Master of Technology - MTech — Artificial Intelligence and Data Science

SAURAV KUMAR JHA - Data Engineer | Stackforce