Sachin Gupta

Data Engineer

Bengaluru, Karnataka, India11 yrs 9 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Over 10 years of experience in Big Data project development.
Proven track record of optimizing Spark applications.
Expertise in both batch and real-time processing.

Stackforce AI infers this person is a Data Engineering expert with a focus on Big Data solutions in Fintech and SaaS industries.

Contact

sachinguptajobs@gmail.com LinkedIn

Skills

Core Skills

Apache SparkData EngineeringCloud ComputingFraud DetectionData AccessibilityCompliance EngineeringData ProcessingSecurity EngineeringUser Interface DevelopmentBusiness Intelligence

Other Skills

JavaAzure DatabricksMavenScalaPython (Programming Language)SQLAWS GlueAmazon S3Amazon AthenaAWS EMRElectronic Medical Record (EMR)Google Cloud Platform (GCP)apache rangerKerberosMapReduce

About

> Experienced Data Engineer with over 10 years in Big Data project development. > Proficient in Hadoop, Spark, MapReduce, Java, Scala, and Python. > Skilled in both batch and real-time processing with expertise in on-premises and cloud computing (AWS, Azure, GCP). Proven track record of optimizing Spark applications, leading to a 50% performance improvement. > Adept at using orchestration tools (Airflow, Azkaban, Oozie) and CI/CD pipelines (GitLab, Jenkins). Committed to automating processes and continuously learning new technologies.

Experience

11 yrs 9 mos

Total Experience

1 yr 11 mos

Average Tenure

3 yrs 9 mos

Current Experience

Adobe

Senior Data Engineer

Aug 2022 – Present · 3 yrs 9 mos · Bengaluru, Karnataka, India · Hybrid

Spearheaded the development of a self-service software platform for creating and managing data pipelines, enabling users across business analysts to developers to independently build, deploy, and maintain data workflows.
Led a high-performing team to deliver optimized data pipelines across both on-premise and cloud environments, ensuring scalability, efficiency, and reliability for diverse business needs.
Architected a comprehensive solution for migrating workloads from on-premise systems to the cloud using Databricks, streamlining data processing and enhancing performance while reducing operational overhead.

JavaApache SparkAzure DatabricksMavenScalaPython (Programming Language)+2

Paytm

Senior Data Engineer

Dec 2020 – Aug 2022 · 1 yr 8 mos · Bangalore Urban, Karnataka, India

Collaborated closely with data scientists on multiple use cases, contributing to the development of data-driven solutions that enhanced decision-making processes.
Built a fraud detection network using Apache Spark, leveraging calculated metrics to identify potential fraud before loan disbursals, which significantly reduced fraudulent activities and improved the loan approval process for merchants.
Created a data mart for business analysts, enabling easy exploration of data, and built multiple OLAP cubes based on user requests to enhance data accessibility and analysis.
Conducted code reviews to ensure code quality, consistency, and adherence to best practices, fostering a culture of excellence within the team.
Developed a scaffolding framework for data pipelines, improving pipeline maintenance, scalability, and reliability for long-term operational efficiency.

AWS GlueAmazon S3MavenAmazon AthenaApache SparkAWS EMR+5

Citi

Assistant Vice President (Apps Dev Sr. Programmer Anlyst)

Nov 2019 – Nov 2020 · 1 yr · Belfast, United Kingdom

Worked on a compliance project where all company communications were monitored to detect misconduct and issues related to trading, ensuring adherence to regulations.
Applied Natural Language Processing (NLP) rules in Apache Spark to enhance the processing and analysis of unstructured data.
Optimized existing Spark applications, resulting in significant performance improvements and more efficient data processing.

MavenApache SparkSQLJavaCompliance Engineering

Exadatum software services pvt. ltd.

Technical Lead

Jul 2018 – Oct 2019 · 1 yr 3 mos · Pune, Maharashtra, India · On-site

Worked with a client to migrate data from IoT devices, such as smart TVs, smart refrigerators, and smart doorbells, into S3 using Apache Spark, enabling efficient storage and analysis.
Developed data pipelines to support machine learning models, ensuring smooth data flow and readiness for model training and deployment.
Contributed as an individual contributor (IC) to the entire project lifecycle, from setting up the scaffolding to deploying production pipelines, ensuring seamless operations.
Worked on an in-house Configuration Management tool, streamlining configuration processes and improving overall system management efficiency.

Amazon S3Google Cloud Platform (GCP)Apache SparkSQLScalaJava+2

Lti

Senior Software Engineer

Jun 2017 – Jul 2018 · 1 yr 1 mo · Pune Area, India

Worked on a UI-based product that simplified the creation of data pipelines, enhancing user accessibility and operational efficiency.
Contributed to multiple aspects of the product, including developing an import/export utility to streamline data integration processes.
Implemented encryption for data at rest on HDFS, ensuring compliance with security standards and safeguarding sensitive information.
Enabled security within the Hadoop cluster using Ranger, providing robust access control and data protection.
Set up and configured a Spark standalone cluster, optimizing data processing and performance for the team.
Explored the Apache Spark source code to understand its internal workings and successfully implemented a feature for the product.

apache rangerAmazon S3MavenKerberosApache SparkAWS EMR+4

Datametica solutions private limited

BigData Software Engineer

Mar 2014 – Mar 2017 · 3 yrs · Pune Area, India

Built a data pipeline for Tableau reports, integrating clickstream and transactional data to deliver valuable business insights.
Collaborated with the community to resolve issues related to ORC corrupt blocks, ensuring data integrity and reliability.
Developed a utility to audit job execution, improving monitoring and tracking of data pipeline processes.
Worked on multiple use cases involving clickstream data, enhancing analytics capabilities and providing actionable insights.
Developed a web crawler to capture data from two websites and compare product prices, enabling competitive analysis and market insights.
Contributed to the development of an in-house product for metadata management and data governance, streamlining data organization and compliance.

MapReduceSpring BootMavenSQLApache PigHive+3