Nikhil Chole

Data Engineer

Mumbai, Maharashtra, India5 yrs 9 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Designed CI/CD pipelines reducing deployment time significantly.
Developed a risk profiling framework recognized with awards.
Engineered real-time fraud detection enhancing security.

Stackforce AI infers this person is a Data Engineering expert in Fintech, specializing in data pipeline architecture and real-time analytics.

Contact

cholenikhil1998@gmail.com LinkedIn

Skills

Core Skills

Data Pipeline ArchitectureDistributed Data ProcessingData Warehouse ArchitectureData EngineeringData Integration

Other Skills

PySparkBig DataJenkinsApache IcebergControl-MHiveData ModelingApache SparkPerformance OptimizationHadoopPython (Programming Language)KafkaSpark StreamingETLData Warehousing

About

An accidental Data Engineer who fell in love with the world of data systems :DOver the last 5+ years, I’ve built reliable data pipelines and platforms to fuel analytics and ML systems.Core Strengths:- Data pipeline architecture experience with on-premises and Cloud platforms (AWS).- Deep understanding of distributed data processing.- Proven Experience in Data modeling and warehouse design- Data Structures and Algorithms and Performance optimization for large-scale datasetsTech Stack:Python • Apache Spark • Kafka • SQL • Hive • Iceberg • S3 • CICD • Git • Informatica Power Center • Control M • Oracle Exadata • MPP Databases • Neo4j • Cypher I am particularly interested in distributed systems, data platform architecture, and building reliable data infrastructure at scale.

Experience

5 yrs 9 mos

Total Experience

5 yrs 9 mos

Average Tenure

5 yrs 9 mos

Current Experience

Axis bank

2 roles

Senior Data Engineer

Promoted

May 2024 – Present · 2 yrs · Mumbai · Remote

Designed and implemented a Jenkins CI/CD pipeline with a reusable shell script to standardize PySpark deployments across all Spark jobs. This removed the need for job-specific scripts and centralized JAR dependency management, reducing deployment time and saving several hours of manual effort in each release cycle.
Migrated legacy 100+ Hive tables to Apache Iceberg and built an automated maintenance pipeline. This modernization of the lakehouse architecture significantly reduced HDFS storage usage and removed the need for manual table maintenance.
Built a Credit Card Customer Feature Store that consolidated customer-level attributes into a single source of truth. This enabled reuse across multiple analytics, campaign, and ML pipelines, reducing duplicate development work for Data Science and Analytics teams.
Implemented a centralized source control framework integrated with Control-M file watcher jobs. Spark pipelines now trigger automatically only when all upstream data sources are ready, eliminating manual dependency monitoring.
Optimized existing Spark jobs by tuning configurations and restructuring processing logic, achieving an 80% reduction in processing time and lowering compute costs across the data platform.

PySparkBig DataJenkinsApache IcebergControl-MData Pipeline Architecture+1

Data Engineer

Aug 2020 – May 2024 · 3 yrs 9 mos · Mumbai · Remote

Developed a unified customer risk profiling framework integrating data from 10+ banking products for ~30M customers. This enabled centralized risk visibility across banking channels and was recognized with the Economic Times CIO Award 2024 for Excellence in Technology Implementation – Business Resilience Impact and an internal BIU Star Award.
Engineered real-time fraud detection pipelines using Kafka and Spark Streaming to process high-velocity transaction data. This enabled instant identification of suspicious transactions and strengthened fraud prevention across digital banking channels.
Implemented advanced data validation and sanitization processes within data pipelines. This reduced data anomalies by 30% and improved data quality, which increased fraud detection effectiveness by 20%.
Built a centralized fraud data mart and optimized ETL pipelines. This reduced processing time by ~80% and enabled faster fraud investigations and reporting for analytics and risk teams.
Collaborated closely with risk, fraud analytics, and compliance teams to design scalable data solutions. These systems helped prevent ₹12+ crore in financial losses and improved customer trust in digital banking platforms.

HadoopPython (Programming Language)KafkaSpark StreamingData EngineeringData Integration