Rohit Pandey

Backend Engineer

Bengaluru, Karnataka, India10 yrs 3 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Reduced data latency to 1.5 minutes and storage costs by 25%
Increased data processing efficiency by over 50% using PySpark
Built a GPT-based RAG system for intelligent document search

Stackforce AI infers this person is a Data Engineering expert with a focus on cloud-native solutions in Fintech and Energy sectors.

Contact

+917975364620 LinkedIn

Skills

Core Skills

Data EngineeringCloud ArchitectureAi-ready PipelinesSoftware Development

Other Skills

AWS API GatewayAWS AthenaAWS CloudWatchAWS EMRAWS GlueAWS LambdaAWS RDSAWS S3AWS Step FunctionsAirflowAmazon CloudWatchAmazon EC2Amazon Elastic MapReduce (EMR)Amazon Relational Database Service (RDS)Amazon Step Functions

About

Lead Software Engineer – Data & Analytics with 10+ years of experience designing and building large-scale, cloud-native data platforms across AWS and Azure. I specialize in Snowflake, Databricks, Kafka, PySpark, Delta / Iceberg, and streaming/CDC architectures, with hands-on experience building high-performance, cost-optimized, and governed data systems. I have delivered measurable impact across organizations, including: • Reduced data latency to 1.5 minutes while cutting storage costs by 25% • Improved query performance by 30% through optimized Delta / Iceberg strategies • Increased data processing efficiency by 50%+ using PySpark and Databricks • Built a GPT-based RAG system using FAISS and PyMuPDF for intelligent document search My expertise spans: • Data Engineering & Streaming (PySpark, Kafka, CDC) • Lakehouse architectures (Delta, Iceberg, Snowflake, Databricks) • Cloud platforms (AWS, Azure) • AI-ready pipelines & vector search systems I’m passionate about building centralized, governed, and scalable data platforms that enable analytics, reporting, and future AI/ML capabilities.

Experience

10 yrs 3 mos

Total Experience

2 yrs

Average Tenure

Current Experience

Ltimindtree

Staff Data Engineer

Jul 2024 – Sep 2025 · 1 yr 2 mos · Noida, Uttar Pradesh, India · Hybrid

Built a high-performance financial data pipeline using Snowflake-Databricks, cutting latency to 1.5 minutes and reducing storage cost by 25%.
Developed a Databricks control table system to orchestrate workflows between shared Snowflake datasets and writable Delta tables, ensuring data integrity.
Designed a secure architecture that transformed read-only financial data into actionable analytics datasets through effective RBAC implementation.
Implemented Azure Databricks Autoloader for real-time data ingestion from S3, boosting ingestion speed by 45% and lowering compute costs by 30%.

SnowflakeDatabricksAzure DatabricksData PipelinesReal-time AnalyticsData Engineering+1

Nagarro

Staff Data Engineer

Aug 2021 – Jun 2024 · 2 yrs 10 mos · Bengaluru, Karnataka, India · Hybrid

Project1: RAG-Based Intelligent Search System for Chemical Agriculture Data Platform
Built a GPT-3.5 chatbot prototype for information retrieval, leveraging Retrieval-Augmented Generation (RAG) for enhanced response generation and ChatGPT API for Q&A functionality.
Implemented a PyMuPDF-based intelligent search system utilizing Faiss for efficient text embedding storage and retrieval, enabling users to quickly find relevant information within documents using dynamic GPT prompts and vector similarity search.
Project2: Chemical Agriculture Data Platform
Implemented efficient data pipelines using PySpark and Kedro to extract, transform, and load agricultural data into the Azure Data Lakehouse on Azure Databricks.
Optimized Delta table data models and partitioning strategies, resulting in a 30% improvement in query performance and a 20% reduction in storage costs for feature store operations.
Leveraged Snowpark API to build data pipelines for extracting trade data from Snowflake, increasing trade profitability by 23%.
Spearheaded Spark job deployment optimization on Azure Databricks, reducing deployment time by 57%.
Project3: Cumulus Media
Migrated large datasets from AWS S3 to AWS Glue ETL pipelines, reducing migration time by 51%. Also executed SQL queries using AWS Athena, improving query response time by 43% and enabling more efficient data management.
Integrated a WebSocket API using AWS API Gateway and AWS Lambda, reducing latency in real-time communication by 87% and enhancing data transfer efficiency.
Designed and developed a comprehensive data model to support a data warehouse initiative, resulting in a 40% improvement in data accessibility and a 25% reduction in data retrieval time.

PySparkKedroRetrieval-Augmented Generation (RAG)ChatGPT APIFaissData Engineering+1

Nokia

Senior Data Engineer

Apr 2020 – Aug 2021 · 1 yr 4 mos · Bengaluru, Karnataka, India · Hybrid

Project: DNA IBUS Platform
Implemented robust REST APIs using Python and Flask to extract data from multiple data lakes and load it into PostgreSQL for the DNA IBUS Platform, resulting in a 43% increase in data accessibility.
Analyzed large datasets using Pandas and in-memory data structures, uncovering hidden patterns that contributed to a 27% revenue increase through data-driven decision-making.
Developed a real-time data monitoring and alerting system that proactively identified and resolved production issues before they affected business operations.

PythonFlaskPostgreSQLPandasData Engineering

Autogrid

2 roles

Senior Software Engineer

Promoted

Mar 2018 – Mar 2020 · 2 yrs · Bengaluru, Karnataka, India · On-site

Project3: Real Time Communication Controller for Energy Management Systems
Developed ETL pipelines using PySpark, Kafka, and HBase to enhance data processing efficiency and reduce processing time by 57%.
Integrated IoT devices from Ecobee, Honeywell, LG Smart, Emerson’s Sensi, and Tesla Powerwall for real-time data ingestion and control, enhancing monitoring and decision-making capabilities.
Utilized Oozie workflow to schedule PySpark jobs, resulting in a 33% improvement in job scheduling and execution.

PySparkKafkaHBaseIoTData Engineering

Software Engineer

Jan 2017 – Mar 2018 · 1 yr 2 mos · Bengaluru, Karnataka, India · On-site

Project1: Weather Data Ingestion for Renewable Energy
Optimized energy production for renewable energy systems by writing code for weather forecasting that generated accurate predictions, enabling informed decision-making.
Collaborated with data scientists and domain experts to identify and implement data quality checks and validation processes for weather data ingestion.
Project2: Connector Factory for Time Series Data Analysis (Renewable Energy)
Established REST APIs using Python, Flask and SQLAlchemy data models to improve efficiency in complex prediction/forecasting algorithms on time series data data series data, resulting in a 43% improvement in data retrieval and analysis.
Containerized the environment with Docker and deployed it through Kubernetes, resulting in a 71% reduction in deployment time and a smoother development experience.

PythonFlaskDockerKubernetesSoftware Development

Tas information intelligence - india

Software Engineer

Oct 2014 – Jul 2016 · 1 yr 9 mos · Bengaluru, Karnataka, India · On-site

Built custom web scraping plugins to extract valuable data from diverse online sources, capturing over 10,000 data points per day.
Optimized existing data scraping models by 69% through transitioning from Python to C++, resulting in a 43% reduction in system resource utilization.
Implemented distributed processing system and a plugin using XPCOM components, from scratch to speed up execution.
Conducted thorough testing and debugging of data, ensuring accuracy and reliability, and delivering a high-quality product with a 73% accuracy rate.

PythonC++Web ScrapingSoftware Development