Bitan Sarkar

Data Engineer

Pune, Maharashtra, India4 yrs 10 mos experience

Most Likely To Switch

Key Highlights

Expert in optimizing data pipelines for real-time analytics.
Skilled in building scalable data solutions using Azure and Databricks.
Passionate about bridging data engineering and machine learning.

Stackforce AI infers this person is a Data Engineer specializing in Fintech and Data Architecture.

Contact

Skills

Core Skills

Data EngineeringPerformance OptimizationData Integration

Other Skills

API Data IngestionAlgorithmsArduinoAzure Data FactoryAzure Data LakeBig DataCC++Data AnalysisData GovernanceData LakesData Pipeline AutomationData QualityData ReliabilityData Structures

About

I’m a Data Engineer with 4+ years of experience designing, building, and optimizing data platforms, ETL/ELT pipelines, and lakehouse architectures across Azure and Databricks ecosystems. I specialize in leveraging Python, SQL, and PySpark to create scalable, high-performance data solutions that power analytics and business decisions. At Mastercard, I focus on optimizing large-scale data pipelines, reducing query latency, and improving data reliability for analytics teams. Previously, I built data warehouses and lakehouses using Azure Data Factory, dbt, and PostgreSQL, integrating data modeling, governance, and automation best practices. I’m passionate about data architecture, streaming pipelines, and big data frameworks like Apache Spark, Kafka, and Airflow. Recently, I developed a RAG-based AI agent that integrates knowledge graphs, vector databases (Pinecone, FAISS), and LLMs to deliver contextual insights from financial data. I thrive in solving complex data problems end to end — from design to deployment — and mentoring teams to build robust, future-ready data systems. Always exploring ways to bridge data engineering, machine learning, and GenAI for intelligent, automated decision-making.

Experience

4 yrs 10 mos

Total Experience

1 yr 7 mos

Average Tenure

1 yr 11 mos

Current Experience

Mastercard

Data Engineer 2

Jun 2024 – Present · 1 yr 11 mos · Pune, Maharashtra, India · Hybrid

Optimized complex SQL queries with partitioning, indexing, and caching techniques, reducing query execution time by 25% and enabling near-real-time analytics on multi-billion row datasets.
Enhanced platform capabilities by refactoring Python feature modules, achieving a 20% reduction in resource utilization and improving scalability. Leveraged modules like DuckDB and PyCaret to gain insights into data.
Leveraged PySpark to automate large-scale synthetic dataset generation, cutting generation time by 85%, which accelerated analytical testing and reduced dependency on production data.

ETL developmentSQLPythonPySparkData Pipeline AutomationPerformance Optimization+1

Finarb

Data Engineer

Dec 2022 – Jun 2024 · 1 yr 6 mos · Kolkata, West Bengal, India

Developed ETL pipelines in Azure Data Factory for migrating on-premise databases to Azure SQL DB, establishing a robust Data Lakehouse infrastructure with an incremental load solution, improving data processing speed and reducing storage costs.
Engineered custom PySpark code for EDA plots and ML data preparation. Integrated FastAPI for seamless data access.
Constructed a tailored Data Warehouse using DBT and Postgres for reporting, including Incentive Optimization and data extraction, reducing client spend significantly.
Conducted EDA, sensitivity analysis, and statistical tests using Pandas, Numpy, and Plotly. Created PowerBI reporting dashboards that increased data visibility across teams.

Azure Data FactorydbtPostgreSQLPythonFastAPIPandas+5

Lti - larsen & toubro infotech

Data Engineer

Jul 2021 – Dec 2022 · 1 yr 5 mos · Mumbai, Maharashtra, India

Spearheaded the development of Python-based ETL pipelines for API-driven ingestion, reducing manual processing by 40% and increasing reliability with automated validation.
Developed optimized dynamic procedures and triggers in MS SQL, resulting in reduced processing time across project units. Reusable solutions improved efficiency and reduced errors.

PythonSQLETL developmentAPI Data IngestionData Engineering