Sumit Bajaj

Data Engineer

Bengaluru, Karnataka, India6 yrs 1 mo experience
Highly Stable

Key Highlights

  • Expert in building scalable data pipelines.
  • Proficient in modernizing legacy ETL systems.
  • Strong focus on data quality and governance.
Stackforce AI infers this person is a Data Engineer specializing in EdTech and Industrial IoT data solutions.

Contact

Skills

Core Skills

Data EngineeringCloud Data SolutionsData VisualizationReal-time Data Processing

Other Skills

ADLS Gen2AWS SQSAmazon EC2Amazon S3Amazon Simple Notification Service (SNS)Amazon Web Services (AWS)Azure Data FactoryAzure Data LakeAzure DatabricksCassandraData GovernanceData PipelinesDatabasesDatabricks SQLDelta Lake

About

Experienced Data Engineer with 6 years of expertise in building scalable data pipelines, modernizing legacy ETL systems, and enabling real-time analytics across domains like EdTech and Industrial IoT. Proficient in Python, SQL, PySpark, Azure, and Databricks, with a strong focus on data quality, governance, and performance optimization.

Experience

6 yrs 1 mo
Total Experience
3 yrs
Average Tenure
--
Current Experience

Tarento group

2 roles

Senior Data Engineer

Promoted

Nov 2022Sep 2025 · 2 yrs 10 mos · Bangalore Urban, Karnataka, India

  • Migrated complex Scala-based legacy ETL to PySpark on Azure Databricks, enhancing scalability, error handling, and maintainability.
  • Ingested incremental learner data from Cassandra into Delta Lake following the Medallion Architecture (Bronze → Silver → Gold).
  • Centralized all datasets into ADLS Gen2, establishing a unified and governed data lake with proper directory structuring.
  • Implemented Unity Catalog to manage fine-grained access control, table-level permissions, and schema governance for secure collaboration.
  • Enabled real-time stakeholder visibility by building Power BI dashboards on top of Databricks SQL endpoints.
  • Integrated third-party content usage data from AWS S3 using Autoloader, handling schema drift and late-arriving data efficiently.
  • Adopted event-driven ingestion to track user engagement, course completions, and improve analytics granularity.
PySparkAzure DatabricksCassandraDelta LakeADLS Gen2Unity Catalog+3

Data Engineer

Jun 2021Oct 2022 · 1 yr 4 mos · Bangalore Urban, Karnataka, India

  • Designed and implemented real-time streaming pipelines using Kafka, AWS SQS, and Spark Structured Streaming to capture learner interactions such as quiz attempts, course completions, and time spent.
  • Built robust batch processing workflows on Azure Databricks to ingest and process large volumes of telemetry data (IMPRESSION, START, END events) from MongoDB.
  • Adopted Medallion Architecture (Bronze → Silver → Gold) to enable clean, performant, and reliable data layers using Delta Lake with schema enforcement, Z-Ordering, and OPTIMIZE operations.
  • Managed data storage in ADLS Gen2 and ensured secure, governed access using Unity Catalog with fine-grained controls and audit capabilities.
  • Delivered real-time and scheduled dashboards using Databricks SQL and Power BI to empower decision-making for national and state-level monitoring teams.
KafkaAWS SQSSpark Structured StreamingAzure DatabricksMongoDBDelta Lake+4

Va tech wabag ltd.

Graduate Engineering Trainee: Data

Jul 2019Jun 2021 · 1 yr 11 mos

  • Centralized operational data from SCADA systems, lab reports, and Excel sheets into a unified SQL Server database.
  • Built end-to-end ETL pipelines using Python and SQL to automate daily ingestion of water quality and equipment performance metrics.
  • Developed Power BI dashboards for real-time monitoring of critical KPIs such as turbidity, pH levels, and filtration efficiency.
  • Implemented data validation and anomaly detection rules to flag abnormal chemical dosages and sensor readings.
  • Collaborated with process engineers to align data models with operational workflows, enabling timely and data-driven decision-making.
SQL ServerPythonPower BIData Engineering

Education

PSIT Kanpur (Pranveer Singh Institute of Technology)

Bachelor of Technology - BTech — Mechanical Engineering

Jan 2015Jan 2019

Stackforce found 100+ more professionals with Data Engineering & Cloud Data Solutions

Explore similar profiles based on matching skills and experience