Aditya Mishra

Data Engineer

Bengaluru, Karnataka, India5 yrs 10 mos experience

Key Highlights

Architected a real-time event streaming platform saving over $50K monthly.
Developed a self-service framework enhancing SQL performance by 3-5x.
Built a centralized Data Lakehouse ensuring comprehensive data governance.

Stackforce AI infers this person is a Data Engineering expert specializing in real-time data processing and cloud-based architectures.

Contact

Skills

Core Skills

Data EngineeringReal-time Data ProcessingData Platform DevelopmentSoftware Development

Other Skills

KafkaClickHouseDatabricksAirflowSparkPythonSQLPySparkApache FlinkJavaEMRScalaS3PostgresMongoDB

About

Passionate data engineer with expertise in developing robust and scalable data solutions. I specialize in leveraging cutting-edge technologies such as Apache Spark, Scala, Python, Airflow, and Kafka to build efficient ETL pipelines and optimize data workflows. With a solid background in SQL, I excel at designing and implementing data models and database solutions. Proficient in Snowflake and experienced in working with AWS services like EMR, S3, EC2, Glue, Redshift, and DMS, I have a strong understanding of cloud-based architectures and their integration with data processing frameworks. I am adept at deploying and managing large-scale data processing clusters, ensuring high availability and performance. My skills extend to working with databases such as Postgres and MongoDB, where I employ my expertise in data modeling, schema design, and query optimization. I have successfully implemented database solutions to support various business requirements, ensuring data integrity and efficient data retrieval.

Experience

5 yrs 10 mos

Total Experience

1 yr 5 mos

Average Tenure

1 yr 6 mos

Current Experience

Zepto

Data Engineer 2

Dec 2024 – Present · 1 yr 6 mos · Bengaluru, Karnataka, India · On-site

Platform Architecture & Cost Savings: Architected and deployed an in-house realtime event streaming platform using Kafka and ClickHouse, directly replacing Mixpanel. This initiative processes terabytes of events daily with sub-second latency and generates over $50K+ in monthly cost savings.
Self-Service Enablement & Optimization: Developed the self-service "Intelligent Flat Framework" on Databricks, Airflow, Spark, Python, Django, SQL, significantly reducing dependency on the core Data Team. Enhanced business analytics SQL query performance by 3-5x, accelerating dashboard load times and reducing compute costs by more than $25K+ monthly.
Data Lakehouse Construction & Unified Ingestion: Built and governed a centralized Data Lakehouse on Databricks using a Delta Lake Medallion Architecture (Bronze/Silver/Gold), establishing comprehensive data governance. Developed a generic CDC based ingestion framework (Databricks, Airflow, PySpark) to seamlessly handle both high-volume event data (realtime) and OLTP sources (Postgres, Mongo) managed centrally by the Data platform.
Real-Time Order and Inventory Monitoring: Engineered a mission-critical, real-time system using ClickHouse for mother hubs and dark stores, tracking delivery rider's call logs and order status. This enabled Order Management System (OMS) visibility and delivered query response times of <500ms.
Performance and Infrastructure Optimization: Single-handedly managed ClickHouse cluster optimization (sharding, replication, performance tuning, hybrid storage), cutting infrastructure costs by 40%.
Storage Efficiency: Reduced Delta Lake storage costs by 60% by standardizing on ZSTD compression over Snappy without affecting query run time.

KafkaClickHouseDatabricksAirflowSparkPython+3

Gokwik

Data Engineer

Nov 2022 – Dec 2024 · 2 yrs 1 mo · Remote

Built realtime data platform for OLAP data which process more than 200M records per day to replace snowplow paid tool and reduce cost from $10,000 to ~$3000 per month. Tech stack - Kafka, Apache Flink, Java, EMR, clickhouse.
As a Data Platform Engineer, I’m played a pivotal role in transforming data accessibility for the organization. I initiated and executed the development of the Organization's Data Platform, establishing it as a single source of truth from ground zero.
Key Achievements:
Successfully addressed data availability challenges by architecting and building a Data Lake from inception, encompassing Landing Zone, Raw (Hudi) Tables, and Models.
Led the entire lifecycle, including requirements gathering, architectural design, code development, testing, and production deployment.
Utilized a tech stack including Spark, Scala, EMR, S3, Airflow, EC2, Python, Athena, Glue, and adopted the Parquet format.
Managed data integration from diverse sources such as Postgres, MongoDB, GCP, S3, and Kafka.
Collaborated closely with cross-functional teams including tech, product, analytics, and data scientists to comprehend business use cases and independently deliver end-to-end solutions.
Implemented proactive monitoring with alerts on pipeline failures, data flow delays, and data sanity checks, ensuring swift resolution of production issues to mitigate recurrence.
Implemented Workload Management (WLM) activities to enhance Redshift efficiency.
Automated manual and ad-hoc tasks, streamlining workflows and increasing operational efficiency.
Conducted periodic reviews of the existing infrastructure, optimizing it for performance and cost-effectiveness.
Actively engaged in setting up efficient processes with various teams, fostering a collaborative and responsive work environment.

KafkaApache FlinkJavaEMRClickHouseSpark+6

Byju's

Software Engineer - Data Platform

Mar 2022 – Nov 2022 · 8 mos · Bengaluru, Karnataka, India

Built Organization's Centralized Data
Platform
Built & Handled Multiple product's
OLAP data pipelines (click stream data)
Built Streaming and Batch ETL/ELT
data pipelines
DataBases - Postgres, MongoDB
Data Migration from Cloud/enterprise to
cloud
Transformed & Managed Data
Warehouse (Snowflake) & Data Lake
Worked on AWS - S3, Glue, Redshift,
Appflow, Kinesis, EC2, MSK,Cloud watch
Technologies- Apache Spark, Kafka,
Python, Hive, SQL

PostgresMongoDBAWSS3GlueRedshift+7

Tata consultancy services

Data Engineer

Aug 2020 – Mar 2022 · 1 yr 7 mos · Noida, Uttar Pradesh, India

Worked as Data Engineer for Walgreens Boot Alliance
Worked in Data Ingestion Framework
Worked in Data Transformation
Led Data Ingestion Team [Team size 15], Data Ingestion Operations.
Led and developed two covid related pipelines using Databricks, Apache Spark, Scala, Azure, Blob, ADLS (data lake), and Azure ServiceBus [ team size 3 ]

DatabricksApache SparkScalaAzureBlobADLS+1