Aditya Mishra

Data Engineer

Bengaluru, Karnataka, India5 yrs 10 mos experience

Key Highlights

  • Architected a real-time event streaming platform saving over $50K monthly.
  • Developed a self-service framework enhancing SQL performance by 3-5x.
  • Built a centralized Data Lakehouse ensuring comprehensive data governance.
Stackforce AI infers this person is a Data Engineering expert specializing in real-time data processing and cloud-based architectures.

Contact

Skills

Core Skills

Data EngineeringReal-time Data ProcessingData Platform DevelopmentSoftware Development

Other Skills

KafkaClickHouseDatabricksAirflowSparkPythonSQLPySparkApache FlinkJavaEMRScalaS3PostgresMongoDB

About

Passionate data engineer with expertise in developing robust and scalable data solutions. I specialize in leveraging cutting-edge technologies such as Apache Spark, Scala, Python, Airflow, and Kafka to build efficient ETL pipelines and optimize data workflows. With a solid background in SQL, I excel at designing and implementing data models and database solutions. Proficient in Snowflake and experienced in working with AWS services like EMR, S3, EC2, Glue, Redshift, and DMS, I have a strong understanding of cloud-based architectures and their integration with data processing frameworks. I am adept at deploying and managing large-scale data processing clusters, ensuring high availability and performance. My skills extend to working with databases such as Postgres and MongoDB, where I employ my expertise in data modeling, schema design, and query optimization. I have successfully implemented database solutions to support various business requirements, ensuring data integrity and efficient data retrieval.

Experience

5 yrs 10 mos
Total Experience
1 yr 5 mos
Average Tenure
1 yr 6 mos
Current Experience

Zepto

Data Engineer 2

Dec 2024Present · 1 yr 6 mos · Bengaluru, Karnataka, India · On-site

  • Platform Architecture & Cost Savings: Architected and deployed an in-house realtime event streaming platform using Kafka and ClickHouse, directly replacing Mixpanel. This initiative processes terabytes of events daily with sub-second latency and generates over $50K+ in monthly cost savings.
  • Self-Service Enablement & Optimization: Developed the self-service "Intelligent Flat Framework" on Databricks, Airflow, Spark, Python, Django, SQL, significantly reducing dependency on the core Data Team. Enhanced business analytics SQL query performance by 3-5x, accelerating dashboard load times and reducing compute costs by more than $25K+ monthly.
  • Data Lakehouse Construction & Unified Ingestion: Built and governed a centralized Data Lakehouse on Databricks using a Delta Lake Medallion Architecture (Bronze/Silver/Gold), establishing comprehensive data governance. Developed a generic CDC based ingestion framework (Databricks, Airflow, PySpark) to seamlessly handle both high-volume event data (realtime) and OLTP sources (Postgres, Mongo) managed centrally by the Data platform.
  • Real-Time Order and Inventory Monitoring: Engineered a mission-critical, real-time system using ClickHouse for mother hubs and dark stores, tracking delivery rider's call logs and order status. This enabled Order Management System (OMS) visibility and delivered query response times of <500ms.
  • Performance and Infrastructure Optimization: Single-handedly managed ClickHouse cluster optimization (sharding, replication, performance tuning, hybrid storage), cutting infrastructure costs by 40%.
  • Storage Efficiency: Reduced Delta Lake storage costs by 60% by standardizing on ZSTD compression over Snappy without affecting query run time.
KafkaClickHouseDatabricksAirflowSparkPython+3

Gokwik

Data Engineer

Nov 2022Dec 2024 · 2 yrs 1 mo · Remote

  • Built realtime data platform for OLAP data which process more than 200M records per day to replace snowplow paid tool and reduce cost from $10,000 to ~$3000 per month. Tech stack - Kafka, Apache Flink, Java, EMR, clickhouse.
  • As a Data Platform Engineer, I’m played a pivotal role in transforming data accessibility for the organization. I initiated and executed the development of the Organization's Data Platform, establishing it as a single source of truth from ground zero.
  • Key Achievements:
  • Successfully addressed data availability challenges by architecting and building a Data Lake from inception, encompassing Landing Zone, Raw (Hudi) Tables, and Models.
  • Led the entire lifecycle, including requirements gathering, architectural design, code development, testing, and production deployment.
  • Utilized a tech stack including Spark, Scala, EMR, S3, Airflow, EC2, Python, Athena, Glue, and adopted the Parquet format.
  • Managed data integration from diverse sources such as Postgres, MongoDB, GCP, S3, and Kafka.
  • Collaborated closely with cross-functional teams including tech, product, analytics, and data scientists to comprehend business use cases and independently deliver end-to-end solutions.
  • Implemented proactive monitoring with alerts on pipeline failures, data flow delays, and data sanity checks, ensuring swift resolution of production issues to mitigate recurrence.
  • Implemented Workload Management (WLM) activities to enhance Redshift efficiency.
  • Automated manual and ad-hoc tasks, streamlining workflows and increasing operational efficiency.
  • Conducted periodic reviews of the existing infrastructure, optimizing it for performance and cost-effectiveness.
  • Actively engaged in setting up efficient processes with various teams, fostering a collaborative and responsive work environment.
KafkaApache FlinkJavaEMRClickHouseSpark+6

Byju's

Software Engineer - Data Platform

Mar 2022Nov 2022 · 8 mos · Bengaluru, Karnataka, India

  • Built Organization's Centralized Data
  • Platform
  • Built & Handled Multiple product's
  • OLAP data pipelines (click stream data)
  • Built Streaming and Batch ETL/ELT
  • data pipelines
  • DataBases - Postgres, MongoDB
  • Data Migration from Cloud/enterprise to
  • cloud
  • Transformed & Managed Data
  • Warehouse (Snowflake) & Data Lake
  • Worked on AWS - S3, Glue, Redshift,
  • Appflow, Kinesis, EC2, MSK,Cloud watch
  • Technologies- Apache Spark, Kafka,
  • Python, Hive, SQL
PostgresMongoDBAWSS3GlueRedshift+7

Tata consultancy services

Data Engineer

Aug 2020Mar 2022 · 1 yr 7 mos · Noida, Uttar Pradesh, India

  • Worked as Data Engineer for Walgreens Boot Alliance
  • Worked in Data Ingestion Framework
  • Worked in Data Transformation
  • Led Data Ingestion Team [Team size 15], Data Ingestion Operations.
  • Led and developed two covid related pipelines using Databricks, Apache Spark, Scala, Azure, Blob, ADLS (data lake), and Azure ServiceBus [ team size 3 ]
DatabricksApache SparkScalaAzureBlobADLS+1

Jaza software opc pvt. ltd.

Software Development Engineer, Intern

Jan 2020Apr 2020 · 3 mos · Greater Bengaluru Area

  • Worked as java backend developer
  • Responsible to develop spring boot based web applications for apparel industry to track the manufacturing stage.
JavaSpring BootMavenSoftware Development

Education

Inderprastha Engineering College

Bachelor of Technology — Information Technology

Jan 2016Jan 2020

Stackforce found 100+ more professionals with Data Engineering & Real-time Data Processing

Explore similar profiles based on matching skills and experience