Amit Bugalia

Software Engineer

Bengaluru, Karnataka, India10 yrs 5 mos experience

Highly StableAI Enabled

Key Highlights

Over 9 years of IT experience with Big Data specialization.
Expert in architecting scalable data pipelines and optimizing data workflows.
Proficient in managing terabytes of complex data daily.

Stackforce AI infers this person is a Big Data Engineer with expertise in SaaS data solutions.

Contact

Skills

Core Skills

Big DataData EngineeringData VisualizationSoftware Development

Other Skills

Amazon Elastic MapReduce (EMR)Apache HudiApache KafkaApache OozieApache RangerApache SparkApache Spark StreamingCore JavaData StructuresDatabricksDatadogDelta LakeEclipseExtract, Transform, Load (ETL)HBase

About

Over 9 years of IT experience, with 8+ years specializing in Big Data technologies, including Hadoop ecosystems. Proficient Big Data developer with extensive hands-on experience in Spark Streaming & Batch, Apache Hudi, Kafka, Delta Tables, HDFS, Apache Ranger, MapReduce, Yarn, HBase, Zookeeper, Airflow, and multiple file formats such as Avro, Parquet, JSON, and XML. Skilled in managing and processing terabytes of complex, high-volume data daily using advanced tools like SQL, GIT, BitBucket, Maven, and AWS.

Experience

10 yrs 5 mos

Total Experience

2 yrs 11 mos

Average Tenure

1 yr 7 mos

Current Experience

Eightfold ai

Staff Software Engineer

Oct 2024 – Present · 1 yr 7 mos

Teikametrics

Lead Software Engineer, Data Platform

Oct 2021 – Oct 2024 · 3 yrs · Bangalore Urban, Karnataka, India

Architected and implemented a highly scalable data pipeline leveraging Delta Lake, efficiently handling 5TB of daily data ingestion through Spark Structured Streaming and batch processing on Databricks.
Optimized Looker dashboards by designing a read-optimized presentation layer on Unity Catalog, improving performance, and successfully migrating complex queries from Snowflake to Databricks.
Developed a comprehensive cluster utilization and monitoring framework for Databricks, seamlessly integrating with Datadog and OpenSearch to ensure efficient resource management and real-time insights.
Engineered advanced partition pruning and Delta table parsing strategies, reducing infrastructure costs and significantly enhancing execution times for large-scale data workflows.

Delta LakeSpark Structured StreamingDatabricksLookerDatadogOpenSearch+2

Delhivery

2 roles

Lead Data Engineer

Jan 2021 – Sep 2021 · 8 mos

Managed a data warehouse handling 500 GB of daily volume, leading a team of 6 engineers responsible for the data platform’s ETL and streaming operations.
Integrated Apache Hudi into a large-scale data pipeline using Spark Streaming over S3 and Kafka, alongside developing a robust health monitoring framework for pipeline stability.
Implemented a query parsing framework to enhance data module quality and optimize cloud infrastructure costs.
Led cross-functional collaboration with various teams to establish a comprehensive data lake pipeline for application data.

Apache HudiSpark StreamingKafkaBig DataData Engineering

Senior Software Engineer

May 2019 – Jan 2021 · 1 yr 8 mos

Designed and owned the Cost Per Shipment (CPS) system, implementing an activity-based costing model using batch-based historical data analysis with Spark DataFrames for precise shipment-level cost allocation.
Resolved complex Spark issues including concurrent writes, memory management, shuffle inefficiencies, and write failures, gaining deep expertise in tuning Spark internals for optimal performance.
Introduced predictive analytics leveraging historical cost data, and implemented an outlier detection framework to ensure data accuracy and reliability.
Managed financial dashboards for PnL and other key finance metrics, using processed data from CPS jobs to provide actionable insights.

Spark DataFramesPredictive AnalyticsBig DataData Engineering

Nagarro

3 roles

Senior Associate

Promoted

Jan 2018 – Apr 2019 · 1 yr 3 mos · Gurugram, Haryana, India

Creating a 360-degree customer database for the client using data sourced from multiple data sources, which will contain unique customer data to be used for marketing purposes. Database will help segment customers and identify customer RFM, customer churn and customer lifetime value using SAS analytics
Worked on Implement business logic of customer id generation and customer 360 variable generation.
Worked on Storing Normalized and filtered raw data provided by multiple data sources in NoSQL Database HBase. Along with saving intermediate stages data in parquets in HDFS.

SASHBaseHDFSData Engineering

Associate

Sep 2016 – Dec 2017 · 1 yr 3 mos · Gurugram, Haryana, India

Developed a system to generate CSV files from SDP files using an N-array tree structure in XML, optimizing data transformation and storage.
Parsed complex XML trees using DOM parser, converting them into N-array POJO trees with Core Java for efficient data manipulation.
Optimized CSV file size by implementing dynamic virtual links to break down N-array trees in a streamlined manner.
Engineered algorithms to simplify tree structures into linear representations, enabling seamless CSV export, and designed configuration files using JSON to handle diverse XML parsing requirements.

XMLCore JavaSoftware Development