Saketh Mukkanti

Data Engineer

Bengaluru, Karnataka, India3 yrs 6 mos experience

Key Highlights

Expert in designing end-to-end data pipelines.
Proven track record in data lakehouse implementations.
Strong collaboration with cross-functional teams.

Stackforce AI infers this person is a Data Engineering expert in Fintech and SaaS industries.

Contact

Skills

Core Skills

Data EngineeringCloud TechnologiesData LakehouseData Processing

Other Skills

Apache AirflowSnowflake CloudApache SparkKubernetesAmazon S3AWS Glue Data CatalogTrinoNode.jsPythonSnowflakeMySQLData PipelinesPySparkApache HiveETL

About

I have 3.7 years of experience working as a data engineer, I design and develop end-to-end data pipelines for various areas such as product and growth analytic teams, using Flink, Kafka, Spark, Iceberg and other technologies. I collaborate cross-functionally with business teams and stakeholders to understand their needs, provide insights, and deliver value. I also obtained multiple certifications in big data analytics, Hadoop, and Apache Spark from LinkedIn, enhancing my skills and knowledge in the domain. I graduated from Indian Institute of Technology (IIT), Guwahati with a Bachelor's in Computer Science and Engineering, where I learned the fundamentals of data structures, algorithms, databases, and software engineering. I am passionate about solving complex problems, optimizing performance, and automating processes, using data-driven and innovative approaches. Apart from work i like traveling, watching movies, anime and football. Skills: - Apache Flink - Apache Kafka - Apache Spark - Apache Airflow - Kubernetes - Docker - AWS - Snowflake - Microsoft Azure Cloud - Apache Hive - Apache Hbase - Apache Iceberg - NoSQL Databases - MYSQL - ETL (Extract, Transform and Load) - Python - DBT - XML - Jinja

Experience

3 yrs 6 mos

Total Experience

1 yr 3 mos

Average Tenure

10 mos

Current Experience

Apple

Cloud Data Engineer

Jul 2025 – Present · 10 mos · Bengaluru, Karnataka, India · Hybrid

Apache AirflowSnowflake CloudData EngineeringCloud Technologies

New relic

Data Engineer

Jul 2024 – Jun 2025 · 11 mos · Bengaluru, Karnataka, India · Hybrid

Designed and implemented a Data Lakehouse using Apache Spark on Kubernetes, Amazon S3 (Iceberg format), Apache Airflow, AWS Glue Data Catalog, and Trino enabling scalable and cost-efficient data processing.
Built a Node.js application to streamline competitor data ingestion, processing account samples and storing JSON outputs in Amazon S3, enabling real-time insights into competitor usage trends.
Established a robust data validation framework using Apache Spark for seamless data lake migration from Snowflake to Amazon S3 (Apache Iceberg format), ensuring stakeholder alignment.
Automated daily and hourly Slack alerts for new product capability opt-ins using Airflow, Snowflake, and Python. This improved customer onboarding experience resulting in increased user opt-in conversion.
Collaborated with teams to resolve data issues, ensuring stakeholder alignment on objectives and timelines.

Apache SparkKubernetesAmazon S3Apache AirflowAWS Glue Data CatalogTrino+4

American express

2 roles

Data Engineer II

May 2024 – Jul 2024 · 2 mos · Bengaluru, Karnataka, India

MySQLData PipelinesData Engineering

Data Engineer | Pyspark, Hive, Hbase, ETL

Aug 2022 – May 2024 · 1 yr 9 mos · Bengaluru, Karnataka, India

Framework Building
Developed a software application end to end utilizing PySpark to generate spreadsheet and XML outputs for around 200 predefined rules for detecting potential money laundering activity within AMEX transaction data.
Implemented a new feature in the Data processing pipeline for 7 new data feeds to generate reports on high volume data in XML format. This change facilitated easier investigation and in depth analysis in downstream systems.
Provide Data Modeling, Data solutions and development of new capabilities for Business Requirement given by Product Managers, also perform root cause analysis(RCA) for issues and maintenance in pipelines, automate manually intensive processes, and collaborate cross-functionally with business customers, product, tech and data scientist teams.
Spark Optimization & Automation
Applied Spark Optimization techniques and converted shell scripts to run on Pyspark that reduced overall execution time of the application by 40% and count of failures by 80%.
Created utility programs aimed at enhancing self-service support for customers by automating the reprocessing of failed pipeline executions on an hourly basis. This initiative led to a 40% decrease in system downtime.
Established Sisense Business Intelligence tool within the application framework to develop an automated job tracking system. This enabled real-time monitoring of job statuses, Data analytics, and performance metrics, resulting in a 75% reduction in manual effort for status updates verification.
Near Real Time Data Pipeline - POC
Completed proof of concept for near real-time presentation of transaction data to the financial investigations team.
The solution included Hbase as the backend database and includes an ETL data pipelines that load data into Hbase tables. This allowed investigators to query and analyze transaction records with minimal latency.