Manali Verma

Data Engineer

Bengaluru, Karnataka, India12 yrs 3 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

7+ years of experience in data engineering.
Expert in building scalable data pipelines.
Passionate about data quality and automation.

Stackforce AI infers this person is a Data Engineering expert in E-commerce and Cloud Computing.

Contact

Skills

Core Skills

Data EngineeringCloud InfrastructureData AnalyticsBusiness IntelligenceDevops

Other Skills

AWSAWS Step FunctionsAirflowAmazon Elastic MapReduce (EMR)Amazon Web Services (AWS)ApacheApache FlinkApache KafkaApache SparkArtificial Intelligence (AI)AutomationBig DataCC++CI/CD

About

I’m a Senior Data Engineer with 7+ years of experience building large-scale, real-time data pipelines and analytics platforms at Amazon. I’ve handled pipelines scaling from 500M to 14B records/day using PySpark, AWS EMR, and Redshift. Passionate about improving decision-making through data quality, automation, and low-latency insights. Now seeking challenging opportunities to build scalable data systems and deliver actionable insights at high-growth, data-driven companies.

Experience

12 yrs 3 mos

Total Experience

3 yrs

Average Tenure

7 yrs 1 mo

Current Experience

Self.

Interview Guider

May 2025 – Present · 1 yr 1 mo · Remote

Please join with me through https://topmate.io/manali_verma/

Amazon

6 roles

Senior Data Engineer

Promoted

Jun 2024 – Present · 2 yrs

Leading a team for end to end infrastructure for Customer behaviour and Amazon Retail Business services of Petabytes of data size (10 Billion), enabling product teams to analyze adoption, usage, and revenue trends, as well as assess the impact of new feature launches. Leveraged SQL and PySpark to create scalable data models and pipelines that efficiently processed large datasets.
Built and maintain data pipeline on 500 Millions per day to 14 Billion (300 TB) at the time of sales for Amazon Retail & Business in <10 min using AWS. Enabled and provided data to the Sciences team through AWS DDB and S3 to produce low latency analytics and insights for failure detection, time-critical service reactions, root cause analysis, business decisions and strategise the product selling and growth through Quicksight and Tableau dashboard.
Designed Data Quality detection and analysis API-based capability to identify the data quality issues using Glue catalog scripts, which aims to improve Data Quality and increase the population & accuracy of key fields. Provided generic pipeline to 100+ Amazon internal teams and 10M+ of data transforming jobs are executed every day.
Designed and developed KPIs for monitoring to help the sales in Subscribe & Save make data-driven decisions

Extract, Transform, Load (ETL)Problem SolvingContinuous Integration and Continuous Delivery (CI/CD)Data EngineeringApache SparkCloud Infrastructure+16

Senior Application Developer

Apr 2023 – Jul 2024 · 1 yr 3 mos

Developed advanced QuickSight dashboards to empower product managers, finance teams, and sales with actionable insights, driving data-informed decision-making and improving customer engagement strategies.
Redesigned the data quality architecture, implementing a streamlined alarms system that significantly enhanced report reliability and accuracy for leadership, and reduced review time.
Optimized SQL data processing, improving report delivery times by 2 hours and ensuring SLAs were consistently met, supporting critical business operations.
Contributed to the development and growth of the team through active mentoring, on-call support, and participating in recruitment efforts, fostering a collaborative and high-performance environment.

Problem SolvingJavaAmazon Web Services (AWS)Big DataData Build Tool (DBT)Computer Programming+4

Application Developer

May 2022 – Mar 2023 · 10 mos

Manages a world wide expansion on 5 marketplaces in Europe region to develop key business performance and control insights through a diverse suite of advanced analytics and data visualization tools. Built a KPIs and metrics to automate evidence gathering and compliance testing processes, reducing manual collection time by ~25%. Developed an advanced analytics tool to actively monitor and evaluate usage of system and services for Developer team of 37 people to reduce their manual hours in the oncall activities by 67% and notify any drop in system CPU usage by 75% beforehand.

Apache FlinkAmazon Web Services (AWS)Big DataApache KafkaGoogle Cloud Platform (GCP)Computer Programming+5

Data Engineer

Promoted

Apr 2021 – May 2022 · 1 yr 1 mo

Collaborated with PE and SDM to conduct an in-depth data mining analysis of customer return patterns. Utilized association rule mining to uncover complex relationships between product attributes and return rates. Applied clustering algorithms to segment customers based on their return behaviors. Implemented decision tree models to identify key factors influencing returns. Conducted time series analysis to detect seasonal trends in return rates. Leveraged anomaly detection techniques to identify unusual return patterns. Analyzed customer purchase sequences using sequential pattern mining. These advanced data mining techniques revealed critical insights, including 80% returns in the last quarter with continued purchase-return cycles this quarter. The analysis also highlighted correlations between price differences, Prime Day sales, and repeat return-purchase behavior. Employed text mining on customer reviews to understand return reasons. Utilized dimensionality reduction techniques like PCA to identify the most impactful features driving returns. The comprehensive findings were incorporated into the PRFAQ by PM and PE, leading to engagement with the Andon team who successfully addressed the high return rates issue by implementing targeted interventions based on the data-driven insights.

Data EngineeringAmazon Web Services (AWS)Big DataComputer ProgrammingPython (Programming Language)SQL+1

Business Intelligence Engineer

Apr 2020 – Apr 2021 · 1 yr

Retail business teams calculates drop-offs for specific numbers of browse nodes and product types, mapping them to root causes and providing detailed insights illustrated with customer behaviors. The main goal of this experiment was to conduct a POC across all product types to select high-opportunity PTs. I implemented an N-gram based approach (with N ranging from 1 to 3) to capture sequential patterns in customer behavior. Utilizing PySpark's MLlib, I engineered features from customer interaction sequences, including search terms, filter applications, and click patterns. I gathered requirements for identifying custom metrics at customer search, click, and purchase levels, implementing them using PySpark's SQL and DataFrame APIs. The PySpark code was optimized to handle 500 MM rows for all PTs and 1 day, utilizing partitioning strategies and broadcast joins to improve performance. I leveraged PySpark's window functions to analyze time-based customer behaviors and RDD operations for complex data transformations. To scale the analysis, I implemented distributed computing techniques using PySpark's cluster computing capabilities. I further increased the dataset to identify patterns and relationships where customers had chosen refinements, using PySpark's ML Pipeline for feature extraction and transformation. The results, processed using PySpark's aggregation functions and UDFs (User Defined Functions), revealed a list of low-performance XX% cluster IDs per customer query. This analysis was integrated into Sherlock+'s data pipeline, specifically in the sampling part, using PySpark's streaming capabilities for real-time processing. The implementation included error handling and data quality checks using PySpark's exception handling and schema validation features.

Data EngineeringTableauComputer ProgrammingRESTful WebServicesBusiness IntelligenceData Analytics

DevOps Engineer III

Apr 2019 – Apr 2020 · 1 yr

Analysed, reviewed and finalised the design on tools implementation, build and deployment automation for 30+ AWS services
Designed DevOps infrastructure from scratch for product lines(Hardlines, Consumables and Softlines) in Amazon
Achieved 99.99% uptime for services on SLA
Frequent testing and incremental releases
Programming language skills: Javascript, Bash,Python
Datastores: MySQL, PostgreSQL, Oracle
CI/CD Deployment pipelines Tools: Ansible, CloudFormation, Docker, Kubernetes
Cloud: AWS

JenkinsData SecurityExtract, Transform, Load (ETL)Continuous Integration and Continuous Delivery (CI/CD)Amazon Web Services (AWS)Data Pipelines+4

Oracle

Senior Member Of Technical Staff

Nov 2017 – Mar 2019 · 1 yr 4 mos · Bengaluru, Karnataka, India

Working on Oracle Exalogic Private Cloud Software
Oracle Exalogic Elastic Cloud is an Engineered System, consisting of software,firmware
and hardware, on which enterprises may deploy Oracle business applications,Oracle
Fusion Middleware or software products provided by Oracle partners.
Installation, configuration and administration of Exalogic Cloud and security system activity, monitoring, reporting and analysis
Design and develop transactional and analytical data structures in Oracle ZFS and Xen platform
Defining strategic technology directions in release - Storage, Backups and Disaster recovery
Created Big Data Framework using Scala, Spark, performed Data Sourcing, Loading and Quality
Build and managed data infrastructure using Automated procedures through Scala, Python & SQL.
Information Security : Work in Host-base security, IPSec, VPN, Encryption, SSL Certifcation
Responsible for Troubleshoot various network problems & system problems in rack and reporting requirements
Working closely with Product and Engineering teams to architect and build custom data integration solutions to meet the utility requirements.
Working on designing ETL as part of the data ingest pipeline for new data spec that captures more granular data from utilities.
Test and provide feedback for the new import pipeline developed by software engineering team to import files that match the new oracle data spec.
Developed a data validation tool (using Python pandas) to automate and improve the data acquisition process and saved time by 48 hours to handle 20 GB to 50 GB data.
Customizing some of the existing Oracle services to meet utility requirements. This involves making config changes, overriding java functions to support custom data scenarios.

Data LoadingDatabase ConsultingApache SparkCloud InfrastructureScalaData Architects+5

Tata consultancy services

2 roles

System Engineer

Dec 2016 – Sep 2017 · 9 mos

Project 1
Client: CVS Pharmacy
Role : Senior Data Analyst
Responsibilities:
Streamlined data from complex sources - Teradata, Netezza databases & SQL queries using Sqoop
Participated in the Analysis, Design and Development Phases of report development & generation, performance tuning and production rollout for every report of Information Technology Department.
Created Technical specifications documents based on the functional design document for the ETL coding to build the data mart, OLTP, OLAP, ODS and AAS Wrote PL/SQL Functions and Stored Procedures to front-end developers to manipulate the database.
Designed migration process of Prod jobs from dev environments by using UNIX.
Performed data integration, data cleaning, manipulation & data validation.

Assistant System Engineer

Aug 2015 – Nov 2016 · 1 yr 3 mos

Project 2
Client: CVS Retail
Role : Data Analyst
Responsibilities:
Extracted data from oracle database and spreadsheets and staged into a single
place and applied business logic to load them in the central oracle database and
Oracle goldengate.
Extensively used Transformations like Router, Aggregator, Normalizer, Joiner,
Expression and Lookup, Update strategy and Sequence generator and Stored
Procedure.
Developed complex mappings in Informatica to load the data from various
sources.Implemented performance tuning logic on targets, sources, mappings,
sessions to provide maximum efficiency and performance.
Created procedures in PL/SQL and coding in Unix shell scripting.
Provide knowledge of data normalization processes
Environment: Informatica Power Center 9.6, Oracle 11g, Toad, HP Quality Center, Windows 7 and MS Office Suite, PL/SQL Developer, Oracle Golden Gate, Netbeans
Achievements:
1. Best Critical Project Holder Employee 2017

Goldman sachs

WeTech Mentoring Program Member

Jan 2015 – Mar 2015 · 2 mos · New Delhi Area, India

Elastic Search, Microservice, MVC

Oil and natural gas corporation ltd

Intern

Jun 2014 – Aug 2014 · 2 mos · New Delhi Area, India

Automation of Database Recovery in Databases and SAP
This project is to develop an automation system for the switchover and switchback process
of Oracle databases to ensure business continuity. This system will allow a seamless,
efficient and smooth transition to a remote site where data is backed up.
Built, tested and deployed scalable, highly available and modular software products.
Drove continual improvement to system architecture by refactoring old legacy code.
Regularly perform health checks of the database
Wrote and implemented scripts to enhance user experience and integrated scripts with the SAP Management Tool.
Implemented backup/restore procedures in ARCHIVELOG mode by using RMAN
Environment: Putty, HP-UX, Oracle 9g, Oracle Data Guard, SAP Basis
Achievements:
1. Best Project in ONGC 2014