Ashish sh

Data Engineer

Bengaluru, Karnataka, India13 yrs experience

Most Likely To SwitchHighly Stable

Key Highlights

Led Hadoop to GCP migration saving $8M+ annually.
Built real-time fraud detection pipelines reducing latency to <8 seconds.
Established best practices for data quality and CI/CD.

Stackforce AI infers this person is a Data Engineering expert in Fintech and Consulting sectors.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingReal-time ProcessingCloud MigrationData AnalyticsEtl

Other Skills

AirflowApache BeamApache ImpalaApache KafkaApache NiFiApache NifiApache SparkApache Spark StreamingBigQueryBigtableBlockchainBlockchain AnalysisBlockchain ArchitectureCassandraCore Java

About

Staff Data Engineer @ PayPal (MTS1) | Ex-EY Global Delivery I own petabyte-scale batch & real-time data platforms that power fraud detection, risk analytics data analytics , and reporting for 400M+ global users. Key impact at PayPal: • Led Hadoop → GCP migration → $8M+ annual compute savings, 99.99% uptime • Built real-time fraud pipelines (Kafka → Bigtable → BigQuery) → latency from minutes to <8 seconds • Established SLIs/SLOs across 50+ pipelines → SLA 74% → 99.2% • Additional $3.5M/year savings via resource right-sizing & optimization • Tech lead for 10 engineers; built Spark/GCP Center-of-Excellence (org-wide) Previously delivered $2–10M data platform engagements as Senior Consultant (Manager-equivalent) at EY and led a 5+ PB Teradata-to-GCP migration at Capgemini. Core stack: Spark (Scala | PySpark) • Kafka • BigQuery • Dataflow • Airflow • Bigtable • Pub/Sub • Terraform • Kubernetes Google Cloud Professional Data Engineer | Cloud Architect | ML Engineer Databricks Spark | Terraform Associate

Experience

Paypal

MTS1

Sep 2022 – Present · 3 yrs 6 mos · Bengaluru, Karnataka, India · On-site

Staff Data Engineer (MTS 1) — PayPal
Key Responsibilities & Impact:
Owned and led the design & delivery of petabyte-scale batch & near-real-time data platforms processing 400 M+ daily transactions using Spark (Scala/Python), Airflow, and BigQuery.
Architected reusable data framework adopted by 15+ teams, reducing pipeline development time by 50 % and eliminating duplicate work across org.
Drove performance initiatives that cut average ETL runtime 35 % (saving ~$1.2 M annually in compute) and reduced data-latency SLAs from hours to < 30 min.
Served as tech lead for cross-functional initiatives with Risk, Fraud, and Finance — delivering critical data sets that directly influenced $X00 M fraud-detection models.
Mentored junior engineers and established best practices for schema evolution, data quality, and CI/CD in the Hadoop → GCP migration.
Primary on-call owner for mission-critical pipelines with 99.99 % uptime.

Spark (Scala/Python)AirflowBigQueryGCPHadoopData Engineering+1

Ey

Big Data Consultant

Jun 2021 – Aug 2022 · 1 yr 2 mos · Remote

Led end-to-end delivery of petabyte-scale data platforms for Fortune-500 clients in banking, retail, and telecom on GCP and on-prem Hadoop clusters.
Architected and implemented high-throughput Spark (Scala/Python) pipelines processing 100 TB+ daily, achieving 99.99 % SLA and reducing client processing windows from days to hours.
Designed reusable data ingestion & transformation frameworks adopted by 8+ concurrent client engagements, cutting average project delivery time by 40 %.
Owned technical solution design and client stakeholder relationships on $2–10 M deals — from requirements to production handover.
Introduced modern data stack practices (Airflow orchestration, BigQuery + dbt, schema registry, CI/CD for data) that became EY’s internal gold standard for 2020–2022 deliveries.
Mentored 20+ mid/junior consultants and conducted internal Spark/GCP training bootcamps

Spark (Scala/Python)GCPAirflowData EngineeringCloud Computing

Capgemini

Lead Data Engineer – Teradata-to-GCP Migration

Jul 2019 – Jun 2021 · 1 yr 11 mos · On-site

Owned end-to-end design & delivery of a multi-petabyte Teradata-to-GCP migration for a global banking client ,migrated 8 PB+ of risk & finance data with zero business downtime.
Architected and implemented greenfield modern data platform on GCP using BigQuery, Dataflow, Dataproc, Cloud Composer (Airflow), and Apache Nifi , replacing 15-year-old legacy EDW.
Built 200+ high-throughput ingestion & transformation pipelines processing 50 TB+ daily, achieving 99.99 % SLA .
Led a cross-vendor team of 25 engineers (Capgemini + client + Google PS); established CI/CD, schema registry, and data-quality frameworks that became the client’s global template.
Drove performance tuning initiatives that accelerated query latency from hours to < 10 seconds on critical regulatory reports.
Primary liaison between business stakeholders, Teradata COE, and Google Cloud.
Tech Stack:
GCP (BigQuery • Dataflow • Dataproc • Composer • Pub/Sub), Apache Gobblin, Flume, Hive, Hadoop, Teradata, Python, Scala, Terraform

GCPBigQueryDataflowApache NifiData EngineeringCloud Migration

Zebra technologies

Data Engineer

Apr 2018 – Jul 2019 · 1 yr 3 mos · India

Designed and delivered high-volume Spark (Scala + PySpark) pipelines processing 5–10 TB of daily transaction data for fraud analytics and regulatory reporting.
Built end-to-end data flows from raw ingestion (Cassandra → Hadoop) → transformation (Spark Core + Spark SQL) → publishing to Hive/Elasticsearch for downstream BI and data-science teams.
Owned critical production pipelines with daily runs serving 200+ analysts and ML models, achieving 99.98 % reliability and sub-6-hour end-to-end latency.
Implemented performance optimizations (partition tuning, broadcast joins, caching) that reduced average job runtime by 55 % .
Introduced code modularization and unit-testing standards that became the team template for all new Spark workloads.
Technologies: Apache Spark (Scala | PySpark), Hadoop, Hive, HDFS, Cassandra, Elasticsearch, Apache Hue, SQL, Python

Spark (Scala/PySpark)CassandraHadoopElasticsearchData EngineeringData Analytics

Accenture

Data Engineer (Associate → Analyst)

Feb 2013 – Apr 2018 · 5 yrs 2 mos · India · On-site

Started career as SQL Server developer maintaining enterprise data warehouses and complex ETL workflows for Fortune-100 clients in banking and insurance.
Self-taught and pioneered the team’s adoption of big-data technologies (Hadoop, Hive, HDFS) during the 2016–2018 wave — built the first Hive-based analytical datasets that replaced slow SQL Server reporting jobs.
Designed and delivered 20+ high-impact SQL → Hive migrations, cutting average monthly report generation time from 18 hours to < 2 hours.
Owned end-to-end delivery of critical regulatory and risk reports serving C-level stakeholders, consistently meeting 100 % of SLA deadlines.
Mentored new hires on SQL performance tuning and became the go-to person for complex joins and query optimization.
Technologies: Microsoft SQL Server, T-SQL, SSIS, Hadoop, Hive, HDFS, Tableau.

SQL ServerHadoopHiveData EngineeringETL