Ashish sh

Data Engineer

Bengaluru, Karnataka, India13 yrs experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Led Hadoop to GCP migration saving $8M+ annually.
  • Built real-time fraud detection pipelines reducing latency to <8 seconds.
  • Established best practices for data quality and CI/CD.
Stackforce AI infers this person is a Data Engineering expert in Fintech and Consulting sectors.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingReal-time ProcessingCloud MigrationData AnalyticsEtl

Other Skills

AirflowApache BeamApache ImpalaApache KafkaApache NiFiApache NifiApache SparkApache Spark StreamingBigQueryBigtableBlockchainBlockchain AnalysisBlockchain ArchitectureCassandraCore Java

About

Staff Data Engineer @ PayPal (MTS1) | Ex-EY Global Delivery I own petabyte-scale batch & real-time data platforms that power fraud detection, risk analytics data analytics , and reporting for 400M+ global users. Key impact at PayPal: • Led Hadoop → GCP migration → $8M+ annual compute savings, 99.99% uptime • Built real-time fraud pipelines (Kafka → Bigtable → BigQuery) → latency from minutes to <8 seconds • Established SLIs/SLOs across 50+ pipelines → SLA 74% → 99.2% • Additional $3.5M/year savings via resource right-sizing & optimization • Tech lead for 10 engineers; built Spark/GCP Center-of-Excellence (org-wide) Previously delivered $2–10M data platform engagements as Senior Consultant (Manager-equivalent) at EY and led a 5+ PB Teradata-to-GCP migration at Capgemini. Core stack: Spark (Scala | PySpark) • Kafka • BigQuery • Dataflow • Airflow • Bigtable • Pub/Sub • Terraform • Kubernetes Google Cloud Professional Data Engineer | Cloud Architect | ML Engineer Databricks Spark | Terraform Associate

Experience

Paypal

MTS1

Sep 2022Present · 3 yrs 6 mos · Bengaluru, Karnataka, India · On-site

  • Staff Data Engineer (MTS 1) — PayPal
  • Key Responsibilities & Impact:
  • Owned and led the design & delivery of petabyte-scale batch & near-real-time data platforms processing 400 M+ daily transactions using Spark (Scala/Python), Airflow, and BigQuery.
  • Architected reusable data framework adopted by 15+ teams, reducing pipeline development time by 50 % and eliminating duplicate work across org.
  • Drove performance initiatives that cut average ETL runtime 35 % (saving ~$1.2 M annually in compute) and reduced data-latency SLAs from hours to < 30 min.
  • Served as tech lead for cross-functional initiatives with Risk, Fraud, and Finance — delivering critical data sets that directly influenced $X00 M fraud-detection models.
  • Mentored junior engineers and established best practices for schema evolution, data quality, and CI/CD in the Hadoop → GCP migration.
  • Primary on-call owner for mission-critical pipelines with 99.99 % uptime.
Spark (Scala/Python)AirflowBigQueryGCPHadoopData Engineering+1

Ey

Big Data Consultant

Jun 2021Aug 2022 · 1 yr 2 mos · Remote

  • Led end-to-end delivery of petabyte-scale data platforms for Fortune-500 clients in banking, retail, and telecom on GCP and on-prem Hadoop clusters.
  • Architected and implemented high-throughput Spark (Scala/Python) pipelines processing 100 TB+ daily, achieving 99.99 % SLA and reducing client processing windows from days to hours.
  • Designed reusable data ingestion & transformation frameworks adopted by 8+ concurrent client engagements, cutting average project delivery time by 40 %.
  • Owned technical solution design and client stakeholder relationships on $2–10 M deals — from requirements to production handover.
  • Introduced modern data stack practices (Airflow orchestration, BigQuery + dbt, schema registry, CI/CD for data) that became EY’s internal gold standard for 2020–2022 deliveries.
  • Mentored 20+ mid/junior consultants and conducted internal Spark/GCP training bootcamps
Spark (Scala/Python)GCPAirflowData EngineeringCloud Computing

Capgemini

Lead Data Engineer – Teradata-to-GCP Migration

Jul 2019Jun 2021 · 1 yr 11 mos · On-site

  • Owned end-to-end design & delivery of a multi-petabyte Teradata-to-GCP migration for a global banking client ,migrated 8 PB+ of risk & finance data with zero business downtime.
  • Architected and implemented greenfield modern data platform on GCP using BigQuery, Dataflow, Dataproc, Cloud Composer (Airflow), and Apache Nifi , replacing 15-year-old legacy EDW.
  • Built 200+ high-throughput ingestion & transformation pipelines processing 50 TB+ daily, achieving 99.99 % SLA .
  • Led a cross-vendor team of 25 engineers (Capgemini + client + Google PS); established CI/CD, schema registry, and data-quality frameworks that became the client’s global template.
  • Drove performance tuning initiatives that accelerated query latency from hours to < 10 seconds on critical regulatory reports.
  • Primary liaison between business stakeholders, Teradata COE, and Google Cloud.
  • Tech Stack:
  • GCP (BigQuery • Dataflow • Dataproc • Composer • Pub/Sub), Apache Gobblin, Flume, Hive, Hadoop, Teradata, Python, Scala, Terraform
GCPBigQueryDataflowApache NifiData EngineeringCloud Migration

Zebra technologies

Data Engineer

Apr 2018Jul 2019 · 1 yr 3 mos · India

  • Designed and delivered high-volume Spark (Scala + PySpark) pipelines processing 5–10 TB of daily transaction data for fraud analytics and regulatory reporting.
  • Built end-to-end data flows from raw ingestion (Cassandra → Hadoop) → transformation (Spark Core + Spark SQL) → publishing to Hive/Elasticsearch for downstream BI and data-science teams.
  • Owned critical production pipelines with daily runs serving 200+ analysts and ML models, achieving 99.98 % reliability and sub-6-hour end-to-end latency.
  • Implemented performance optimizations (partition tuning, broadcast joins, caching) that reduced average job runtime by 55 % .
  • Introduced code modularization and unit-testing standards that became the team template for all new Spark workloads.
  • Technologies: Apache Spark (Scala | PySpark), Hadoop, Hive, HDFS, Cassandra, Elasticsearch, Apache Hue, SQL, Python
Spark (Scala/PySpark)CassandraHadoopElasticsearchData EngineeringData Analytics

Accenture

Data Engineer (Associate → Analyst)

Feb 2013Apr 2018 · 5 yrs 2 mos · India · On-site

  • Started career as SQL Server developer maintaining enterprise data warehouses and complex ETL workflows for Fortune-100 clients in banking and insurance.
  • Self-taught and pioneered the team’s adoption of big-data technologies (Hadoop, Hive, HDFS) during the 2016–2018 wave — built the first Hive-based analytical datasets that replaced slow SQL Server reporting jobs.
  • Designed and delivered 20+ high-impact SQL → Hive migrations, cutting average monthly report generation time from 18 hours to < 2 hours.
  • Owned end-to-end delivery of critical regulatory and risk reports serving C-level stakeholders, consistently meeting 100 % of SLA deadlines.
  • Mentored new hires on SQL performance tuning and became the go-to person for complex joins and query optimization.
  • Technologies: Microsoft SQL Server, T-SQL, SSIS, Hadoop, Hive, HDFS, Tableau.
SQL ServerHadoopHiveData EngineeringETL

Education

ABES Engineering College

Bachelor of Technology (BTech) — Computer Science

Jan 2008Jan 2012

Stackforce found 100+ more professionals with Data Engineering & Cloud Computing

Explore similar profiles based on matching skills and experience