Shashank Mishra 🇮🇳

Data Engineer

Bengaluru, Karnataka, India8 yrs 10 mos experience

Most Likely To Switch

Key Highlights

Expert in building scalable data pipelines.
Co-founder of GrowDataSkills for data education.
Active contributor to data engineering community on YouTube.

Stackforce AI infers this person is a Data Engineer specializing in FinTech and Big Data solutions.

Contact

shashank@prophecy.io LinkedIn

Skills

Core Skills

Real-time Data ProcessingData EngineeringData MigrationAws SolutionsData IngestionPipeline OptimizationAutomationFeature EngineeringBig Data SolutionsWeb DevelopmentData Query OptimizationData AnalyticsData Aggregation

Other Skills

AWSAWS CLIAirflowAlgorithmsAmazon Web Services (AWS)Android DevelopmentApache AirflowApache FlinkApache KafkaApache SparkApache SqoopAppFlowAzkabanAzure CloudBig Data

About

Experienced Data Engineer with a demonstrated history of solving complex data problems across various domains like Aviation, Pharmaceutical, FinTech, Telecom, and Employee Services. I've designed and built scalable data pipelines to handle vast amounts of data, both in batch and real-time environments. With a passion for taking ownership, I thrive on collaborating with business teams and stakeholders to drive impactful solutions. Beyond my professional journey, I am dedicated to empowering the next generation of data professionals through GrowDataSkills, a platform I co-founded to provide quality, hands-on learning at the most affordable rates. Since 2020, I've been actively contributing to the data engineering community by creating insightful content on my YouTube channel, E-Learning Bridge, where I share my experiences and knowledge through podcasts and practical lessons. LinkedIn has been instrumental in my growth, and I continue to use it as a platform to share my daily thoughts and ideas ❤️

Experience

8 yrs 10 mos

Total Experience

1 yr 9 mos

Average Tenure

2 yrs 3 mos

Current Experience

Prophecy

Data Engineer

Mar 2024 – Present · 2 yrs 3 mos · Bengaluru, Karnataka, India · Hybrid

Expedia group

Data Engineer - III

Nov 2021 – Feb 2024 · 2 yrs 3 mos · Gurugram, Haryana, India

➡️ One checkout Platform – Real time streaming application for Financial data
✅ Tech Stack – Flink, Python, Kafka, Oracle, Kafka Connect, Docker, AWS, Airflow
Building one checkout platform for different business domains of Expedia i.e; Vrbo, Hotels, Car Rentals, Lodging
Real time streaming platform captures & process financial data for Accounting and helps suppliers to track total amount to
be paid via Auto Pay or Request Pay model
Crafted a scalable streaming solution using Apache Flink & Remote functions to handle ~800k booking streams each day

FlinkPythonKafkaOracleKafka ConnectDocker+4

Amazon

Data Engineer

Mar 2020 – Nov 2021 · 1 yr 8 mos · Bengaluru, Karnataka, India

➡️ Salesforce to Redshift Ingestion - Migration from Informatica to Native AWS
✅ Tech Stack – Salesforce, Informatica, S3, Lambda, Glue, AppFlow, Redshift, SNS
Crafted generic scalable Native AWS solution for Salesforce to Redshift ingestion
It helped to move ingestion pipelines from third party tool Informatica and saved cost for heavy license fee
This generic framework helped other business units for smooth ingestion of newly onboarded Salesforce object into Redshift Datalake
➡️ Incremental Ingestion pipeline – Employee Benefits Data
✅ Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight,PySpark
Build generic & optimized ingestion pipeline for highly critical & confidential Employee Benefits Data
Pipeline is designed in a way to handle GB’s of daily & weekly data together for different use cases like Audit, Payroll, Reimbursement, Education Reimbursement etc
Took complete ownership and worked closely with business teams to understand the requirements & deliver enriching dashboards
➡️ Pipeline Optimization & Enhancement
✅ Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight, PySpark, Lambda
Enhanced & optimized multiple pipelines, built for different business units like Peoplesoft, Audit, AEM, Immigration, Accurate, MyDocs, Background Verification etc
Reduced execution time by 50% and improved alerting system for different edge cases
Leadership principals like Customer Obsession, Earn Trust and Think Big, helped me to keep improving existing systems
➡️ Automated Alerting System for Job Monitoring
✅ Tech Stack – Python, AWS CLI, QuickSight
Created automated alerting system for Redshift load metrics and Job monitoring
It saved 1.5 Hours/day of manual efforts by each team member to monitor & prepare Daily Job Status

SalesforceInformaticaS3LambdaGlueAppFlow+8

Quantumblack, ai by mckinsey

Data Engineer

Dec 2019 – Mar 2020 · 3 mos · Gurgaon, Haryana, India

➡️ Feature Engineering For Telecom Client
✅ Tech Stack – PySpark, Kedro, Azure Cloud, Databricks
Created large scale & optimized pipelines for Telcom data using PySpark & Kedro framework
Worked closely with client in order to get business requirements
Implemented business logics to prepare clean & aggregated data for Customer Churn Analysis

PySparkKedroAzure CloudDatabricksData EngineeringFeature Engineering

Paytm

Software Engineer ( BigData & DWH )

Jan 2019 – Dec 2019 · 11 mos · Noida Area, India

➡️ Data Ingestion & Sync Process
✅ Tech Stack – Python, Hive, ElasticSearch, Scala Play Framework, SBT, EMR, Lambda, DynamoDB, Azkaban, Jenkins
Crafted data-sync logic by prioritizing datasets (High/Medium/Low tag) based upon criticality to meet SLO
Built premption logic to prioritize highly critical datasets when multiple low priority sync processes are running
Designed Rest API in data ingestion for retention of GA data in order to optimize cluster space
Added exception handling scenarios in data sync logic to fix multiple bugs
Fix for missing PG data from Kafka for UMP panel - Created a new pipeline to ingest missing data from HDFS to ElasticSearch in case of cluster failure
➡️ Near Real Time Data Pipeline – POC
✅ Tech Stack – Java, Spark, Kafka, Datastax Cassandra, Datastax studio, Zookeeper, Maven
Crafted a Cassandra based real time ingestion pipeline for marketplace data in order to help DWH team to reduce request load from production MySQL. The Objective was to shift business users from production, to overcome data leaks & security issues
Interacted with different business users to know about their use cases, ingestion tables, PII data and built data models accordingly for faster insertion/updation of data
Setup web interface Datastax Studio for users to query real time data from Cassandra using LDAP authentication
➡️ Hive Query Parser
✅ Tech Stack – Django, Python, NGINX
Created a Django web application to parse and validate user's hive queries.
In case of a bad query (missing partition columns/unbalanced joins), it also provides suggestions to improve the query - PII detector – Built a Django web application to detect all running hive queries which are fetching PII data.

PythonHiveElasticSearchScala Play FrameworkSBTEMR+6

Operasolutions

3 roles

Software Engineer - II ( BigData & Analytics )

Dec 2018 – Jan 2019 · 1 mo

➡️ Procurement Spend Optimization (Pharmaceutical)
Developed CXO-level insights engine to manage USD 60Bn; engine enabled cost optimization using smart categorisation, benchmarking and anomaly detection
Crafted a Big Data based solution; organised structured & unstructured data
Built solution using Hadoop Ecosystem (HDFS, YARN), Spark and Python
Built a google translator API based solution to automate legacy translation engine; improved record aggregation accuracy by 50% and saved team 120 hours/month

HadoopSparkPythonBig Data SolutionsData Analytics

Software Engineer-1 ( BigData & Analytics )

Jul 2017 – Nov 2018 · 1 yr 4 mos

➡️ Trip Narrative Platform (Aviation)
Deployed an end to end solution for a leading US airlines; Aggregated a 360 view of customer's engagement throughout the life-cycle of the trip
Developed data pipelines from scratch; optimised data aggregation from 10+ independent sources and automated the ETL process to roll out the solution
The solution powers a web application; used by 1000+ CSRs and decision makers
Built application on RESTFUL API`s using Hadoop Ecosystem (HDFS, YARN), DataRush Applications (Distributed Processing Engine), SQL and Python

HadoopDataRush ApplicationsSQLPythonData EngineeringData Aggregation