S

Shashank Mishra 🇮🇳

Data Engineer

Bengaluru, Karnataka, India8 yrs 10 mos experience
Most Likely To Switch

Key Highlights

  • Expert in building scalable data pipelines.
  • Co-founder of GrowDataSkills for data education.
  • Active contributor to data engineering community on YouTube.
Stackforce AI infers this person is a Data Engineer specializing in FinTech and Big Data solutions.

Contact

Skills

Core Skills

Real-time Data ProcessingData EngineeringData MigrationAws SolutionsData IngestionPipeline OptimizationAutomationFeature EngineeringBig Data SolutionsWeb DevelopmentData Query OptimizationData AnalyticsData Aggregation

Other Skills

AWSAWS CLIAirflowAlgorithmsAmazon Web Services (AWS)Android DevelopmentApache AirflowApache FlinkApache KafkaApache SparkApache SqoopAppFlowAzkabanAzure CloudBig Data

About

Experienced Data Engineer with a demonstrated history of solving complex data problems across various domains like Aviation, Pharmaceutical, FinTech, Telecom, and Employee Services. I've designed and built scalable data pipelines to handle vast amounts of data, both in batch and real-time environments. With a passion for taking ownership, I thrive on collaborating with business teams and stakeholders to drive impactful solutions. Beyond my professional journey, I am dedicated to empowering the next generation of data professionals through GrowDataSkills, a platform I co-founded to provide quality, hands-on learning at the most affordable rates. Since 2020, I've been actively contributing to the data engineering community by creating insightful content on my YouTube channel, E-Learning Bridge, where I share my experiences and knowledge through podcasts and practical lessons. LinkedIn has been instrumental in my growth, and I continue to use it as a platform to share my daily thoughts and ideas ❤️

Experience

8 yrs 10 mos
Total Experience
1 yr 9 mos
Average Tenure
2 yrs 3 mos
Current Experience

Prophecy

Data Engineer

Mar 2024Present · 2 yrs 3 mos · Bengaluru, Karnataka, India · Hybrid

Expedia group

Data Engineer - III

Nov 2021Feb 2024 · 2 yrs 3 mos · Gurugram, Haryana, India

  • ➡️ One checkout Platform – Real time streaming application for Financial data
  • ✅ Tech Stack – Flink, Python, Kafka, Oracle, Kafka Connect, Docker, AWS, Airflow
  • Building one checkout platform for different business domains of Expedia i.e; Vrbo, Hotels, Car Rentals, Lodging
  • Real time streaming platform captures & process financial data for Accounting and helps suppliers to track total amount to
  • be paid via Auto Pay or Request Pay model
  • Crafted a scalable streaming solution using Apache Flink & Remote functions to handle ~800k booking streams each day
FlinkPythonKafkaOracleKafka ConnectDocker+4

Amazon

Data Engineer

Mar 2020Nov 2021 · 1 yr 8 mos · Bengaluru, Karnataka, India

  • ➡️ Salesforce to Redshift Ingestion - Migration from Informatica to Native AWS
  • ✅ Tech Stack – Salesforce, Informatica, S3, Lambda, Glue, AppFlow, Redshift, SNS
  • Crafted generic scalable Native AWS solution for Salesforce to Redshift ingestion
  • It helped to move ingestion pipelines from third party tool Informatica and saved cost for heavy license fee
  • This generic framework helped other business units for smooth ingestion of newly onboarded Salesforce object into Redshift Datalake
  • ➡️ Incremental Ingestion pipeline – Employee Benefits Data
  • ✅ Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight,PySpark
  • Build generic & optimized ingestion pipeline for highly critical & confidential Employee Benefits Data
  • Pipeline is designed in a way to handle GB’s of daily & weekly data together for different use cases like Audit, Payroll, Reimbursement, Education Reimbursement etc
  • Took complete ownership and worked closely with business teams to understand the requirements & deliver enriching dashboards
  • ➡️ Pipeline Optimization & Enhancement
  • ✅ Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight, PySpark, Lambda
  • Enhanced & optimized multiple pipelines, built for different business units like Peoplesoft, Audit, AEM, Immigration, Accurate, MyDocs, Background Verification etc
  • Reduced execution time by 50% and improved alerting system for different edge cases
  • Leadership principals like Customer Obsession, Earn Trust and Think Big, helped me to keep improving existing systems
  • ➡️ Automated Alerting System for Job Monitoring
  • ✅ Tech Stack – Python, AWS CLI, QuickSight
  • Created automated alerting system for Redshift load metrics and Job monitoring
  • It saved 1.5 Hours/day of manual efforts by each team member to monitor & prepare Daily Job Status
SalesforceInformaticaS3LambdaGlueAppFlow+8

Quantumblack, ai by mckinsey

Data Engineer

Dec 2019Mar 2020 · 3 mos · Gurgaon, Haryana, India

  • ➡️ Feature Engineering For Telecom Client
  • ✅ Tech Stack – PySpark, Kedro, Azure Cloud, Databricks
  • Created large scale & optimized pipelines for Telcom data using PySpark & Kedro framework
  • Worked closely with client in order to get business requirements
  • Implemented business logics to prepare clean & aggregated data for Customer Churn Analysis
PySparkKedroAzure CloudDatabricksData EngineeringFeature Engineering

Paytm

Software Engineer ( BigData & DWH )

Jan 2019Dec 2019 · 11 mos · Noida Area, India

  • ➡️ Data Ingestion & Sync Process
  • ✅ Tech Stack – Python, Hive, ElasticSearch, Scala Play Framework, SBT, EMR, Lambda, DynamoDB, Azkaban, Jenkins
  • Crafted data-sync logic by prioritizing datasets (High/Medium/Low tag) based upon criticality to meet SLO
  • Built premption logic to prioritize highly critical datasets when multiple low priority sync processes are running
  • Designed Rest API in data ingestion for retention of GA data in order to optimize cluster space
  • Added exception handling scenarios in data sync logic to fix multiple bugs
  • Fix for missing PG data from Kafka for UMP panel - Created a new pipeline to ingest missing data from HDFS to ElasticSearch in case of cluster failure
  • ➡️ Near Real Time Data Pipeline – POC
  • ✅ Tech Stack – Java, Spark, Kafka, Datastax Cassandra, Datastax studio, Zookeeper, Maven
  • Crafted a Cassandra based real time ingestion pipeline for marketplace data in order to help DWH team to reduce request load from production MySQL. The Objective was to shift business users from production, to overcome data leaks & security issues
  • Interacted with different business users to know about their use cases, ingestion tables, PII data and built data models accordingly for faster insertion/updation of data
  • Setup web interface Datastax Studio for users to query real time data from Cassandra using LDAP authentication
  • ➡️ Hive Query Parser
  • ✅ Tech Stack – Django, Python, NGINX
  • Created a Django web application to parse and validate user's hive queries.
  • In case of a bad query (missing partition columns/unbalanced joins), it also provides suggestions to improve the query - PII detector – Built a Django web application to detect all running hive queries which are fetching PII data.
PythonHiveElasticSearchScala Play FrameworkSBTEMR+6

Operasolutions

3 roles

Software Engineer - II ( BigData & Analytics )

Dec 2018Jan 2019 · 1 mo

  • ➡️ Procurement Spend Optimization (Pharmaceutical)
  • Developed CXO-level insights engine to manage USD 60Bn; engine enabled cost optimization using smart categorisation, benchmarking and anomaly detection
  • Crafted a Big Data based solution; organised structured & unstructured data
  • Built solution using Hadoop Ecosystem (HDFS, YARN), Spark and Python
  • Built a google translator API based solution to automate legacy translation engine; improved record aggregation accuracy by 50% and saved team 120 hours/month
HadoopSparkPythonBig Data SolutionsData Analytics

Software Engineer-1 ( BigData & Analytics )

Jul 2017Nov 2018 · 1 yr 4 mos

  • ➡️ Trip Narrative Platform (Aviation)
  • Deployed an end to end solution for a leading US airlines; Aggregated a 360 view of customer's engagement throughout the life-cycle of the trip
  • Developed data pipelines from scratch; optimised data aggregation from 10+ independent sources and automated the ETL process to roll out the solution
  • The solution powers a web application; used by 1000+ CSRs and decision makers
  • Built application on RESTFUL API`s using Hadoop Ecosystem (HDFS, YARN), DataRush Applications (Distributed Processing Engine), SQL and Python
HadoopDataRush ApplicationsSQLPythonData EngineeringData Aggregation

Software Intern

Jan 2017Jun 2017 · 5 mos

Education

Motilal Nehru National Institute Of Technology

Master of Computer Applications (M.C.A.) — Computer Science and Engineering

Jan 2014Jan 2017

University of Lucknow

B.SC in Computer Science

Jan 2011Jan 2014

Stackforce found 100+ more professionals with Real-time Data Processing & Data Engineering

Explore similar profiles based on matching skills and experience