Soumitra Shukla

Data Engineer

Bengaluru, Karnataka, India4 yrs 9 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in building scalable data pipelines.
Proficient in Python and Django for backend development.
Strong experience in data warehousing and ETL processes.

Stackforce AI infers this person is a Fintech Data Engineer with expertise in building scalable data solutions.

Contact

Skills

Core Skills

PythonDjangoData WarehousingEtlAws GlueData Engineering

Other Skills

AWS S3Adobe PhotoshopAirflowAmazon Elastic MapReduce (EMR)Amazon S3Apache AirflowApache IcebergCreative EntrepreneurshipDatabase tuningDjango REST FrameworkHTMLIntegrationKafkaKerasLeadership

About

I am a competent developer well versed in Algos and Data Struc. (Google Foobar level 3) Confident enough to show my skills in the interview process. Backend tech stack : Django, python Distributed tech stack : Kafka, Pyspark, Apache Iceberg, Apache Airflow . languages : Java and Python

Experience

4 yrs 9 mos

Total Experience

4 yrs 9 mos

Average Tenure

4 yrs 9 mos

Current Experience

Bright money

3 roles

SDE - II / Data Engineer

Promoted

Oct 2022 – Present · 3 yrs 7 mos

Data Warehousing solution: [ Debezium, Kafka Connect, Pyspark , Django]
Designing and implementing Kafka Connect Transforms for Data Enrichment
Designing and implementing Data Pipelines and Data Cubes
ETL pipelines on Iceberg tables for better indexing , compaction and queries.
Data validation and missing data computation for CDC replication flows of tables.
Implemented Buisness Specific wrapper on Databricks Iceberg Sink
Designing + Implementing microservice for Credit Report Data aggregation in realtime and batch.
Integration of Transunion credit report products ,with metro2 handling.
refresh pipelines to maintain 2.5M users data for downstream recency using Airflow Jobs
Did extensive analysis on data flow and product usage to help upgrade pipelines and helped for cost
optimisations
Data Backfill and Enrichment (ETL jobs):
Account + User Data Enrichment and Backfill for 2M accounts from a monolith to 9 microservices
[using pandas, Airflow , AWS S3]
Transactions data for 502M transactions
[using Pyspark , AWS EMR , AWS Glue Jobs , AWS S3, Airflow]

DjangoPythonKafkaPysparkApache IcebergApache Airflow+2

SDE

Aug 2021 – Oct 2022 · 1 yr 2 mos

Implementation + Ownership for end to end service of Account aggregation as 16 different microservices to support fetching financial data in realtime and batch.
Building scalable refresh pipelines for keeping 4M accounts up to date in a short period using
Airflow jobs.
Developing AWS Glue Jobs (PySpark) for providing data to various other services via S3.
Integration with Aggregators: Plaid, Fiserv, Capital One (Direct Integration), Finicity, Teller
Database and query-plans tuning for large tables with >2B records
Designing + Implementing service for Identity (KYC) Data validation in realtime.
Integration with products like LexisNexis