Shubhashis Sinha

Data Engineer

Bengaluru, Karnataka, India10 yrs 7 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

Expert in Data Engineering and Data Science
Led innovative Generative AI projects
Proficient in real-time data processing with PySpark

Stackforce AI infers this person is a Big Data and AI specialist in the Healthcare sector.

Contact

Skills

Core Skills

Data EngineeringGenerative AiBig Data Engineering

Other Skills

AWS BedrockAWS SageMakerAmazon RedshiftAmazon Web Services (AWS)AnalyticsApache HudiApache KafkaApache NiFiApache SparkApache Spark StreamingArtificial Intelligence (AI)AutomationBig DataContinuous Integration and Continuous Delivery (CI/CD)Data Analytics

About

I am a seasoned Senior Data Engineer at GE Healthcare, currently contributing to the Central Data Engineering team. In my current role, I have the unique opportunity to wear dual hats as both a Data Engineer and a Data Scientist, combining my skills to tackle diverse data challenges. As a Data Engineer, I've been at the forefront of developing automation accelerators, creating frameworks, and pioneering innovative patterns to address intricate data engineering issues. I am also leading the Low-Code/No-Code (LCNC) initiative, driving innovation and efficiency in data engineering processes. My expertise extends to real-time data processing with PySpark and Kafka, managing unstructured data, and the end-to-end development of data pipelines. Simultaneously, in my role as a Data Scientist, I am actively involved in building multiple Generative AI solutions for internal use cases, leveraging advanced capabilities of LLMs with platforms such as Bedrock and Co-Pilot Studio. Alongside this, I continue to utilize enterprise data for tasks such as forecasting, classification, anomaly detection, and predictive modeling. Collaborating closely with cross-functional teams, I focus on translating algorithms into practical, impactful solutions, further enriching my knowledge in the data analytics domain.

Experience

10 yrs 7 mos

Total Experience

1 yr 8 mos

Average Tenure

4 yrs 5 mos

Current Experience

Ge healthcare

2 roles

Staff Data Engineer

May 2025 – Present · 1 yr · Bengaluru, Karnataka, India

Senior Data Engineer

Dec 2021 – Aug 2025 · 3 yrs 8 mos · Bengaluru, Karnataka, India

Summary
As a Senior Data Engineer at GE Healthcare's Central Data Engineering team, I bring a unique blend of expertise in data engineering and data science. My work involves contributing to Generative AI development within the team, designing and implementing transformative solutions for internal use cases. By combining data engineering excellence with advanced data analytics, I help deliver impactful insights and build automation frameworks that drive innovation across healthcare domains.
Responsibilities
Generative AI Development: Building cutting-edge solutions using LLMs with tools like Bedrock and Co-Pilot Studio to address diverse internal use cases.
LCNC Leadership: Spearheading the Low-Code/No-Code (LCNC) initiative to drive innovation and efficiency in data engineering processes.
Data Engineering Solutions: Designing accelerators, frameworks, and patterns to automate processes, reduce costs, and address complex data engineering challenges.
Real-Time Data Processing: Handling unstructured data and building real-time processing pipelines using PySpark and Kafka.
End-to-End Pipeline Development: Developing and optimizing pipelines to provide value-added services at every stage of data processing.
Data Analytics Expertise: Implementing forecasting, classification, prediction, and anomaly detection solutions while analyzing customer behavior and system upgrades.
Machine Learning Models: Developing, training, and deploying models using AWS Sagemaker and Python.
Cross-Functional Collaboration: Partnering with engineers to translate algorithms into practical, scalable products and services.
AWS Solutions: Architecting and implementing complex solutions leveraging AWS services like Glue, Step Functions, Lambda, and Redshift.
Tech Stack
Programming: Python, SQL, Polars.
Data Processing: Kafka, PySpark
AWS Services: Glue, Step Function, Lambda, Athena, Redshift, RDS, Opensearch
Gen AI : Sagemaker, Bedrock, Co-Pilot Studio

Predictive AnalyticsData FabricExtract, Transform, Load (ETL)AWS SageMakerNatural Language Processing (NLP)MLOps+21

Bank of america

Big Data Engineer

Jun 2021 – Dec 2021 · 6 mos

I am part of The Non Financial Regulatory Reporting (NFRR) which is responsible for the consistent application of interpretation, data sourcing, preparation, governance and oversight of regulatory reports that are not produced within the CFO function. The regulatory reporting group within the NFRR center is responsible for aligning all NFRR reports to the appropriate target state which potentially includes technology alignment, governance, control framework, preparation, and submission. NFRR Reports are aligned to all lines of business within the bank and includes Markets, Consumer, Anti-Money Laundering, Compliance and Human Resources.
Responsibilities -
1.Buidling new reports and flow using Pyspark
2.Enhancing exisisting Applications using Pyspark
3.Optimized the Pyspark jobs to run on Yarn Cluster for faster data processing
4.Design and develop ETL integration patterns using Python on Spark.
5.Translate business requirements into maintainable software components and understand impact (Technical and Business).
6.Made sure that quality standards are defined and met.
7.Implement CICD(Continuous Integration and Continuous Development) pipeline for Code Deployment
8.Reviewed components developed by the team members

HadoopExtract, Transform, Load (ETL)TensorFlowApache KafkaData AnalyticsAmazon Web Services (AWS)+11

Capital one

Data Engineer

Oct 2018 – May 2021 · 2 yrs 7 mos

Was part of building the Spark Data Pipeline for Loan Servicing, Marketing Horizontals, and Sales Originations.
Developing Spark scripts [Pyspark]as per client requirements
Carrying out real-time in-depth data analytics using Spark Streaming
Importing data from multiple sources such as AWS S3 into Spark DataFrame.
Hands-on experience working with Amazon Web Services like S3 ,EMR and EC2
Extracting features from data sets using Spark SQL
Using Avro and other data formats to store in HDFS
Worked in AWS environment for development and deployment of custom Spark applications.
Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
Coordinating with multiple teams spread across the country and the globe.
Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
Worked on Implementation of Snowflake using different Data Models
Ecosystems Used - Apache Spark[Batch & Straming], Snowflake, AWS, MongoDB
Language Used - Python
Architechture - Lambda Implementation
Others- Git,Jenkins, Jira

HadoopExtract, Transform, Load (ETL)Data AnalyticsAmazon Web Services (AWS)HiveApache Spark Streaming+6

Deutsche bank

Big Data Entwickler

Aug 2017 – Jul 2018 · 11 mos · Pune, Maharashtra, India

Worked on building and supporting Applications related to Trading .
Worked on processing & analyzing Huge Amount of Data .
Ecosystems Used - Hive, Apache Spark, Impala , mongoDB, PySpark
Language Used - Java, Scala, Python

JavaHadoopExtract, Transform, Load (ETL)HiveBig DataApache Spark+2

Commonwealth bank

Data Engineer

Jun 2016 – Jul 2017 · 1 yr 1 mo · Bengaluru, Karnataka, India

Worked on the implementation of Big data in Lending.
Worked on building different modules based on requirement
Ecosystems Used - Hive, Pig, Sqoop, Flume, Map Reduce, Apache Spark
Language Used - Java,Scala

JavaHadoopExtract, Transform, Load (ETL)HiveApache SparkData Structures+1

Hcl technologies

2 roles

Junior Hadoop Developer

Apr 2015 – May 2016 · 1 yr 1 mo

Responsibility - Development & Analysis of Customer-related Data along with Clickstream data i.e Network Details using Pig,Hive, HBase,Sqoop etc.

JavaHadoop

Associate Software Engineer

Feb 2015 – Mar 2015 · 1 mo

Initially, got trained in Big Data as a Fresher.

JavaHadoop

T-mobile

IT‐Berater für Big Data-Entwicklung

Apr 2015 – May 2016 · 1 yr 1 mo

Worked as Junior Hadoop Developer role.
Worked on Click Stream Data related to Org and break them down into information to use it as Future decision making, product launch Purpose.
Apart from that , worked on customer related info to find out patterns and other analysis.
Ecosystems Used - Hive, Pig, Hbase, Map Reduce
Language Used - Java

JavaHadoopHiveData Structures