Yash Raut

Data Engineer

Bengaluru, Karnataka, India4 yrs 10 mos experience
Highly StableAI Enabled

Key Highlights

  • Expert in building scalable data pipelines.
  • Proficient in cloud platforms like GCP and Azure.
  • Strong background in machine learning and data analytics.
Stackforce AI infers this person is a Data Engineering expert in the Fintech industry.

Contact

Skills

Core Skills

Data EngineeringEtlMachine Learning

Other Skills

Python (Programming Language)SQLPySparkProject ManagementArtificial IntelligenceData AnalystGoogle Cloud Platform (GCP)StatisticsData ScienceAlgorithmsExtract, Transform, Load (ETL)Program ManagementAgile Project ManagementDeep LearningEthical Hacking

About

Data Engineer passionate about building scalable, reliable data pipelines and analytics solutions. Proficient in SQL, Python, and PySpark, GCP and Azure. I specialize in transforming raw data into actionable insights and optimizing ETL workflows for performance and maintainability. Always eager to tackle complex data challenges and collaborate with cross‑functional teams to drive data‑driven decision‑making. 📌 Need help with data roles or career guidance? Book a 1:1 session here 👉 https://topmate.io/yash_raut11/

Experience

4 yrs 10 mos
Total Experience
2 yrs 1 mo
Average Tenure
7 mos
Current Experience

Societe generale

Senior Data Engineer - lll (SSE)

Oct 2025Present · 7 mos · Bengaluru, Karnataka, India · Hybrid

  • 🔹 Designed and developed data pipelines using Azure Databricks for large-scale data processing and analytics.
  • 🔹 Built and optimized ETL workflows with PySpark, transforming raw data into curated datasets for reporting and analysis.
  • 🔹 Integrated Databricks with Azure Data Lake (ADLS Gen2), Azure Data Factory (ADF), and Power BI for seamless end-to-end data flow.
  • 🔹 Implemented Delta Lake for ACID transactions, schema enforcement, and time travel capabilities.
  • 🔹 Improved pipeline performance using partitioning, caching, and efficient Spark configurations.
  • 🔹 Developed and maintained Databricks notebooks for data cleansing, validation, and transformation.
  • 🔹 Automated workflows with Databricks Jobs and ADF pipelines, reducing manual effort and improving reliability.
Python (Programming Language)SQLPySparkData EngineeringETL

Hsbc

3 roles

Manager - Data Analytics

Aug 2025Oct 2025 · 2 mos · Bengaluru, Karnataka, India

  • 🔹 Integrated Python scripts with Airflow DAGs for automated batch processing
  • 🔹 Tuned Spark jobs by adjusting partitioning, caching, and broadcast joins
  • 🔹 Performed large-scale data aggregation and feature engineering for ML pipelines using PySpark
  • 🔹 Developed unit tests for PySpark jobs using Pytest & SparkSession
Data EngineeringProject Management

Senior Data Engineer - WPB Risk Analytics

Promoted

May 2024Aug 2025 · 1 yr 3 mos · Bengaluru, Karnataka, India

  • 🔹 Designed and built ETL pipelines using Apache Airflow, Spark, and SQL
  • 🔹 Worked extensively on data modeling and schema design for structured & semi-structured data
  • 🔹 Developed real-time streaming pipelines using Kafka and PySpark
  • 🔹 Optimized queries on Snowflake, BigQuery, and PostgreSQL to enhance performance
  • 🔹 Deployed pipelines on cloud platforms like GCP and AWS (Glue, S3, Lambda)
  • 🔹 Managed large volumes of data using Hive and HDFS in a Hadoop ecosystem
  • 🔹 Implemented data quality checks and data validation frameworks
  • 🔹 Collaborated with cross-functional teams (Data Scientists, Analysts, Product) for scalable data delivery
  • 🔹 Contributed to end-to-end data warehousing projects
Artificial IntelligenceData AnalystGoogle Cloud Platform (GCP)Machine LearningPySparkData Engineering+1

Data Analyst - WPB Risk Analytics

Mar 2022Apr 2024 · 2 yrs 1 mo · Bengaluru, Karnataka, India

  • 🔹 Migrated legacy data pipelines from SAS to Google Cloud Platform (GCP) using BigQuery, Cloud Composer (Airflow), and Dataflow
  • 🔹 Converted SAS ETL logic to Python & SQL scripts for compatibility with GCP-native tools
  • 🔹 Rewrote complex PROC SQL and SAS macros into optimized BigQuery SQL and Dataform
  • 🔹 Implemented GCS-based ingestion pipelines replacing FTP-based SAS loads
  • 🔹 Ensured data quality validation using Great Expectations and custom GCP validation scripts
  • 🔹 Reduced pipeline cost & runtime by leveraging serverless architectures in GCP
  • 🔹 Built and optimized distributed data processing jobs using PySpark for millions of records daily
  • 🔹 Created reusable modular PySpark functions for data cleansing, transformation, and joins
Machine LearningData AnalystPySparkStatisticsPython (Programming Language)Google Cloud Platform (GCP)+1

Capgemini engineering

Software Engineer - Data Analytics

Apr 2021Mar 2022 · 11 mos · Bengaluru, Karnataka, India

  • 🔹 Developed and maintained end-to-end ETL pipelines to ingest, transform, and load banking data from multiple legacy systems into a centralized data lake
  • 🔹 Migrated workflows from on-prem SAS systems to cloud-based architecture (GCP/BigQuery)
  • 🔹 Translated business requirements into data transformation logic using PySpark, SQL, and Python
  • 🔹 Worked with sensitive financial data including transaction records, credit scoring, and fraud indicators – ensured data masking & encryption standards were met
  • 🔹 Designed and implemented data validation frameworks to ensure regulatory compliance (e.g., SOX, Basel)
  • 🔹 Collaborated with cross-functional teams including Capgemini offshore team, client SMEs, and business analysts across time zones
  • 🔹 Automated SAS reports into dashboard visualizations using Looker/Tableau integrated with BigQuery
  • 🔹 Optimized ETL jobs to reduce SLA breach risk by 30% and improved job monitoring with Airflow and Stackdriver
  • 🔹 Documented technical workflows and created data dictionaries for reusable assets
Artificial IntelligenceMachine LearningData ScienceAlgorithmsPython (Programming Language)Google Cloud Platform (GCP)+1

Ctronics infotech private limited

Data Science Intern

Sep 2019Feb 2020 · 5 mos · Maharashtra, India

  • 🔹 Built and trained ML models for classification, regression, and clustering tasks using Scikit-learn, XGBoost, and TensorFlow
  • 🔹 Developed end-to-end data pipelines for model training and evaluation — from data ingestion to deployment
  • 🔹 Cleaned and preprocessed large datasets using Pandas, NumPy, and feature engineering techniques
  • 🔹 Conducted exploratory data analysis (EDA) and generated actionable insights using Seaborn, Matplotlib, and Plotly
  • 🔹 Applied NLP techniques like sentiment analysis, TF-IDF vectorization, and topic modeling for text datasets
  • 🔹 Deployed machine learning models via Flask, Streamlit, or FastAPI for interactive web-based demos
  • 🔹 Tuned hyperparameters using GridSearchCV and evaluated model performance using metrics like F1-score, AUC, and RMSE
  • 🔹 Collaborated with cross-functional teams to translate business problems into data-driven solutions
  • 🔹 Documented model decisions, assumptions, and created visual reports for stakeholders

Verzeo

Machine Learning Intern

Jun 2019Nov 2019 · 5 mos · Greater Bengaluru Area

  • 🔹 Built multiple ML models – classification, regression, clustering using Scikit-learn, TensorFlow, and XGBoost
  • 🔹 Completed hands-on projects like:
  • Spam Email Detection
  • House Price Prediction
  • Customer Churn Prediction
  • 🔹 Worked with real datasets from Kaggle, UCI, and open-source APIs
  • 🔹 Deployed models via Flask & Streamlit on Heroku
  • 🔹 Applied NLP techniques for sentiment analysis and topic modeling
  • 🔹 Performed EDA & feature engineering using Pandas, NumPy, and Seaborn
  • 🔹 Version controlled projects using Git & GitHub
  • 🔹 Completed certifications: e.g., Google, Microsoft
Python (Programming Language)

Berry9 it services (b9its)

Cyber Security Intern

Jan 2018Jun 2018 · 5 mos · Greater Hyderabad Area

  • 🔹 Assisted in performing vulnerability assessments and penetration testing on web applications and internal systems
  • 🔹 Conducted network traffic analysis using Wireshark to identify potential threats and anomalies
  • 🔹 Supported implementation of firewall rules, intrusion detection systems (IDS), and SIEM dashboards
  • 🔹 Participated in incident response simulations, documented root causes, and proposed mitigation plans
  • 🔹 Automated security checks using Python scripts for log analysis and malware signature detection
  • 🔹 Helped design and run phishing awareness campaigns and internal cybersecurity training for employees
  • 🔹 Reviewed and updated security policies and compliance checklists based on ISO 27001 best practices
  • 🔹 Collaborated with senior security analysts on real-time threat monitoring and log triage
  • 🔹 Documented security incidents and maintained weekly audit logs and risk assessment reports

Education

Indian Institute of Remote Sensing (IIRS), Indian Space Research Organization (ISRO)

PGDM — Data Science & ML

Oct 2020Aug 2021

Sant Gadge Baba Amravati University, Amravati

Bachelor of Engineering - BE — Computer Science & Engineering

Jan 2016Jan 2020

Golden Kids Junior College Amravati

HSC — PCM

Mar 2014Feb 2016

Stackforce found 100+ more professionals with Data Engineering & Etl

Explore similar profiles based on matching skills and experience