Deepak K

Data Scientist

Bengaluru, Karnataka, India6 yrs 7 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Over 6 years of data engineering experience.
  • Expert in building scalable data ingestion pipelines.
  • Proficient in modern data engineering tools and platforms.
Stackforce AI infers this person is a Data Engineer specializing in Fintech with expertise in ETL and data pipeline automation.

Contact

Skills

Core Skills

PysparkDatabricksEtlData EngineeringApi DevelopmentData Quality AssuranceData MigrationData AggregationData AutomationData Visualization

Other Skills

FlaskPythonREST APIsHTML/CSSVanilla JSGitHubAWS CLIS3Delta LakeDatabricks SDKKafkaMariaDBKafka ConnectDebeziumAWS

About

Data Engineer with over 6+ years of experience in the information technology and financial services domains. Skilled in Big Data and distributed processing technologies including PySpark, Apache Spark, Spark Streaming, Spark SQL, Hadoop, Hive, and Python. Proficient in ETL/ELT development, data ingestion pipelines, data modeling, and workflow orchestration. Experienced with modern data engineering tools and platforms such as Databricks, Delta Lake, Snowflake, Airflow, Kafka, and DBX-based CI/CD. Holds a strong academic foundation with a Master’s Degree in Computer Applications from the National Institute of Technology Karnataka (NITK).

Experience

6 yrs 7 mos
Total Experience
1 yr 7 mos
Average Tenure
3 yrs 7 mos
Current Experience

Digicert

2 roles

Senior Data Engineer

May 2025Present · 1 yr 1 mo · Bengaluru, Karnataka, India

  • ➡️ MaverickFlow – Pipeline Configuration Application
  • ✅ Tech Stack – Flask, Python, PySpark, REST APIs, HTML/CSS, Vanilla JS, GitHub , AWS CLI, S3, Delta Lake, Databricks SDk.
  • Built a Flask-based UI to manage Databricks pipeline configurations without manual edits, enabling seamless updates to Kafka topics, config files, job parameters, cluster settings, schedules, and task counts.
  • Integrated Databricks REST APIs to automate workflow creation and enable new pipeline provisioning from the UI.
  • Cut pipeline setup time from 12–14 hours to 2–4 hours, significantly improving developer productivity, reducing errors, enhancing monitoring, and accelerating migration cycles.
FlaskPythonPySparkREST APIsHTML/CSSVanilla JS+6

Data Engineer

Oct 2022Apr 2025 · 2 yrs 6 mos · Bengaluru, Karnataka, India

  • ➡️ Data Ingestion Pipeline - On-prem Data Sources to AWS.
  • ✅ Tech Stack – PySpark, Python, Kafka, MariaDB, Kafka Connect,Debezium, Databricks, AWS(S3), Airflow
  • Developed a robust and optimized data ingestion pipeline to efficiently process data, ensuring high performance and scalability.
  • This pipeline to support 30+ databases across multiple regions, including Europe, North America, and Asia.
  • Pipeline is designed in a way to handle GB’s of data daily and weekly, supporting key use cases such as Audit, ETL processing, and Business Intelligence dashboards, enabling data-driven decision-making for management.
  • Took full ownership of the project, collaborating closely with ETL teams to gather requirements and deliver a seamless solution.
  • ➡️ Slack Integration With Databricks
  • ✅ Tech Stack - Databricks, Slack Incoming Webhooks, AWS Secrets, AWS CLI
  • Streamlined job monitoring by sending alerts to Slack channels for failed jobs, long-running tasks, and exceeded resource limits.
  • Enhanced team collaboration through Slack, facilitating faster troubleshooting and issue resolution by keeping everyone informed in real time.
  • ➡️ Raven API Development & Enhancement
  • ✅ Tech Stack - Delta Lake,PySpark, AWS Lambda, AWS API Gateway
  • Built a standalone API to fetch selective columns from multiple Databricks Unity Catalog tables, improving data accessibility and efficiency.
  • Enhanced performance by enabling bulk data loading in chunks, optimized for handling large datasets.
  • ➡️ Data Quality Assurance & Dashboard for Databricks Pipeline
  • ✅ Tech Stack – Databricks, PySpark, Delta Lake, AWS S3, SQL
  • Created a Data QA script to verify schema and record accuracy across migrated datasets.
  • Designed a Databricks Dashboard to present QA results, consolidating key metrics like record volumes, columnar counts, schema structures, and field-to-field mappings across table structures.
  • Streamlined QA and monitoring, boosting accuracy and reliability while cutting manual effort.
PySparkPythonKafkaMariaDBKafka ConnectDebezium+5

Gep worldwide

Data Engineer

May 2022Oct 2022 · 5 mos · Hyderabad, Telangana, India

  • ➡️ Incremental Ingestion pipeline – Customers and Financial Data
  • ✅ Tech Stack - Databricks, Spark Streaming, Kafka, Azure Data Lake, SQL, GitHub, PySpark, Python
  • Implemented ETL scripts to load data from multiple data sources into Azure delta Lake using Azure Data-bricks and carried out data migration activities.
  • Built an automated data ingestion pipeline by migrating data through Apache Kafka and elastic into Delta lake.
DatabricksSpark StreamingKafkaAzure Data LakeSQLGitHub+4

Citi

2 roles

Data Engineer

Sep 2020May 2022 · 1 yr 8 mos · Pune, Maharashtra, India

  • ➡️ SAS to PySpark Migration for Digital Sales Insights
  • ✅ Tech Stack - SAS, Tableau, Hive, SQL, PySpark, Bash, Bitbucket
  • Transformed Codes from SAS scripting to PySpark scripting in Analytical platform – Hive as a data source.
  • Ensured that all business logic, calculations, and data manipulations in the original SAS scripts were accurately replicated in PySpark, optimizing performance for large datasets in the analytical environment.
  • Worked on a Digital sales project that generates matrices towards the Citi offer given to the particular eligible customers like impressions, clicks, submits, reach, and created a final output table in Hive that can be used in tableau visualisation through Arcadia connectivity.
  • Collaborated with business stakeholders to align metrics with KPIs.
  • ➡️ Data Aggregation Pipeline
  • ✅ Tech Stack – PySpark, Python, SQL, Hive Data Lakes, MySQL, HDFS, Jenkins, Curl, Cloudera Manager
  • Built an ingestion pipeline for integrating 3+ data sources into Hive.
  • Implemented business logic to prepare clean & aggregated data for Analysis.
SASTableauHiveSQLPySparkBash+3

Summer Intern

Feb 2020Aug 2020 · 6 mos · Pune, Maharashtra, India

  • ➡️ Data Automation for Streamlined Reporting
  • ✅ Tech Stack - SQL, SAS, Python, Hue, Hive, PySpark, Tableau
  • Worked on report automation using SAS and Data Visualisation using Tableau.
  • Automated the SAS dataset migration process to PySpark that was being leveraged by the team of 17 people for their report execution process.
  • Helped the team of 57 people by sharing knowledge sessions about legacy SAS report transformation to PySpark.
SQLSASPythonHueHivePySpark+3

National institute of technology karnataka

Tech Lead NIMCET 2019

Feb 2019Jan 2020 · 11 mos · Mangalore Area, India

  • ✅ Tech Stack - CSS, Ruby on Rails, JavaScript, HTML, MySQL, NGINX
  • The NIMCET is a Common Entrance National Level Test, conducted by any of the NIT's for admission into their MCA programme. Admissions in the programme is based on the Rank obtained in NIMCET exam.
CSSRuby on RailsJavaScriptHTMLMySQLNGINX

Education

National Institute of Technology Karnataka

Master of Computer Applications - MCA — Computer Science

Jan 2017Jan 2020

Panjab University

Bachelor of Computer Applications — Computer Science

Jan 2014Jan 2017

Jawahar Navodaya Vidyalaya - JNV

Non Medical

Jan 2007Jan 2014

Stackforce found 100+ more professionals with Pyspark & Databricks

Explore similar profiles based on matching skills and experience