Vikesh Malik

Data Engineer

Glasgow, Scotland, United Kingdom4 yrs 11 mos experience

Key Highlights

Achieved 50-70% improvement in data processing speed.
Delivered high-priority projects on time and within scope.
Expert in building efficient data pipelines using PySpark.

Stackforce AI infers this person is a Data Engineer specializing in Big Data solutions for SaaS and Fintech industries.

Contact

Skills

Core Skills

Data EngineeringBig Data

Other Skills

BitbucketC (Programming Language)CassandraClouderaContinuous Integration and Continuous Delivery (CI/CD)Data AnalysisData ExtractionData ModelingData SolutionsDatabricksETLHBaseHQLHTMLHadoop

About

I have always found joy in structured data sets and transforming them into meaningful, actionable insights. At my previous role at EXL , I developed new PySpark scripts to reduce over all batch processing time. I spearheaded the development of a new data processing pipeline that improved processing speed by 50-70%. I was able to dramatically decrease the time it took to analyze data, which in turn allowed us to make faster, data-driven decisions. Skills : Big Data, Hadoop, MySQL,Apache Spark,Apache Sqoop, Hive, Git, Bitbucket, Linux, Data Warehouse, ETL, AWS (S3, Athena,EMR), Presto, JIRA, SQL, Cloudera, Databricks.

Experience

Jpmorganchase

Associate Data Engineer - II

Jan 2025 – Present · 1 yr 2 mos · Glasgow, Scotland, United Kingdom · On-site

Migrated legacy Hive-based data processing workflows to Databricks cloud platform, modernizing data
infrastructure and improving scalability.
Developed high-performance data pipelines using PySpark, reducing batch processing time by 8hrs
and achieving 50% efficiency improvement in system operations.
Performed end-to-end code testing and validation across multiple environments (dev, UAT,
production), ensuring robust application performance and user experience
Performed comprehensive data validation across multiple business logic scenarios to ensure data
accuracy, integrity, and compliance with business requirements.
Investigated and mapped business logic workflows to identify optimization opportunities and ensure
accurate system behaviour.
Successfully delivered two high-priority projects on time and within scope, meeting all stakeholder
requirements and business objectives.
Tech Stack: Spark, MySQL, Cloudera, Hive, Python, Bitbucket, Jira, IntelliJ, Data Pulse, Databricks, Jules, Airflow

DatabricksPySparkMySQLClouderaHiveBitbucket+3

Exl

Software Engineer-II (Data Engineer)

Mar 2024 – Dec 2024 · 9 mos · India · Remote

Design, build and optimize the data architecture using suitable design patterns and effective Data Model to data pipelines to make them accessible for Business Data Analysts, Data Scientists and Business users to enable data-driven decision making.
Gathered, defined and refined requirements, led project design and implementation.
Designed data models for complex analysis needs.
Optimize data pipelines and data storage to improve performance and scalability.
Wrote Complex SQL views for Data Analyst and Data Scientist to leverage models on top of it.
Brought best-practice to proactively and continuously build data related practices within the team.
Built data pipelines in PySpark to reduce batch processing time of system, leading efficiency increase.
Maximised performance benefits for clients, delivering testable, maintainable and modern data solutions in HQL, PySpark.
Creating reports from sales and fraud data.
Migrated existing HQL based system to Pyspark and enabled to reduce batch processing time 30-50%.

PySparkSQLHQLData ModelingETLData Engineering+1

Iris software inc.

Associate Engineer (Data Engineer)

Sep 2022 – Jan 2024 · 1 yr 4 mos · Noida, Uttar Pradesh, India · Hybrid

Citi Bank - Anti Money Laundering Project
Built data pipelines in PySpark to reduce batch processing time of system, leading 43% efficiency increase.
Maximised performance benefits for clients, delivering testable, maintainable and modern data solutions in HQL, PySpark.
Experienced in working with AML system which process years of data from warehouse to track legal activities of customers.
Developing data pipeline to find eligible records for deletion based on retention period.

PySparkHQLData SolutionsData EngineeringBig Data

Ibm

Associate System Engineer

Jan 2021 – Sep 2022 · 1 yr 8 mos · Bengaluru, Karnataka, India · Remote

Customer Inventory
Monitored system performance using recognised and agreed criteria.
Modified current systems to enhance workflows and meet new needs.
Partnered with users to understand and define system requirements.
Extracted maximum value from existing data by leveraging new open data sources.
Built data pipelines in pySpark to reduce system processing time, leading to 20% efficiency increase.

PySparkSystem MonitoringData ExtractionData EngineeringBig Data