Ekamdeep Kaur

Data Engineer

London, Ontario, Canada8 yrs 1 mo experience

Most Likely To SwitchHighly Stable

Key Highlights

7+ years of experience in data engineering.
Expertise in real-time data processing and ETL optimization.
Proven track record of enhancing data reliability and operational efficiency.

Stackforce AI infers this person is a Data Engineering expert specializing in real-time data processing and ETL optimization.

Contact

Skills

Core Skills

Data EngineeringEtlReal-time Data Processing

Other Skills

AWS GlueAirflowAlteryxAmazon RedshiftAmazon Web Services (AWS)Apache ImpalaApache KafkaApache KuduApache OozieApache SparkApache Spark StreamingAthenaBashBig DataBig Data Analytics

About

Data Engineering Professional with 7+ years of experience in designing and building scalable, data-intensive applications across cloud platforms like GCP and AWS. Proven expertise in developing robust ETL and real-time Spark pipelines, managing big data ecosystems, and optimizing workflows for high performance and efficiency. Skilled in tools such as Spark, Kafka, Parquet, and Airflow, with hands-on experience in both batch and streaming architectures. Passionate about data quality, automation, and building reliable, scalable systems that enable advanced analytics and business intelligence.

Experience

8 yrs 1 mo

Total Experience

2 yrs

Average Tenure

2 yrs 7 mos

Current Experience

Brambles

Data Engineer

Oct 2023 – Present · 2 yrs 7 mos · Mississauga, Ontario, Canada

Develop AWS Glue data ingestion pipelines to process data from S3 files, transforming it into Parquet format across three layers: raw, cleansed, and curated, with Athena tables created for querying.
Collaborate with multiple teams to enhance data flows, optimize integration, resolve issues, and improve data reliability and operational efficiency across various projects.
Optimized Python notebooks in Databricks by improving joins, changing SQL queries to PySpark code, and eliminating redundant table reads to enhance performance and minimize resource consumption.
Enhanced and optimized PySpark Glue job for the location dimension table by decommissioning redundant fields, removing unnecessary joins to improve efficiency and reduce resource usage.

AWS GlueParquetPythonDatabricksETLData Engineering

Verse innovation

2 roles

Senior Software Engineer (Data)

Promoted

Apr 2021 – Aug 2023 · 2 yrs 4 mos · Hybrid

Architected and optimized real-time Spark data pipelines to process event data at 100K RPS from Kafka, storing outputs in GCS (Parquet) and Apache Kudu, reducing ETL lag from 2 hours to under 5 seconds and integrating with BigQuery.
Led the initiative to develop a process for generating real-time user locations to enhance targeted delivery of ad campaigns, resulting in a 15% increase in revenue.
Implemented a process leveraging Prometheus, ETCD and creating RESTful API in Flask to compute near real-time RPS for ad campaigns on Dailyhunt and Josh apps; improved campaign delivery accuracy by 20% .
Analysed and optimised ad campaigns delivery by introducing new factors for pacing and rebooking.
Revamped and fine-tuned multi-processing Python processes to parallelise data processing of millions of events, resulting in a 50% decrease in processing time.
Created Superset dashboards on GCP, Kudu and Hive tables for data visualisation and analysis for business stakeholders.
Orchestrated the migration of 50+ jobs on Airflow and automated monitoring by setting up alerts and reporting.
Led technical onboarding for new data engineers, delivering comprehensive training sessions on team projects and coding best practices.
Collaborated effectively within Agile environments and Scrum practises, contributing to iterative development and timely achievement of project milestones.
Developed and implemented Bash scripts to automate tasks and streamline workflows.
Extensive hands-on experience working in Unix and Linux environments, adept at utilising command-line interfaces.

SparkKafkaGCSBigQueryPythonETL+2

Software Engineer (Data)

Oct 2019 – Apr 2021 · 1 yr 6 mos · Hybrid

Analysed and optimised ETL processes by developing real time data pipeline sourcing data from Kafka topics and pushing to Apache Kudu tables using Spark and Scala, thereby reducing the lag from 2 hours to less than 5 seconds.
Collaborated with Ad Operations team to develop User Monetisability module for revenue computation from impressions, clicks, installs, and leads.
Enhanced the computation process for installs numbers by extracting data from Kudu and Hive, and facilitating seamless data loading into Mongo.
Developed Impala view for integration of data from Kudu and Hive.

SparkScalaKafkaApache KuduETLData Engineering

Caastle

Software Engineer (Data)

Jul 2018 – Oct 2019 · 1 yr 3 mos · Bengaluru, Karnataka, India · On-site

Owned, developed and maintained Hadoop based ETL Jobs in Cascading (Java) for data transformation in horizontally scalable architecture to achieve a multi-tenant based Data Warehousing architecture.
Innovated and created Django server to publish DAGs to show ETL jobs’ dependencies on one another and current status by fetching states from MySQL; improving team’s productivity.
Migrated data from Hive to Amazon S3, created build and deployment plans using Bamboo and scheduled the scripts using Oozie.

HadoopCascadingJavaDjangoMySQLData Engineering+1

Mckinsey & company

Data Analytics Intern

Jan 2017 – Jun 2017 · 5 mos · Gurgaon, Haryana, India

Prepared bench-marking document comparing Big Data technologies based on functionalities.
Created a use case to mine Twitter data using Spark Streaming and perform sentiment analysis on tweets using Natural Language Processing.
Created charts for data visualization in Tableau.

TableauSpark StreamingNatural Language Processing