Gunjit Arora

Data Engineer

Toronto, Ontario, Canada7 yrs 11 mos experience

Key Highlights

Expert in building data lakes and ETL processes.
Proven track record in data engineering and analytics.
Strong experience with AWS and big data technologies.

Stackforce AI infers this person is a Data Engineering expert with extensive experience in SaaS and big data solutions.

Contact

Skills

Core Skills

Data EngineeringAwsData Science

Other Skills

AWS CloudFormationAWS GlueAWS LambdaAmazon AthenaAmazon Managed Workflows for AirflowAmazon RedshiftAmazon S3Amazon Web Services (AWS)Apache AirflowApache SparkAthenaAutomationBig DataBig Data AnalyticsContinuous Integration and Continuous Delivery (CI/CD)

About

Gunjit Arora is currently pursuing my Master of Science in Information Systems from Northeastern University and a Data Engineer II at SSENSE and formerly worked at Amazon and Qubole. At Amazon, he created Spark ETLs using Glue, built a Data Lake in S3, managed Redshift clusters supporting more than 250 users, and implemented datashare to reduce redundant data. At Qubole, he created a cost explorer for clients, built a catalog for every command fired in an account, and conducted data analysis to address data sanity and correctness. He also worked as a Data Scientist at Lybrate where he built suitable data for recommenders and performed time series analysis for forecasting lab orders and appointments.

Experience

7 yrs 11 mos

Total Experience

1 yr 7 mos

Average Tenure

1 yr 8 mos

Current Experience

Qapital

Senior Data Engineer

Sep 2024 – Present · 1 yr 8 mos · Toronto, Ontario, Canada · Remote

Microsoft AzureSystems DesignData Engineering

Ssense

Data Engineer II

Jan 2023 – Aug 2024 · 1 yr 7 mos · Toronto, Ontario, Canada · Remote

● Developed a comprehensive data mesh at SSENSE to address data needs across various domains (sales, marketing, etc.) and serve as a single source of truth.
● Ingested data from multiple sources, including SQL databases, Rest-api, CSV files, and BigQuery, and exported data to S3 and BigQuery.
● Designed and built the Data Platform product across multiple AWS domain accounts using Infrastructure as Code (IAC) with AWS CloudFormation for resource creation and management.
● Collaborated with stakeholders and business users to understand their needs and define product requirements.
● Created observability dashboards in Datadog and a notification system to monitor system health and prevent silent errors.
● Managed Docker images and Jenkins pipelines for deploying the product and its instances within the AWS environment.

AWS GlueAutomationJenkinsInfrastructure as code (IaC)AthenaReporting Requirements+18

Amazon

Data Engineer

Feb 2021 – Dec 2022 · 1 yr 10 mos · Bengaluru, Karnataka, India

● Scoped and created Amazon Managed Workflows for Airflow (MWAA) to run EMR Jobs.
● Created Spark ETLs using Glue which were orchestrated using Step functions and triggered using
Lambda.
● Build a Data Lake in S3 for Glue jobs as well as to enable data access for external teams.
● Created and managed two redshift clusters supporting more than 250 users, running 1500
schedule jobs and 19k+ weekly queries.
● Scoped and implemented datashare to reduce the redundant data load between the clusters.
● Migrating all the business critical jobs also created all supporting tables, views and functions.
● Created a weekly report to help us understand the current usage of the cluster. The report has 30+
metrics with WoW and MoM comparison.
● Identified and optimized long running queries on the cluster which helped us save over 1000
compute minutes weekly.

AWS GlueAutomationReporting RequirementsBig Data AnalyticsData WarehousingETL Tools+9

Qubole

Member Of Technical Staff

Sep 2019 – Dec 2020 · 1 yr 3 mos

● Created cost explorer for clients, enabling them to monitor,manage, and control costs at a query and user level to track usage and per unit spend resulting in ~25% cost reduction across use cases.
● With cost explorer, customers can track costs, monitor showback, justify business plans, prepare budgets, and build ROI analyses.
● Built catalog for every command fired in an account, collected resource utilization metrics to create normalization model to associate cost to each query so that cost can be aggregated at user level for chargeback and other use cases.
● Conducted analysis to address data sanity and correctness which lead to custom data validation framework.
● Automated the process of backfilling the data in the final tables as well as intermediate tables, reducing time take by ~50% while also ensuring tables and data are complete and in order
● Blog Link : https://www.qubole.com/blog/qubole-cost-explorer/

Structured DataReporting RequirementsData WarehousingData ProcessingData EngineeringBig Data+2

Lybrate

2 roles

Data Scientist

Jul 2018 – Sep 2019 · 1 yr 2 mos

● Making data visible for data-driven decisions. This includes combining data from various sources using ETL scripts and performing sanity checks. Reporting data through dashboards and Email Reports.
● Building suitable Data for the recommender. - Using NLP to process free text data to gain meaningful insights and tagging data.
● Time Series Analysis for Forecasting the number of Lab Orders and Appointments.

Data WarehousingData ProcessingData EngineeringData PipelinesAmazon Web Services (AWS)Data Science