Kartik Chhabra

Software Engineer

Agra, Uttar Pradesh, India8 yrs 6 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in architecting scalable data pipelines.
Proficient in cloud platforms like AWS and GCP.
Certified in AWS Solutions Architect and Databricks.

Stackforce AI infers this person is a Big Data Engineer with expertise in cloud-based data solutions for retail and life sciences.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingEtl

Other Skills

AWSAgile Project ManagementAirflowAmazon Web Services (AWS)Android DevelopmentApache SparkAutomationCData IngestionData StructuresData WarehousingExtract, Transform, Load (ETL)GCPGWTGoogle Cloud Platform (GCP)

About

Accomplished Senior Big Data Engineer and Tech Lead with over 6 years of experience architecting scalable data pipelines for retail and life science industries. Expert in ETL, data warehousing, and big data technologies (PySpark, Hive, Python), with deep proficiency in cloud platforms (AWS, GCP). Certified in AWS (Solutions Architect, Developer) and Databricks, Driven innovation by reducing new client onboarding time through automation and migrating workloads to Kubernetes and latest cloud tech. Well versed with Data Governance and Security. Skilled in agile methodologies, including sprint planning, PI planning, and cross-functional collaboration, I deliver impactful data solutions. Passionate about continuous learning and mentoring to advance data engineering excellence.

Experience

Dunnhumby

2 roles

Senior Big Data Engineer & Tech Lead

Promoted

Oct 2023 – Present · 2 yrs 5 mos · Gurugram, Haryana, India · Hybrid

Lead technical initiatives for retail clients (MAF, Coop-DK, Coop-Se, Shoprite, Foodstuff, McDonald’s France, Secoma Japan), onboarding new clients, optimizing ETL processes and data pipelines.
Migrate workloads from GCP Dataproc to Kubernetes, enhancing scalability and cost-efficiency.
Develop automation tools, reducing client onboarding Data Engineering time via parallel processing and streamlined process.
Drive agile processes, including sprint planning, PI planning, story creation, and effort estimation, ensuring project alignment.
Collaborate with product, client, platform, and data science teams to prioritize objectives and monitor metrics like burndown charts.
Leverage AWS services (Glue, Athena, Redshift) for advanced data processing and analytics.
Lead Graduate Training Programs with case studies to enhance onboarding and skill development.

Tech LeadAgile Project ManagementCloud ComputingPython (Programming Language)Data Engineering

Big Data Engineer

Apr 2022 – Oct 2023 · 1 yr 6 mos · Gurugram, Haryana, India · Hybrid

Design and implement data pipelines using PySpark, SQL, Hive, and Airflow for retailers like Shoprite, Coop-Se, and Majid Al Fattim.
Lead data migration projects for Coop-Se and Coop-Dk, rewriting transformation logic and recreating pipelines.
Develop standardized data ingestion processes with quality checks for data lakes.
Analyze, transform, and cleanse data to meet business requirements for data models.
Automate workflows using Airflow for efficient data processing and scheduling.

Extract, Transform, Load (ETL)PySparkData WarehousingGoogle Cloud Platform (GCP)Data EngineeringETL

Tata consultancy services

5 roles

Data Security and Governance | Python Development

Sep 2021 – May 2022 · 8 mos

To safeguard Pfizer's Data, tried various Data Security tools and finalized and implemented IMMUTA for universal cloud data access control. Used EKS for running the service. Implemented attribute based control, policy enforcement and classification with masking of Data. Worked on highly efficient multi threaded batch job that sync 2 lakh Pfizer employees and contractors details along with their attributes and training into immuta. Working on jobs that load billion of datasources from cataloging tools into immuta and APIs using AWS API gateway and AWS Lambda that help user to gain limited time access to datasources. Using EKS to host the batchjobs.

Big Data Developer [TCS Digital]

Jan 2021 – Aug 2021 · 7 mos

Owing to the batch based and periodic requirement of cluster computing, Qualified and Implemented Databricks in Pfizer. Worked with Pfizer's AWS, Ping, SSO, external metastore (RDS) to securely provide Databricks as a multi tenant service. Wrote various batch jobs to sync Pfizer's PXED with Databricks, to manage cluster level ACLs according to users group, etc.
Made AWS Athena functional for Pfizer's adhoc query workload on Data Lakes (s3) to give Pfizer compliance power to perform regular checks in valuable data.
Worked on jobs to monitor and report COST of Databricks and various AWS services. Used tags for segregating projects in multi-tenant environment. Rolled out automatically generated weekly, monthly and yearly reports.
Worked on POC using Machine and Deep Learning to study trends in stock indexed and their corr on vaccine demand. Used exploratory data analysis to models like regressions and LSTM

PySparkData Engineering

Big Data Platform Architect and Automation Engineer

Promoted

Jan 2020 – Dec 2020 · 11 mos

Worked on Data Analytic Pipelines for Pfizer using AWS services like Kinesis, S3, AWS Glue (ETL) and Athena, Redshift, Quicksight, AWS Lambda for transferring and analysis of covid related data. Performed POC and implemented the complete architecture on short notice due to sudden outbreak of covid19.
Qualified AWS Glue and supported the service according to Pfizer’s GxP SOX standards and created principal IAM roles and policies for managing various application teams through different roles. Developed Python Script deployed over AWS Lambda using BOTO3 API calls on AWS resources, automating and enforcing various compliance standards and sending appropriate mail to service user.
Created platform architecture design for Dremio, a data lake engine. Tested various architecture from platform perspective like EKS and HA via EC2 and ELB. Used various data sources like AWS S3, Redshift, Elastic Search, Snowflake and BI tools. Wrote various KBA and created Architectural diagrams and flow charts.

Data IngestionData Engineering

Hadoop and Cloud Operations

Jun 2019 – Dec 2019 · 6 mos

Worked with Hadoop Stack (MapR) and configuring Spark, Hive, Impala, Zookeeper, Oozie, metastore, etc on multi-node environments. Responsible for commissioning and decommissioning Data nodes, cluster monitoring, CA certificate and troubleshooting. Implementation of code deployments, configuration management, backups and DR strategy. Using AWS EC2 instances of various types for different application teams as per their requirement. S3 and S3 CRR for cluster backup. Performed periodic Access and Activity reviews to uphold GxP standards for for Orange, CI and PI Data.

Data Ingestion