Karan Sawlani — AI Researcher

A professional Data AI Engineer and DevOps Enthusiast who is building data recipes using the coding hands topped with GenAI fingers to fill the tummy with insights so that the mind can take decisions and unlock new possibilities. Tech Stack: - Programming Languages: Python(DSA), Bash, HCL - Databases: MySQL, Postgres - Data Warehouse: Vertica, Snowflake, Hive, Databricks - Data Modelling: DBeaver, PyCharm Big Data Tools - Data Documentation: Draw.io, Mermaid, Lucid charts, Miro etc - Version Control: GitHub, GitLab, Bit Bucket - IDE: PyCharm, Sublime - Deployments: Docker, Kubernetes(EKS, AKS, GKE), Helm, Terraform - Operating Systems: Linux (commands and scripting), Windows, MacOS - Data Engineering: Pyspark, Hadoop, Pig, Sqoop, Kafka, Data Warehousing, Data Modelling, Data Lake Design, Lakehouse Design, Delta Lake, Oozie, Flume, Apache Nifi, Apache Airflow - Clouds: - GCP (VPN, Firewalls, VM, GCS, IAM, Google Functions, GKE, GCR, Cloud NAT, DataProc) - Microsoft Azure (VNet, NSG, VM, ADLS/ABFS, IAM, Azure Functions, AKS, ACR, AD) - AWS (VPC, NAT, SG, NACL, EC2, S3, IAM, EMR, Glue, Lambda, EMR, QuickSight, Athena, RedShift, EKS, ECR) - GenAI: LLMs (GPT, Llama), Open AI APIs, Langchain, Langraph, Langfuse, Chainlit, Streamlit, Hugging Face Certifications: - Databricks Certified GenAI Associate - Databricks Certified Machine Learning Associate - Cloudera Certified Spark & Hadoop Developer - Databricks Certified Associate Spark3.0 Developer - Databricks Certified Associate Data Analyst - Databricks Certified Associate Data Engineer - Databricks Certified Professional Data Engineer - Databricks Certified Lakehouse Fundamentals Accreditation - Databricks GenAI Fundamentals - Gold Badge in Python (from Hackerrank) - Gold Badge in SQL (from Hackerrank)

Stackforce AI infers this person is a Data Engineering expert with strong capabilities in cloud and GenAI technologies.

Location: Dubai, United Arab Emirates

Experience: 7 yrs 3 mos

Skills

Data Engineering
Genai
Cloud Engineering

Career Highlights

Expert in building GenAI-powered data engineering pipelines.
Proven track record in implementing data governance strategies.
Skilled in creating cloud-agnostic data solutions.

Work Experience

Ras Al Khaimah Economic Zone (RAKEZ)

Data & AI Engineer (1 yr 7 mos)

BCG X

Senior Data Engineer (1 yr 1 mo)

Data Engineer (2 yrs 6 mos)

TO THE NEW

Data Engineer (1 yr 8 mos)

Big Data Trainee (5 mos)

Education

Bachelor of Technology - BTech at Raj Kumar Goel Institute of Technology, Ghaziabad

Science(Physics at Puranchandra Vidyaniketan(PCVN),BARRA,KANPUR(U.P.)

Karan Sawlani

AI Researcher

Dubai, United Arab Emirates7 yrs 3 mos experience

AI EnabledHighly Stable

Key Highlights

Expert in building GenAI-powered data engineering pipelines.
Proven track record in implementing data governance strategies.
Skilled in creating cloud-agnostic data solutions.

Stackforce AI infers this person is a Data Engineering expert with strong capabilities in cloud and GenAI technologies.

Contact

Skills

Core Skills

Data EngineeringGenaiCloud Engineering

Other Skills

ACRAWSAWS Identity and Access Management (AWS IAM)AWS RDSAWS S3AirflowAmazon EC2Amazon EKSAmazon Elastic Container Registry (ECR)Amazon Elastic MapReduce (EMR)Amazon Relational Database Service (RDS)Amazon S3Amazon VPCAmazon Web Services (AWS)Apache Flume

About

Experience

7 yrs 3 mos

Total Experience

2 yrs 9 mos

Average Tenure

1 yr 7 mos

Current Experience

Ras al khaimah economic zone (rakez)

Data & AI Engineer

Nov 2024 – Present · 1 yr 7 mos · Dubai, United Arab Emirates · On-site

Overview:
Leading and executing the enterprise-wide DAMA (Data Management Association) strategy
Developing a cloud-agnostic, metadata-driven Python framework for data ingestion and data quality, integrating with diverse sources such as APIs, SAP, Salesforce, RDBMS, and cloud platforms
Building GenAI-powered data engineering pipelines to unlock business value and generate insights based on specific use cases
Managing & leading the data team to cater day-to-day business data requirements by data analysts and scientists
Details:
Leading the implementation of the DAMA strategy encompassing Data Governance, Data Modeling, Data Warehousing, Data Transformation, Data Cleansing, Master Data Management (MDM), Data - Monitoring, Data Security, and the development of scalable, production-grade ETL pipelines
Building a reusable Python-based Data Ingestion and Quality framework to automate pipeline creation and reduce manual efforts by 40%
Designing a Python logging utility to automate logging and enhance observability across codebases and data pipelines
Actively contributing to Data Architecture decisions for constructing a Data Lake by integrating data from systems like SAP ERP, SAP HANA, Salesforce, Google BigQuery, AWS AppFlow, and more
Developing GenAI-powered data engineering pipelines to overcome business challenges and derive actionable insights tailored to specific use cases

PythonData GovernanceData ModelingData WarehousingData TransformationData Quality+2

Bcg x

2 roles

Senior Data Engineer

Promoted

Oct 2023 – Nov 2024 · 1 yr 1 mo

Overview:
Created cloud agnostic software/data engineering python library to submit spark jobs remotely in any Cloud Platform (AWS/Azure/GCP or Databricks)
Details:
Created a software/data engineering python library that is capable of submitting and monitoring the spark jobs remotely in any cloud platform majorly AWS/Azure/GCP and Databricks. It also controls end-to-end spark cluster lifecycle where it manages the cluster start state, end state, job submission state, error state, bootstrapping, installing any other libraries, etc
This library is also capable of running in any environment like local machine or edge node or can be triggered from Airflow, AWS lambda, Azure Step functions etc
Created unit and integration tests to make the code coverage of 90% and added pytest html reports to show the results in nice visual manner
Implemented coding best practices including configuring a pyproject.toml file, integrating pre-commit hooks, and using tools such as Black, Flake8, and isort for code formatting and linting. Set up unit tests and generated test reports stored in artifacts. Created versioned wheel and tar files for installation on remote clusters. Added code coverage information to GitHub badges. Automated the entire process as part of the CI/CD pipeline, ensuring seamless integration and deployment. Finally, published documentation on GitHub Pages
Created GitHub Actions CICD pipelines to auto-deploy the code changes by creating the wheel/tar files and deploying them on the artifactories and running the jobs synchronously

PythonAWSAzureGCPDatabricksCI/CD+3

Data Engineer

Apr 2021 – Oct 2023 · 2 yrs 6 mos

Overview:
Created customer360 (data model for getting customer insights and analytics) for the fashion retail client
Details:
Created a customer 360 data model in order to do complete analysis of the huge 5TB customer data in terms of their email engagement (CTR, CTOR, OR, CR), RFM analysis, Channel based analysis (Online/Offline/Cross platform etc.)
Done EDA (exploratory data analysis) using waterfall model, funnels etc.
Done visualization using pivot tables/charts using Excel etc
Overview:
Created the end-to-end infra-layer to deploy the Data Engineering and DevOps tech stack for the personalization use case to help automate and kick-start the 2 months long project in 4-5 days
Details:
Created the terraform code to auto-deploy AWS resources like EC2, VPC, ECR, EMR and their related networking stack, maintained the connectivity between the public, private subnets by configuring security groups, installing docker inside the private subnet EC2, installing Airflow in the docker image
Created the same above environment in Azure using VM, VNet, Databricks and in GCP using VM, VPN, Dataproc etc.
Created terraform code to auto-deploy K8s infrastructure in AWS (EKS, ECR), Azure (AKS, ACR), GCP (GKE, GCR), their security related configurations for public and private K8s clusters, installed Apache Airflow inside these K8s clusters using official Helm charts
Created CICD pipelines using GitHub Actions to check linting, do unit tests, deploy the software engineering tar files to the artifacts (DBFS/GCS/S3/ABFS/ADLS), veracode security scans etc

Data ModelingETLData AnalysisPythonHadoopData Engineering

To the new

2 roles

Data Engineer

Aug 2019 – Apr 2021 · 1 yr 8 mos

Overview:
Created production grade Data Warehousing/Modelling solution for the retail client
Details:
Created the ETL pipeline where it fetched all the raw data from landing S3 datalake then transformed it and put it into staging S3 datalake environment and then loaded that data into Vertica DB data warehouse, this includes tasks like orphan records handling, metadata management, logging table creation, data cleaning, creating data marts, creating analytics/reporting tables etc.
Created data warehouse for the global customer records, implemented techniques like SCD Type-1, SCD Type-2, creating schemas like star schema, snowflake schema and at the end the data was populated in this data warehouse using various Airflow ETL pipelines from different sources

ETLData WarehousingAirflowPythonData Engineering

Big Data Trainee

Feb 2019 – Jul 2019 · 5 mos

Overview:
Created production grade Data Pipelines for the hospitality based client
Details:
Created the ETL pipeline where it fetched all the tables and schemas from Staging Hive environment to Prod Hive Environment, handled edge cases like migrating partitioned, bucketed and external tables using regex etc
Created ETL pipelines to ingest the data from various business related APIs, transforming their data and putting them into Data Warehouses

ETLHiveData PipelinesData Engineering