Karan Sawlani

AI Researcher

Dubai, United Arab Emirates7 yrs 3 mos experience
AI EnabledHighly Stable

Key Highlights

  • Expert in building GenAI-powered data engineering pipelines.
  • Proven track record in implementing data governance strategies.
  • Skilled in creating cloud-agnostic data solutions.
Stackforce AI infers this person is a Data Engineering expert with strong capabilities in cloud and GenAI technologies.

Contact

Skills

Core Skills

Data EngineeringGenaiCloud Engineering

Other Skills

ACRAWSAWS Identity and Access Management (AWS IAM)AWS RDSAWS S3AirflowAmazon EC2Amazon EKSAmazon Elastic Container Registry (ECR)Amazon Elastic MapReduce (EMR)Amazon Relational Database Service (RDS)Amazon S3Amazon VPCAmazon Web Services (AWS)Apache Flume

About

A professional Data AI Engineer and DevOps Enthusiast who is building data recipes using the coding hands topped with GenAI fingers to fill the tummy with insights so that the mind can take decisions and unlock new possibilities. Tech Stack: - Programming Languages: Python(DSA), Bash, HCL - Databases: MySQL, Postgres - Data Warehouse: Vertica, Snowflake, Hive, Databricks - Data Modelling: DBeaver, PyCharm Big Data Tools - Data Documentation: Draw.io, Mermaid, Lucid charts, Miro etc - Version Control: GitHub, GitLab, Bit Bucket - IDE: PyCharm, Sublime - Deployments: Docker, Kubernetes(EKS, AKS, GKE), Helm, Terraform - Operating Systems: Linux (commands and scripting), Windows, MacOS - Data Engineering: Pyspark, Hadoop, Pig, Sqoop, Kafka, Data Warehousing, Data Modelling, Data Lake Design, Lakehouse Design, Delta Lake, Oozie, Flume, Apache Nifi, Apache Airflow - Clouds: - GCP (VPN, Firewalls, VM, GCS, IAM, Google Functions, GKE, GCR, Cloud NAT, DataProc) - Microsoft Azure (VNet, NSG, VM, ADLS/ABFS, IAM, Azure Functions, AKS, ACR, AD) - AWS (VPC, NAT, SG, NACL, EC2, S3, IAM, EMR, Glue, Lambda, EMR, QuickSight, Athena, RedShift, EKS, ECR) - GenAI: LLMs (GPT, Llama), Open AI APIs, Langchain, Langraph, Langfuse, Chainlit, Streamlit, Hugging Face Certifications: - Databricks Certified GenAI Associate - Databricks Certified Machine Learning Associate - Cloudera Certified Spark & Hadoop Developer - Databricks Certified Associate Spark3.0 Developer - Databricks Certified Associate Data Analyst - Databricks Certified Associate Data Engineer - Databricks Certified Professional Data Engineer - Databricks Certified Lakehouse Fundamentals Accreditation - Databricks GenAI Fundamentals - Gold Badge in Python (from Hackerrank) - Gold Badge in SQL (from Hackerrank)

Experience

7 yrs 3 mos
Total Experience
2 yrs 9 mos
Average Tenure
1 yr 7 mos
Current Experience

Ras al khaimah economic zone (rakez)

Data & AI Engineer

Nov 2024Present · 1 yr 7 mos · Dubai, United Arab Emirates · On-site

  • Overview:
  • Leading and executing the enterprise-wide DAMA (Data Management Association) strategy
  • Developing a cloud-agnostic, metadata-driven Python framework for data ingestion and data quality, integrating with diverse sources such as APIs, SAP, Salesforce, RDBMS, and cloud platforms
  • Building GenAI-powered data engineering pipelines to unlock business value and generate insights based on specific use cases
  • Managing & leading the data team to cater day-to-day business data requirements by data analysts and scientists
  • Details:
  • Leading the implementation of the DAMA strategy encompassing Data Governance, Data Modeling, Data Warehousing, Data Transformation, Data Cleansing, Master Data Management (MDM), Data - Monitoring, Data Security, and the development of scalable, production-grade ETL pipelines
  • Building a reusable Python-based Data Ingestion and Quality framework to automate pipeline creation and reduce manual efforts by 40%
  • Designing a Python logging utility to automate logging and enhance observability across codebases and data pipelines
  • Actively contributing to Data Architecture decisions for constructing a Data Lake by integrating data from systems like SAP ERP, SAP HANA, Salesforce, Google BigQuery, AWS AppFlow, and more
  • Developing GenAI-powered data engineering pipelines to overcome business challenges and derive actionable insights tailored to specific use cases
PythonData GovernanceData ModelingData WarehousingData TransformationData Quality+2

Bcg x

2 roles

Senior Data Engineer

Promoted

Oct 2023Nov 2024 · 1 yr 1 mo

  • Overview:
  • Created cloud agnostic software/data engineering python library to submit spark jobs remotely in any Cloud Platform (AWS/Azure/GCP or Databricks)
  • Details:
  • Created a software/data engineering python library that is capable of submitting and monitoring the spark jobs remotely in any cloud platform majorly AWS/Azure/GCP and Databricks. It also controls end-to-end spark cluster lifecycle where it manages the cluster start state, end state, job submission state, error state, bootstrapping, installing any other libraries, etc
  • This library is also capable of running in any environment like local machine or edge node or can be triggered from Airflow, AWS lambda, Azure Step functions etc
  • Created unit and integration tests to make the code coverage of 90% and added pytest html reports to show the results in nice visual manner
  • Implemented coding best practices including configuring a pyproject.toml file, integrating pre-commit hooks, and using tools such as Black, Flake8, and isort for code formatting and linting. Set up unit tests and generated test reports stored in artifacts. Created versioned wheel and tar files for installation on remote clusters. Added code coverage information to GitHub badges. Automated the entire process as part of the CI/CD pipeline, ensuring seamless integration and deployment. Finally, published documentation on GitHub Pages
  • Created GitHub Actions CICD pipelines to auto-deploy the code changes by creating the wheel/tar files and deploying them on the artifactories and running the jobs synchronously
PythonAWSAzureGCPDatabricksCI/CD+3

Data Engineer

Apr 2021Oct 2023 · 2 yrs 6 mos

  • Overview:
  • Created customer360 (data model for getting customer insights and analytics) for the fashion retail client
  • Details:
  • Created a customer 360 data model in order to do complete analysis of the huge 5TB customer data in terms of their email engagement (CTR, CTOR, OR, CR), RFM analysis, Channel based analysis (Online/Offline/Cross platform etc.)
  • Done EDA (exploratory data analysis) using waterfall model, funnels etc.
  • Done visualization using pivot tables/charts using Excel etc
  • Overview:
  • Created the end-to-end infra-layer to deploy the Data Engineering and DevOps tech stack for the personalization use case to help automate and kick-start the 2 months long project in 4-5 days
  • Details:
  • Created the terraform code to auto-deploy AWS resources like EC2, VPC, ECR, EMR and their related networking stack, maintained the connectivity between the public, private subnets by configuring security groups, installing docker inside the private subnet EC2, installing Airflow in the docker image
  • Created the same above environment in Azure using VM, VNet, Databricks and in GCP using VM, VPN, Dataproc etc.
  • Created terraform code to auto-deploy K8s infrastructure in AWS (EKS, ECR), Azure (AKS, ACR), GCP (GKE, GCR), their security related configurations for public and private K8s clusters, installed Apache Airflow inside these K8s clusters using official Helm charts
  • Created CICD pipelines using GitHub Actions to check linting, do unit tests, deploy the software engineering tar files to the artifacts (DBFS/GCS/S3/ABFS/ADLS), veracode security scans etc
Data ModelingETLData AnalysisPythonHadoopData Engineering

To the new

2 roles

Data Engineer

Aug 2019Apr 2021 · 1 yr 8 mos

  • Overview:
  • Created production grade Data Warehousing/Modelling solution for the retail client
  • Details:
  • Created the ETL pipeline where it fetched all the raw data from landing S3 datalake then transformed it and put it into staging S3 datalake environment and then loaded that data into Vertica DB data warehouse, this includes tasks like orphan records handling, metadata management, logging table creation, data cleaning, creating data marts, creating analytics/reporting tables etc.
  • Created data warehouse for the global customer records, implemented techniques like SCD Type-1, SCD Type-2, creating schemas like star schema, snowflake schema and at the end the data was populated in this data warehouse using various Airflow ETL pipelines from different sources
ETLData WarehousingAirflowPythonData Engineering

Big Data Trainee

Feb 2019Jul 2019 · 5 mos

  • Overview:
  • Created production grade Data Pipelines for the hospitality based client
  • Details:
  • Created the ETL pipeline where it fetched all the tables and schemas from Staging Hive environment to Prod Hive Environment, handled edge cases like migrating partitioned, bucketed and external tables using regex etc
  • Created ETL pipelines to ingest the data from various business related APIs, transforming their data and putting them into Data Warehouses
ETLHiveData PipelinesData Engineering

Education

Raj Kumar Goel Institute of Technology, Ghaziabad

Bachelor of Technology - BTech — Computer Science

Jan 2015Jan 2019

Puranchandra Vidyaniketan(PCVN),BARRA,KANPUR(U.P.)

Science(Physics

Jan 2011Jan 2015

Stackforce found 100+ more professionals with Data Engineering & Genai

Explore similar profiles based on matching skills and experience