Nishchay Agrawal 🇮🇳 — AI Researcher

Experienced Senior Data Engineer at Walmart with demonstrated work in the information technology and services industry. Strong engineering professional with a B.Tech focused in CSE. Also, holding a certificate for being 𝐆𝐨𝐥𝐝 𝐌𝐞𝐝𝐚𝐥𝐢𝐬𝐭 𝐢𝐧 𝐁.𝐭𝐞𝐜𝐡 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 & 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠, 𝐔𝐭𝐭𝐚𝐫𝐚𝐤𝐡𝐚𝐧𝐝 𝐓𝐞𝐜𝐡𝐧𝐢𝐜𝐚𝐥 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 ❤️ 𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞𝐬: DB2, PostgreSQL, MySQL, Aurora Database 𝐁𝐢𝐠 𝐃𝐚𝐭𝐚 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬 & 𝐓𝐞𝐜𝐡 𝐒𝐭𝐚𝐜𝐤𝐬: Hadoop, Spark, Zeppelin, Hive, PySpark, HDFS, Kafka Connect, Spark Structured Streaming, Confluent 𝐎𝐫𝐜𝐡𝐞𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐓𝐨𝐨𝐥𝐬: Airflow, Autosys Job Scheduler, Azkaban 𝐎𝐧-𝐂𝐥𝐨𝐮𝐝 𝐃𝐚𝐭𝐚𝐋𝐚𝐤𝐞𝐡𝐨𝐮𝐬𝐞 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤: AWS Databricks, Azure Databricks, Metabase 𝐎𝐩𝐞𝐧 𝐒𝐨𝐮𝐫𝐜 𝐞 𝐓𝐨𝐨𝐥𝐬: Spline Data Lineage, Presto, Sqlglot 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: Gen AI-based LLM models GPT 3.5, Gemini, Agentic AI, RAG, Langchain, Langgraph, Google ADK, Chroma, Milvus vector database, Chainlit, MCP Server 𝐂𝐥𝐨𝐮𝐝 𝐒𝐞𝐫𝐯𝐢𝐜𝐞𝐬: AWS, Azure, Docker, GCP, Spark Serverless 𝐃𝐚𝐭𝐚 𝐖𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞 & 𝐃𝐚𝐭𝐚 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠: Snowflake, Redshift, Data Model, No Code Analytics, Datahub Cataloging Tool, Druid OLAP, Superset, Redshift, Bigquery 𝐂𝐚𝐭𝐚𝐥𝐨𝐠𝐢𝐧𝐠 𝐓𝐨𝐨𝐥: Datahub, Data Observability, Data Governance, Data Discovery, Data Modeling 𝐃𝐚𝐭𝐚 𝐋𝐚𝐤𝐞: AWS Delta Lake, Hudi Data Lake 𝐍𝐨𝐒𝐐𝐋: MongoDB, Cassandra 𝐎𝐎𝐏𝐒 & 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞𝐬: Python, SQL, Java Springboot, JPA, Hibernate, Flask, React 𝐃𝐞𝐯𝐎𝐩𝐬 𝐓𝐨𝐨𝐥𝐬: Kubernetes, AWS EKS, Docker, Helm Chart, CI/CD Jenkins, Terraform, SonarQube Code Coverage I share insights about my journey in data engineering, including how to crack Top Product companies, data engineering roadmaps, and more on my LinkedIn. I thoroughly enjoy this rewarding journey. ✅ 𝐂𝐨𝐧𝐧𝐞𝐜𝐭 𝐰𝐢𝐭𝐡 𝐦𝐞 𝐟𝐨𝐫 𝐋𝐨𝐧𝐠 𝐓𝐞𝐫𝐦 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐌𝐞𝐧𝐭𝐨𝐫𝐬𝐡𝐢𝐩 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 https://www.preplaced.in/profile/nishchay-agrawal ✅ 1:1 𝐂𝐨𝐧𝐧𝐞𝐜𝐭 𝐰𝐢𝐭𝐡 𝐦𝐞 𝐨𝐧 𝐂𝐫𝐚𝐜𝐤 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐑𝐨𝐥𝐞 𝐢𝐧 𝐓𝐨𝐩 𝐌𝐍𝐂 𝐂𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 & 𝐦𝐮𝐜𝐡 𝐦𝐨𝐫𝐞 https://topmate.io/nishchay_agrawal ✅ 𝐓𝐰𝐢𝐭𝐭𝐞𝐫: https://twitter.com/NishchayAg8447 ✅ 𝐈𝐧𝐬𝐭𝐚𝐠𝐫𝐚𝐦: https://www.instagram.com/dataengineeringwithnishchay/ ✅ 𝐌𝐲 𝐌𝐞𝐝𝐢𝐮𝐦 𝐁𝐥𝐨𝐠𝐬 𝐨𝐧 𝐓𝐞𝐜𝐡 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐄𝐱𝐩𝐞𝐫𝐢𝐞𝐧𝐜𝐞: https://medium.com/@nishchayagrawal

Stackforce AI infers this person is a Data Engineering expert specializing in cloud solutions and big data technologies.

Location: Bengaluru, Karnataka, India

Experience: 5 yrs 1 mo

Skills

Data Engineering
Cloud Cost Optimization
Generative Ai
Data Validation
Automation
Cloud Migration
Data Ingestion
Big Data
Cost Optimization
Cloud Engineering
Data Governance
Data Integration
Data Processing
Cloud Architecture
Etl Processes

Career Highlights

Gold Medalist in Computer Science Engineering
Led cost-saving initiatives saving $2M at Walmart
Developed innovative GenAI solutions for data validation

Work Experience

Walmart

Software Development Engineer IV (Senior Data Engineer) (11 mos)

Software Development Engineer III (Data Engineer-3) (1 yr 11 mos)

Meesho

Software Development Engineer II (Data Engineer-2) (1 yr 1 mo)

Morgan Stanley

Data Engineer (6 mos)

ZS

Data Engineer (8 mos)

Data Engineer Intern (4 mos)

Education

Bachelor of Technology - BTech at Uttarakhand Technical University

Intermediate at M.K.D Senior Secondary School

High School at M.K.D Senior Secondary School

Nishchay Agrawal 🇮🇳

AI Researcher

Bengaluru, Karnataka, India5 yrs 1 mo experience

Most Likely To SwitchAI Enabled

Key Highlights

Gold Medalist in Computer Science Engineering
Led cost-saving initiatives saving $2M at Walmart
Developed innovative GenAI solutions for data validation

Stackforce AI infers this person is a Data Engineering expert specializing in cloud solutions and big data technologies.

Contact

Skills

Core Skills

Data EngineeringCloud Cost OptimizationGenerative AiData ValidationAutomationCloud MigrationData IngestionBig DataCost OptimizationCloud EngineeringData GovernanceData IntegrationData ProcessingCloud ArchitectureEtl Processes

Other Skills

Retrieval-Augmented Generation (RAG)Google ADKAirflowGenAIData VisualizationSQLSparkPythonMilvus vector DBChainlitPySparkSplineApache SparkKafkaGCS

About

Experience

5 yrs 1 mo

Total Experience

1 yr 3 mos

Average Tenure

2 yrs 10 mos

Current Experience

Walmart

2 roles

Software Development Engineer IV (Senior Data Engineer)

Promoted

Jun 2025 – Present · 11 mos · On-site

𝐃𝐞𝐩𝐚𝐫𝐭𝐦𝐞𝐧𝐭: Data Platformisation Team
𝐑𝐨𝐥𝐞𝐬 & 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: Worked as SDE IV (Senior Data Engineer) in Walmart International Data Platformisation Team
1. Built a GenAI-powered LLM solution leveraging Airflow metadata, ServiceNow incidents, operational excellence data, storage and cluster cost optimization data. Enhanced user interaction by embedding LIDA-based GenAI visualization in the chatbot to dynamically generate and display output graphs, enabling faster incident triage and improved operational visibility.
2. Created GenAI Data Validation Assist that generates SQL from Spark logs to validate pipeline outputs, reducing QA effort by 70%.
3. Driving cloud cost efficiency strategy for International Data Lake, resulting in approximately $2M savings through optimization of storage and compute costs
4. Developed an Agentic solution for automated data pipeline creation using a config-driven framework that transformed PRDs (Product Requirement Documents) into complete end-to-end pipelines. Leveraged ADK multi-agents for orchestration, Milvus vector DB for semantic search, Chainlit, React UI for self-service configuration, and a Python API with Jinja templates for backend automation. Integrated Gemini and GPT-4 models into an ADK-based multi-agent framework to automate complex data engineering workflows.
5. Built multi-agent workflows with LangGraph, visualized orchestration in LangGraph Studio, and enhanced observability with LangSmith, delivering end-to-end monitoring, debugging, and performance tracking multi-agent systems.
6. Worked on open-source platforms in data governance, implementing Spark lineage capture using Spline
7. Managed a team of 5 junior engineers as Technical Lead, delivering end-to-end data engineering solutions.
8. Developed a team productivity platform to capture operational excellence and engineering scores, integrating DORA metrics, Git, code quality, and Jira metrics across international domain teams.

Retrieval-Augmented Generation (RAG)Google ADKData EngineeringCloud Cost Optimization

Software Development Engineer III (Data Engineer-3)

Jul 2023 – Jun 2025 · 1 yr 11 mos · On-site

𝐃𝐞𝐩𝐚𝐫𝐭𝐦𝐞𝐧𝐭: Data Lake Management Team
𝐑𝐨𝐥𝐞𝐬 & 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: Worked as SDE III (Data Engineer-3) in Walmart International- Data Lake Management
1. Built a custom ingestion framework using Apache Spark (Structured & Streaming) to integrate Kafka & API data, delivering into Hudi, BigQuery & GCS with real-time error handling & schema evolution.
2. Developed Kafka Connect self-service UI for near real-time DB → Kafka → GCS/BigQuery/Hudi pipelines on Kubernetes, replacing Sqoop & cutting ingestion latency drastically.
3. Optimized Airflow by tuning configs & DAG-level framework changes, reducing parsing time from 10→5 mins & cutting queued tasks, driving 100% operational improvement.
4. Achieved $29K annual savings by optimizing Spark History Server logs with GCS lifecycle policies.
5. Built automation utilities for GCP VM cost optimization, reducing infra spend via scaling policies.
6. Led GCP POCs (BigQuery, Dataproc, Pub/Sub, Cloud Run) to guide cost-efficient enterprise migration.
7. Designed a Config-Driven Data Platform on serverless GCP, ingesting Kafka → Hudi with MOR/COW storage; saved $400K annually.
8. Built Databricks Overwatch dashboards for job, cluster & cost visibility, enabling anomaly detection.
9. Migrated 5,000+ ETL jobs to a modern framework for international markets using automation & self-service utilities.
10. Integrated Hudi data lake for upserts/deletes, modernizing legacy Hive processes.
11. Oversaw vendor team of 30 managing 7,000+ Airflow DAGs, ensuring reliability & leadership reporting.
12. Migrated from Dataproc → Dataproc Serverless, enabling auto-scaling, faster deployments & cost savings.
13. Standardized CI/CD pipelines (Jenkins, Ansible, SonarQube) with unit test cases improving deployment efficiency & code quality.
14. Built GenAI chatbot using OpenAI, LangChain, Milvus vector database, Embedding models, RAG in International Data Lake, enabling conversational access to operational and platform data.

PySparkSplineData EngineeringCloud Migration

Meesho

Software Development Engineer II (Data Engineer-2)

Jun 2022 – Jul 2023 · 1 yr 1 mo · Bengaluru, Karnataka, India

𝐃𝐞𝐩𝐚𝐫𝐭𝐦𝐞𝐧𝐭: Data team
𝐑𝐨𝐥𝐞𝐬 & 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: Worked as SDE II Data Engineer-2 in Data Intelligence Team
1. Worked on Zeppelin Databricks SQL connector, when any users run queries on Zeppelin, it will run on Databricks SQL Warehouse. It helps Zeppelin Users to personate queries with RBAC-enabled cluster along with SSO enablement on Databricks cluster.
2. Worked on Metabase Cache Warmup Setup which helps to speed up queries by caching only active cards/questions of Metabase instead of caching failed questions
3. Worked on Springboot Service & Developed Custom REST API to generate Personal Access Token for each Zeppelin EKS user, so that when any queries will run on Zeppelin EKS using Databricks, we can track Query History & maintain instrumentation for each User using Service Principal Token.
4. Worked on Datahub Governance by using Glossary terms, tags & Domains which helps the whole organization to discover or identify which domain or pod of data team
5. Worked on Data Observability Tasks to identify anomalies present in tables or models
6. I developed Anomalies Detection Algorithm using mathematical concepts of standard deviations, medians, effective z-score & IQR, which helps to detect if there are any flaws present in any columns of the Models tables.
7. Worked on doing a lot of POC/comparison on Unity Catalog/Spline Agent & MonteCarlo Spark
8. Worked on the Dashboard Creation of showing metrics on how many Jobs are running or such as cluster costing at notebook level, user level, or command level using Overwatch & Audit Logs of Databricks.
9. Worked on Experiment Metrics Store along with A/B Testing, which helps to increase analyst productivity.
10. Worked on Kubernetes part along with some DevOps Services such as AWS S3 policies, Databricks Instance Profile, Assume Role, and Node Groups Management for Presto.
11. Worked on Datahub Setup using RDS MySQL, Kafka Registery & ElasticSearch & Docker Containerize Datahub Service.

PySparkApache KafkaData EngineeringData Governance

Morgan stanley

Data Engineer

Nov 2021 – May 2022 · 6 mos · Bengaluru, Karnataka, India

Worked as Data Engineer in Institutional Securities Technology Division.
𝐃𝐢𝐯𝐢𝐬𝐢𝐨𝐧: 𝐈𝐧𝐬𝐭𝐢𝐭𝐮𝐭𝐢𝐨𝐧𝐚𝐥 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐢𝐞𝐬 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲 𝐃𝐢𝐯𝐢𝐬𝐢𝐨𝐧
𝐒𝐮𝐩𝐞𝐫 𝐃𝐞𝐩𝐚𝐫𝐭𝐦𝐞𝐧𝐭: 𝐓𝐄𝐃𝐑𝐀 (𝐓𝐫𝐚𝐝𝐞 𝐄𝐧𝐫𝐢𝐜𝐡𝐦𝐞𝐧𝐭 𝐃𝐚𝐭𝐚 𝐑𝐞𝐩𝐨𝐫𝐭𝐢𝐧𝐠 & 𝐀𝐥𝐥𝐨𝐜𝐚𝐭𝐢𝐨𝐧𝐬)super department in Institutional Securities Technology (IST) Division which is responsible for maintaining, distributing, processing, and reporting on trading, revenue, risk, and reference data.
𝐃𝐞𝐩𝐚𝐫𝐭𝐦𝐞𝐧𝐭: 𝐈𝐧𝐭𝐞𝐫𝐧𝐚𝐥 𝐓𝐫𝐚𝐝𝐞 𝐑𝐞𝐩𝐨𝐫𝐭𝐢𝐧𝐠 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬
𝐑𝐨𝐥𝐞𝐬 & 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬:
1. Working as Data Engineer, responsible for ingesting the data coming from different data sources such as OLTP DB2, Sybase, and OLAP Server Greenplum into Data lake to generate the BI reports for the downstream stakeholders by using Spark Processing, Metadata & Configuration Gathering.
2. Setup the Architecture of Databricks Platform on top of Azure using Azure Active Directory Services, Service Principal, Tenant. Managing the Azure Blob Storages Services
3. Connecting the Databricks from Snowflake Data Warehouse using Spark Snowflake Connector using Service Account. It helps take the driving facts and dimensions tables from SF and ingest the tables into Databricks Layer and finally apply the transformation logic on top of it to create final views using PySpark, SparkSQL, Databricks Services (Delta Lake, DBX Cluster).
4. To schedule the databricks Job( which point to mounted Python Scripts mounted on Databricks File System), I created the RESTFUL API Code to call the Databricks Job, Clusters from On-Premise Autosys Job scheduler by using REST Endpoints, HTTP Requests, Databricks Services, Azure Tenant & Service Principal Client Secret & MSAL library
5. Worked on creating High Level & Low-Level Design and also documentation for the whole architectural setup of Databricks on top of Azure Cloud with Integration with Snowflake

PySparkData ModelingData EngineeringData Processing

Zs

2 roles

Data Engineer

Mar 2021 – Nov 2021 · 8 mos

𝐖𝐨𝐫𝐤𝐞𝐝 𝐚𝐬 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫 (𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐬 𝐀𝐬𝐬𝐨𝐜𝐢𝐚𝐭𝐞)
𝐓𝐨𝐨𝐥𝐬 & 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬 𝐔𝐬𝐞𝐝: 𝐁𝐢𝐠 𝐃𝐚𝐭𝐚, 𝐒𝐩𝐚𝐫𝐤, 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬, 𝐇𝐢𝐯𝐞, 𝐀𝐖𝐒, 𝐀𝐳𝐤𝐚𝐛𝐚𝐧 𝐉𝐨𝐛 𝐒𝐜𝐡𝐞𝐝𝐮𝐥𝐞𝐫, 𝐂𝐥𝐨𝐮𝐝𝐛𝐞𝐫𝐫𝐲, 𝐙𝐒 𝐈𝐧𝐭𝐞𝐫𝐧𝐚𝐥 𝐄𝐓𝐋 𝐓𝐨𝐨𝐥, 𝐆𝐢𝐭𝐥𝐚𝐛 𝐂𝐈/𝐂𝐃, 𝐏𝐨𝐬𝐭𝐠𝐫𝐞𝐒𝐐𝐋, 𝐀𝐠𝐢𝐥𝐞 𝐒𝐜𝐫𝐮𝐦
𝐑𝐨𝐥𝐞𝐬 & 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬:
𝐓𝐞𝐚𝐦 1
1. Created the IQ Script (Installation Qualification) with all the steps mentioned while moving from Dev-QA or QA-Prod Setup Migration
2. Created the Python Script (Using Matplotlib, Pandas, Jinja2, Seaborn Library) to create the Visualization Graphs/Dashboards for the No of Unit Test Cases Passed, Coverage % of Test Cases
3. Enhanced the CI/CD Pipeline Script by adding new features & making the script more generic such as checking the status of the Databricks cluster before calling the Databricks Job from CI/CD Job etc.
4. Worked with Pharmaceutical Client for requirement gathering as well as regarding data discrepancy in the source tables)
5. Worked on POC for the Stardog, Graph Database & SPARQL. Stardog is the consolidation layer/source of data where raw data is dumped by the Data Fabric Team and this data is used by my team & finally, same date/tables are used to create final transformed delta tables by applying transformation logic on top of source tables.
𝐓𝐞𝐚𝐦 2
1. Created the BIE ( Business Rules Engines- Graphs that shows the Joining or transformation of multiple tables) in the REVO ETL tool
2. Scheduling the ETL Jobs(Running on top Elastic MapReduce AWS Cloud) through Azkaban Scheduler
3. Managing the Parquet files in the source S3 bucket & uploading source data (coming from Filezilla)on S3 Bucket using cloudberry (used for managing the files across local & cloud storage)
4. Apply the DQM Checks, rowcount on the target Hive tables created through BRE with the source tables.

PySparkData ModelingData EngineeringETL Processes

Data Engineer Intern

Oct 2020 – Feb 2021 · 4 mos

𝐖𝐨𝐫𝐤𝐞𝐝 𝐚𝐬 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫 𝐈𝐧𝐭𝐞𝐫𝐧(𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲 𝐀𝐧𝐚𝐥𝐲𝐬𝐭)
𝐓𝐨𝐨𝐥𝐬 & 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐢𝐞𝐬 𝐔𝐬𝐞𝐝: 𝐀𝐖𝐒 𝐄𝐂2, 𝐒3, 𝐑𝐞𝐝𝐬𝐡𝐢𝐟𝐭, 𝐆𝐥𝐮𝐞 𝐄𝐓𝐋, 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬, 𝐏𝐲𝐭𝐡𝐨𝐧, 𝐒𝐐𝐋, 𝐏𝐲𝐒𝐩𝐚𝐫𝐤, 𝐆𝐢𝐭𝐥𝐚𝐛 𝐂𝐈/𝐂𝐃 𝐎𝐫𝐜𝐡𝐞𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐓𝐨𝐨𝐥, 𝐀𝐠𝐢𝐥𝐞 𝐒𝐜𝐫𝐮𝐦, 𝐂𝐨𝐧𝐟𝐥𝐮𝐞𝐧𝐜𝐞 𝐟𝐨𝐫 𝐃𝐨𝐜𝐮𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐂𝐫𝐞𝐚𝐭𝐢𝐨𝐧
𝐑𝐨𝐥𝐞: 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲 𝐀𝐧𝐚𝐥𝐲𝐬𝐭 helps companies define and execute their
technology strategy by designing, building, and operating their business
intelligence (BI), cloud, data management, dashboard, and analytics capabilities.
Team members strategize, design and build custom IT solutions to improve our
clients’ commercial effectiveness.
𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬:
1. Created the Vaccum Utility code for delta tables. To cleanup the delta files & log files for the Delta tables on top of AWS Databricks
2. Involved in designing or providing the solution of Cost of Good Manufacturing (COGM) or Cost of Good Sold (COGS) to Pharmaceutical Client of ZS Associates by implementing the solution using PySpark, SQL & Python.
3. Worked on design architecture and code to create configuration for high level environment (test and production)for Release-2
4. Apart from the technical ability, I always ensured there is proper documentation associated with everything work which helps any other developer to get easily onboarded.
5. Worked on Scheduling or Monitoring the Gitlab CI/CD Pipeline, which is used to call the databricks Job, dumping the data into S3 Bucket, creating the PostgreSQL entries (for holding the metadata, configuration for the Databricks Job).
6. Created the Source to Target Mapping (STTM) Entry which contains the information of each or every column present in the final table and mapping of these columns to source tables (from which tables these columns are coming in final delta table)