Nishchay Agrawal ๐Ÿ‡ฎ๐Ÿ‡ณ

AI Researcher

Bengaluru, Karnataka, India5 yrs 1 mo experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Gold Medalist in Computer Science Engineering
  • Led cost-saving initiatives saving $2M at Walmart
  • Developed innovative GenAI solutions for data validation
Stackforce AI infers this person is a Data Engineering expert specializing in cloud solutions and big data technologies.

Contact

Skills

Core Skills

Data EngineeringCloud Cost OptimizationGenerative AiData ValidationAutomationCloud MigrationData IngestionBig DataCost OptimizationCloud EngineeringData GovernanceData IntegrationData ProcessingCloud ArchitectureEtl Processes

Other Skills

Retrieval-Augmented Generation (RAG)Google ADKAirflowGenAIData VisualizationSQLSparkPythonMilvus vector DBChainlitPySparkSplineApache SparkKafkaGCS

About

Experienced Senior Data Engineer at Walmart with demonstrated work in the information technology and services industry. Strong engineering professional with a B.Tech focused in CSE. Also, holding a certificate for being ๐†๐จ๐ฅ๐ ๐Œ๐ž๐๐š๐ฅ๐ข๐ฌ๐ญ ๐ข๐ง ๐.๐ญ๐ž๐œ๐ก ๐‚๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐ซ ๐’๐œ๐ข๐ž๐ง๐œ๐ž & ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ , ๐”๐ญ๐ญ๐š๐ซ๐š๐ค๐ก๐š๐ง๐ ๐“๐ž๐œ๐ก๐ง๐ข๐œ๐š๐ฅ ๐”๐ง๐ข๐ฏ๐ž๐ซ๐ฌ๐ข๐ญ๐ฒ โค๏ธ ๐‘๐ž๐ฅ๐š๐ญ๐ข๐จ๐ง๐š๐ฅ ๐ƒ๐š๐ญ๐š๐›๐š๐ฌ๐ž๐ฌ: DB2, PostgreSQL, MySQL, Aurora Database ๐๐ข๐  ๐ƒ๐š๐ญ๐š ๐“๐ž๐œ๐ก๐ง๐จ๐ฅ๐จ๐ ๐ข๐ž๐ฌ & ๐“๐ž๐œ๐ก ๐’๐ญ๐š๐œ๐ค๐ฌ: Hadoop, Spark, Zeppelin, Hive, PySpark, HDFS, Kafka Connect, Spark Structured Streaming, Confluent ๐Ž๐ซ๐œ๐ก๐ž๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐“๐จ๐จ๐ฅ๐ฌ: Airflow, Autosys Job Scheduler, Azkaban ๐Ž๐ง-๐‚๐ฅ๐จ๐ฎ๐ ๐ƒ๐š๐ญ๐š๐‹๐š๐ค๐ž๐ก๐จ๐ฎ๐ฌ๐ž ๐…๐ซ๐š๐ฆ๐ž๐ฐ๐จ๐ซ๐ค: AWS Databricks, Azure Databricks, Metabase ๐Ž๐ฉ๐ž๐ง ๐’๐จ๐ฎ๐ซ๐œ ๐ž ๐“๐จ๐จ๐ฅ๐ฌ: Spline Data Lineage, Presto, Sqlglot ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ฏ๐ž ๐€๐ˆ: Gen AI-based LLM models GPT 3.5, Gemini, Agentic AI, RAG, Langchain, Langgraph, Google ADK, Chroma, Milvus vector database, Chainlit, MCP Server ๐‚๐ฅ๐จ๐ฎ๐ ๐’๐ž๐ซ๐ฏ๐ข๐œ๐ž๐ฌ: AWS, Azure, Docker, GCP, Spark Serverless ๐ƒ๐š๐ญ๐š ๐–๐š๐ซ๐ž๐ก๐จ๐ฎ๐ฌ๐ž & ๐ƒ๐š๐ญ๐š ๐Œ๐จ๐๐ž๐ฅ๐ข๐ง๐ : Snowflake, Redshift, Data Model, No Code Analytics, Datahub Cataloging Tool, Druid OLAP, Superset, Redshift, Bigquery ๐‚๐š๐ญ๐š๐ฅ๐จ๐ ๐ข๐ง๐  ๐“๐จ๐จ๐ฅ: Datahub, Data Observability, Data Governance, Data Discovery, Data Modeling ๐ƒ๐š๐ญ๐š ๐‹๐š๐ค๐ž: AWS Delta Lake, Hudi Data Lake ๐๐จ๐’๐๐‹: MongoDB, Cassandra ๐Ž๐Ž๐๐’ & ๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž๐ฌ: Python, SQL, Java Springboot, JPA, Hibernate, Flask, React ๐ƒ๐ž๐ฏ๐Ž๐ฉ๐ฌ ๐“๐จ๐จ๐ฅ๐ฌ: Kubernetes, AWS EKS, Docker, Helm Chart, CI/CD Jenkins, Terraform, SonarQube Code Coverage I share insights about my journey in data engineering, including how to crack Top Product companies, data engineering roadmaps, and more on my LinkedIn. I thoroughly enjoy this rewarding journey. โœ… ๐‚๐จ๐ง๐ง๐ž๐œ๐ญ ๐ฐ๐ข๐ญ๐ก ๐ฆ๐ž ๐Ÿ๐จ๐ซ ๐‹๐จ๐ง๐  ๐“๐ž๐ซ๐ฆ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐  ๐Œ๐ž๐ง๐ญ๐จ๐ซ๐ฌ๐ก๐ข๐ฉ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ https://www.preplaced.in/profile/nishchay-agrawal โœ… 1:1 ๐‚๐จ๐ง๐ง๐ž๐œ๐ญ ๐ฐ๐ข๐ญ๐ก ๐ฆ๐ž ๐จ๐ง ๐‚๐ซ๐š๐œ๐ค ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐  ๐‘๐จ๐ฅ๐ž ๐ข๐ง ๐“๐จ๐ฉ ๐Œ๐๐‚ ๐‚๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ & ๐ฆ๐ฎ๐œ๐ก ๐ฆ๐จ๐ซ๐ž https://topmate.io/nishchay_agrawal โœ… ๐“๐ฐ๐ข๐ญ๐ญ๐ž๐ซ: https://twitter.com/NishchayAg8447 โœ… ๐ˆ๐ง๐ฌ๐ญ๐š๐ ๐ซ๐š๐ฆ: https://www.instagram.com/dataengineeringwithnishchay/ โœ… ๐Œ๐ฒ ๐Œ๐ž๐๐ข๐ฎ๐ฆ ๐๐ฅ๐จ๐ ๐ฌ ๐จ๐ง ๐“๐ž๐œ๐ก ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐  ๐ˆ๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ ๐„๐ฑ๐ฉ๐ž๐ซ๐ข๐ž๐ง๐œ๐ž: https://medium.com/@nishchayagrawal

Experience

5 yrs 1 mo
Total Experience
1 yr 3 mos
Average Tenure
2 yrs 10 mos
Current Experience

Walmart

2 roles

Software Development Engineer IV (Senior Data Engineer)

Promoted

Jun 2025 โ€“ Present ยท 11 mos ยท On-site

  • ๐ƒ๐ž๐ฉ๐š๐ซ๐ญ๐ฆ๐ž๐ง๐ญ: Data Platformisation Team
  • ๐‘๐จ๐ฅ๐ž๐ฌ & ๐‘๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ: Worked as SDE IV (Senior Data Engineer) in Walmart International Data Platformisation Team
  • 1. Built a GenAI-powered LLM solution leveraging Airflow metadata, ServiceNow incidents, operational excellence data, storage and cluster cost optimization data. Enhanced user interaction by embedding LIDA-based GenAI visualization in the chatbot to dynamically generate and display output graphs, enabling faster incident triage and improved operational visibility.
  • 2. Created GenAI Data Validation Assist that generates SQL from Spark logs to validate pipeline outputs, reducing QA effort by 70%.
  • 3. Driving cloud cost efficiency strategy for International Data Lake, resulting in approximately $2M savings through optimization of storage and compute costs
  • 4. Developed an Agentic solution for automated data pipeline creation using a config-driven framework that transformed PRDs (Product Requirement Documents) into complete end-to-end pipelines. Leveraged ADK multi-agents for orchestration, Milvus vector DB for semantic search, Chainlit, React UI for self-service configuration, and a Python API with Jinja templates for backend automation. Integrated Gemini and GPT-4 models into an ADK-based multi-agent framework to automate complex data engineering workflows.
  • 5. Built multi-agent workflows with LangGraph, visualized orchestration in LangGraph Studio, and enhanced observability with LangSmith, delivering end-to-end monitoring, debugging, and performance tracking multi-agent systems.
  • 6. Worked on open-source platforms in data governance, implementing Spark lineage capture using Spline
  • 7. Managed a team of 5 junior engineers as Technical Lead, delivering end-to-end data engineering solutions.
  • 8. Developed a team productivity platform to capture operational excellence and engineering scores, integrating DORA metrics, Git, code quality, and Jira metrics across international domain teams.
Retrieval-Augmented Generation (RAG)Google ADKData EngineeringCloud Cost Optimization

Software Development Engineer III (Data Engineer-3)

Jul 2023 โ€“ Jun 2025 ยท 1 yr 11 mos ยท On-site

  • ๐ƒ๐ž๐ฉ๐š๐ซ๐ญ๐ฆ๐ž๐ง๐ญ: Data Lake Management Team
  • ๐‘๐จ๐ฅ๐ž๐ฌ & ๐‘๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ: Worked as SDE III (Data Engineer-3) in Walmart International- Data Lake Management
  • 1. Built a custom ingestion framework using Apache Spark (Structured & Streaming) to integrate Kafka & API data, delivering into Hudi, BigQuery & GCS with real-time error handling & schema evolution.
  • 2. Developed Kafka Connect self-service UI for near real-time DB โ†’ Kafka โ†’ GCS/BigQuery/Hudi pipelines on Kubernetes, replacing Sqoop & cutting ingestion latency drastically.
  • 3. Optimized Airflow by tuning configs & DAG-level framework changes, reducing parsing time from 10โ†’5 mins & cutting queued tasks, driving 100% operational improvement.
  • 4. Achieved $29K annual savings by optimizing Spark History Server logs with GCS lifecycle policies.
  • 5. Built automation utilities for GCP VM cost optimization, reducing infra spend via scaling policies.
  • 6. Led GCP POCs (BigQuery, Dataproc, Pub/Sub, Cloud Run) to guide cost-efficient enterprise migration.
  • 7. Designed a Config-Driven Data Platform on serverless GCP, ingesting Kafka โ†’ Hudi with MOR/COW storage; saved $400K annually.
  • 8. Built Databricks Overwatch dashboards for job, cluster & cost visibility, enabling anomaly detection.
  • 9. Migrated 5,000+ ETL jobs to a modern framework for international markets using automation & self-service utilities.
  • 10. Integrated Hudi data lake for upserts/deletes, modernizing legacy Hive processes.
  • 11. Oversaw vendor team of 30 managing 7,000+ Airflow DAGs, ensuring reliability & leadership reporting.
  • 12. Migrated from Dataproc โ†’ Dataproc Serverless, enabling auto-scaling, faster deployments & cost savings.
  • 13. Standardized CI/CD pipelines (Jenkins, Ansible, SonarQube) with unit test cases improving deployment efficiency & code quality.
  • 14. Built GenAI chatbot using OpenAI, LangChain, Milvus vector database, Embedding models, RAG in International Data Lake, enabling conversational access to operational and platform data.
PySparkSplineData EngineeringCloud Migration

Meesho

Software Development Engineer II (Data Engineer-2)

Jun 2022 โ€“ Jul 2023 ยท 1 yr 1 mo ยท Bengaluru, Karnataka, India

  • ๐ƒ๐ž๐ฉ๐š๐ซ๐ญ๐ฆ๐ž๐ง๐ญ: Data team
  • ๐‘๐จ๐ฅ๐ž๐ฌ & ๐‘๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ: Worked as SDE II Data Engineer-2 in Data Intelligence Team
  • 1. Worked on Zeppelin Databricks SQL connector, when any users run queries on Zeppelin, it will run on Databricks SQL Warehouse. It helps Zeppelin Users to personate queries with RBAC-enabled cluster along with SSO enablement on Databricks cluster.
  • 2. Worked on Metabase Cache Warmup Setup which helps to speed up queries by caching only active cards/questions of Metabase instead of caching failed questions
  • 3. Worked on Springboot Service & Developed Custom REST API to generate Personal Access Token for each Zeppelin EKS user, so that when any queries will run on Zeppelin EKS using Databricks, we can track Query History & maintain instrumentation for each User using Service Principal Token.
  • 4. Worked on Datahub Governance by using Glossary terms, tags & Domains which helps the whole organization to discover or identify which domain or pod of data team
  • 5. Worked on Data Observability Tasks to identify anomalies present in tables or models
  • 6. I developed Anomalies Detection Algorithm using mathematical concepts of standard deviations, medians, effective z-score & IQR, which helps to detect if there are any flaws present in any columns of the Models tables.
  • 7. Worked on doing a lot of POC/comparison on Unity Catalog/Spline Agent & MonteCarlo Spark
  • 8. Worked on the Dashboard Creation of showing metrics on how many Jobs are running or such as cluster costing at notebook level, user level, or command level using Overwatch & Audit Logs of Databricks.
  • 9. Worked on Experiment Metrics Store along with A/B Testing, which helps to increase analyst productivity.
  • 10. Worked on Kubernetes part along with some DevOps Services such as AWS S3 policies, Databricks Instance Profile, Assume Role, and Node Groups Management for Presto.
  • 11. Worked on Datahub Setup using RDS MySQL, Kafka Registery & ElasticSearch & Docker Containerize Datahub Service.
PySparkApache KafkaData EngineeringData Governance

Morgan stanley

Data Engineer

Nov 2021 โ€“ May 2022 ยท 6 mos ยท Bengaluru, Karnataka, India

  • Worked as Data Engineer in Institutional Securities Technology Division.
  • ๐ƒ๐ข๐ฏ๐ข๐ฌ๐ข๐จ๐ง: ๐ˆ๐ง๐ฌ๐ญ๐ข๐ญ๐ฎ๐ญ๐ข๐จ๐ง๐š๐ฅ ๐’๐ž๐œ๐ฎ๐ซ๐ข๐ญ๐ข๐ž๐ฌ ๐“๐ž๐œ๐ก๐ง๐จ๐ฅ๐จ๐ ๐ฒ ๐ƒ๐ข๐ฏ๐ข๐ฌ๐ข๐จ๐ง
  • ๐’๐ฎ๐ฉ๐ž๐ซ ๐ƒ๐ž๐ฉ๐š๐ซ๐ญ๐ฆ๐ž๐ง๐ญ: ๐“๐„๐ƒ๐‘๐€ (๐“๐ซ๐š๐๐ž ๐„๐ง๐ซ๐ข๐œ๐ก๐ฆ๐ž๐ง๐ญ ๐ƒ๐š๐ญ๐š ๐‘๐ž๐ฉ๐จ๐ซ๐ญ๐ข๐ง๐  & ๐€๐ฅ๐ฅ๐จ๐œ๐š๐ญ๐ข๐จ๐ง๐ฌ)super department in Institutional Securities Technology (IST) Division which is responsible for maintaining, distributing, processing, and reporting on trading, revenue, risk, and reference data.
  • ๐ƒ๐ž๐ฉ๐š๐ซ๐ญ๐ฆ๐ž๐ง๐ญ: ๐ˆ๐ง๐ญ๐ž๐ซ๐ง๐š๐ฅ ๐“๐ซ๐š๐๐ž ๐‘๐ž๐ฉ๐จ๐ซ๐ญ๐ข๐ง๐  ๐€๐ฉ๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง๐ฌ
  • ๐‘๐จ๐ฅ๐ž๐ฌ & ๐‘๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ:
  • 1. Working as Data Engineer, responsible for ingesting the data coming from different data sources such as OLTP DB2, Sybase, and OLAP Server Greenplum into Data lake to generate the BI reports for the downstream stakeholders by using Spark Processing, Metadata & Configuration Gathering.
  • 2. Setup the Architecture of Databricks Platform on top of Azure using Azure Active Directory Services, Service Principal, Tenant. Managing the Azure Blob Storages Services
  • 3. Connecting the Databricks from Snowflake Data Warehouse using Spark Snowflake Connector using Service Account. It helps take the driving facts and dimensions tables from SF and ingest the tables into Databricks Layer and finally apply the transformation logic on top of it to create final views using PySpark, SparkSQL, Databricks Services (Delta Lake, DBX Cluster).
  • 4. To schedule the databricks Job( which point to mounted Python Scripts mounted on Databricks File System), I created the RESTFUL API Code to call the Databricks Job, Clusters from On-Premise Autosys Job scheduler by using REST Endpoints, HTTP Requests, Databricks Services, Azure Tenant & Service Principal Client Secret & MSAL library
  • 5. Worked on creating High Level & Low-Level Design and also documentation for the whole architectural setup of Databricks on top of Azure Cloud with Integration with Snowflake
PySparkData ModelingData EngineeringData Processing

Zs

2 roles

Data Engineer

Mar 2021 โ€“ Nov 2021 ยท 8 mos

  • ๐–๐จ๐ซ๐ค๐ž๐ ๐š๐ฌ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ (๐๐ฎ๐ฌ๐ข๐ง๐ž๐ฌ๐ฌ ๐“๐ž๐œ๐ก๐ง๐จ๐ฅ๐จ๐ ๐ฒ ๐’๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌ ๐€๐ฌ๐ฌ๐จ๐œ๐ข๐š๐ญ๐ž)
  • ๐“๐จ๐จ๐ฅ๐ฌ & ๐“๐ž๐œ๐ก๐ง๐จ๐ฅ๐จ๐ ๐ข๐ž๐ฌ ๐”๐ฌ๐ž๐: ๐๐ข๐  ๐ƒ๐š๐ญ๐š, ๐’๐ฉ๐š๐ซ๐ค, ๐ƒ๐š๐ญ๐š๐›๐ซ๐ข๐œ๐ค๐ฌ, ๐‡๐ข๐ฏ๐ž, ๐€๐–๐’, ๐€๐ณ๐ค๐š๐›๐š๐ง ๐‰๐จ๐› ๐’๐œ๐ก๐ž๐๐ฎ๐ฅ๐ž๐ซ, ๐‚๐ฅ๐จ๐ฎ๐๐›๐ž๐ซ๐ซ๐ฒ, ๐™๐’ ๐ˆ๐ง๐ญ๐ž๐ซ๐ง๐š๐ฅ ๐„๐“๐‹ ๐“๐จ๐จ๐ฅ, ๐†๐ข๐ญ๐ฅ๐š๐› ๐‚๐ˆ/๐‚๐ƒ, ๐๐จ๐ฌ๐ญ๐ ๐ซ๐ž๐’๐๐‹, ๐€๐ ๐ข๐ฅ๐ž ๐’๐œ๐ซ๐ฎ๐ฆ
  • ๐‘๐จ๐ฅ๐ž๐ฌ & ๐‘๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ:
  • ๐“๐ž๐š๐ฆ 1
  • 1. Created the IQ Script (Installation Qualification) with all the steps mentioned while moving from Dev-QA or QA-Prod Setup Migration
  • 2. Created the Python Script (Using Matplotlib, Pandas, Jinja2, Seaborn Library) to create the Visualization Graphs/Dashboards for the No of Unit Test Cases Passed, Coverage % of Test Cases
  • 3. Enhanced the CI/CD Pipeline Script by adding new features & making the script more generic such as checking the status of the Databricks cluster before calling the Databricks Job from CI/CD Job etc.
  • 4. Worked with Pharmaceutical Client for requirement gathering as well as regarding data discrepancy in the source tables)
  • 5. Worked on POC for the Stardog, Graph Database & SPARQL. Stardog is the consolidation layer/source of data where raw data is dumped by the Data Fabric Team and this data is used by my team & finally, same date/tables are used to create final transformed delta tables by applying transformation logic on top of source tables.
  • ๐“๐ž๐š๐ฆ 2
  • 1. Created the BIE ( Business Rules Engines- Graphs that shows the Joining or transformation of multiple tables) in the REVO ETL tool
  • 2. Scheduling the ETL Jobs(Running on top Elastic MapReduce AWS Cloud) through Azkaban Scheduler
  • 3. Managing the Parquet files in the source S3 bucket & uploading source data (coming from Filezilla)on S3 Bucket using cloudberry (used for managing the files across local & cloud storage)
  • 4. Apply the DQM Checks, rowcount on the target Hive tables created through BRE with the source tables.
PySparkData ModelingData EngineeringETL Processes

Data Engineer Intern

Oct 2020 โ€“ Feb 2021 ยท 4 mos

  • ๐–๐จ๐ซ๐ค๐ž๐ ๐š๐ฌ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ ๐ˆ๐ง๐ญ๐ž๐ซ๐ง(๐๐ฎ๐ฌ๐ข๐ง๐ž๐ฌ๐ฌ ๐“๐ž๐œ๐ก๐ง๐จ๐ฅ๐จ๐ ๐ฒ ๐€๐ง๐š๐ฅ๐ฒ๐ฌ๐ญ)
  • ๐“๐จ๐จ๐ฅ๐ฌ & ๐“๐ž๐œ๐ก๐ง๐จ๐ฅ๐จ๐ ๐ข๐ž๐ฌ ๐”๐ฌ๐ž๐: ๐€๐–๐’ ๐„๐‚2, ๐’3, ๐‘๐ž๐๐ฌ๐ก๐ข๐Ÿ๐ญ, ๐†๐ฅ๐ฎ๐ž ๐„๐“๐‹, ๐ƒ๐š๐ญ๐š๐›๐ซ๐ข๐œ๐ค๐ฌ, ๐๐ฒ๐ญ๐ก๐จ๐ง, ๐’๐๐‹, ๐๐ฒ๐’๐ฉ๐š๐ซ๐ค, ๐†๐ข๐ญ๐ฅ๐š๐› ๐‚๐ˆ/๐‚๐ƒ ๐Ž๐ซ๐œ๐ก๐ž๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐“๐จ๐จ๐ฅ, ๐€๐ ๐ข๐ฅ๐ž ๐’๐œ๐ซ๐ฎ๐ฆ, ๐‚๐จ๐ง๐Ÿ๐ฅ๐ฎ๐ž๐ง๐œ๐ž ๐Ÿ๐จ๐ซ ๐ƒ๐จ๐œ๐ฎ๐ฆ๐ž๐ง๐ญ๐š๐ญ๐ข๐จ๐ง ๐‚๐ซ๐ž๐š๐ญ๐ข๐จ๐ง
  • ๐‘๐จ๐ฅ๐ž: ๐๐ฎ๐ฌ๐ข๐ง๐ž๐ฌ๐ฌ ๐“๐ž๐œ๐ก๐ง๐จ๐ฅ๐จ๐ ๐ฒ ๐€๐ง๐š๐ฅ๐ฒ๐ฌ๐ญ helps companies define and execute their
  • technology strategy by designing, building, and operating their business
  • intelligence (BI), cloud, data management, dashboard, and analytics capabilities.
  • Team members strategize, design and build custom IT solutions to improve our
  • clientsโ€™ commercial effectiveness.
  • ๐‘๐ž๐ฌ๐ฉ๐จ๐ง๐ฌ๐ข๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ:
  • 1. Created the Vaccum Utility code for delta tables. To cleanup the delta files & log files for the Delta tables on top of AWS Databricks
  • 2. Involved in designing or providing the solution of Cost of Good Manufacturing (COGM) or Cost of Good Sold (COGS) to Pharmaceutical Client of ZS Associates by implementing the solution using PySpark, SQL & Python.
  • 3. Worked on design architecture and code to create configuration for high level environment (test and production)for Release-2
  • 4. Apart from the technical ability, I always ensured there is proper documentation associated with everything work which helps any other developer to get easily onboarded.
  • 5. Worked on Scheduling or Monitoring the Gitlab CI/CD Pipeline, which is used to call the databricks Job, dumping the data into S3 Bucket, creating the PostgreSQL entries (for holding the metadata, configuration for the Databricks Job).
  • 6. Created the Source to Target Mapping (STTM) Entry which contains the information of each or every column present in the final table and mapping of these columns to source tables (from which tables these columns are coming in final delta table)
PySparkData Modeling

Education

Uttarakhand Technical University

Bachelor of Technology - BTech โ€” Computer Science and Engineering

Jan 2017 โ€“ Jan 2021

M.K.D Senior Secondary School

Intermediate

Jan 2016 โ€“ Jan 2017

M.K.D Senior Secondary School

High School

Jan 2014 โ€“ Jan 2015

Stackforce found 100+ more professionals with Data Engineering & Cloud Cost Optimization

Explore similar profiles based on matching skills and experience