Abhishek Agrawal

Data Engineer

India8 yrs 11 mos experience

Key Highlights

8+ years of experience in data engineering and analytics.
Expert in building scalable data solutions on Azure.
Proven track record in optimizing data pipelines and CI/CD processes.

Stackforce AI infers this person is a Data Engineering specialist in the Azure ecosystem.

Contact

Skills

Core Skills

Data EngineeringAzure DatabricksAzure Data FactoryData ScienceData Analysis

Other Skills

Azure Data LakeAzureAzure DevOps ServicesContinuous Integration and Continuous Delivery (CI/CD)Apache SparkSQLPySparkGitAZURE SQL SERVERAzure synapse analyticsPython (Programming Language)Predictive ModelingMachine LearningMicrosoft Power BIpython

About

I'm an Azure Data Engineer offering 8+ years of experience with proven ability to deliver short or long-term projects in data engineering, data warehousing, machine learning, and business intelligence realm. My passion is to partner with my clients to deliver top-notch, scalable data solutions to provide immediate and lasting value. I have completed my engineering (B.Tech) from NIT RAIPUR. I specialize in the following data solutions: ✔️ Builiding End to End ETL Pipeline Using Azure Cloud Tools. ✔️ Building the migration process from Hadoop cluster to Azure Databricks spark cluster ✔️ Building data warehouses using modern cloud platforms and technologies. ✔️ Creating and automating data pipelines, & ETL processes ✔️ Building highly intuitive, interactive dashboards. ✔️ Data Cleaning, Processing, and Machine Learning models. ✔️ Data strategy advisory & technology selection/recommendation Technologies I most frequently work with are: ☁️ Cloud: Azure ☁️ Cloud Tools: Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Azure Data Lake, Azure Analysis Service, Azure DevOps, Azure Key Vault, Azure Active Directory. 💬Language: SQL, Python, PySpark, SparkSQL, R, SAS, Dash. 👨‍💻 Databases: SQL Server, Azure Synapse, Azure SQL Database ⚙️ Data Integration/ETL: SAP HANA, Dyanmics 365, EPM Onyx, QAD 📊 BI/Visualization: PowerBI, Excel 🤖 Machine learning - Jupyter Notebook, Python, Pandas, Numpy, Statistics, Probablity.

Experience

8 yrs 11 mos

Total Experience

1 yr 6 mos

Average Tenure

1 yr 6 mos

Current Experience

Aldi dx

Data Engineer

Nov 2024 – Present · 1 yr 6 mos · Mülheim an der Ruhr, North Rhine-Westphalia, Germany · On-site

Leading the shift from Hadoop to Azure Databricks Spark, ensuring a smooth transition with better performance.
Continuously improving Spark code to keep it efficient and aligned with business needs.
Structuring the codebase for easy scaling, making sure it handles growing data volumes smoothly.
Setting up a strong CI/CD process for Azure Data Factory and Databricks, including unit and automated testing for better code quality and reliable deployments.
Working closely with Data Scientists to provide accurate data on time, helping them generate valuable business insights.
Building data pipelines in Azure Data Factory to efficiently move data from older systems to the Azure cloud.
Designing pipelines that work across multiple countries, ensuring they meet global standards.
Writing flexible and scalable code that adapts to different country-specific requirements.
Encouraging cloud best practices within the team and providing guidance to maintain industry standards.

Azure Data FactoryAzure Data LakeAzure DatabricksAzureAzure DevOps ServicesContinuous Integration and Continuous Delivery (CI/CD)+4

Aldi süd

Azure Data Engineer

Feb 2023 – Feb 2025 · 2 yrs · Germany · On-site

Spearheading the migration process from Hadoop cluster to Azure Databricks Spark cluster, ensuring a smooth transition and enhanced performance.
Constantly optimizing Spark cluster code for optimal performance, proactively seeking opportunities for improvement to meet evolving business needs.
Industrializing the code base to facilitate seamless scaling, ensuring robustness and efficiency in handling increasing data volumes.
Handing the establishment of a comprehensive CI/CD process for both Azure Data Factory and Azure Databricks, integrating unit testing and automated testing to enhance code quality and deployment reliability.
Collaborating closely with Data Scientists, providing them with accurate and timely data to derive actionable business insights, fostering a synergistic relationship between data engineering and data science teams.
Developing data pipelines in Azure Data Factory for the seamless movement of data from legacy systems to the Azure environment.
Designing pipelines with a global perspective, ensuring scalability across multiple countries and aligning with international data standards.
Writing scalable code that can be adapted to different countries, promoting a consistent and efficient approach to data processing.
Promoting and enforcing cloud best practices within the team, offering guidance and support to ensure adherence to industry standards.

Azure Data FactoryAzure DatabricksSQLPySparkContinuous Integration and Continuous Delivery (CI/CD)Git+1

Smiths group plc

Azure Data Engineer

Aug 2021 – Dec 2022 · 1 yr 4 mos · Bengaluru, Karnataka, India · Remote

EPM ONYX BUSINESS REPORTING
1)Gathered requirements from stakeholders for Business Reporting.
2)Prepared Technical & Scope Analysis document for Data Models which includes Facts & Dimensions mapping.
3)Fetched Data from a Multidimensional source system(EPM) using API through Azure Data Factory.
4)Built Full load and Delta load Pipeline in Azure Data Factory.
5)Stored the raw data into Azure Data Lake in the Date Time folder structure.
6)Written transformation logic using Pyspark and Spark SQL in databricks to convert the raw data into Facts and Dimensions.
7)Implemented Semantic Data Model using Azure Analysis Services for various Analytics Power BI reports/dashboards.
8)Created a technical incident/challenge Document for effective communication across the team.
MLT O2C DASHBOARD
1)Added a new report in existing order to the cash dashboard to track market lead time.
2)Read the uploaded data through Logic App into Databricks from Azure Blob Storage.
3)Made required transformation in Databricks using SQL and Python and wrote the transformed data into Azure Blob storage.
4)Read the Data into Azure Synapse and made Data Models in Azure Analysis Service for Power BI Visualization.
Disaster Recovery and CI/CD Pipeline
1)Taken backup for Azure Databricks, Azure Data Factory, Azure Analysis Service using Databricks Command-Line(CLI), ARM templates, SSMS, and Powershell.
2)Designed a CI/CD pipeline using Azure DevOps to automate the build and release processes across various environments for Azure
Data Factory (ADF), Azure Synapse (Data Warehouse) & Azure Analysis Services.

Azure Data FactoryAzure DatabricksAzure Data LakeAZURE SQL SERVERAzure synapse analyticsApache Spark+3

Smiths detection

Azure Engineer

Aug 2021 – Dec 2022 · 1 yr 4 mos · Bengaluru, Karnataka, India · Remote

Project Vector HR Dashboard Development:
1) created warehouse views to report various business KPIs such as Joiners, Leavers, HeadCount, Future Joiners and Leavers.
2) created a stored procedure to update previous headcount reports.
3) Worked on improving the performance of the dashboard by improving the existing schema and data models.
HR Dashboard Optimization.
1) Increased the speed of Pyspark, Python code written in databricks.
2) Made several changes to the code, such as replacing collect () and executemany () with spark writer and fetchall () with the spark.read jdbc function.
3) Adapted Python commands and integrated them with Spark so that they could run in worker mode intead of driver node.
4) Optimization gave us a 20% reduction in time and a 30% reduction in overall cost.
5) Removed manual intervention in the HR Data Load pipeline and worked on automating it.
6) added a feature for deleting source records to the existing dashboard. 7)Analysed the existing dashboard with business people to find out the descrypency in the existing dashboard and worked with them to remove those anomalies by adding functionality using ADF, ADLS, and Azure Databricks.
KPI REPORTING USING AZURE
1) Migrating data from On Premise server to Azure Data Lake using Azure Data Factory and processing CSV, Json, XML file using Scala, PySpark, Spark SQL in Databricks.
2)Writing processed file to Azure SQL and Delta Lake using Databricks and moving it to Archive container in Databricks.
3)Building Dashboard on top of Delta Table and Orchestrating the whole ETL process using Azure Data Factory.

Data EngineeringAzure Data FactoryAzure Data LakeAZURE SQL SERVERAzure DatabricksPySpark+1

Syniti

Machine Learning Engineer

Nov 2018 – Jan 2021 · 2 yrs 2 mos · Bengaluru, Karnataka, India

1)Built a data integration pipeline from SAP S/4HANA and Dynamics 365 source systems with the help of Azure Data Factory.
2)Created a date hierarchy partitions in Azure Data Lake Storage Gen 2 to store the data in multiple layers.
3)With Azure Synapse (Azure Data Warehouse) created Schema, Facts and Dimensions to organize and populate the data into table structures for various source systems with Stored Procedures and created a semantic layer for data modelling in Azure Analysis Services to build and deploy multiple
interactive analytics dashboards or reports using Power BI.
4)Developed an internal Data Science Platform which can automate the various DS processes thereby reducing the computational complexity and time required to solve specific problem statements.
5)Developed a solution for Utility firm by implementing optimization techniques in python to recommend the best optimal route to minimize the time taken and distance travelled by vehicles contributing to the reduction in operational cost.

Data EngineeringAzure Data FactoryData SciencePython (Programming Language)Predictive ModelingMachine Learning+4

Tracxn

Business Analyst

Apr 2018 – Oct 2018 · 6 mos · Bengaluru Area, India

As a part of the team, key responsibilities include:
Custom sector research and Analysis of the companies, especially Start-ups and provide insights to the Investors, Private Equity firms, Venture Capitalists.
Areas of focus included Coding Tools, DevOps, Software Testing Tools, API Management, IT
services, Enterprise Security.
RETAIL HUB DASHBOARD
Developed a dynamic spark notebook to pivot the web crawling data efficiently
Integrated Azure Synapse Warehouse and Data Lake with ADF and created parameterised pipeline in Data Factory for an end-to-end ETL process.
Designing and developing Warehouse Objects and Stored Procedures for streamlined data processing
Developing a Logic app to send daily email extracts from Azure Data Lake to the relevant stakeholders
Establishing a robust CI/CD process for Azure Data Factory to ensure seamless deployment and management of the pipeline.