Rahul Sanap

Data Engineer

Bengaluru, Karnataka, India5 yrs 1 mo experience

Most Likely To SwitchHighly Stable

Key Highlights

Contributed to $30 billion in sales through data-driven insights.
Reduced invoice delays from $4 billion to $2.5 billion.
Streamlined operations, reducing job count from 250+ to 15.

Stackforce AI infers this person is a Data Engineer specializing in Big Data solutions for Retail and Logistics industries.

Contact

rahulvsanap1234@gmail.com LinkedIn

Skills

Core Skills

Data EngineeringBig DataBusiness Intelligence

Other Skills

AirflowAmazon EC2Azure Virtual MachinesAzure blobC (Programming Language)CI/CDDashboardingData AnalysisData CleaningData Pipeline EngineeringData Quality ChecksData VisualizationDockerHadoopHive

About

I am a Data Engineer with 4.9 years of experience in the data engineering domain. I am passionate about dynamic coding, uncovering valuable insights from data, and solving complex business problems. In my role, I engage in various work activities, including: - Gathering business requirements and understanding the underlying product logic. - Conducting data cleaning through initial and incremental loads. - Deriving new key performance indicators (KPIs) for the product. - Utilizing the final data to generate meaningful insights on interactive dashboards. - Optimizing costs associated with Azure Databricks clusters, notebooks, and Production Azure Data Factory pipelines. - Troubleshooting and resolving issues in ADF pipelines. - Sharing knowledge and collaborating with my team. I have worked with a diverse tech stack that includes SQL, Python, Apache Hadoop, Hive, Spark, PySpark, Azure Databricks, Data Factory, Airflow, Data Lake, DevOps, Data Warehouse (DWH), Linux, Bash, Shell Scripting, Oozie, Power BI, Jenkins and Bitbucket.

Experience

5 yrs 1 mo

Total Experience

2 yrs 6 mos

Average Tenure

3 yrs 6 mos

Current Experience

Lowe's india

Data Engineer

Nov 2022 – Present · 3 yrs 6 mos · Bengaluru, Karnataka, India

Certified Data Source: Promotions
Consolidated promotional offers data from various sources, including DB2, Teradata, and Postgres, into a single source of truth encompassing over 1 billion records.
Implemented an efficient Change Data Capture (CDC) architecture leveraging Spark SQL, Shell Scripts, and Airflow to enhance data processing workflows.
Executed the ingestion of JSON data into Hadoop, transforming raw JSON into complex data types (Map, Struct, Array) for seamless integration into Hive tables.
Enabled data-driven decision-making to analyze the impact of various offer types on sales and forecast future promotions, leveraging insights from these offers that yielded $30 billion in FY23.
Revionics: Clearance
Constructed diverse data sets, including Product Hierarchy, Product, Store, Store Hierarchy, Distribution Center, Store Inventory, and DC Inventory for the 3rd party vendor Revionics to facilitate Clearance Plans.
Engineered a complete end-to-end pipeline with robust data quality checks to efficiently deliver data to Revionics.
Streamlined operations by deactivating 250+ jobs for the existing vendor, achieving the same functionality with fewer than 15 jobs.
Generated a direct impact on Clearance Revenue, contributing to $1 billion in sales for Lowe's.

SQLSpark SQLShell ScriptsAirflowHadoopHive+2

A.p. moller - maersk

Data Engineer

Mar 2021 – Oct 2022 · 1 yr 7 mos · Pune, Maharashtra, India

Invoice Timeliness
Developed a product that effectively reduced the time gap between invoice generation from the Booking System to SAP.
Analyzed and processed a substantial volume of invoice-level data to identify the root causes of invoice generation delays.
Designed and implemented a Big Data-based solution, utilizing optimized coding practices to minimize infrastructure costs.
Achieved a significant reduction in delayed invoices, decreasing the amount from $4 billion to $2.5 billion.
Successfully improved the average number of days for late invoice generation from 22 days to 15 days within a span of 6 months.
Customer Console
Created a customer-level insights product that provides comprehensive information, including Total Outstanding, Total Turnover, Total Overdue, Daily Sales Outstanding, and Average Days Late for payment.
Constructed a Big Data-based solution, creating an end-to-end pipeline that encompasses data cleaning and visualization of insights on a Dashboard.
Addressed the challenge faced by the collection team, eliminating the need to gather information from multiple sources and perform manual calculations to track daily and monthly collections at both the customer and regional levels.
Earned recognition as one of the top 10 most widely used analytics products at Maersk.
Cost Optimization of Databricks Clusters and Data Factory Pipelines
Reduced the cost of Databricks Interactive Dev clusters from $3000 to $1200 by analyzing cluster usage patterns. Provided knowledge transfer (KT) sessions to the team regarding the utilization of single-node clusters versus multi-node clusters, effectively reducing development costs.
Minimized the costs of two Production Data Factory pipelines from $5500 to $600 and $500 to $80, respectively. This was achieved by conducting a detailed analysis of each activity's cluster usage and implementing Spark optimization techniques.