Pratyush Kumar

Data Engineer

Bengaluru, Karnataka, India4 yrs 3 mos experience

Key Highlights

Over 3 years of experience in data engineering.
Expert in building optimized ETL pipelines.
Strong background in analytics and data governance.

Stackforce AI infers this person is a Data Engineer specializing in ETL solutions across technology and agriculture sectors.

Contact

pratyush.kumar@kpipartners.com LinkedIn

Skills

Core Skills

Data EngineeringEtl ToolsData IntegrationData Analytics

Other Skills

AWS EC2AWS EMRAWS S3AirflowAmazon EC2Amazon Web Services (AWS)Apache AirflowApache KafkaApache SparkBig DataCommunicationContinuous Integration and Continuous Delivery (CI/CD)Data GovernanceData LakesData Manipulation

About

I am having 3+ years of professional experience in design, development and implementation of ETL/ELT solutions. A data engineer with experience turning raw data from multiple sources into valuable insights and creative solutions. Ability to translate vast amounts data into meaningful findings that influence business strategy. Solid background in analytics.

Experience

4 yrs 3 mos

Total Experience

1 yr 5 mos

Average Tenure

1 yr 8 mos

Current Experience

Kpi partners

Senior Data Engineer

Oct 2024 – Present · 1 yr 8 mos · Bengaluru, Karnataka, India · Hybrid

Client : Broadcom
Broadcom is a global technology leader that designs, develops, and
supplies a broad range of semiconductor and infrastructure software
solutions. With a focus on innovation, Broadcom's products are essential in
a wide array of markets, including data centres, networking, broadband,
and wireless communications. We were working on building a unified,
scalable data platform to streamline financial reporting, revenue
forecasting, and cost analytics across semiconductor and software business
units. The project enabled faster, more accurate financial decision-making
by integrating data from multiple operational and ERP systems.
Environment: SQL, Python,PySpark, Spark, Databricks, AWS S3, AWS EC2,
GitHub.
Roles and responsibilities:
Build highly optimised PySpark pipelines to process data from different
sources.
Developed robust ETL pipelines using Databricks workflows for
incremental and full data loads.
Implemented data cleansing, validation, and transformation logic to prepare data for downstream analytics, and reporting.
Created optimized Delta Lake schemas to support analytical queries while maintaining partitioning and data governance best practices.
Implemented partitioning and Z-order clustering strategies to improve performance and reduce cost on Databricks.
Documented data pipeline architecture, workflows, and operational procedures for maintainability and knowledge sharing.
Closely monitored data warehouse functionality and performance, proactively responding to issues, and designed and maintained data models, including fact and dimensional tables, in Delta Lake.
Implemented CI/CD using GitHub.
Conducted training sessions between the team members to onboard

SQLPythonPySparkSparkDatabricksAWS S3+4

Cargill

Data Engineer

Mar 2024 – Sep 2024 · 6 mos · India · Hybrid

Data Analytics
Client: Cargill
Project Description: Cargill's major businesses are trading, purchasing and
distributing grain and other agricultural commodities, such as palm oil;
trading in energy, steel and transport etc.
We design, build and deliver high-performance, data-centric solutions using
comprehensive data capabilities of Cargill Data Platform and build data
structures and pipelines to collect, curate and enable data for Supply Chain
Initiatives.
Domain: Supply Chain
Roles / Responsibility :
Developed and optimized SQL scripts to support data analysis and
reporting for the America Supply Chain team ◦ Enhanced database
performance and data accuracy through advanced SQL query design.
Build Highly Optimized Pyspark pipelines to process the data from
different sources.
Created delta tables properties to optimize it further.
Conducted trainings between the team members to onboard.
Used Airflow for Orchestration.

SQLPySparkAirflowData EngineeringData Integration

Altiostar, a rakuten symphony company

Member Of Technical Staff|Software Enginner 1

Dec 2021 – Jan 2024 · 2 yrs 1 mo · Bangalore Urban, Karnataka, India · Hybrid

Client : Network Software Provider.
Project Description : Altiostar is a company that provides open virtual
radio access network (vRAN) technology.
In data analytics space we consume data from various sources provided by
different vendors, and put together to get the insight regarding users
monitor,
call drop , sites operable/not operational.
Environment : AWS S3, AWS EMR, Spark, Python, Airflow, Kafka, HDFS, Hive,
Databricks
Role / Responsibility :
Used Airflow for Orchestration.
Created PySpark data pipelines to get data from S3 to hive.
Used Airflow as orchestration tool.
Used Databricks for delta tables.
Used PySpark for processing real time data.
Optimization on pipeline and query performance.
Used spark structured streaming to get data into hive in real time.
Created DQ framework for alerting support engineers.
Also created various alert mechanism with slack for various pipeline maintenance