Pratyush Kumar

Data Engineer

Bengaluru, Karnataka, India4 yrs 3 mos experience

Key Highlights

  • Over 3 years of experience in data engineering.
  • Expert in building optimized ETL pipelines.
  • Strong background in analytics and data governance.
Stackforce AI infers this person is a Data Engineer specializing in ETL solutions across technology and agriculture sectors.

Contact

Skills

Core Skills

Data EngineeringEtl ToolsData IntegrationData Analytics

Other Skills

AWS EC2AWS EMRAWS S3AirflowAmazon EC2Amazon Web Services (AWS)Apache AirflowApache KafkaApache SparkBig DataCommunicationContinuous Integration and Continuous Delivery (CI/CD)Data GovernanceData LakesData Manipulation

About

I am having 3+ years of professional experience in design, development and implementation of ETL/ELT solutions. A data engineer with experience turning raw data from multiple sources into valuable insights and creative solutions. Ability to translate vast amounts data into meaningful findings that influence business strategy. Solid background in analytics.

Experience

4 yrs 3 mos
Total Experience
1 yr 5 mos
Average Tenure
1 yr 8 mos
Current Experience

Kpi partners

Senior Data Engineer

Oct 2024Present · 1 yr 8 mos · Bengaluru, Karnataka, India · Hybrid

  • Client : Broadcom
  • Broadcom is a global technology leader that designs, develops, and
  • supplies a broad range of semiconductor and infrastructure software
  • solutions. With a focus on innovation, Broadcom's products are essential in
  • a wide array of markets, including data centres, networking, broadband,
  • and wireless communications. We were working on building a unified,
  • scalable data platform to streamline financial reporting, revenue
  • forecasting, and cost analytics across semiconductor and software business
  • units. The project enabled faster, more accurate financial decision-making
  • by integrating data from multiple operational and ERP systems.
  • Environment: SQL, Python,PySpark, Spark, Databricks, AWS S3, AWS EC2,
  • GitHub.
  • Roles and responsibilities:
  • Build highly optimised PySpark pipelines to process data from different
  • sources.
  • Developed robust ETL pipelines using Databricks workflows for
  • incremental and full data loads.
  • Implemented data cleansing, validation, and transformation logic to prepare data for downstream analytics, and reporting.
  • Created optimized Delta Lake schemas to support analytical queries while maintaining partitioning and data governance best practices.
  • Implemented partitioning and Z-order clustering strategies to improve performance and reduce cost on Databricks.
  • Documented data pipeline architecture, workflows, and operational procedures for maintainability and knowledge sharing.
  • Closely monitored data warehouse functionality and performance, proactively responding to issues, and designed and maintained data models, including fact and dimensional tables, in Delta Lake.
  • Implemented CI/CD using GitHub.
  • Conducted training sessions between the team members to onboard
SQLPythonPySparkSparkDatabricksAWS S3+4

Cargill

Data Engineer

Mar 2024Sep 2024 · 6 mos · India · Hybrid

  • Data Analytics
  • Client: Cargill
  • Project Description: Cargill's major businesses are trading, purchasing and
  • distributing grain and other agricultural commodities, such as palm oil;
  • trading in energy, steel and transport etc.
  • We design, build and deliver high-performance, data-centric solutions using
  • comprehensive data capabilities of Cargill Data Platform and build data
  • structures and pipelines to collect, curate and enable data for Supply Chain
  • Initiatives.
  • Domain: Supply Chain
  • Roles / Responsibility :
  • Developed and optimized SQL scripts to support data analysis and
  • reporting for the America Supply Chain team ◦ Enhanced database
  • performance and data accuracy through advanced SQL query design.
  • Build Highly Optimized Pyspark pipelines to process the data from
  • different sources.
  • Created delta tables properties to optimize it further.
  • Conducted trainings between the team members to onboard.
  • Used Airflow for Orchestration.
SQLPySparkAirflowData EngineeringData Integration

Altiostar, a rakuten symphony company

Member Of Technical Staff|Software Enginner 1

Dec 2021Jan 2024 · 2 yrs 1 mo · Bangalore Urban, Karnataka, India · Hybrid

  • Client : Network Software Provider.
  • Project Description : Altiostar is a company that provides open virtual
  • radio access network (vRAN) technology.
  • In data analytics space we consume data from various sources provided by
  • different vendors, and put together to get the insight regarding users
  • monitor,
  • call drop , sites operable/not operational.
  • Environment : AWS S3, AWS EMR, Spark, Python, Airflow, Kafka, HDFS, Hive,
  • Databricks
  • Role / Responsibility :
  • Used Airflow for Orchestration.
  • Created PySpark data pipelines to get data from S3 to hive.
  • Used Airflow as orchestration tool.
  • Used Databricks for delta tables.
  • Used PySpark for processing real time data.
  • Optimization on pipeline and query performance.
  • Used spark structured streaming to get data into hive in real time.
  • Created DQ framework for alerting support engineers.
  • Also created various alert mechanism with slack for various pipeline maintenance
AWS S3AWS EMRSparkPythonAirflowKafka+5

Education

Institute of Techncal Education & Research

Bachelor of Technology - BTech — Electronics and communication engineering

Jan 2017Jan 2021

Stackforce found 100+ more professionals with Data Engineering & Etl Tools

Explore similar profiles based on matching skills and experience