PULKIT JAIN

Intern

Jaipur, Rajasthan, India11 mos experience

AI Enabled

Key Highlights

Strong understanding of distributed data systems
Experience in building AI-driven backend solutions
Proven ability to optimize big data pipelines

Stackforce AI infers this person is a Backend Engineer with a focus on Big Data and AI systems in a SaaS environment.

Contact

Skills

Core Skills

Backend DevelopmentAi SystemsBig Data EngineeringResource Optimization

Other Skills

FastAPIApache KafkaApache Spark StreamingDelta LakeApache SparkKafkaClickHouseNeo4jAI AgentsPostgreSQLModel Context Protocol (MCP)Cloud ComputingGoogle Cloud Platform (GCP)Distributed SystemsOptimizing Performance

About

I'm a 3rd-year Computer Science undergraduate and CPP-2 Project Intern at Hewlett Packard Enterprise (HPE), where I recently completed an end-to-end big data engineering project focused on resource optimization and cost-aware system design across batch and streaming pipelines. Through this project, I developed a strong understanding of distributed data systems, including how compute, storage, and infrastructure decisions directly impact performance, reliability, and cost at scale. The project strengthened my ability to reason about systems holistically rather than treating technologies in isolation. Technical Toolkit - Programming Languages: Python, Java, C++, SQL - Data & Streaming Systems: Apache Spark (Batch & Structured Streaming), Delta Lake, Apache Kafka, ClickHouse - Backend & APIs: FastAPI - Infrastructure & DevOps: Docker, Kubernetes (Minikube, KIND) - Version Control & Tooling: GitHub - Foundations: Data Structures & Algorithms (DSA), Object-Oriented Programming (OOPs) Currently, my primary focus is on growing as a backend engineer and AI builder, working on the design and implementation of backend systems, APIs, and intelligent, AI-driven workflows. I've started actively building in this space, with an emphasis on understanding how AI systems are engineered, deployed, and integrated into real-world backend architectures. I'm motivated by learning deeply, building thoughtfully, and applying engineering principles to create scalable, reliable, and intelligent systems.

Experience

11 mos

Total Experience

11 mos

Average Tenure

11 mos

Current Experience

Hewlett packard enterprise

2 roles

CPP-3 Project Intern

Feb 2026 – Present · 4 mos · Remote

Working on Agentic Bug Triaging & Routing System
Building an agentic AI system to automate intelligent bug triage across multi-repository enterprise environments.
Addressing a key challenge in large engineering organizations where a single customer-reported defect spans multiple teams (e.g., storage, compute, networking) and tools (Jira, Bugzilla, GitHub Issues).
Designing solutions to correlate and route related issues across disconnected systems, reducing manual effort and improving debugging efficiency.

FastAPIApache KafkaBackend DevelopmentAI Systems

CPP-2 Project Intern

Jun 2025 – Jan 2026 · 7 mos · Remote

Worked on an end-to-end big data engineering project focused on resource optimization and cost-aware design of batch and streaming data pipelines.
Established a baseline Spark workload to identify performance bottlenecks related to shuffle, joins, partitioning, recomputation, and I/O.
Applied Apache Spark batch optimizations including join strategy tuning, partition management, caching & persistence, Kryo serialization, and Adaptive Query Execution (AQE).
Benchmarked before-and-after performance using Spark UI metrics (execution time, shuffle read/write, CPU and memory utilization).
Evaluated Spark execution across multiple infrastructure setups: local mode, multi-executor single node, Docker-based Spark cluster, and Kubernetes (Minikube, KIND).
Optimized the storage layer using Delta Lake, leveraging ACID transactions, OPTIMIZE (file compaction), Z-Ordering, Liquid Clustering, and MERGE-based upserts.
Designed and implemented a fault-tolerant, idempotent real-time streaming pipeline using Kafka, Spark Structured Streaming, Delta Lake, Clickhouse to safely handle duplicates and data replays.
Ensured storage-level correctness using Delta Lake MERGE and analytical deduplication using ClickHouse ReplacingMergeTree.

Apache Spark StreamingDelta LakeBig Data EngineeringResource Optimization