Priyesh Agrawal

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India3 yrs 10 mos experience

Highly Stable

Key Highlights

Expert in building resilient and scalable systems.
Achieved 100% incident resolution within SLAs.
Proven track record in automation and reliability.

Stackforce AI infers this person is a Site Reliability Engineer specializing in Fintech infrastructure and automation.

Contact

Skills

Core Skills

Cloud InfrastructureSite Reliability EngineeringAutomation

Other Skills

MySQLMariaDBAerospikeElasticsearchPercona XtraDBAWSDockerKubernetesPythonBashShell scriptingCJavaMesos-Marathon

About

A Computer Science and Engineering graduate with a proven track record as a Site Reliability Engineer at PhonePe. Demonstrates a keen interest in acquiring robust professional knowledge, expertise, and creativity through dedicated effort. Committed to applying acquired skills to address organisational challenges, contributing to upliftment, and fostering personal development. Currently dedicated to mastering hard-core programming for tool development and problem-solving, showcasing quick learning abilities, strong debugging skills, and a drive to stay updated on emerging technologies. As a Site Reliability Engineer , I specialise in ensuring the availability, scalability, and reliability of mission-critical systems. My expertise spans across cloud infrastructure, containerization, automation, and monitoring, with a strong focus on maintaining high uptime and system performance. I have a deep understanding of microservices architecture, distributed systems, and container orchestration, which has enabled me to solve complex challenges in production environments. What drives me is a commitment to both engineering excellence and collaborative problem-solving, working cross-functionally with teams to proactively identify and resolve issues. My goal is to build resilient systems that align with business objectives, making sure that infrastructure is both scalable and reliable. Let's connect and discuss how we can optimize systems for the future!

Experience

3 yrs 10 mos

Total Experience

3 yrs 10 mos

Average Tenure

3 yrs 10 mos

Current Experience

Phonepe

2 roles

Site Reliability Engineer

Aug 2022 – Present · 3 yrs 10 mos · Bengaluru, Karnataka, India

Owned and built end-to-end infrastructure for PhonePe’s ONDC platform (~1200 servers), handling design, deployment, capacity planning, and reliability.
Led production migrations, cutovers, and DR drills for ONDC and core fintech services, ensuring minimal downtime and zero data loss.
Designed and managed large-scale, distributed fintech infrastructure delivering 99.9%+ uptime with proactive monitoring, alerting, and incident response.
Drove on-call for critical infrastructure, achieving 100% incident resolution within SLAs and reducing MTTR through better runbooks and automation.
Automated deployment pipelines, alert remediation, and recovery flows using Python, Bash, and CI/CD tools, cutting manual ops by 43% and improving release reliability.
Implemented Aerospike XDR and Galera replication and DR strategies; executed quarterly DR drills to validate RPO/RTO.
Migrated workloads from Mesos–Marathon to an in-house container orchestrator, reducing infra costs by 15% and improving scalability by 20%.
Ensured RBI compliance through quarterly tech audits and timely closure of findings.
Built and operated Kubernetes-based platforms and large-scale private cloud infrastructure, supporting multiple application pods under tight timelines while maximizing operational efficiency.

MySQLMariaDBAerospikeElasticsearchPercona XtraDBAWS+6

SRE Intern

Mar 2022 – Jul 2022 · 4 mos · Bengaluru, Karnataka, India

Automated deployment and operational tasks using Python and Shell scripts, improving
deployment consistency and reducing manual errors.
Enhanced YARN cluster scalability by automating dynamic node additions based on
load, optimizing resource utilization.
Implemented HashiCorp Consul for configuration synchronization across multiple
environments, improving reliability of service configurations.
Gained hands-on experience with Apache Ambari, Apache Hadoop, SaltStack, Linux
administration, and foundational SRE practices.
Collaborated with senior engineers on monitoring and alert setups, contributing to early
detection of infrastructure issues.