Santhosh Deepu Patrayuni

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India19 yrs 10 mos experience

AI ML PractitionerAI Enabled

Key Highlights

Over 13 years of IT experience with strong cloud expertise.
Led significant projects optimizing cloud infrastructures.
Certified in AWS and Kubernetes with a focus on reliability.

Stackforce AI infers this person is a Cloud Infrastructure and DevOps expert specializing in SaaS solutions.

Contact

Skills

Core Skills

KubernetesPythonDevopsAws

Other Skills

Agile MethodologiesAmazon EKSAmazon Web Services (AWS)AnsibleAppDynamicsApplication VirtualizationArgoCDAzure Kubernetes Service (AKS)BashChefContinuous IntegrationDesktop VirtualizationDockerGenerative AIGit

About

Experience

19 yrs 10 mos

Total Experience

3 yrs 6 mos

Average Tenure

2 yrs 4 mos

Current Experience

Nvidia

Staff Site Reliability Engineer

Jan 2024 – Present · 2 yrs 4 mos · Hybrid

Generative AIKubernetesGoogle Kubernetes Engine (GKE)Azure Kubernetes Service (AKS)Amazon EKSPython

Broadcom

Reliability Engineer 5

Oct 2023 – Jan 2024 · 3 mos · Bengaluru, Karnataka, India · On-site

Tanzu observability by wavefront

Staff Site Reliability Engineer

Aug 2020 – Oct 2023 · 3 yrs 2 mos · Bengaluru, Karnataka, India · Hybrid

Proven Leadership in Infrastructure and DevOps:
Guided as Lead Engineer in numerous projects, driving the architecture and implementation of changes to optimize Tanzu Observability (Wavefront) clusters.
Oversaw the management of expansive AWS and Google Cloud infrastructures, comprising over 2000+ cloud-computing platform resources.
Spearheaded updates to versions, scripts, and documentation of Ansible playbooks and Terraform modules, ensuring seamless deployment and maintenance.
Managed a portfolio of 500+ Kubernetes workloads , implementing standardization and configuration through ArgoCD, Kustomize, and Helm, resulting in increased operational efficiency.
Enhanced the usability of Kubernetes proxies, significantly reducing failure rates and improving overall system reliability.
Developed custom tools in Go and Python to expedite troubleshooting and implemented bots to support on-call responsibilities, enhancing incident response times.
Engineered a mutli sharded system for FoundationDB, substantially increasing speed and capacity in super-large clusters.
Optimized PagerDuty alerts, leading to a reduction in overall alerts and increased efficiency in alert management.
Contributed as an integral part of the on-call rotation, taking responsibility for PagerDuty alerts and successfully recovering failed systems and applications.

DevOpsPythonAgile MethodologiesContinuous IntegrationAmazon Web Services (AWS)Wavefront+8

Intuit india

Senior Site Reliability Engineer

Oct 2012 – Aug 2020 · 7 yrs 10 mos · India

Experience in orchestrating the migration of services between Private Data Centers and AWS, showcasing versatility in adapting to evolving infrastructure requirements.
Experience in Infrastructure as Code (IaC) like Terraform & AWS CloudFormation, ensuring seamless management and scalability.
Employing a comprehensive suite of monitoring tools including Splunk, AppDynamics as APM, Wavefront, Telegraf with InfluxDB, AWS CloudWatch etc., to uphold optimal system performance and reliability.
Wrote many scripts and contributed in writing tools and services aimed at enhancing operational efficiency.
Experience in working with SQL databases like MySQL, Oracle and Non-SQL databases like AWS DynamoDB
Leveraging advanced troubleshooting techniques for Linux systems using tools like top, htop, vmstat, iostat, tcpdump, etc., ensuring prompt identification and resolution of performance issues.
Experience in automating mandane and manual tasks using Python

DevOpsPythonAmazon Web Services (AWS)JenkinsSite Reliability EngineeringAWS