Nitin Ganjam Ramesh

SRE (Site Reliability Engineer)

Bangalore Urban, Karnataka, India3 yrs 11 mos experience

Key Highlights

Achieved 99.9% uptime in critical systems.
Reduced incident response times by 30%.
Expert in proactive observability and automation.

Stackforce AI infers this person is a Site Reliability Engineer specializing in high-availability systems and cloud infrastructure.

Contact

Skills

Core Skills

Site Reliability EngineeringObservability

Other Skills

Amazon Web Services (AWS)AnsibleAzure DevOps ServicesC++Cloud ComputingComputer NetworkingDocker ProductsDynatraceELK StackElastic Stack (ELK)Incident ManagementKubernetesLinuxLog AnalysisMIL-STD-1553

About

I hate outages. That’s why I spend my days building systems that refuse to break. Currently at Dynatrace, I help organizations see the 'why' behind their data. Whether it's cutting incident response times by 30% or managing multi-region AWS/Azure environments, my goal is simple: 99.9% uptime and zero 3 AM wake-up calls. I’ve spent my career at places like PhonePe, Tesco, and Kyndryl, tackling scale and reliability head-on. My focus is on proactive observability—finding the fire before the alarm even goes off. I specialize in turning chaotic infrastructure into automated, resilient pipelines using: Cloud: AWS, Azure Orchestration & Containers: Kubernetes (K8s), Docker IaC & Automation: Terraform, Ansible, Jenkins Observability: Dynatrace, Prometheus, ELK Stack, Grafana OS: Linux (Ubuntu) If you’re building high-availability systems or want to talk about the future of SRE and AIOps, let’s connect.

Experience

3 yrs 11 mos

Total Experience

1 yr 3 mos

Average Tenure

Current Experience

Kyndryl india

Infrastructure Specialist

Jan 2025 – Oct 2025 · 9 mos · Bengaluru, Karnataka, India · Hybrid

Resolved over 80+ real-time incidents monthly across AP, IN, and JP regions, addressing AlOps dashboard issues like count
mismatches, Dynatrace alerts, and data flow interruptions, achieving a 95% resolution rate.
Utilized the ELK Stack (Elasticsearch, Logstash, Kibana) for log analysis and troubleshooting of complex issues, including
Kubernetes cluster overflows and system performance.
Collaborated with engineering teams to conduct root cause analysis and implement permanent fixes, reducing repeat incidents by
30%.
Processed 50+ user access requests weekly, ensuring timely and secure bundle provisioning while maintaining audit compliance.
Participated in on-call rotations, responding to Dynatrace alerts and ensuring overall product health, in Kubernetes-based systems.

ELK StackKubernetesDynatraceSite Reliability EngineeringObservability