praveen kondaveeti

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India8 yrs 9 mos experience

Key Highlights

  • Reduced alert fatigue by 60% through automation.
  • Improved service uptime from 97.5% to 99.9%.
  • Designed proactive alert strategies reducing MTTD by 35%.
Stackforce AI infers this person is a Site Reliability Engineer with expertise in SaaS and cloud infrastructure.

Contact

Skills

Core Skills

Production DeploymentIncident ManagementService ReliabilityCloud InfrastructureMonitoring SolutionsIncident Response

Other Skills

AWSAWS CloudFormationAmazon Web Services (AWS)AnsibleApacheAppDynamicsAzureAzure DevOpsBashBitbucketCascading Style Sheets (CSS)CentOSChef.ioCloudWatchDocker Products

About

I specialize in driving production system reliability by optimizing monitoring solutions, automating incident response, and conducting thorough Root Cause Analyses (RCAs) for high-severity incidents, leading to reduced downtime and improved system stability. Expertise in creating and optimizing monitoring dashboards, alerts, and conducting Root Cause Analysis (RCA) for high-severity incidents.

Experience

Ericsson

SRE

Jul 2024Present · 1 yr 8 mos · Bangalore Urban, Karnataka, India · Remote

  • Project: Certificate Automation T-Mobile (July 2024 – Present)
  • Automated SSL/TLS certificate management and validation across Kubernetes and on-prem environments.
  • Reduced alert fatigue by 60% through alert suppression tuning and automated expiry validation.
  • Developed rollback scripts to recover from faulty certificate deployments with zero downtime.
  • Led daily monitoring reviews to track system health and address recurring alert patterns.
  • Coordinated RCA for multiple Sev-1 incidents, documenting preventive measures.
KubernetesSSL/TLSalert suppressionrollback scriptsRoot Cause AnalysisProduction Deployment+1

Apple

Site Reliability Engineer

Aug 2021Jun 2024 · 2 yrs 10 mos · Bangalore Urban · Hybrid

  • Led implementation of SLIs, SLOs and SLAs; improved service uptime from 97.5% to 99.9%.
  • Built and maintained Splunk dashboards and alerts, reducing false positives by 40%.
  • Created synthetic and real-time monitors with New Relic.
  • Migrated 128 servers to cloud with observability-first focus, including alert integration.
  • Performed in-depth troubleshooting of alert anomalies and system degradation.
SLIsSLOsSLAsSplunkNew Reliccloud migration+2

Entropik tech

DevOps Engineer

Jun 2019Aug 2021 · 2 yrs 2 mos · Bengaluru Area, India

  • Designed proactive alert strategies using CloudWatch, Splunk, and AppDynamics, reducing mean time to detect (MTTD) by 35%.
  • Maintained dashboards for real-time service metrics including latency, error rate, and system throughput.
  • Responded to on-call alerts, identified root causes, and documented learnings in runbooks.
  • Assisted with migration of monitoring workflows during transition from AWS to Azure.
CloudWatchSplunkAppDynamicsAWSAzureMonitoring Solutions+1

Intuit

Site Reliability Engineer

May 2017May 2019 · 2 yrs · Bengaluru, Karnataka, India

  • Supported highly available tax data systems by implementing resilient monitoring, automated alerting, and proactive incident response.
  • Created and maintained Splunk dashboards, Splunk alerts, and AppDynamics dashboards to monitor application performance and system health.
  • Configured Apache Web Server and JBoss load balancing for optimized web application delivery.
  • Set up and managed OMD (Open Monitoring Distribution), Wavefront, and SiteScope synthetic monitoring for infrastructure and endpoint visibility.
  • Acted as a key on-call responder, performing root cause analysis (RCA) for production incidents to improve system reliability.
  • Designed and implemented Rundeck pipelines enabling single-click deployments and automated decommissioning of on-premises servers.
  • Actively participated in production and non-production deployments, ensuring smooth and error-free releases
SplunkAppDynamicsApacheJBossRundeckMonitoring Solutions+1

Education

JNTU Anantapur

Bachelor of Technology — Civil Engineering Technology/Technician

Jan 2009Feb 2013

Stackforce found 100+ more professionals with Production Deployment & Incident Management

Explore similar profiles based on matching skills and experience