R

RaviRaj Jha

SRE (Site Reliability Engineer)

Noida, Uttar Pradesh, India2 yrs experience

Key Highlights

  • Expert in Kubernetes and AWS for scalable infrastructure.
  • Proven track record in automating CI/CD pipelines.
  • Strong communicator with a passion for new technologies.
Stackforce AI infers this person is a DevOps Engineer with expertise in cloud infrastructure and automation in SaaS environments.

Contact

Skills

Core Skills

KubernetesIncident ManagementAwsTerraformMonitoringCi/cd

Other Skills

AWS Auto ScalingAWS ECSAWS Identity and Access Management (AWS IAM)Access Control ManagementAlertingAmazon EC2Amazon EKSAmazon Elasticsearch ServiceAmazon Web Services (AWS)Application ConfigurationApplication DeploymentAutomationBuild AutomationCloud ComputingCloudWatch

About

As a passionate and goal-oriented Computer Science graduate, I thrive on solving complex problems and delivering high-quality, efficient solutions. With hands-on experience in DevOps practices, I have developed expertise in monitoring critical systems, automating infrastructure, and managing CI/CD pipelines. In addition to my technical skills, I am highly organized, a strong communicator, and eager to explore new technologies that can improve processes and systems. With a solid foundation in Python, C/C++, and Linux, along with a deep understanding of key DevOps and cloud technologies, I am excited about contributing to impactful projects and expanding my knowledge further.

Experience

Gigaspaces

Site Reliability Engineer

Mar 2025Present · 1 yr · Remote · Remote

  • Infrastructure & Application Monitoring: Proactively monitor and maintain system health using observability tools such as Prometheus, Grafana, and Groundcover, ensuring high availability and performance of services.
  • Kubernetes Operations:
  • Manage and troubleshoot Kubernetes clusters for seamless deployment, scaling, and management of containerized applications.
  • Incident Management & Alerting:
  • Set up and fine-tune alerts to detect anomalies early and respond quickly to incidents, minimizing downtime and impact to end-users.
  • Performance Optimization:
  • Analyze system metrics and logs to identify bottlenecks, improve efficiency, and drive performance improvements across the stack.
  • Automation & Reliability:
  • Implement automation to streamline infrastructure operations, reduce manual intervention, and enhance system reliability.
  • Collaboration:
  • Work closely with development and DevOps teams to ensure systems are designed with reliability, scalability, and observability in mind.
  • Documentation & Best Practices:
  • Maintain clear documentation of monitoring setups, incident runbooks, and contribute to SRE best practices and knowledge sharing.
PrometheusGrafanaGroundcoverKubernetesIncident ManagementAutomation

Amber

DevOps Engineer

Mar 2024Mar 2025 · 1 yr · Pune · Hybrid

  • I worked extensively with AWS ECS, ECR, Jenkins, Grafana, Prometheus, Loki, CloudWatch, and Infrastructure as Code (Terraform) to provision and manage scalable, reliable infrastructure across development, staging, and production environments.
  • Key Responsibilities & Achievements:
  • Monitored critical application metrics using Grafana dashboards, including frontend/backend performance, GT metrics, Sentry errors, core API health, P75 latency, CDN SSR P90, platform budgets, and code coverage—resulting in a 10–50% improvement in organizational performance, user experience, and system reliability.
  • Enhanced access control by modifying existing Terraform architecture to manage user permissions and ensure secure authorization across teams.
  • Migrated Android build pipeline from GitHub Actions to Jenkins, leading to an 85% reduction in build time and enabling automated deployments to Firebase with real-time job monitoring.
  • Streamlined operational workflows by integrating Grafana, Terraform, and Jenkins, resulting in improved monitoring, faster deployments, better access management, and greater overall efficiency.
AWS ECSECRJenkinsGrafanaPrometheusTerraform+2

Education

CHANDIGARH UNIVERSITY

Computer Science And Engineering — Computer Software Engineering

Jun 2019Jun 2023

Stackforce found 100+ more professionals with Kubernetes & Incident Management

Explore similar profiles based on matching skills and experience