RaviRaj Jha

SRE (Site Reliability Engineer)

Noida, Uttar Pradesh, India2 yrs experience

Key Highlights

Expert in Kubernetes and AWS for scalable infrastructure.
Proven track record in automating CI/CD pipelines.
Strong communicator with a passion for new technologies.

Stackforce AI infers this person is a DevOps Engineer with expertise in cloud infrastructure and automation in SaaS environments.

Contact

Skills

Core Skills

KubernetesIncident ManagementAwsTerraformMonitoringCi/cd

Other Skills

AWS Auto ScalingAWS ECSAWS Identity and Access Management (AWS IAM)Access Control ManagementAlertingAmazon EC2Amazon EKSAmazon Elasticsearch ServiceAmazon Web Services (AWS)Application ConfigurationApplication DeploymentAutomationBuild AutomationCloud ComputingCloudWatch

About

As a passionate and goal-oriented Computer Science graduate, I thrive on solving complex problems and delivering high-quality, efficient solutions. With hands-on experience in DevOps practices, I have developed expertise in monitoring critical systems, automating infrastructure, and managing CI/CD pipelines. In addition to my technical skills, I am highly organized, a strong communicator, and eager to explore new technologies that can improve processes and systems. With a solid foundation in Python, C/C++, and Linux, along with a deep understanding of key DevOps and cloud technologies, I am excited about contributing to impactful projects and expanding my knowledge further.

Experience

Gigaspaces

Site Reliability Engineer

Mar 2025 – Present · 1 yr · Remote · Remote

Infrastructure & Application Monitoring: Proactively monitor and maintain system health using observability tools such as Prometheus, Grafana, and Groundcover, ensuring high availability and performance of services.
Kubernetes Operations:
Manage and troubleshoot Kubernetes clusters for seamless deployment, scaling, and management of containerized applications.
Incident Management & Alerting:
Set up and fine-tune alerts to detect anomalies early and respond quickly to incidents, minimizing downtime and impact to end-users.
Performance Optimization:
Analyze system metrics and logs to identify bottlenecks, improve efficiency, and drive performance improvements across the stack.
Automation & Reliability:
Implement automation to streamline infrastructure operations, reduce manual intervention, and enhance system reliability.
Collaboration:
Work closely with development and DevOps teams to ensure systems are designed with reliability, scalability, and observability in mind.
Documentation & Best Practices:
Maintain clear documentation of monitoring setups, incident runbooks, and contribute to SRE best practices and knowledge sharing.

PrometheusGrafanaGroundcoverKubernetesIncident ManagementAutomation

Amber

DevOps Engineer

Mar 2024 – Mar 2025 · 1 yr · Pune · Hybrid

I worked extensively with AWS ECS, ECR, Jenkins, Grafana, Prometheus, Loki, CloudWatch, and Infrastructure as Code (Terraform) to provision and manage scalable, reliable infrastructure across development, staging, and production environments.
Key Responsibilities & Achievements:
Monitored critical application metrics using Grafana dashboards, including frontend/backend performance, GT metrics, Sentry errors, core API health, P75 latency, CDN SSR P90, platform budgets, and code coverage—resulting in a 10–50% improvement in organizational performance, user experience, and system reliability.
Enhanced access control by modifying existing Terraform architecture to manage user permissions and ensure secure authorization across teams.
Migrated Android build pipeline from GitHub Actions to Jenkins, leading to an 85% reduction in build time and enabling automated deployments to Firebase with real-time job monitoring.
Streamlined operational workflows by integrating Grafana, Terraform, and Jenkins, resulting in improved monitoring, faster deployments, better access management, and greater overall efficiency.

AWS ECSECRJenkinsGrafanaPrometheusTerraform+2