Suraj Nayak

DevOps Engineer

Bengaluru, Karnataka, India9 yrs 9 mos experience

Key Highlights

  • Over a decade of experience in Cloud Infrastructure and DevOps.
  • Expert in building scalable and reliable platforms across multiple cloud providers.
  • Proven track record in modernizing legacy systems and driving cloud cost efficiency.
Stackforce AI infers this person is a Cloud Infrastructure and DevOps expert specializing in scalable solutions across multiple cloud platforms.

Contact

Skills

Core Skills

Cloud InfrastructureSre PracticesObservability SolutionsInfrastructure ManagementDevops ExcellenceInfrastructure AutomationMonitoring Platform ImplementationKubernetes ArchitectureMicroservices ManagementCloud Security

Other Skills

Amazon Web Services (AWS)AnsibleBitbucketCloud AutomationCloud ComputingCloud ServicesCommunicationConfiguration ManagementDevOpsDisaster RecoveryDomain ArchitectureGitGitHubGoogle Cloud Platform (GCP)IT Operations

About

With over a decade of experience in Cloud Infrastructure and DevOps, I specialise in building scalable, reliable, and observable platforms across AWS, Azure, and Kubernetes. I’ve led initiatives to modernise legacy systems, implement SRE best practices (SLIs/SLOs, error budgets, chaos engineering), and drive cloud cost efficiency through FinOps and automation. At AiDash and Cargill, I led infrastructure scalability and automation programs using Terraform, Kubernetes, and Ansible, optimising performance and reducing costs. Passionate about operational excellence and cross-functional collaboration, I focus on designing secure, resilient, and data-driven systems that empower teams and ensure business continuity in dynamic environments. ⸻

Experience

Locus

Principal Engineer

Oct 2025Present · 5 mos · Bengaluru, Karnataka, India · Hybrid

  • Architecting Observability Frameworks: Drive implementation of advanced monitoring and tracing using Prometheus, Grafana, and OpenTelemetry to ensure deep system visibility and proactive issue detection.
  • Defining SRE Practices: Establish and enforce SLIs, SLOs, and error budgets to balance innovation with reliability, embedding SRE culture across engineering teams.
  • Cloud Infrastructure Leadership: Lead design and optimization of multi-cloud (AWS/Azure) infrastructure for scalability, reliability, and cost efficiency, aligned with FinOps principles.
  • Platform Automation & Modernization: Oversee automation through Infrastructure-as-Code (Terraform, Ansible) and modernization of legacy systems into containerized, cloud-native architectures.
  • Cross-Functional Technical Strategy: Collaborate with product, data, and security teams to build cohesive, end-to-end solutions that align engineering efforts with business goals.
  • Mentorship & Technical Governance: Mentor engineering teams on DevOps, SRE, and cloud best practices while defining architectural standards and driving engineering excellence.
Production DeploymentDomain ArchitectureDevOpsCloud InfrastructureSRE Practices

Roku

Sr. Software Engineer, Infrastructure

Feb 2024Oct 2025 · 1 yr 8 mos · Bengaluru, Karnataka, India · Hybrid

  • Roku, the leading TV streaming platform in the U.S., Canada, and Mexico based on streaming hours, is committed to transforming global television viewing. My role involves evaluating and implementing advanced monitoring and observability solutions optimized for our unique operational environment. This includes hands-on experience with technologies such as Prometheus, Grafana, the ELK stack, Datadog, Jaeger, and OpenTelemetry, to ensure optimal selection and utilization.
  • Effective collaboration is paramount. I work closely with development and operations teams to instrument our infrastructure, guaranteeing comprehensive monitoring of all applications and services. This is achieved by adhering to best practices, minimizing overhead while maximizing visibility and actionable insights.
  • My daily responsibilities encompass configuring alert thresholds, leading incident response, and leveraging observability data for both reactive problem-solving and proactive system improvements. This includes identifying performance bottlenecks and collaborating on scalability enhancements, viewing each challenge as an opportunity for optimization.
  • Automation is a critical component of my position. I develop scripts and integrations to streamline processes, ensuring seamless integration of our observability solutions with CI/CD pipelines and other core systems.
  • My responsibilities include: managing AWS production application infrastructure (including networking, EKS & ECS clusters); Kubernetes cluster management; efficient IoT device-to-server communication design and implementation; leveraging specialized skills in Python, Infrastructure as Code, and AWS services to build high-quality production applications; developing cloud computing strategies; creating cloud adoption plans (AWS, GCP, Azure); designing cloud applications; and managing and monitoring cloud environments.
TerraformKubernetesCommunicationobservabilityPython (Programming Language)Amazon Web Services (AWS)+2

Aidash

Staff Engineer

Mar 2022Feb 2024 · 1 yr 11 mos · Bengaluru, Karnataka, India

  • DevOps Expertise for High-Growth Startups
  • DevOps Leadership and Infrastructure
  • Driving DevOps Excellence: Orchestrating the deployment, automation, and optimization of cloud-based infrastructure and software delivery pipelines for high-performing startups.
  • Agile DevOps Leadership: Enabling cross-functional teams to achieve continuous integration and delivery through collaborative workflows, tools, and best practices.
  • Infrastructure as Code Advocate: Leveraging cutting-edge technologies and infrastructure automation tools (e.g., Terraform, Kubernetes, Ansible) to streamline operations, reduce costs, and enhance reliability.
  • Scalability, Performance, and Cloud Expertise
  • Scalability and Performance Optimization * Designing and implementing scalable architectures, performance monitoring strategies, and cloud cost optimization techniques to support rapid growth and optimal resource utilization.
  • CI/CD Champion: Implementing robust continuous integration and delivery pipelines, enabling startups to rapidly deliver features and enhancements while maintaining quality and stability.
  • Cloud Architecture Expert: Designing and managing cloud-based architectures ( AWS, Azure, Google Cloud) to maximize uptime, security, and cost efficiency while aligning with business objectives.
TerraformKubernetesCommunicationGitHubProgrammingCloud Computing+6

Cargill

Cloud Engineer

Jan 2020Mar 2022 · 2 yrs 2 mos · Bangalore

  • Implemented a comprehensive monitoring platform using victoriametrics and
  • established logging and tracing using coralogix for production systems on EKS,
  • managing a substantial 13 trillion metrics data points.
  • Worked on Kubernetes architecture, involving a multi-cluster setup using
  • cilium cluster mesh, end-to-end automation for cluster creations using
  • Terraform and Helm, and CI/CD pipelines using argoCD and Jenkins.
  • Planned and executed major upgrades for EKS clusters without downtime
  • using Terraform and Ansible, implementing disaster recovery (DR) for critical
  • P0 services.
  • Led the migration of around 300+ applications from EC2 architecture to
  • Kubernetes within an impressive 80-day timeframe, ensuring zero application
  • downtime.
  • Actively participated in the planning, design, and migration of 600+ services
  • to Google Cloud Platform (GCP), encompassing GCP projects, environment
  • setups, automation, GKE cluster and node pool designs, as well as CI/CD
  • implementation (Jenkins & argoCD), monitoring, logging, and necessary
  • automation.
  • Worked on system optimizations at scale to enhance reliability and uptime in
  • Kubernetes, introducing an organization-wide change management process.
  • Assisted in the creation of an SRE dashboard to capture all incidents with
  • severity, managing Root Cause Analyses (RCAs), uptime, alert configurations,
  • and analytics.
  • Actively organized and participated in Root Cause Analysis (RCA) and
  • quarterly reflection meetings to analyze uptime/outages, ensuring the highest
  • application availability.
  • Implemented a Slack bot for canary manual promotions (flagger) using Flask
  • and Slack APIs, contributing to multiple automations to reduce manual
  • interventions.
  • Ensured system stability,
  • documenting and supporting standards and procedures as per the
  • organization's guidelines.
TerraformKubernetesCommunicationAnsibleLinuxInfrastructure as code (IaC)+15

Microland limited

Cloud Architect

Jun 2016Jan 2020 · 3 yrs 7 mos · Bengaluru · On-site

  • Migrating and managing all microservices-based applications on the AWS EKS cluster
  • Implementation of various tools like Kafka, logging stack(EFK) and monitoring stack(Thanos) in eks cluster using helm charts
  • Security using Cloudflare Disaster management plan with Terraform and bash script CI/CD using Jenkins and argo-cd
  • Load Balancing services using nginx ingress controller and ALB ingress controller
  • Implementing and managing all contemporary technologies such as Java, and NodeJS applications in kubernetes(EKS)
  • Configuration management using helm charts for kubernetes clusters for major services
  • Monitoring and logging using Aws Cloudwatch, Thanos and grafana and Elasticsearch Fluentd and Kibana
  • Implementation and integration of application tracing using Jaeger
  • Automation using bash scripting and python
  • SRE, On-call for all projects, used ELK, New-Relic, Cloudwatch, PagerDuty
  • Automation by Contributing towards IAC by writing terraform modules, creating images through Packer, writing saltstack formulas for Aerospike, access etc.
JFrog ArtifactoryTerraformKubernetesCommunicationLinuxInfrastructure as code (IaC)+12

Maq software

Software Engineer

May 2015Jul 2015 · 2 mos · Greater Hyderabad Area

  • Developing .net platform Based application

Genpact headstrong capital markets

Software Trainee

May 2013Jul 2013 · 2 mos · Noida Area, India

  • Working on hybrid mobile application development

Education

National Institute of Technology, Tiruchirappalli

Bachelor's degree — Computer Science

Jan 2012Jan 2016

Stackforce found 100+ more professionals with Cloud Infrastructure & Sre Practices

Explore similar profiles based on matching skills and experience