Dilip Rathod — SRE (Site Reliability Engineer)

I am a dedicated Senior Site Reliability Engineer with over 7 years of experience in the DevOps and cloud infrastructure domain. My expertise lies in building, automating, and optimizing systems to ensure performance, scalability, and cost efficiency. Throughout my career, I’ve had the opportunity to work across various industries, helping companies implement robust infrastructure solutions, streamline operations, and enhance observability. With extensive hands-on experience in Kubernetes, Docker, AWS, Azure, and GCP, I have led numerous projects centered around Kubernetes management and optimization, including migrating services to Kubernetes, automating scaling, and enhancing system monitoring. I am passionate about leveraging modern tools like Terraform and KEDA, alongside monitoring solutions such as Prometheus and Datadog, to create reliable and efficient systems. Key focus areas in my work include: Kubernetes Management: Streamlining and managing Kubernetes clusters to improve scalability, performance, and reliability across environments. Kubernetes Automation: Automating deployments, scaling, and service management using Kubernetes-native tools like Helm, KEDA, and AWS Load Balancer Controller. Kubernetes Monitoring and Observability: Implementing robust monitoring solutions like Prometheus and Datadog to ensure the health and performance of Kubernetes workloads. Kubernetes Cost Optimization: Using tools like Kubecost and other cost management strategies to optimize resource usage and reduce cloud spending. I enjoy solving complex technical challenges in Kubernetes environments and continuously seek opportunities to innovate and optimize cloud infrastructure. Let’s connect and explore how we can collaborate to build more resilient, cost-effective, and scalable Kubernetes-driven solutions.

Stackforce AI infers this person is a SaaS Infrastructure Engineer with a strong focus on Kubernetes and cloud optimization.

Experience: 8 yrs 9 mos

Skills

Kubernetes Management
Infrastructure Automation
Cloud Cost Optimization
Deployment Automation
Infrastructure Management

Career Highlights

Over 7 years of experience in DevOps and cloud infrastructure.
Expert in Kubernetes management and optimization.
Proven track record of cost-saving initiatives in cloud infrastructure.

Work Experience

Clari

Sr. Site Reliability Engineer (3 yrs 1 mo)

Joveo

Lead Devops Engineer (2 yrs 5 mos)

Dream11

SD2- Devops (7 mos)

Qubole

MTS-Devops (8 mos)

Site Reliability Engineer (1 yr)

InMobi

Site Reliability Engineer (1 yr)

Education

Bachelor’s Degree at MIT Academy of Engineering

Dilip Rathod

SRE (Site Reliability Engineer)

India8 yrs 9 mos experience

Most Likely To Switch

Key Highlights

Over 7 years of experience in DevOps and cloud infrastructure.
Expert in Kubernetes management and optimization.
Proven track record of cost-saving initiatives in cloud infrastructure.

Stackforce AI infers this person is a SaaS Infrastructure Engineer with a strong focus on Kubernetes and cloud optimization.

Contact

Skills

Core Skills

Kubernetes ManagementInfrastructure AutomationCloud Cost OptimizationDeployment AutomationInfrastructure Management

Other Skills

AWS Command Line Interface (CLI)Amazon CloudWatchAmazon EC2Amazon EKSAmazon Web Services (AWS)AnsibleAuto Scaling GroupsAutomationBashCC++Capacity PlanningChefCloudWatchContainerization

About

Experience

8 yrs 9 mos

Total Experience

1 yr 9 mos

Average Tenure

3 yrs 1 mo

Current Experience

Clari

Sr. Site Reliability Engineer

May 2023 – Present · 3 yrs 1 mo · Bengaluru, Karnataka, India · Remote

Led Kubernetes stack management and optimization.
Spearheaded infrastructure cost-saving initiatives.
Enhanced observability and monitoring with Datadog and Prometheus.
Set up a logging stack with CloudWatch to send Kubernetes logs for centralized logging
and monitoring.
Automated environment setup using Terraform and Terragrunt to streamline
infrastructure provisioning and management.
Implemented ArgoCD for continuous deployment (CD) to Kubernetes, automating the
deployment process and ensuring seamless updates across environments.

KubernetesTerraformDatadogPrometheusCloudWatchKubernetes Management+1

Joveo

Lead Devops Engineer

Nov 2020 – Apr 2023 · 2 yrs 5 mos · Hyderabad, Telangana, India

Managed and optimized Kubernetes infrastructure, ensuring seamless operations and system reliability.
Implemented cloud cost optimization strategies, significantly reducing expenses while enhancing operational efficiency.
Implemented cloud cost optimization strategies, significantly reducing expenses while enhancing operational efficiency.
Improved system reliability by automating processes and deploying advanced monitoring tools for proactive issue resolution.
Successfully migrated over 60 services to Kubernetes, ensuring smooth transitions with minimal downtime.
Led the migration from Datadog to Grafana Cloud, streamlining observability and monitoring efforts.
Achieved a 50% reduction in COGS through strategic cost optimization initiatives.

KubernetesCloud Cost OptimizationMonitoring ToolsDatadogGrafanaKubernetes Management

Dream11

SD2- Devops

Apr 2020 – Nov 2020 · 7 mos · Mumbai, Maharashtra, India · Hybrid

Implemented one-click deployment for multiple Auto Scaling Groups (ASG) at Dream11.
Automated routing based on service stack numbers to streamline deployment processes.
Enhanced deployment logic to enable auto-scaling of service stacks based on user input.

Auto Scaling GroupsDeployment Automation

Qubole

2 roles

MTS-Devops

Promoted

Aug 2019 – Apr 2020 · 8 mos

Managed logging infrastructure and ensured high service availability at Qubole in Bengaluru, India.
Implemented application monitoring using SignalFx and automated infrastructure processes with Terraform.
Optimized cloud infrastructure with Cloud Custodian to enhance performance and efficiency.

Logging InfrastructureMonitoring ToolsTerraformInfrastructure Management

Site Reliability Engineer

Jul 2018 – Jul 2019 · 1 yr

Inmobi

Site Reliability Engineer

Jun 2017 – Jun 2018 · 1 yr · Bengaluru Area, India

Developed a web-based terminal for Mesos containers for developer debugging.
Implemented log analysis and monitoring systems to detect suspicious network activity.
Contributed to service deployment, onboarding, and monitoring.

Web-based Terminal DevelopmentLog AnalysisInfrastructure Management