Nikhil Maheshwari

SRE (Site Reliability Engineer)

Rajasthan, India4 yrs 11 mos experience
Most Likely To Switch

Key Highlights

  • Improved AWS Security Hub score significantly.
  • Achieved substantial cost reductions in cloud infrastructure.
  • Automated critical infrastructure tasks for efficiency.
Stackforce AI infers this person is a Cloud Infrastructure and DevOps specialist with a focus on automation and cost optimization.

Contact

Skills

Core Skills

Cloud InfrastructureSite Reliability EngineeringCost OptimizationInfrastructure AutomationOpen Source DevelopmentDevops EngineeringMonitoring

Other Skills

AWSAWS LambdaAmazon EKSAnsibleCloudWatchCloudflareCost ManagementCustomer ServiceDockerDocumentationGitGrafanaHashicorp VaultHelmIncident Response

About

I am a results-oriented DevOps Engineer with a strong focus on driving innovation, enhancing productivity, and ensuring the seamless functioning of critical projects. I thrive in collaborative environments and consistently deliver solutions that meet and exceed organizational goals.

Experience

Zeta

Site Reliability Engineer II

Jul 2023Present · 2 yrs 8 mos · Hyderabad, Telangana, India · On-site

  • As a Site Reliability Engineer at Zeta, I focus on optimizing cloud infrastructure security, cost-efficiency, and reliability through automation and monitoring improvements.
  • 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆: I improved Zeta's AWS Security Hub score from 59 to 83 by addressing critical vulnerabilities, including public S3 access and open security group ports. I enforced security best practices across the infrastructure to enhance protection.
  • 𝗖𝗼𝘀𝘁 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: I led cost-saving initiatives, including establishing an AWS tagging strategy that improved cost tracking. I also recommended optimal pod rightsizing in Kubernetes, reducing EC2 costs by 56%. Additionally, I automated misconfigured pod detection, improving resource efficiency.
  • 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻: I automated key infrastructure tasks, such as token renewals and secure token fetching from Hashicorp Vault using RVault. I imported all S3 buckets into Terraform and automated incident reporting by integrating AWS Connect, Lambda, and Jira via IVR systems.
  • 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴: I enhanced monitoring by refactoring Helm charts to implement centralized alerting and comprehensive health checks across all clusters, ensuring greater visibility and faster issue resolution.
  • 𝟮𝟰/𝟳 𝗢𝗻-𝗖𝗮𝗹𝗹 𝗦𝘂𝗽𝗽𝗼𝗿𝘁: Provided 24/7 incident response while ensuring adherence to SLAs and SLOs. Managed availability, and latency issues, and drafted RCAs to maintain optimal system performance.
  • 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 & 𝗔𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: I optimized Horizontal Pod Autoscaler (HPA) behavior, improved performance by resolving CPU throttling issues, and conducted POCs for pod scheduling using affinities. I also implemented KEDA for lag-based autoscaling to automate scaling for performance optimization.
  • At Zeta, my focus is on building scalable, cost-effective, and secure cloud infrastructure to support business needs while ensuring operational resilience.
AWSKubernetesTerraformHashicorp VaultHelmMonitoring+3

Cloud native computing foundation (cncf)

Open Source Contributor

Apr 2021Present · 4 yrs 11 mos · Jaipur, Rajasthan, India · Remote

  • Devtron: Fix the HPA Helm Chart - https://github.com/devtron-labs/charts/pull/46/files
  • Valero: Updated the Documentation - https://github.com/vmware-tanzu/velero/pull/5608
  • Kyverno: Updated the Kyverno Policy - https://github.com/kyverno/policies/pull/1103; https://github.com/kyverno/website/pull/1314
  • Amazon EKS (Best Practices): Update the Manifest with Comments for better understanding - https://github.com/aws/aws-eks-best-practices/pull/551
GitDocumentationOpen Source ContributionOpen Source Development

Delhivery

DevOps Engineer

Apr 2021Jul 2023 · 2 yrs 3 mos · Gurugram, Haryana, India · Hybrid

  • At Delhivery, I focused on enhancing security, optimizing costs, and automating infrastructure to support operational efficiency and scalability.
  • 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆: I improved security by reducing the SLA for critical findings to under 12 hours and tripled the AWS Security Hub score by addressing key vulnerabilities, prioritizing risk-based fixes.
  • 𝗖𝗼𝘀𝘁 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: I reduced daily EKS costs by 15% through rightsizing and optimized RDS, Redis, Elasticsearch, and DynamoDB resources, achieving a 35% reduction in cloud costs. I also introduced VPC endpoints to lower data transfer expenses and optimized AZ resource utilization to handle traffic surges.
  • 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻: Automated infrastructure deployments using Terraform and Ansible, ensuring smooth setups. Deployed MongoDB services with RAID and Prometheus logging. Automated CloudWatch alarms and user-data scripts for efficient resource configuration. Streamlined SFTP setup for PAN India Kengic Sorters with centralized log monitoring.
  • 𝗖𝗜/𝗖𝗗: I migrated Python 2.7 applications from EC2 M4 to M5a series and shifted OS from Debian to Ubuntu 18. Switched deployment strategy from rolling updates to canary releases, cutting deployment time by 80%. Led cross-region Kubernetes and AWS migrations, improving scalability.
  • 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴: Developed custom CloudWatch alarms and Grafana dashboards for enhanced infrastructure monitoring. Categorized Kubernetes alarms by namespace, reducing noise and improving response.
  • 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 & 𝗔𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: I executed zero-downtime migrations for critical applications, optimized performance during peak periods, and ensured capacity for 2-3x traffic surges through strategic planning and disaster recovery.
TerraformAnsibleAWSKubernetesMonitoringCost Management+1

Education

Swami Keshvanand Institute of Technology, Jaipur

Bachelor of Technology (B.Tech) — Information Technology

Aug 2017Dec 2021

Stackforce found 100+ more professionals with Cloud Infrastructure & Site Reliability Engineering

Explore similar profiles based on matching skills and experience