Nikhil Maheshwari

SRE (Site Reliability Engineer)

Rajasthan, India5 yrs 1 mo experience

Most Likely To Switch

Key Highlights

Improved AWS Security Hub score significantly.
Achieved substantial cost reductions in cloud infrastructure.
Automated critical infrastructure tasks for efficiency.

Stackforce AI infers this person is a Cloud Infrastructure and DevOps specialist with a focus on automation and cost optimization.

Contact

Skills

Core Skills

Cloud InfrastructureSite Reliability EngineeringCost OptimizationInfrastructure AutomationOpen Source DevelopmentDevops EngineeringMonitoring

Other Skills

AWSAWS LambdaAmazon EKSAnsibleCloudWatchCloudflareCost ManagementCustomer ServiceDockerDocumentationGitGrafanaHashicorp VaultHelmIncident Response

About

I am a results-oriented DevOps Engineer with a strong focus on driving innovation, enhancing productivity, and ensuring the seamless functioning of critical projects. I thrive in collaborative environments and consistently deliver solutions that meet and exceed organizational goals.

Experience

5 yrs 1 mo

Total Experience

2 yrs 6 mos

Average Tenure

2 yrs 10 mos

Current Experience

Zeta

Site Reliability Engineer II

Jul 2023 – Present · 2 yrs 10 mos · Hyderabad, Telangana, India · On-site

As a Site Reliability Engineer at Zeta, I focus on optimizing cloud infrastructure security, cost-efficiency, and reliability through automation and monitoring improvements.
𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆: I improved Zeta's AWS Security Hub score from 59 to 83 by addressing critical vulnerabilities, including public S3 access and open security group ports. I enforced security best practices across the infrastructure to enhance protection.
𝗖𝗼𝘀𝘁 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: I led cost-saving initiatives, including establishing an AWS tagging strategy that improved cost tracking. I also recommended optimal pod rightsizing in Kubernetes, reducing EC2 costs by 56%. Additionally, I automated misconfigured pod detection, improving resource efficiency.
𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻: I automated key infrastructure tasks, such as token renewals and secure token fetching from Hashicorp Vault using RVault. I imported all S3 buckets into Terraform and automated incident reporting by integrating AWS Connect, Lambda, and Jira via IVR systems.
𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴: I enhanced monitoring by refactoring Helm charts to implement centralized alerting and comprehensive health checks across all clusters, ensuring greater visibility and faster issue resolution.
𝟮𝟰/𝟳 𝗢𝗻-𝗖𝗮𝗹𝗹 𝗦𝘂𝗽𝗽𝗼𝗿𝘁: Provided 24/7 incident response while ensuring adherence to SLAs and SLOs. Managed availability, and latency issues, and drafted RCAs to maintain optimal system performance.
𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 & 𝗔𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: I optimized Horizontal Pod Autoscaler (HPA) behavior, improved performance by resolving CPU throttling issues, and conducted POCs for pod scheduling using affinities. I also implemented KEDA for lag-based autoscaling to automate scaling for performance optimization.
At Zeta, my focus is on building scalable, cost-effective, and secure cloud infrastructure to support business needs while ensuring operational resilience.

AWSKubernetesTerraformHashicorp VaultHelmMonitoring+3

Cloud native computing foundation (cncf)

Open Source Contributor

Apr 2021 – Present · 5 yrs 1 mo · Jaipur, Rajasthan, India · Remote

Devtron: Fix the HPA Helm Chart - https://github.com/devtron-labs/charts/pull/46/files
Valero: Updated the Documentation - https://github.com/vmware-tanzu/velero/pull/5608
Kyverno: Updated the Kyverno Policy - https://github.com/kyverno/policies/pull/1103; https://github.com/kyverno/website/pull/1314
Amazon EKS (Best Practices): Update the Manifest with Comments for better understanding - https://github.com/aws/aws-eks-best-practices/pull/551

GitDocumentationOpen Source ContributionOpen Source Development

Delhivery

DevOps Engineer

Apr 2021 – Jul 2023 · 2 yrs 3 mos · Gurugram, Haryana, India · Hybrid

At Delhivery, I focused on enhancing security, optimizing costs, and automating infrastructure to support operational efficiency and scalability.
𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆: I improved security by reducing the SLA for critical findings to under 12 hours and tripled the AWS Security Hub score by addressing key vulnerabilities, prioritizing risk-based fixes.
𝗖𝗼𝘀𝘁 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: I reduced daily EKS costs by 15% through rightsizing and optimized RDS, Redis, Elasticsearch, and DynamoDB resources, achieving a 35% reduction in cloud costs. I also introduced VPC endpoints to lower data transfer expenses and optimized AZ resource utilization to handle traffic surges.
𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻: Automated infrastructure deployments using Terraform and Ansible, ensuring smooth setups. Deployed MongoDB services with RAID and Prometheus logging. Automated CloudWatch alarms and user-data scripts for efficient resource configuration. Streamlined SFTP setup for PAN India Kengic Sorters with centralized log monitoring.
𝗖𝗜/𝗖𝗗: I migrated Python 2.7 applications from EC2 M4 to M5a series and shifted OS from Debian to Ubuntu 18. Switched deployment strategy from rolling updates to canary releases, cutting deployment time by 80%. Led cross-region Kubernetes and AWS migrations, improving scalability.
𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴: Developed custom CloudWatch alarms and Grafana dashboards for enhanced infrastructure monitoring. Categorized Kubernetes alarms by namespace, reducing noise and improving response.
𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 & 𝗔𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: I executed zero-downtime migrations for critical applications, optimized performance during peak periods, and ensured capacity for 2-3x traffic surges through strategic planning and disaster recovery.

TerraformAnsibleAWSKubernetesMonitoringCost Management+1