Utkarsh Kumar

DevOps Engineer

Noida, Uttar Pradesh, India11 yrs 8 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in AWS and Kubernetes for cloud infrastructure.
Led successful migration to microservices architecture.
Achieved 50% cost reduction through automation.

Stackforce AI infers this person is a DevOps expert in cloud infrastructure and automation within the SaaS industry.

Contact

Skills

Core Skills

DevopsKubernetesCi/cdCloud ComputingInfrastructure Management

Other Skills

Amazon EKSArgocdShell ScriptingDockerPythonContinuous Delivery (CD)Continuous Integration (CI)Amazon Web Services (AWS)AWS LambdaDocker ProductsCICDPackerHelm ChartsBitbucketTerraforn

About

At Paytm, my role as a DevOps Lead centers around enhancing the scalability, availability, and security of cloud infrastructures. With a deep understanding of AWS services, I've been instrumental in transforming and automating cloud architectures, ensuring robust monitoring, and facilitating large-scale deployments. Our team's efforts in infrastructure auto-scaling and cost optimization reflect our commitment to operational excellence and fiscal responsibility. My tenure at Collegedunia honed my ability to manage complex DevOps challenges, leading to a seamless transition to Paytm. Here, I've implemented high availability (HA) and disaster recovery (DR) solutions, written scripts for resource optimization, and contributed to the migration to a microservices architecture, all underscored by my AWS Solutions Architect certification. Our achievements in infrastructure automation and proactive monitoring have bolstered system reliability and team efficiency.

Experience

11 yrs 8 mos

Total Experience

2 yrs 11 mos

Average Tenure

4 yrs

Current Experience

Paytm

2 roles

Senior Devops Lead Engineer

Promoted

Jul 2024 – Present · 1 yr 10 mos · Noida, Uttar Pradesh, India

Actively contributing to production issue resolution as part of the RCA team, collaborating across teams to reduce mean time to recovery (MTTR) and implement long-term preventive actions.
Kubernetes platform optimisation initiatives to enhance stability and cost-efficiency:
Enforced PodDisruptionBudgets (PDBs) for safe rollout and node draining.
Enabled graceful termination of pods during node lifecycle events.
Tuned CoreDNS ndots settings for improved DNS lookup performance.
Integrated Mittens init container to simulate traffic for cold-start validation
Refactored existing Helm templates to support highly variable and reusable deployment patterns using conditional logic and environment-specific overrides.
Implemented KEDA for event-driven autoscaling of pods and Karpenter for efficient and dynamic node autoscaling, replacing static ASGs.
Keycloak integration for SSO login in jenkins, ArgoCD, AWS.
Management of user-management for SSH into servers for different projects.
Conducting of structured knowledge transfer sessions for interns, Devops/Senior DevOps engineers, focusing on production readiness, deployment hygiene and troubleshooting playbooks.
Managed On-call roasters, agile sprint planning and execution of ad-hoc production tickets, while aligning with team KRAs to track delivery and performance.

Amazon EKSArgocdDevOpsKubernetes

DevOps Lead

Jun 2022 – Jul 2024 · 2 yrs 1 mo · Noida, Uttar Pradesh, India

Managed infrastructure and CI/CD for 2200+ applications across production and non-prod EKS clusters.
Created standardized Terraform modules for cloud resource provisioning and drift management.
Designed 200+ Jenkins pipelines for Maven builds, Docker image creation, Helm updates, Ansible deployments, GitOps flows, and JFrog Artifactory publishing.
Built canary deployment framework for EC2-based services using Jenkins and CodeDeploy, with developer-controlled rollbacks via email triggers.
Reduced infra cost ~50% by implementing Spot Instances and Golden AMI automation via HashiCorp Packer (saved $150/day by reducing EBS volumes to 8 GB).
Implemented ArgoCD-based CD with Istio mesh for app-level rollout, rollback, and promotion by developers.
Created optmised base images and updated on centralised ECR to use among the projects with basic configuration.
Configured HAProxy for request header manipulation and rate limiting, integrated with AWS Auto Scaling. Later migrated to self managed API gateway.
Built self-healing systems using SNS, Lambda, and Ansible triggered by CloudWatch alerts.
Configured logging, monitoring and alerting with Prometheus, Grafana, Alertmanager, and ELK/EFK stack.

Shell ScriptingDockerDevOpsCI/CD

Collegedunia

DevOps Team Lead

May 2021 – Jun 2022 · 1 yr 1 mo · Gurugram, Haryana, India · Remote

Led 10+ projects, reporting directly to the CTO and overseeing all aspects of infrastructure, CI/CD, alerting, monitoring, security, and automation.
Designed and managed the migration from hybrid cloud to AWS, implementing cost-optimisation strategies using Savings Plans, Spot Instances, and instance right-sizing, achieving substantial cost reductions in non-production environments.
Modernised deployment strategy by creating a repeatable, automated deployment process for standalone applications, replacing manual steps with Ansible playbooks and rollback-capable shell scripts.
Re-architected monolithic services into highly available, auto-scalable infrastructure using ALB, ASG, CloudFront CDN, and AWS WAF to improve reliability, latency, and scalability.
Developed centralized Terraform modules and Ansible roles for provisioning and configuring AWS services enabling reusable, consistent IaC.
Automated infrastructure housekeeping and optimisation tasks, such as:
Detecting and alerting on unused EIPs and EBS volumes
Retaining only the last 5 AMIs per ASG launch template
Auto start/stop of instances based on scheduled tags
Dynamic IP blocking via AWS WAF based on traffic thresholds
Implemented suspicious activity detection and incident response automation using CloudTrail, EventBridge, SNS, and Lambda, covering IAM, EC2, GuardDuty, and SSO events.
Built centralized monitoring and alerting systems using Telegraf, InfluxDB, Grafana, and CloudWatch, with real-time Slack alerts for production-critical metrics and anomaly detection.
Setup and enforce AWS SSO for centralised login across multiple AWS accounts, defining granular permissions and access roles for DevOps, developers, and admins.
Defined and enforced organisation-wide policies to ensure security and governance:
Mandatory tagging, Region and instance-type restrictions, IAM admin role controls, AWS service usage restrictions.

Shell ScriptingPythonDevOpsCloud Computing

Samsung electronics

3 roles

Lead Engineer

Promoted

Jul 2017 – Apr 2021 · 3 yrs 9 mos · On-site

Developed reusable Ansible playbooks to automate repetitive operational tasks, reducing manual intervention and human error.
Created and maintained centralised Terraform modules to standardise infrastructure provisioning across environments and teams.
Designed and implemented a multi-region architecture for the Smart TV interface service to ensure global availability and failover capability.
Enabled cost optimization by:
Defining dynamic and scheduled scaling policies for Auto Scaling Groups (ASGs) based on usage trends and peak hours.
Version management by using Launch template and removal of Launch configuration using automation.
Automating detection of unused/underutilised resources via EventBridge rules and Lambda-driven reporting.
Integrated AWS alerts with Slack to ensure real-time visibility and faster incident response for production-critical events.
Implemented LDAP-based SSH authentication, granting role-based access control to servers and simplifying centralised login management.
Set up and managed Jenkins master-slave architecture for distributed builds, handling parallel pipelines and diverse use cases.
Installed and maintained JFrog Artifactory for artifact management and internal package distribution, improving build and deployment consistency.

Continuous Delivery (CD)Continuous Integration (CI)DevOpsInfrastructure Management

Senior Software Engineer

Oct 2016 – Jun 2017 · 8 mos · On-site

Managed infrastructure for 11+ services across multiple AWS regions, overseeing over 200 EC2 instances with high availability and scalability requirements.
Designed and reviewed service architectures to ensure high availability, fault tolerance, and disaster recovery (DR) compliance.
Implemented automation for cloud cost optimisation using Lambda and Python scripts:
Identified and cleaned up unused resources (EBS volumes, Elastic IPs, AMIs)
Enabled S3 Intelligent Tiering for automated storage cost reduction
Generated under utilised resource reports for team action
Performed detailed AWS billing analysis and recommended cost-saving strategies, including reserved instances and spot instance usage.
Configured CloudWatch monitoring, custom metrics, and alerting pipelines using Lambda to proactively detect and respond to production issues.
Migrated legacy server-based workloads to server less (AWS Lambda and Kubernetes architectures) for improved scalability and cost-efficiency.
Acted as a core member of the RCA (Root Cause Analysis) team, investigating high-impact incidents and contributing to postmortems and incident prevention plans.
Part of a centralised DevOps team responsible for:
Enforcing organisation-wide security and compliance policies
Managing IAM roles and cross-team access control strategies
Set up secure SSH access across environments, enabling controlled server login using username/password and enforcing audit best practices.

Amazon Web Services (AWS)AWS LambdaCloud ComputingInfrastructure Management

Software Engineer

Dec 2014 – Sep 2016 · 1 yr 9 mos · On-site

Provisioned and managed AWS infrastructure for Smart TV applications using AWS Console, ensuring optimal setup for scalability and security.
Executed ad hoc cloud tasks based on JIRA requests from development teams, including:
AWS Lambda function creation and configuration, Security Group rule updates for fine-grained access control, IAM user and role creation with least-privilege policies, S3 bucket policy implementation for secure object storage access etc.
Collaborated with multiple development teams to support and automate deployment workflows, reducing time-to-deploy and operational overhead.

Amazon Web Services (AWS)Shell Scripting

Samsung r&d institute delhi

Internship

Jul 2014 – Dec 2014 · 5 mos · India · On-site

Assisted in understanding and implemention of technical features from the development team and communicated with QA to facilitate verification processes.
Collaborated with cross-functional teams to integrate various TV modules, ensuring seamless feature functionality.
Developed use case documentation relevant to specific features to support development and testing.
Contributed to improving feature integration workflows and enhancing overall product quality.