Utkarsh Kumar

DevOps Engineer

Noida, Uttar Pradesh, India11 yrs 8 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in AWS and Kubernetes for cloud infrastructure.
  • Led successful migration to microservices architecture.
  • Achieved 50% cost reduction through automation.
Stackforce AI infers this person is a DevOps expert in cloud infrastructure and automation within the SaaS industry.

Contact

Skills

Core Skills

DevopsKubernetesCi/cdCloud ComputingInfrastructure Management

Other Skills

Amazon EKSArgocdShell ScriptingDockerPythonContinuous Delivery (CD)Continuous Integration (CI)Amazon Web Services (AWS)AWS LambdaDocker ProductsCICDPackerHelm ChartsBitbucketTerraforn

About

At Paytm, my role as a DevOps Lead centers around enhancing the scalability, availability, and security of cloud infrastructures. With a deep understanding of AWS services, I've been instrumental in transforming and automating cloud architectures, ensuring robust monitoring, and facilitating large-scale deployments. Our team's efforts in infrastructure auto-scaling and cost optimization reflect our commitment to operational excellence and fiscal responsibility. My tenure at Collegedunia honed my ability to manage complex DevOps challenges, leading to a seamless transition to Paytm. Here, I've implemented high availability (HA) and disaster recovery (DR) solutions, written scripts for resource optimization, and contributed to the migration to a microservices architecture, all underscored by my AWS Solutions Architect certification. Our achievements in infrastructure automation and proactive monitoring have bolstered system reliability and team efficiency.

Experience

11 yrs 8 mos
Total Experience
2 yrs 11 mos
Average Tenure
4 yrs
Current Experience

Paytm

2 roles

Senior Devops Lead Engineer

Promoted

Jul 2024Present · 1 yr 10 mos · Noida, Uttar Pradesh, India

  • Actively contributing to production issue resolution as part of the RCA team, collaborating across teams to reduce mean time to recovery (MTTR) and implement long-term preventive actions.
  • Kubernetes platform optimisation initiatives to enhance stability and cost-efficiency:
  • Enforced PodDisruptionBudgets (PDBs) for safe rollout and node draining.
  • Enabled graceful termination of pods during node lifecycle events.
  • Tuned CoreDNS ndots settings for improved DNS lookup performance.
  • Integrated Mittens init container to simulate traffic for cold-start validation
  • Refactored existing Helm templates to support highly variable and reusable deployment patterns using conditional logic and environment-specific overrides.
  • Implemented KEDA for event-driven autoscaling of pods and Karpenter for efficient and dynamic node autoscaling, replacing static ASGs.
  • Keycloak integration for SSO login in jenkins, ArgoCD, AWS.
  • Management of user-management for SSH into servers for different projects.
  • Conducting of structured knowledge transfer sessions for interns, Devops/Senior DevOps engineers, focusing on production readiness, deployment hygiene and troubleshooting playbooks.
  • Managed On-call roasters, agile sprint planning and execution of ad-hoc production tickets, while aligning with team KRAs to track delivery and performance.
Amazon EKSArgocdDevOpsKubernetes

DevOps Lead

Jun 2022Jul 2024 · 2 yrs 1 mo · Noida, Uttar Pradesh, India

  • Managed infrastructure and CI/CD for 2200+ applications across production and non-prod EKS clusters.
  • Created standardized Terraform modules for cloud resource provisioning and drift management.
  • Designed 200+ Jenkins pipelines for Maven builds, Docker image creation, Helm updates, Ansible deployments, GitOps flows, and JFrog Artifactory publishing.
  • Built canary deployment framework for EC2-based services using Jenkins and CodeDeploy, with developer-controlled rollbacks via email triggers.
  • Reduced infra cost ~50% by implementing Spot Instances and Golden AMI automation via HashiCorp Packer (saved $150/day by reducing EBS volumes to 8 GB).
  • Implemented ArgoCD-based CD with Istio mesh for app-level rollout, rollback, and promotion by developers.
  • Created optmised base images and updated on centralised ECR to use among the projects with basic configuration.
  • Configured HAProxy for request header manipulation and rate limiting, integrated with AWS Auto Scaling. Later migrated to self managed API gateway.
  • Built self-healing systems using SNS, Lambda, and Ansible triggered by CloudWatch alerts.
  • Configured logging, monitoring and alerting with Prometheus, Grafana, Alertmanager, and ELK/EFK stack.
Shell ScriptingDockerDevOpsCI/CD

Collegedunia

DevOps Team Lead

May 2021Jun 2022 · 1 yr 1 mo · Gurugram, Haryana, India · Remote

  • Led 10+ projects, reporting directly to the CTO and overseeing all aspects of infrastructure, CI/CD, alerting, monitoring, security, and automation.
  • Designed and managed the migration from hybrid cloud to AWS, implementing cost-optimisation strategies using Savings Plans, Spot Instances, and instance right-sizing, achieving substantial cost reductions in non-production environments.
  • Modernised deployment strategy by creating a repeatable, automated deployment process for standalone applications, replacing manual steps with Ansible playbooks and rollback-capable shell scripts.
  • Re-architected monolithic services into highly available, auto-scalable infrastructure using ALB, ASG, CloudFront CDN, and AWS WAF to improve reliability, latency, and scalability.
  • Developed centralized Terraform modules and Ansible roles for provisioning and configuring AWS services enabling reusable, consistent IaC.
  • Automated infrastructure housekeeping and optimisation tasks, such as:
  • Detecting and alerting on unused EIPs and EBS volumes
  • Retaining only the last 5 AMIs per ASG launch template
  • Auto start/stop of instances based on scheduled tags
  • Dynamic IP blocking via AWS WAF based on traffic thresholds
  • Implemented suspicious activity detection and incident response automation using CloudTrail, EventBridge, SNS, and Lambda, covering IAM, EC2, GuardDuty, and SSO events.
  • Built centralized monitoring and alerting systems using Telegraf, InfluxDB, Grafana, and CloudWatch, with real-time Slack alerts for production-critical metrics and anomaly detection.
  • Setup and enforce AWS SSO for centralised login across multiple AWS accounts, defining granular permissions and access roles for DevOps, developers, and admins.
  • Defined and enforced organisation-wide policies to ensure security and governance:
  • Mandatory tagging, Region and instance-type restrictions, IAM admin role controls, AWS service usage restrictions.
Shell ScriptingPythonDevOpsCloud Computing

Samsung electronics

3 roles

Lead Engineer

Promoted

Jul 2017Apr 2021 · 3 yrs 9 mos · On-site

  • Developed reusable Ansible playbooks to automate repetitive operational tasks, reducing manual intervention and human error.
  • Created and maintained centralised Terraform modules to standardise infrastructure provisioning across environments and teams.
  • Designed and implemented a multi-region architecture for the Smart TV interface service to ensure global availability and failover capability.
  • Enabled cost optimization by:
  • Defining dynamic and scheduled scaling policies for Auto Scaling Groups (ASGs) based on usage trends and peak hours.
  • Version management by using Launch template and removal of Launch configuration using automation.
  • Automating detection of unused/underutilised resources via EventBridge rules and Lambda-driven reporting.
  • Integrated AWS alerts with Slack to ensure real-time visibility and faster incident response for production-critical events.
  • Implemented LDAP-based SSH authentication, granting role-based access control to servers and simplifying centralised login management.
  • Set up and managed Jenkins master-slave architecture for distributed builds, handling parallel pipelines and diverse use cases.
  • Installed and maintained JFrog Artifactory for artifact management and internal package distribution, improving build and deployment consistency.
Continuous Delivery (CD)Continuous Integration (CI)DevOpsInfrastructure Management

Senior Software Engineer

Oct 2016Jun 2017 · 8 mos · On-site

  • Managed infrastructure for 11+ services across multiple AWS regions, overseeing over 200 EC2 instances with high availability and scalability requirements.
  • Designed and reviewed service architectures to ensure high availability, fault tolerance, and disaster recovery (DR) compliance.
  • Implemented automation for cloud cost optimisation using Lambda and Python scripts:
  • Identified and cleaned up unused resources (EBS volumes, Elastic IPs, AMIs)
  • Enabled S3 Intelligent Tiering for automated storage cost reduction
  • Generated under utilised resource reports for team action
  • Performed detailed AWS billing analysis and recommended cost-saving strategies, including reserved instances and spot instance usage.
  • Configured CloudWatch monitoring, custom metrics, and alerting pipelines using Lambda to proactively detect and respond to production issues.
  • Migrated legacy server-based workloads to server less (AWS Lambda and Kubernetes architectures) for improved scalability and cost-efficiency.
  • Acted as a core member of the RCA (Root Cause Analysis) team, investigating high-impact incidents and contributing to postmortems and incident prevention plans.
  • Part of a centralised DevOps team responsible for:
  • Enforcing organisation-wide security and compliance policies
  • Managing IAM roles and cross-team access control strategies
  • Set up secure SSH access across environments, enabling controlled server login using username/password and enforcing audit best practices.
Amazon Web Services (AWS)AWS LambdaCloud ComputingInfrastructure Management

Software Engineer

Dec 2014Sep 2016 · 1 yr 9 mos · On-site

  • Provisioned and managed AWS infrastructure for Smart TV applications using AWS Console, ensuring optimal setup for scalability and security.
  • Executed ad hoc cloud tasks based on JIRA requests from development teams, including:
  • AWS Lambda function creation and configuration, Security Group rule updates for fine-grained access control, IAM user and role creation with least-privilege policies, S3 bucket policy implementation for secure object storage access etc.
  • Collaborated with multiple development teams to support and automate deployment workflows, reducing time-to-deploy and operational overhead.
Amazon Web Services (AWS)Shell Scripting

Samsung r&d institute delhi

Internship

Jul 2014Dec 2014 · 5 mos · India · On-site

  • Assisted in understanding and implemention of technical features from the development team and communicated with QA to facilitate verification processes.
  • Collaborated with cross-functional teams to integrate various TV modules, ensuring seamless feature functionality.
  • Developed use case documentation relevant to specific features to support development and testing.
  • Contributed to improving feature integration workflows and enhancing overall product quality.

Education

Motilal Nehru National Institute Of Technology

Bachelor of Technology (B.Tech.) — Computer Science

Jan 2010Jan 2014

Stackforce found 100+ more professionals with Devops & Kubernetes

Explore similar profiles based on matching skills and experience