Jasvinder Singh — Co-Founder

Experienced Site Reliability Engineering Manager with over 14 years in DevOps and SRE, specializing in optimizing system reliability, performance, and scalability. Proven track record of leading high-performance teams in managing large-scale e-commerce platforms on AWS and GCP, handling millions of requests per second. Skilled in implementing SRE best practices, fostering cross-functional collaboration, and driving continuous improvement. Committed to delivering high-quality software through fault-tolerant design, robust incident management, and comprehensive capacity planning. Strong advocate for a DevOps culture, promoting innovation and operational excellence in fast-paced environments. Key Skills: ➢Leadership: Dynamic leader with expertise in hiring, coaching, and mentoring DevOps/SRE teams. Skilled in providing actionable feedback, driving progress, and fostering motivation and creativity to achieve exceptional results ➢CI/CD Expertise: Spearheaded cutting-edge CI/CD pipeline implementation using code to set a standard of excellence. Expert in Groovy, Kube-Jenkins, helm-charts, and Argocd for seamless deployment of various applications, optimizing workflows. ➢Infrastructure Architecture: Experienced in designing, implementing, and supporting large-scale in-house and cloud infrastructure (AWS, GCP) as code. ➢Kubernetes and Containers: Proficient in automating service onboarding with Kubernetes. Skilled in setting up the contour ingress controller, cilium, and Istio for effective inter-service communication, optimizing workflows. ➢Cloud Security: Skilled cloud architect in implementing robust security policies, conducting security analysis, and gap assessment. Strong in building cloud security roadmaps and ensuring client engagement for successful cloud transformation. ➢AWS Services: Hands-on experience with AWS services, including VPC, EC2, EKS, RDS, S3, Lambda, DynamoDB, CloudWatch, and more. ➢Google Cloud Services: Extensive hands-on experience with Google Cloud services, including BigQuery, Cloud Storage, Compute Engine, Kubernetes Engine, Terraform, and more.

Stackforce AI infers this person is a SaaS Infrastructure Architect with extensive experience in cloud automation and site reliability engineering.

Location: Bengaluru, Karnataka, India

Experience: 12 yrs 9 mos

Skills

Cloud Automation
Devops
Cloud Architecture
Site Reliability Engineering
Cross-functional Collaboration
Kubernetes
Infrastructure Design
Deployment Automation

Career Highlights

Led high-performance teams in large-scale e-commerce platforms.
Expert in cloud automation and Kubernetes management.
Proven track record in driving continuous improvement.

Work Experience

Adda247

Engineering Manager - DevOps & SRE (7 mos)

Ik8OI.IO

Co-Founder (1 yr 3 mos)

Meesho

DevOps Architect (1 yr 1 mo)

Innovaccer

Senior Engineering Manager - Site Reliability (7 mos)

Tokopedia

Head of DevOps (2 yrs 4 mos)

BirdEye

Lead Technical Specialist (9 mos)

Snapdeal

Lead devops engineer (1 yr 1 mo)

Sr Devops Engineer (2 yrs 4 mos)

Vcare software Solutions Pvt Ltd

Unix/Linux System Administrator (11 mos)

CCS COMPUTERS PVT lTD

Aix/Linux Support Engineer (3 yrs 1 mo)

Education

Master of Computer Applications - MCA at Maharaja Agrasen Himalayan Garhwal University

Master of Business Administration - MBA at Maharaja Agrasen Himalayan Garhwal University

Btech C.S. at C.S.J.M. University

Jasvinder Singh

Co-Founder

Bengaluru, Karnataka, India12 yrs 9 mos experience

Key Highlights

Led high-performance teams in large-scale e-commerce platforms.
Expert in cloud automation and Kubernetes management.
Proven track record in driving continuous improvement.

Stackforce AI infers this person is a SaaS Infrastructure Architect with extensive experience in cloud automation and site reliability engineering.

Contact

Skills

Core Skills

Cloud AutomationDevopsCloud ArchitectureSite Reliability EngineeringCross-functional CollaborationKubernetesInfrastructure DesignDeployment Automation

Other Skills

AIX AdministrationAWSAmazon EKSAmazon Web Services (AWS)ApacheApache KafkaApache ZooKeeperBashCICDCloud InfrastructureContainer OrchestrationContinuous Integration and Continuous Delivery (CI/CD)DNS ServerDatabasesDomain Architecture

About

Experience

12 yrs 9 mos

Total Experience

1 yr 7 mos

Average Tenure

Current Experience

Adda247

Engineering Manager - DevOps & SRE

Oct 2024 – May 2025 · 7 mos · Gurugram, Haryana, India · On-site

Ik8oi.io

Co-Founder

May 2023 – Aug 2024 · 1 yr 3 mos · Noida · On-site

As the co-founder of Ik8OI.IO, I led the creation of a cutting-edge platform designed to redefine cloud automation for Kubernetes (k8s) environments. The core vision behind Ik8OI.IO was to empower organizations to streamline their DevOps workflows and embrace automation at scale, enabling faster, more reliable, and efficient software delivery.
Ik8OI.IO specializes in:
Comprehensive Cloud Automation: Simplifies Kubernetes operations by automating deployment, scaling, and management processes, reducing complexities for DevOps teams.
Intelligent CI/CD Pipelines: Fully automates the CI/CD lifecycle, reducing manual interventions and significantly accelerating deployment times.
Scalability & Flexibility: Built to adapt seamlessly to hybrid and multi-cloud environments, supporting businesses of all sizes.
Real-Time Observability: Delivers actionable insights and monitoring tools, helping teams optimize performance and maintain operational excellence.
Under my leadership, Ik8OI.IO became a transformative solution, helping organizations achieve faster time-to-market, reduced operational overhead, and improved system reliability. The platform’s innovative approach and impactful results caught the attention of a leading tech organization, culminating in its successful acquisition.
This milestone reflects my dedication to building solutions that address modern cloud challenges and my passion for advancing automation in the DevOps ecosystem.

CICDCloud AutomationDevOps

Meesho

DevOps Architect

Apr 2022 – May 2023 · 1 yr 1 mo · Bengaluru, Karnataka, India · Remote

➢Cloud Architecture: Led the end-to-end design and implementation of AWS and GCP architectures, emphasizing Infrastructure as Code (IAC) to ensure automated, secure, scalable, and highly available components for every environment, business unit (BU), and team.
➢SRE Re-architect and Contribution: As an SRE Leader, I played a pivotal role in fortifying system reliability and performance in our projects. I implemented and refined SRE principles and practices, conducting regular reliability reviews to identify and address potential points of failure proactively. Collaborating closely with development and operations teams, I established Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical applications. Introducing chaos engineering practices, I systematically identified and mitigated system weaknesses before they could impact production.
➢CICD Transformation: Reengineered the system's architecture to introduce modern CI/CD practices for Kubernetes-based deployments. Implemented Pipeline as Code with canary deployment strategies.
➢Standardized Onboarding: Established and standardized new channels for Operations and Development teams, simplifying application onboarding to Kubernetes.
➢Centralized Logging: Created a centralized logging solution for Kubernetes-based deployments to ensure cost savings and efficient log exports.
➢Resource Optimization: Developed Grafana dashboards for Kubernetes deployments to optimize resource utilization, resulting in cost savings.
➢Cloud Connectivity: Orchestrated high-availability connections between GCP and AWS production and development environments, minimizing downtime through redundancy. ➢ ➢Kubernetes Clusters: Designed Kubernetes clusters (EKS and GKE auto-pilot) for team-based application deployment in various business units and environments.
➢CDN Migration: Successfully migrated our largest CDN from AWS to GCP without any downtime, resulting in cost savings and improved application performance.

Prometheus.ioGoogle Kubernetes Engine (GKE)Google App EngineProcess ImprovementTechnical LeadershipApache Kafka+14

Innovaccer

Senior Engineering Manager - Site Reliability

Sep 2021 – Apr 2022 · 7 mos · Noida, Uttar Pradesh, India

➢Cross-Functional Collaboration: Engaged and influenced development, operations, and product teams to align technology service delivery with Site Reliability Engineering (SRE) practices, enhancing collaboration across groups.
➢Team Building and Leadership: Hired and built SRE teams of 28 members across multiple locations, providing strong leadership and guidance.
➢Quality Accountability: Drove quality accountability by implementing well-defined processes, metrics, and goals, leading effective postmortems, and ensuring follow-up on action items to improve process quality.
➢Reliability Engineering: Managed availability, latency, scalability, and efficiency of applications by integrating engineering reliability into the development lifecycle, with a strong focus on fault-tolerant approaches.
➢Capacity Planning: Led capacity planning and performance analysis, ensuring non-functional system requirements were met through robust instrumentation and monitoring.
➢Strategic Communication: Defined and reported progress on strategic initiatives and project tasks to stakeholders, including senior executives and clients, using tailored communication strategies for different audiences.
➢Operational Excellence: Implemented metrics-driven processes to ensure service quality targets were consistently met, contributing to overall operational excellence.

LoggingPrometheus.ioContinuous Integration and Continuous Delivery (CI/CD)Process ImprovementTechnical LeadershipGrafana+11

Tokopedia

Head of DevOps

May 2019 – Sep 2021 · 2 yrs 4 mos · Noida Area, India

Hands-on experience setting up Kubernetes (k8s) Clusters for running
microservices.Took several microservices into production with Kubernetes backed
Infrastructure.
➢ Point team player on Kubernetes for creating new Projects, Services for load
balancing and adding them to Routes to be accessible from outside, Creation of
Pods through new application and control the scaling of pods (HPA) etc.
➢ Expertises in Horizontal pod autoscaling based on CPU utilisation or custom
metrics, cluster autoscaling that works on a per-node-pool basis and vertical pod
autoscaling that continuously analyses the CPU and memory usage of pods and
dynamically adjusts their CPU and memory requests in response. Automatically
scales the node pool and clusters across multiple node pools, based on changing
workload requirements
➢ Architect and deploy Istio service mesh in K8S, a modernised service .
Expertise in project planning, system requirements management, risk
management and mitigation, project execution, monitoring and control, project
resource management.
➢ Manages overall software development lifecycle. Provides leadership and
ownership for process improvements.
➢ Report to senior executives on weekly basis to show the status of current
releases, monthly goals, team burn down charts, risks and mitigation plan,
quarterly roadmaps.
➢ Lead the group with devops, senior devops and lead devops on stories
estimation, sprint planning and daily scrums.
➢ Design TBD (Trunk based development) to make a standard for all services,
completely design as single pipeline in groovy and automate the process through
kube-jenkins.
➢ Architect Jenkins as centralised for each tribe in all environments in tokopedia
over kubernetes and on-demand slave solutions for build, deploy and rollback.
➢ Experience with logging, monitoring and reporting solutions such as: Icinga, Prometheus Grafana, Kibana (ELK), New Relic etc.

Prometheus.ioProcess ImprovementTechnical LeadershipApache KafkaGrafanaTeam Leadership+14

Birdeye

Lead Technical Specialist

Aug 2018 – May 2019 · 9 mos · Gurgaon, India

➢Infrastructure Design and Maintenance: Designed and maintained highly available, scalable, and performant infrastructure for applications.
➢Infrastructure as Code (IaC): Implemented and managed IaC using Terraform and Ansible.
➢Deployment Automation: Automated deployments and configuration management across AWS, on-premise, and container platforms (Kubernetes, EKS, OpenShift).
➢CI/CD Integration: Integrated CI/CD pipelines to automate infrastructure provisioning and configuration changes.
➢Monitoring Solutions: Implemented and maintained monitoring tools (e.g., Datadog, Prometheus) for proactive infrastructure issue identification and troubleshooting.
➢Collaboration with Development: Collaborated with development teams to ensure infrastructure met application requirements and best practices.
➢Incident Response: Participated in incident response procedures for infrastructure and application issues.
➢Continuous Improvement: Continuously improved infrastructure performance, scalability, and reliability.
➢Technology Awareness: Stayed up-to-date with the latest trends in cloud computing, containers, and monitoring.

Prometheus.ioProcess ImprovementApache KafkaGrafanaTerraformCloud Infrastructure+7

Snapdeal

2 roles

Lead devops engineer

Apr 2017 – May 2018 · 1 yr 1 mo

Prometheus.ioProcess ImprovementGrafanaTerraformCloud InfrastructureInfrastructure as code (IaC)+4

Sr Devops Engineer

Nov 2014 – Mar 2017 · 2 yrs 4 mos

➢Cross-Functional Collaboration: Bridged gaps between core infrastructure, security, QA, and development teams.
➢Deployment Automation: Managed application deployment on GKE platforms, automating and improving development and release processes.
➢Application Architecture Analysis: Collaborated with development teams to understand application architecture and identify bottlenecks.
➢Infrastructure as Code: Created and maintained data stores and platform infrastructure using IaC.
➢End-to-End Reliability: Owned availability, performance, and capacity of applications, ensuring observability with Prometheus, New Relic, ELK, and Loki.
➢Infrastructure Management: Managed internal infrastructure platforms for CI/CD processes, data stores, and Kubernetes.
➢24/7 Support: Provided continuous infrastructure and application support, building processes and documenting essential knowledge.
➢SLO Management: Managed SLOs, error budgets, and alerts for internal platforms.
➢Outage Management: Led outage management, conducted detailed RCAs, and identified preventive measures with developers.
➢Mentorship and Training: Mentored L1 engineers, enhancing support processes for applications and infrastructure.
➢Automation of Repetitive Tasks: Automated toil and repetitive work to improve efficiency.

Prometheus.ioGrafanaTerraformInfrastructure as code (IaC)Site Reliability Engineering

Vcare software solutions pvt ltd

Unix/Linux System Administrator

Jul 2013 – Jun 2014 · 11 mos

Installation & Configuration new Linux servers,,
Installation & Configuration of Aix servers,,
Installation, configuration and administration of VIO server and VIO client LPAR’S via HMC and configuration of the virtual devices and adding/removing devices using DLPAR,,
Managing the storage of Linux & Aix servers,,
Mirroring of rootvg/datavg and LVM management,,
Network Installation Management (NIM) definitions & concepts, configuring the NIM master, NIM Client, Configuring the alternate (backup) NIM master, configuring a resource server, Performing a BOS installation, Migration a NIM master client, mksysb migration, Operating system backup using mksysb on NIM server,,
Worked as a primary onsite Admin for the production servers,,
Migrated the servers physically onsite,,
Replaced the hard drives on IBM storage, Shark connected to the IBM servers,,
Apache web server configuration with clustering, & with load balancing also,,
Creating auto-mate scripting to take a backup of Linux servers,,
VMware Infrastructure 5.x/vsphere Vcenter Installation & Configuration,,
Planning and Designing VMware VI5/vsphere (ESX5.x / ESXi4.x and VMware Virtual Center),,
Creation, Management and Configuration of Virtual Machines, Clone and Templates,,
High availability, Clustering, VMotion, Storage VMotion,,

Ccs computers pvt ltd

Aix/Linux Support Engineer

May 2010 – Jun 2013 · 3 yrs 1 mo

Creation and managing LPARs on different IBM servers like p520, p550, p570, p630, p650, p710 and p720,,
Responsible for all aspects of AIX System Administration and support for all RS-6000 servers such as: OS installation and upgrade, OS problem resolution, application of OS fixes, installation of application software, administration of user accounts, roll based access control, system configuration, troubleshooting, system monitoring, system tuning, security updates, hardware maintenance, disk usage maintenance, etc,,
Performed BOS Installations from NIM server using Lppsource and Spot, Migrations, patch Installations from NIM server, taking mksysb's of the servers from NIM,,
Migration the servers physically onsite,,
Upgraded the IBM servers from older versions to the newer versions on Server console by connecting IBM monitors, using the Cd's or by NIM,,
Lun allocation, de-allocation on storage,,
Configuration in EMC Storage Arrays, IBM Storage Arrays, Sun Storage Array, STK Robotic Tape Libraries and EMC Power Path etc,,
Zoning configuration on Brocade & QLogic Fiber channel switches,,
Firewall, send mail/postfix mail, Samba, FTP, NFS, DNS, DHCP, Apache, RAID, LVM, Configuration and administration in Linux,,
Responsible for server troubleshooting,,
Perform file system management,,
Oracle Database installation & configuration in clustering environment on Linux, Solaris & Aix servers,,
VMware Infrastructure 5.x/vsphere Vcenter Installation & Configuration,,
High availability, Clustering, VMotion, Storage VMotion,,
Configuration and administration of Fiber card Adapters and handling AIX part of SAN (SAN arrays HP, IBM EMC),,