Gurtez Singh

SRE (Site Reliability Engineer)

Chandigarh, Chandigarh, India14 yrs 5 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Proven track record in incident reduction and problem management.
  • Expertise in team leadership and cloud operations.
  • Skilled in developing automation tools and dashboards.
Stackforce AI infers this person is a seasoned professional in IT Services with a focus on cloud operations and system reliability.

Contact

Skills

Core Skills

Project ManagementPeople ManagementTeam LeadershipLinux System AdministrationIncident ManagementScripting

Other Skills

AnsibleApacheBashCC++Cross-functional Team LeadershipDashboard DevelopmentDecision-MakingEmotional IntelligenceGSMITIL ProcessJBoss Application ServerJMSJavaJira

About

My Moto: “If something can be done manually, it can always be Automated given we see a good enough ROI” Working on R&D of open-source tools and then handing it over to my team for further implementation.

Experience

14 yrs 5 mos
Total Experience
3 yrs 7 mos
Average Tenure
9 yrs 8 mos
Current Experience

Zscaler

5 roles

Sr. Manager, Site Reliability Engineering

Promoted

Aug 2024Present · 1 yr 10 mos

Manager, Site Reliability Engineering

Mar 2024Sep 2024 · 6 mos

Manager, Cloud Operations

Apr 2022Mar 2024 · 1 yr 11 mos

ITIL ProcessEmotional IntelligenceProject ManagementProject PlanningMicrosoft ExcelTalent Recognition+5

Lead Cloud Operations

Promoted

Apr 2021Apr 2022 · 1 yr

  • Managing a team of 9 individuals. Work includes - People Review, Upskilling, Workload management, and assigning people to various ongoing projects.
  • Responsible for coaching, mentoring, and overall development of my team
  • Focus on reduction in Incidents, Problem Management
  • 14% Decrease in Incidents in the first year and 23% decrease overall in 3 years.
  • Created several Digital dashboards/Monitor-Scripts during the Covid-19 period when the team was working from Home.
  • Managing the cloud upgrades during the weekend and acting as SPOC between the upgrade team and various cross-functional teams like Dev, QA & Management teams
  • Various Excel-based Macros created to provide real-time feedback in monthly operations review with the management team and to help project managers with reporting. Reports are also being used for performance feedback in team weeklies.
  • Handling Customer and internal escalation and providing technical support to the team as necessary
  • Handling the Problem Management practice within the company to derive Root cause and permanent fix, to minimize the customer impact, create internal RCA
  • Leading various projects like monitoring enhancements, Alert reduction, Weekly Audits, and various in-house projects
  • Responsible for creating all the new monitors, modifying existing monitors, creating test cases, understanding and documentation of the new features coming in the UI and Storage side.
JiraPagerDutyNetworkingPython (Programming Language)Team LeadershipAnsible+3

Sr. Cloud Operations Engineer

Oct 2016Apr 2021 · 4 yrs 6 mos

  • Experienced in running (2500+ servers) Linux/UNIX production environments
  • Managed an international, 24-7/always-on, multi-site infrastructure powering the Zscaler Security Cloud
  • Ensured proper security, monitoring, alerting, and reporting for the infrastructure
  • Strong working knowledge of Linux/UNIX/FreeBSD and core systems. Hand on experience on networking
  • Scripting experience including - Bash, Python
  • Implementing and maintaining server monitoring system (Nagios)
  • Closely working with Development and QA team
  • Writing custom scripts in Python and Shell for Nagios or related for day-to-day activities to ease the tasks
  • Deleting/cleaning the logs (Log Rotation) & unwanted files, analysis of system logs to maintain the system health and resolve various issues
  • Provisioning of new cloud servers
  • Upgrading Servers worldwide, pre & post validation testing

Stmicroelectronics

Systems Engineer

Sep 2014Oct 2016 · 2 yrs 1 mo · Noida, Uttar Pradesh, India

  • Worked as a System and production support engineer for linux.
  • Supporting more than 1000+ UNIX (VM + Physical) servers and multiple applications running on it.
  • File system administration, create/ configure file systems, troubleshoot and repair file systems
  • LVM – Management of Volume Group, Physical Volumes, Logical Volumes
  • Hands on experience of Switching VERITAS Cluster Service Group in failover scenario or as part of a schedule activity
  • Working on Restoring the data using TSM / NETBACKUP Backup-Restore technique
  • Patch and Package management using RPM and YUM
  • Handling Jobs and Schedules of different applications through TWS
  • Providing primary support to Database and applications
  • Working with application, development & other teams to strategize the downtimes for servers for various upgrades & changes
  • Server OS & kernel patching
  • Monitoring of disk space, system and application error, memory and swap utilization, disk performance, CPU utilization
  • Incident, change and problem management
  • Configuration and troubleshooting of printers using Output Management, VPOM and unispool

Flytxt

Sr.Executive - System Administration

Nov 2013Sep 2014 · 10 mos · Gurugram, Haryana, India

  • L1 support to applications running through JBoss, apache
  • Server's Performance Monitoring, disk & resource utilization monitoring
  • Providing pre/post-deployment support for upgrades/changes/enhancements done on production (Live) Environment.
  • Have taken the responsibility of deployment, raising change request,s and coordinating with the IBM UNIX/Networks team to complete the installation of patches & packages.
  • Analyzing and investigating defects and faults, and tracing them till final resolution.
  • A monthly highlight report, fault report, Utilization report is made for the management
  • Troubleshooting & installation of JBOSS,JMS,APACHE,HADOOP.
  • Monthly UID, QEV, Dormant ID & shared ID declaration
  • Nagios configuration,installation, and troubleshooting
  • Provide support out of office hours, during weekends, and during major deployments

Alcatel-lucent enterprise

Network Operations Center Engineer

Nov 2011Sep 2013 · 1 yr 10 mos · Gurugram, Haryana, India

  • Installed and configured Solaris and Linux machines
  • Disk Management using SVM & LVM
  • SVM – Management of Volumes
  • Installed server with ZFS file system
  • System Administration
  • Performance monitoring as a part of the L1 team
  • Patch & Package installation in Both Linux/Solaris
  • Maintain information on the assigned customers, including contact points, deployment data, remote access methods, and other information as requested by management
  • Setup NFS server, configurations & troubleshooting in Solaris / Linux
  • Basic knowledge of configuring Solaris zones

Education

SUS College of Engineering And Technology(SUSCET)

Bachelor of Technology - BTech — Electronics and Communications Engineering

May 2007Jun 2011

Punjab Technical University

Bachelor of Technology (B.Tech.) — Electronics and Communications Engineering

Jan 2007Jan 2011

Stackforce found 100+ more professionals with Project Management & People Management

Explore similar profiles based on matching skills and experience