Jitender Tanwar

SRE (Site Reliability Engineer)

Gurgaon, Haryana, India11 yrs experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Over 9 years of experience in E-Commerce and Travel domains.
  • Expert in Site Reliability Engineering and Cloud Operations.
  • Proven track record in cloud cost optimization and team leadership.
Stackforce AI infers this person is a Site Reliability Engineer specializing in E-Commerce and Cloud Operations.

Contact

Skills

Core Skills

24×7 Site Reliability & OperationsAws Cloud Optimization

Other Skills

24x7 live site troubleshootingAmazon Web Services (AWS)Cost ControlCost ManagementGCPInfrastructure operational supportKubernetesLinuxNetworkingNoc supportTCP/IPTerraformTroubleshootingUnixZabbix

About

With over 9 years of progressive experience across E-Commerce and Travel domains, I specialize in Site Reliability Engineering (SRE), Cloud Operations, and Technical Leadership. I thrive in high-growth, complex environments, ensuring stability, scalability, and cost-efficiency while driving teams toward operational excellence. My career reflects a strong commitment to delivering results under dynamic conditions, adapting seamlessly to evolving project scopes and tight deadlines. Core Specialties Hyperscale Microservices Architecture Architecting and optimizing large-scale, highly available environments running 1,000+ dockerized microservices, ensuring performance consistency and reliability at hyperscale. AWS Cloud Optimization Expert in Cloud Deployment, Capacity Planning, Right-Sizing, and Advanced Cost Optimization using innovative, data-driven techniques to achieve maximum efficiency and cost control. 24×7 Site Reliability & Operations Leading global operations with a focus on continuous uptime, NOC management, and meeting or exceeding SLA/SLO/KPI targets through proactive reliability engineering. Incident Management & Troubleshooting Driving proactive incident response, root-cause analysis, and impact mitigation through structured playbooks, automation, and detailed post-incident reporting. Financial & Resource Management Proven expertise in budget planning, cost forecasting, and technical operations cost control, aligning infrastructure spend with business objectives. Advanced Monitoring & Observability Building comprehensive end-to-end monitoring pipelines covering SLA, latency, response codes, business KPIs, system health, and performance metrics. Hands-on experience with a diverse stack: Python, Dataset, ELK, OpenTSDB, Grafana, Zabbix, Diamond, CloudWatch, SNMP, New Relic, Catchpoint, and Akamai mPulse (RUM).

Experience

Makemytrip

Site Reliability Engineer

Oct 2022Present · 3 yrs 5 mos · Gurugram

  • Live site management and reliability: Expertise in 24x7 live site troubleshooting, root cause analysis (RCA), and ensuring smooth
  • operations for high-traffic, multi-layer complex live sites, with a focus on performance tuning and business continuity.
  • Cloud cost optimization: Proven experience in identifying and implementing strategies for significant cloud cost optimization.
  • Leadership and team management: Strong cross-functional and people skills with a passion for building, growing, and sustaining
  • strong SRE organizations, including managing teams of 10+ members.
  • Proactive monitoring and reporting: Skilled in automated daily reports (DSR), comprehensive incident management, and
  • maintaining site performance and business health around the clock, including new data center setups and capacity planning.
  • Automation and efficiency: Innovates and implements automation strategies to reduce time to detect (TTD) and time to resolve
  • (TTR), increasing patrolling coverage, minimizing manual efforts, and creating centralized dashboards for all components.
24x7 live site troubleshootingroot cause analysiscloud cost optimizationteam managementautomated daily reportsincident management+3

Airtel digital

Lead SRE

Jul 2021Mar 2023 · 1 yr 8 mos · Gurugram, Haryana, India

Rackspace technology

Devops Engineer

Dec 2020Jun 2021 · 6 mos · Gurugram, Haryana, India

Makemytrip

3 roles

Cloud Engineer II

Apr 2020Nov 2020 · 7 mos

Cloud Engineer

Promoted

May 2018Apr 2020 · 1 yr 11 mos

Site reliability Engineer - I

Aug 2016May 2018 · 1 yr 9 mos

Shopclues

Noc Engineer

Jan 2015Jul 2016 · 1 yr 6 mos · Gurugram, Haryana, India

  • Handled responsibilities of providing 24x7 networking support in production environment.
  • Dealing with monitoring tools like Nagios for handling production servers/applications health.
  • Troubleshooting areas of performance and identifies effective solutions to resolve issues both immediate and for the longer term.
  • Monitoring Servers load,memory, disk space & improves performance of servers with traffic accordingly also fixing known repeated issues.
networking supportmonitoring toolstroubleshootingserver performance monitoring24×7 Site Reliability & Operations