G

Gaurav S.

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India18 yrs 5 mos experience
Highly Stable

Key Highlights

  • Over a decade of experience in software and database solutions.
  • Expert in Site Reliability Engineering and distributed computing.
  • Proven track record in improving system reliability and performance.
Stackforce AI infers this person is a Fintech professional with extensive expertise in Site Reliability Engineering and database management.

Contact

Skills

Core Skills

Reliability EngineeringSystems ReliabilityObservabilitySite Reliability EngineeringAutomationDatabase ManagementService EngineeringDatabase EngineeringDatabase Consulting

Other Skills

Resource Isolation solutionsMonitoring solutionsAutomation solutionsCost saving initiativesEnd user experience improvementSQL ServerScaling SolutionsInfrastructure SolutionsGrafanaDisaster RecoverySRE principlesToil reductionSLI/SLO/Error-budgetCentralized monitoringGrafana dashboards

About

Software professional with over a decade of hands-on experience in - Relational Database Management Systems, Distributed computing system design, Site Reliability Engineering principles, multiple development technologies and ability to translate business needs to technology solutions. Key focus areas: ★ Design & support enterprise level software and database solutions using technologies like JAVA, Springboot, RDBMS [MS SQL Server, AWS RDS Postgres, AWS Aurora, Aurora Server-less] and NoSQL ★ Distributed Computing System design ★ Service Reliability [SLI, SLO, Error-budget, Product readiness review, Toil reduction, Incident Mgmt] ★ Observability and Monitoring solutions ★ Infrastructure solutions [Kubernetes, dockers, containers] ★ Version control systems [Git, GitLab] ★ Database Systems Management ★ Engineering Management

Experience

18 yrs 5 mos
Total Experience
4 yrs
Average Tenure
2 yrs 5 mos
Current Experience

Zeta

SRE Manager of managers

Jan 2024Present · 2 yrs 5 mos · Bengaluru, Karnataka, India · On-site

  • Heading PayZapp UPI App and Pixel Card SRE business and overseeing a team of 25+ talented engineers.

Arcesium

2 roles

Reliability Engineering Manager/Principal Engineer

Promoted

Jan 2019Dec 2023 · 4 yrs 11 mos

  • Arcesium [The D.E. Shaw group subsidiary]
  • ☛ As a Reliability Engineering Manager of SRE group, my key focus is around to improve appropriate level of Systems Reliability, Develop monitoring and automation solutions, take cost saving initiatives and improve end user experience. Below are few key projects details we delivered as a team:
  • ♦ Improved production systems reliability by implementing Resource Isolation solutions in SQL Server Standard/Enterprise edition respectively which helped to reduce ~70% Major Incidents.
  • ♦ Implemented horizontal scaling solutions to improve production systems performance.
  • ♦ Defined and implemented Re-certification exercises for critical apps to capture resource and scale factors.
  • ♦ Enhanced and Implemented connection limiting infrastructure for better systems reliability.
  • ♦ Created multiple run books and standardised templates to improve Quality of Service.
  • ♦ Built Grafana dashboard to show/measure the trend of daily/hourly system usage for various system metrics.
  • ♦ Designed and developed one-click & anytime disaster recovery solution to achieve less than 5 mins of RTO.
Resource Isolation solutionsMonitoring solutionsAutomation solutionsCost saving initiativesEnd user experience improvementReliability Engineering+1

Project Leader

Jul 2016Dec 2018 · 2 yrs 5 mos

  • ♦ Defined and Implemented various SRE principles in Arcesium and transformed Operation business to Site reliability engineering commercial.
  • ♦ Reduced platform Toil by significant number (from ~60K to ~13k per annum) which further helped to save 2.8 FTE’s cost per annum.
  • ♦ Defined, documented and Implemented SLI’s/SLO’s/Error-budget for platform applications.
  • ♦ Implemented impact based centralised monitoring based upon CIA (customer impact assessment) models.
  • ♦ Built Grafana dashboards to measure platform availability, latency and throughput.
  • ♦ Defined Product Onboarding Process a.k.a. PRR (Production readiness review) for various applications.
  • ♦ Coached and mentored SRE team members to achieve their next level.
  • ♦ Focused on Automation and platform resiliency.
  • ♦ Responsible for product road map and Quality of Service.
SRE principlesToil reductionSLI/SLO/Error-budgetCentralized monitoringGrafana dashboardsProduct Onboarding Process+2

Microsoft india (r&d) private limited

Senior Service Engineer

Apr 2012Jul 2016 · 4 yrs 3 mos · Hyderabad Area, India

  • In Microsoft Mr. Gaurav responsible to handle E2E 10 Tera Byte database Application as an owner including issues related to SQL Server, C#, T-SQL, IIS and Windows. His specialties in this Role are :
  • ♦ Experienced on MVC architecture and cloud technologies
  • ♦ Designed, architected, programmed of Win Forms, Web based Application and Libraries
  • ♦ Experienced in Win Forms application development WPF
  • ♦ Experienced in Web Applications in C#, ASP.NET using latest technique like WPF, WCF
  • ♦ Coded multiple SMM (Service maturity model) user stories in terms of availability, telemetry, scalability, performance, deployment & monitoring and successfully implemented them in various applications
  • ♦ Experienced in extensive Performance tuning (Index tuning\​Stored Procedure tuning) and Query
  • optimization
  • ♦ Experience in Locking\​Blocking\​Deadlock analysis by reading their SQL Code and providing their fixes
  • ♦ Experienced to troubleshoot and fixing performance issues like CPU\​Memory\​IO bottlenecks
  • ♦ Experienced to use/deploy multiple SQL Server diagnostic tools like SQLDiag, PSSDiag,
  • SQL Nexus, Fiddler, Process Monitor, AVICode, BPA and Perf-Analyzer to troubleshoot
  • deep/complex SQL issues
  • ♦ Developed and owned a tool name as Real Time SQL Analyzer using technologies like C# / T-SQL which helps to troubleshoot countless SQL issues and key highlights are:
  • ✔ Diagnose and provides help to fix 80 types of risks on a SQL server instance.
  • ✔ Troubleshoot and provides help to fix 20 types of complex SQL performance issues.
  • ✔ Debug thousands lines of store procedure code in a quicker way.
  • ✔ Create 500 types of performance counters just in minutes.
  • ✔ Debug run time T-SQL queries with their SQL Text, wait type information and execution plan.
  • ☛ And many more which can’t be accommodate here.
SQL ServerC#T-SQLPerformance tuningMonitoringCloud technologies+2

Fidelity investments

Database Engineer

May 2010Mar 2012 · 1 yr 10 mos · Bangalore

  • ♦ Built SQL 2008 R1 and SQL 2008 R2 client tools for x86 & x64 machines.
  • ♦ Certified various SQL Server versions on different windows versions.
  • ♦ Worked on Windows 2008 Active Directory Streamlining.
  • ♦ Wrote Slipstreams for SQL Server SP's and CU's for various SQL Server versions.
  • ♦ Wrote multiple SSIS packages for metadata sync between various windows servers.
  • ♦ Implemented SQL native auditing across all SQL prod environment.
  • ♦ Wrote multiple documents for SQL Server Operations team including SQL Server Installation (Standalone/Cluster) for various SQL Server versions, Instance migration guidelines and other multiple troubleshooting run books.
SQL ServerSSISWindows ServerAuditingDocumentationDatabase Engineering

Alliance bernstein

Database Consultant

Sep 2007Apr 2010 · 2 yrs 7 mos · Bangalore

  • ♦ Provided support of Production, DR, Development and QA environment SQL database servers.
  • ♦ Performed database query tuning using Windows PerfMon, SQL Profiler, system tables/views.
  • ♦ Supported SQL Server active/passive and active/active cluster configurations.
  • ♦ Supported SQL Server transactional and merge replication environments.
  • ♦ Supported disaster recovery environments.
  • ♦ Install and administer SQL Server Standalone/Clustered instances.
  • ♦ Upgraded various SQL Server 2000 instances to SQL Server 2005.
  • ♦ Provided 24/7 production support of database environments.
SQL ServerDatabase supportPerformance tuningDisaster recoveryDatabase Consulting

Education

Centre for Development of Advanced Computing (C-DAC)

Diploma in advance computing — Computer Science

Government Engineering College Bikaner

B.E. (Hons) — Computer Science

Stackforce found 100+ more professionals with Reliability Engineering & Systems Reliability

Explore similar profiles based on matching skills and experience