Nishant Garg

CTO

Delhi, Delhi, India19 yrs 7 mos experience

AI EnabledHighly Stable

Key Highlights

19 years of experience in SRE and Cloud Technology.
Expert in implementing AIOps and Data Analytics.
Proven leadership in cross-functional team management.

Stackforce AI infers this person is a seasoned expert in SaaS and Cloud Infrastructure with a strong focus on Site Reliability Engineering.

Contact

Skills

Core Skills

Site Reliability EngineeringCloud Computing IaasCloud ComputingData Engineering

Other Skills

AIOpsArchitectureArtificial Intelligence (AI)BMC Atrium CMDBBMC Atrium OrchestratorBMC Blade LogicBMC PatrolBMC ProactiveNet Performance Manager - BPPMBMC RemedyBig Data AnalyticsBusiness AnalysisBusiness IntelligenceBusiness StrategyCA SpectrumCMDB

About

A seasoned cross functional technocrat, an Open-source advocate, with more than 19 years of extensive hands-on experience in implementing SRE, AIOps, Data Analytics, Cloud Technology, Agile Service Management functions & best practices across various industry verticals to achieve agility, scalability, reliability and peak performance whilst optimising cost. Tools & Technology: ★ Software & Systems Engineering ★ Kubernetes, Docker ★ Prometheus, Grafana ★ Jeager, Zipkin ★ Splunk, ELK Stack ★ Big Query, Data Studio ★ Kafka, Apache Beam ★ Github, Jenkins, Artifactory ★ Dynatrace, Datadog, New Relic ★ Terraform, SaltStack, Chef , Ansible ★ AWS, Azure, GCP ★ REST APIs ★ Perl, Python, GO ★ Consul, Istio ★ MongoDB, NOSQL ★ NodeJs, React.Js, Angular ★ PagerDuty, OpsGenie ★ JIRA, ServiceNow, BMC Remedy ★ OpenTSDB, Cortex, InfluxDB ★ BMC BPPM, Nagios, Zabbix

Experience

19 yrs 7 mos

Total Experience

3 yrs 7 mos

Average Tenure

1 yr 7 mos

Current Experience

Flutter international

Vice President - SRE & Platform Engineering

Oct 2024 – Present · 1 yr 7 mos · Gurugram, Haryana, India

Head the SRE & Platform Engineering function, defining strategy,roadmap & execution
for Cloud Infrastructure, Reliability Engineering, Observability, FinOps, DevSecOps and DBaS with an AI first approach to Cloud SRE.

Organizational LeadershipCross-functional Team LeadershipSite Reliability EngineeringCloud Computing IaaSObservabilityFinops+16

Freshworks

Director - SRE / Cloud Engineering AIOps & MLOps

Jul 2021 – Oct 2024 · 3 yrs 3 mos

Heading the Global Site Reliability Engineering, Cloud Infrastructure, Platform Engineering, CloudFinOps & Incident response teams responsible for enabling product & platform development, productionlize business services, whilst maintaining availability, performance, security & optimising costs across the organisation.
Ensure 99.9% availablity of all Freshworks Products and Platforms.
Define Strategies and Roadmap inline with all product BUs.
Solution Architecture & Design.
Product / Features Development, Debugging and Root Cause Analysis across the technology stack.
Establish Cloud Center of Excellence
Products & Platforms Architecture Reviews.
Establish and enable Product SRE teams.
Centralised Observability platform across Freshworks.
Centralised DevSecOps value chain.
Define SLA's, SLOs, SLIs and Error Budgets across all products and platforms.
Enable and maintain Delight Metrics (RED & USE metrics) across the product org.
Service ownership of entire Freshworks Cloud Infrastructure.
Disaster Recovery and Chaos Engineering.
Infrastructure as a Service and Platform as a Service offering.
Enabling Application, Infrastructure & Data Security.
Security, Compliance, Cost, Identity & Access Management Governance through Policy as a Code.
Certificates Generation and Distribution.
Centralised FinOps Platform and cost optimization across Freshworks Cloud environment.
Cloud Cost Reviews, Optimization and Budget Allocation.
Automation & Self Serve Methodology.
Incident Response & Crisis Management.
Event Driven Automation (EDA) & Robotic Process Automation (RPA).
Mentoring & Development
Product Management
Stakeholder Management
Vendor Management
Hiring & Retaining excellent talent to formulate High Performing Teams

Infrastructure as a Service (IaaS)DevSecOpsPlatform as a Service (PAAS)Data EngineeringFinancial OperationsBig Data Analytics+32

Tower research capital

Core Engineering - Site Reliability Engineering & Data Engineering

Sep 2019 – Jul 2021 · 1 yr 10 mos · Gurugram, Haryana, India

Head the Site Reliability Engineering and Data Engineering functions as part of Core Engineering to establish a centralized SRE practice for the purpose of productionalising key business services.
Establish a centralized & Integrated Observability platform (Metrics + Logs + Traces).
Data consolidation, warehousing, mining, distribution and analytics across Historical Markets Data, Curent Market Data, Training Data and Post Trade Data.
Enable and maintian High Performance Compute (HPC) infrastructure responsible for model trainings & inferencing of trading strategies across all trading teams.
Kafka as a Service for data distribution across various business applications.
Streaming Data Pipeline & ETL workloads for data warehousing & Data Mining.
Centralised Data Analytics platform across organization.
Development of tools, platforms and product features to be used across the organisation.
DevOps value chain for agile SDLC.
Infrastructure as a Code for one click integration, deployment & configuration of the complete application stack.
Centralized governance by implementing Policy as a Code.
Data Governance & Distribution Policies.
Define Service Level Objectives (SLO) and indicators (SLI).
Maintain a highly motivated team and team mentorship.
Stakeholder, Supplier and Vendor Management

Site Reliability EngineeringCloud ComputingInfrastructure as a Service (IaaS)Infrastructure as code (IaC)Data EngineeringHigh Performance Computing (HPC)+3

Adobe

Global Lead, Site Reliability Engineering & Observability

Aug 2011 – Sep 2019 · 8 yrs 1 mo · Noida Area, India

Lead the SRE, Observability & Automation architecture function responsible for a broad set of capabilities and projects.
EA SRE, DevOps Strategy, Architecture, Roadmap and Solution Design
Establish AIOps Platform which includes:
Unified Observability platform on OSS
Predictive Modelling and Analytics across Observability pillars (Metrics, Logging, Distributed Tracing and Code Profiling).
Automated pattern based Self Healing platform
Centralized Notification Engine & Crisis Management platform
Automated Infrastructure - IaaS and PaaS
Centralized Service Mesh implementation for:
Single pane of glass for Service Taxonomy
Service Distribution & impact mapping
Ingress & Egress Traffic gateway
Centralized Metric collection
Self serve Secrets Management & Automation.
Enterprise Systems Management Remediation and Improvement
Network Management and Automation
Fully integrated & automated IT Service Management (ITSM).
Application Performance Management (APM)
Stakeholder Management
Product Management
Vendor and Supplier Management
Notification and Communication Management

Site Reliability Engineering

Standard chartered bank

Technology Lead - Global Enterprise Systems Management and Automation

Mar 2009 – Aug 2011 · 2 yrs 5 mos

Investment Banking technology and infrastructure solutions design and delivery management.
Stakeholder, team, project, vendor and relationship management
Business and technical requirements management
Technology and infrastructure solutions design and delivery management
Front Office application monitoring and management strategy and architecture management
ESM solutions design and relationship management
ESM strategy and architecture management
ESM product management