Saurabh Goyal

Director of Engineering

Bengaluru, Karnataka, India19 yrs 3 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • 19+ years in managing large-scale distributed systems
  • Proven leadership in cloud engineering and reliability
  • Expertise in building and nurturing global engineering teams
Stackforce AI infers this person is a seasoned leader in Cloud Engineering and Reliability Engineering within the SaaS industry.

Contact

Skills

Core Skills

Cloud EngineeringReliability EngineeringProductivity EngineeringData PlatformsBig DataDatabase EngineeringData EngineeringSite Reliability EngineeringDatabase Administration

Other Skills

Application SupportArtificial Intelligence (AI)AutomationBackup and recoveryBigData technologiesBug diagnosticsBuilding TrustCapacity PlanningCentralized data platformCoachingCost ManagementData MigrationData WarehousingData ingestion pipelinesData security

About

Professional Synopsis ● 19+ yrs of diversified & hands-on experience in successfully managing and overseeing reliable, scalable, secure large scale distributed systems with cost efficiency in self served manner. ● Proven technical and people leadership in leading, guiding & managing geographically dispersed teams in large-scale infrastructure & Data Platforms, Cloud Engineering, Product and Production engineering, Reliability Engineering (SRE & DevOps), fostering close collaboration with product, operations, and business leadership teams. ● Extensive experience in ensuring production readiness, product roadmap & strategy, and providing strong engineering management and leadership. ● Bootstrapped (multiple, diversified, nimble & proficient global teams from first engineer to multiples of tens) consistently, nurtured & led teams in the startup (Flipkart|Paytm|Grab) as well as world’s Industry Leaders (LinkedIn|Amazon). ● Evengalised reliability and productivity mindset with 3Ps (People, Product & Process) for better customer’s experience & Operational excellence. ● Data driven approach to build tactical & strategic solutions and measuring success using a variety of attributes like Reliability, Scalability, Uptime, Cost Efficiency, Engineering Efficiencies & Customer’s experience. ● Domain Expertise ● Reliability Engineering ○ Site Reliability Engineering ○ Resilience Engineering, BCP ○ Incident Management ○ Observability ● Developer Productivity ○ DevX ( Dev Tools and Dev Ops) ○ Environments and Service Registry ○ Release and Testing ● Cloud Enablement & Infrastructure ○ Public Clouds Platforms (AWS, GCP) ○ DevOps Strategy & Roadmap Development ○ Infrastructure as Code (Terraform, Ansible, CloudFormation) ○ Kubernetes & Container Orchestration ● Data Platforms ○ DataStores as a Services ○ Data Streaming as a Services. ○ Big Data Platforms

Experience

Deliveroo

Director of Engineering - Infra and Platforms

Oct 2024Nov 2024 · 1 mo · Bengaluru, Karnataka, India · Hybrid

  • Leading the efforts in building Scalable Infra platforms and responsible for 3 major sub-verticals:
  • 1. Core Infra - Platforms
  • NEC: Network, Edge & Compute
  • DataStore as a Service
  • Streaming as as Service
  • 2. Dev Acceleration Platforms
  • Software Release and Test Engineering
  • Observability
  • Reliability Engineering & SRE
  • Incident Management
  • 3. Data Intelligence
  • Machine Learning Platform
  • Experimentation Platform
  • Analytics Platform
  • Data Governance Platform
Scalable Infra platformsIncident ManagementObservabilityReliability EngineeringMachine Learning PlatformCloud Engineering

Flipkart

Sr. Engineering Manager: Reliability & Productivity Engineering

Dec 2021Sep 2024 · 2 yrs 9 mos · Greater Bengaluru Area · Hybrid

  • ● Bootstrapped Reliability engineering charter under the umbrella of Reliability and
  • Productivity engineering.
  • ● Leading the charter for Site Reliability and Resilience Engineering Org to solve reliability
  • complex problems systematically by inculcating them into SDLC at Flipkart scale.
  • ● Brought Hiring efficiency by establishing right hiring processing including targeting,
  • candidate briefing, interview & evaluation scripts etc.
  • ● Evengalised reliability mindsets with 3Ps (People, Product & Process) to attain service and
  • platform reliability for better customer’s experience.
  • ● Brought efficiency to Incident Management by templatising and auto creating RCA after
  • any Incident with autofilling field mapped with Jira tags to reduce toil for IM and Devs.
  • ● Build service catalog for providing self managed Stateful workload deployed on VMs.
  • ● Successfully drive the Service Tiering program with SLO approach .
  • ● Ideate and conceptualize the Service Registry and Reliability Scorecard
Reliability engineeringIncident ManagementService RegistryService Tiering programReliability EngineeringProductivity Engineering

Paytm

Director: Central Data Platform

Sep 2020Dec 2021 · 1 yr 3 mos · India

  • Derived and institutionalized operational and engineering excellence in my BU
  • through right KPIs.
  • ● Responsible for managing a centralized data platform for Ingestion, computation
  • and Extractions.
  • ● Owns the Reliability, Productivity, Stability, Operability & scalability charter to
  • achieve four 9s uptime for all centralized BigData services and platforms.
  • ● Accountability for Incident Management (24x7 availability management), Change
  • Management and operational excellence.
  • ● Hired Lateral and NSG talent and nurture them as best in the class.
  • ● Inculcate practice to embedded Reliability, Scalability, Uptime, Security,
  • Self-Service in all the solutions and platforms we build under the hood of the
  • Data platform.
  • ● Brought right cost efficiency by implementing AutoScaling, right sizing,
  • minimizing infra wastage and adopting latest generation infra.
Centralized data platformIncident ManagementOperational excellenceData PlatformsReliability Engineering

Grab

Sr Engineering Manager: BigData Platform/Infra and Databases Engineering

Dec 2017Sep 2020 · 2 yrs 9 mos · Bengaluru Area, India

  • Database
  • Bootstrapping Database team from scratch.
  • Managing & mentoring database team.
  • Derived and implemented best practice process for Schema/Database design review.
  • Educating Dev team for DB optimization best practice.
  • Guiding team for working towards state of art monitoring infrastructure for all DBs.
  • Standardized DB instance parameters for optimal performance.
  • Automated Database Deployment.
  • DB access as API.
  • Consultation to all Dev team towards best and optimal data-store solution available as per use case.
  • Working with Sysops/DevOps, Infra security and compliance towards best practice for infra & security from DB standpoint.
  • BigData Platform
  • o Leading effort for designing Data Platform and Data solutions, working towards unified data solutions using BigData technologies like Kafka, Spark, Presto, Hive, EMR and so on.
  • o Ingestion pipeline using open source technology debezium and Kafka and working towards
  • o Design solution based on geographical data locality.
  • o Data metadata, discovery and access control layer on top of data and database layer. o Data security, protection and privacy.
  • o Catering all Data requirement for business available through heterogeneous resources.
Database managementBigData technologiesData securityBig DataDatabase Engineering

Linkedin

2 roles

Staff SRE : Data

Promoted

Oct 2014Nov 2017 · 3 yrs 1 mo

  • Data Ingestion Pipeline : Experience in design and implementation of Data ingestion pipelines. Currently performing the Proof Of Concepts for Oracle BigData Adapter for Kafka.
  • DataCenter BuildOuts : Bootstrapped Oracle databases 500T in cumulative in size. Storage capacity planning, Automation towards buildout process.
  • Development Life Cycle : Well versed with the latest technology and trends in the database arena with strong knowledge in all the aspects of SDLC life cycle.
  • High Availability Solution: Designing and implementation experience for Active-Active data replication using ORACLE Golden Gate for Oracle and Mysql. We are managing 4 active sites with Oracle GoldenGate and all have live write traffic.
  • Design Review : Being part of BDF (Bangalore Design Forum) & DMRC (Data Modelling Review Committee) @LinkedIn bangalore involved in application design evaluation and suggest best practice.
  • Part of SIG (SRE Innovation Group) working on exploring best new innovative ideas and successful implementation. About to complete first project under this umbrella.
  • Application Support : Code Review & Deployment, Application Ramp-up, application troubleshooting in DB perspective like latency, no response etc.
  • NoSql and Big Data Concepts : Knowledge of NoSQL datastores (Couchbase and ElasticSearch), Hadoop, Hive, PIG
  • RDBMS : Oracle, MySQL, Teradata, MS-SQL
  • NoSql DataStore : Couchbase, Elastic-Search
  • Programing : Python, SQL, PL/SQL, Bash Scripting
  • Data Replication : GoldenGate, Kafka, Databus, Lumos/FastReader
  • BigData : Hadoop, Hive, Pig, Azkaban
Data ingestion pipelinesHigh Availability SolutionApplication SupportData EngineeringSite Reliability Engineering

Sr. Database Adminitrator

Jan 2012Sep 2014 · 2 yrs 8 mos

  • Providing 12*7 support to all mission critical databases (Oracle, Teradata and My-Sql).
  • Automation using Shell Scripting.
  • Infrastructure project like Capacity Planning, Datacenter Migrations.
Database supportAutomationCapacity PlanningDatabase Administration

Oracle

Sr Eng Database: Bug diagnostics and escalation

Jun 2010Jan 2012 · 1 yr 7 mos · Bangalore

  • 1) As a BDE Engineer working as single point of contact between Oracle Support and Development teams on Bugs and other solutions.
  • 2) BDE Engineer is responsible for ensuring a high level of Bug quality. We are reviewing, screening and validating all Bugs.
  • Identify and minimize duplicate and false bugs by testing and implementation.
  • 3) Providing support to Oracle support for customer's escalated production issues.
  • 4) Designing of processes and procedures that ensure that all Bugs are fully documented and well formed prior to their assignment to Development.
  • 5) Working as a subject matter expert providing solution for complex issues or identify acceptable workarounds.
  • 6) Working on bug screening of all the different areas of Oracle database like RAC, ASM, Performance, Optimizer, RMAN etc.
Bug diagnosticsSupportProcess designDatabase Engineering

Amazon

Oracle DBA

Mar 2008Jun 2010 · 2 yrs 3 mos

  • 1) Providing 12*7 on-call supports for all 250 critical production instances.
  • 2) Designing and creating the databases as per application team requirement.
  • 3) Design; implement, maintaining 10g data guard solution with Fast Start Failover (FSF).
  • 4) Upgrading oracle database to higher version as per Oracle's recommendations and company standards.
  • 5) Actively working on databases upgrade from 10g to 11g.
  • 6) Working on design solution on Oracle 11g Active Data Guard (ADG).
  • 7) Monitoring wait events, Locks upon the database objects and long operations running upon the database to improve the performance of the database by taking the necessary actions.
  • 8) Tuning the database and SQL to increase the response throughput.
  • 9) Backup and recovery of database using RMAN.
  • 10) Migrating database from one host to another host by using RMAN or OS utility.
  • 11) Done the upgrades of databases from 32bit Linux to 64 bit Linux.
  • 12) Implementing new technologies or features as per business requirement and company standards.
Database designPerformance tuningBackup and recoveryDatabase Administration

Tata consultancy services

Assistant System Engineer

Apr 2005Mar 2008 · 2 yrs 11 mos

  • Started career in IT from TCS and worked as an Oracle Database Administrator.
  • Performing below responsibility as an Oracle DBA:-
  • 1) Managing 70 critical production database instances and Applications on 24*7 basis.
  • 2) Oracle 8i/9i/10g Database, Oracle 10gR2 RAC with ASM and Oracle 10g Application server Administration.
  • 3) Designing and creating the databases on different platforms.
  • 4) Installation and Configuration of Oracle databases 8i/9i/10g.
  • 5) Upgrading oracle database to higher version as per Oracle's recommendations.
  • 6) Worked on Disaster Recovery solution like DR Center Setup, physical standbys.
  • 7) Tuning the database and SQL to increase the response throughput.
  • 8) Reorganizing the database objects by archiving the data and moving the tables within the tablespace, rebuilding the indexes.
  • 9) Archival and Restoration of data as per business requirement.
  • 10) Worked on OS/Shell scripting to check database Health
  • 11) Used Transportable tablespace (TTS) and Exchange Partition concept for data archiving.
  • 12) Implementation of Security features as recommended by Oracle and the Client’s Security policy.
Database managementInstallation and ConfigurationDisaster RecoveryDatabase Administration

Education

Dr. A.P.J. Abdul Kalam Technical University

B.Tech — Electronics and Telecommunications

Jan 2000Jan 2004

Symbosis Center for Distance Learning

Post Graduation Diploma in Information Technology — Computer/Information Technology Administration and Management

Jan 2007Jan 2009

Kendriya Vidyalaya

Intermediate

Jan 1987Jan 1999

Stackforce found 100+ more professionals with Cloud Engineering & Reliability Engineering

Explore similar profiles based on matching skills and experience