Shantanu Kshire

Software Engineer

London, England, United Kingdom9 yrs 6 mos experience
Highly StableAI Enabled

Key Highlights

  • Led AI security initiatives at Google Cloud.
  • Developed critical incident management tools at Meta.
  • Improved cloud service reliability and security.
Stackforce AI infers this person is a Backend-heavy Fullstack Engineer with expertise in Cloud Infrastructure and AI Security.

Contact

Skills

Core Skills

ObservabilityIncident ManagementAi SecurityDistributed SystemsCloud InfrastructureBackup And RecoverySecurityBackend DevelopmentWeb DevelopmentSoftware DevelopmentFull Stack Development

Other Skills

AI AgentPythonSQLHiveQLElasticsearchGenAIJavaC++gRPCRESTful APIsMapReduceFlumeJavaGCP Cloud APIsPostgreSQLePUB

About

I’m a Software Engineer at Meta, based in London, UK with 9+ years of experience designing, building, and scaling software solutions across global tech giants—Meta, Google and Amazon. At Meta, I work in the Monitoring & Observability (M&O) org, focusing on safeguarding production systems via the Service Health team. Our mission is to shift left and prevent breaking changes, ultimately minimizing SEVs and improving service resilience. I've also contributed to Meta’s central incident management tool (SEV Manager), driving improvements in system reliability, developer experience, and AI-based tooling. Prior to Meta, I spent nearly 4 years at Google, where I led AI security initiatives within Google Cloud. I built the Proof-of-Concept for Model Armor, which today is a fully managed Google Cloud service aimed at improving the security and safety of AI applications. I was also responsible for the Public API design for Model Armor service, shaping Google Cloud's approach to secure GenAI workloads. I also worked on Cloud Key Management Service (Cloud KMS), a foundational part of GCP’s cryptographic infrastructure. Earlier in my career at Amazon, I worked on Kindle product enhancements, improving the rendering experience for 20,000+ ebooks. I’ve also delivered large-scale backend and full-stack systems at Oracle and Dassault Systèmes. I hold an M.Tech in Computer Engineering and thrive at the intersection of System design, Security, Reliability and AI safety. Key Areas: Cloud Infrastructure, AI Security, Observability, Distributed Systems, Incident Management, GenAI Trust & Safety, Backend Development, C++, Java, Python, SQL, CUDA, REST, RPC, React, Web Technologies

Experience

9 yrs 6 mos
Total Experience
1 yr 11 mos
Average Tenure
1 yr 10 mos
Current Experience

Meta

Software Engineer

Aug 2024Present · 1 yr 10 mos · Greater London, England, United Kingdom · On-site

  • SWE for Monitoring & Observability Infrastructure org at Meta.
  • I am Engineer in Service Health product which is Tier-0 internal Apache Thrift service at Meta.
  • Service Health team focuses on preventing code/config changes from breaking Meta's production systems.
  • Improved backend system that runs A/B comparison tests on every code and config change at Meta.
  • Delivered multiple backend service improvements using Python, C++ and Hack/PHP application stack.
  • Developed Data Pipelines using Python, Hive and SQL to propagate Regional health check metrics into OneMonitoring dashboards, achieving 100% visibility into critical quality signals including False Positive Rate (FPR), Precision and Recall for all services at Meta.
  • Health Check Mate - AI Agent
  • Led overall ideation and development of AI Agent - Health Check Mate using LLM model OpenAI GPT-4.1.
  • Agent is deployed internally at Meta to investigate health check failures, supporting 150+ WAU and 500+ MAU.
  • Improved agent responses using in-context learning techniques including Prompt engineering and Retrieval-Augmented Generation (RAG).
  • I developed and integrated multiple Data retrieval tools, enabling the agent to query HiveQL, SQL, and RPC-based services, to accelerate debugging for health check failures.
  • I also worked as Product Engineer for SEV Manager - Meta’s core incident response product.
  • I improved SEV search reliability and correctness by enhancing Elasticsearch indexing and increased Unit and E2E test coverage from 20% to 80%.
  • Proposed and built a GenAI LLM-powered tool to accelerate product hierarchy migrations, significantly reducing manual effort and operational toil.
  • Led company-wide migration achieving 100% deprecation of a legacy Python Thrift service used by 210+ teams with 1.4K+ call sites.
  • Implemented codebase dependency crawler to detect service usage and landed 200+ automated diffs using Python LibCST, driving deprecation at scale.
AI AgentDistributed SystemsObservabilityIncident Management

Google

2 roles

Senior Software Engineer

Apr 2024Jul 2024 · 3 mos · On-site

  • Tech Lead (L5) for Model Armor GCP service, designed to secure LLM workloads in cloud environment.
  • Led a team of 4+ engineers, driving the product from proof-of-concept (PoC) to private preview launch.
  • Authored high-level system design, ensuring high availability, reliability and scalability.
  • Designed and implemented RESTful public APIs and a gRPC-based Java backend service.
  • Implemented integrations with Responsible AI service at Google to detect Prompt Injection and Jailbreak attacks.
  • Tech Lead (L5) for Notebook Security Scanner product
  • Led a team of 3+ engineers to deliver Notebook Security Scanner from PoC to private preview launch.
  • Implemented a distributed batch processing pipeline using MapReduce to process 10K+ Python Colab Enterprise Notebooks, detecting Python package vulnerabilities at scale.
  • Led XFN collaboration initiatives with Security Command Center (SCC) teams to generate vulnerability reports and surface findings to GCP customers.
  • Implemented scalable system using FlumeJava, OOPs design principles and used Google Guice
  • Dependency Injection (DI) framework.
  • One of the early engineers on the GCP Cloud Key Management Service (Cloud KMS) India team.
  • Built deep expertise in Cloud KMS service architecture.
  • Served as a Technical mentor and onboarded 5+ new engineers in the team.
  • Bootstrapped multiple projects, acting as a force multiplier.
  • Authored the technical design for VPC Service Controls (VPC-SC) integration, enabling perimeter-based, granular IAM access control for Cloud KMS resources.
  • Improved Cloud KMS reliability by leading Spanner Point-in-Time Restore project, achieving near-zero Recovery Point Objective (RPO).
JavaC++AI SecurityCloud Infrastructure

Software Engineer III

Nov 2020Apr 2024 · 3 yrs 5 mos · On-site

  • One of the founding engineers for GCP Backup and DR team, worked as part of Systems and Platform Engineering teams.
  • Delivered several key projects enabling initial public launch of Backup and DR GCP service.
  • Served as Lead Engineer for Cloud Snapshots offering, designed and implemented Java and C++
  • programs to orchestrate backup and recovery for Compute Engine VMs and Persistent Disks using GCP Cloud APIs.
  • Improved reliability of PD Snapshots workflows by introducing functional testing. Increased test coverage from 0% to 60%, significantly reducing regressions.
  • Implemented Public Key Infrastructure to enable mTLS communication between worker VMs in GCP.
  • Built automated certificate generation, renewal and storage across Java (Tomcat) and C++ services, backed by PostgreSQL.
  • Identified SSL/TLS performance bottlenecks and introduced in-memory certificate caching, reduced TLS connection latency by 80x.
  • Implemented job orchestration system to manage software upgrades and certificate renewals across 100+ VMs concurrently, leveraged Java Message Queues and ExecutorService framework.
  • Led end-to-end project to improve rollout safety for software upgrades to Backup Appliance.
  • Authored technical design to propagate operational and health metrics and built monitoring dashboards.
  • Implemented integration with Canary Analysis Service (CAS) and enabled canary rollouts to safely validate changes prior to full deployment.
JavaC++Cloud InfrastructureBackup and Recovery

Amazon

Software Development Engineer 1

Jun 2019Oct 2020 · 1 yr 4 mos · Chennai, Tamil Nadu, India · On-site

  • Worked as SDE-1 on Amazon Kindle Conversions team.
  • Developed and maintained Java backend service which transformed ePUB books into Amazon's proprietary Enhanced Typesetting (ET) format.
  • Collaborated with Kindle Rendering team and implemented support for inline-block structures, unblocking conversions for 20K+ titles across Kindle Store.
  • Implemented ePUB normalization to handle negative CSS properties, improving layout and book reading experience.
  • Resolved multiple publisher-reported issues, including high priority cases from Penguin Random House (PRH), improving content quality.
  • Mentored SDE intern, provided guidance on project execution, design decisions, and code reviews.
JavaBackend DevelopmentWeb Development

Dassault systèmes

Research And Development Engineer

Jun 2018Jun 2019 · 1 yr · Pune, Maharashtra, India · On-site

  • Java Developer for Enovia PLM.
  • Gained experience of Configuration and Change Management UX libraries.
  • Delivered features for 3DEXPERIENCE platform releases R2018x, R2019x and R2020x.
  • Mentored/Trained fresh graduates & peers on REST API and Web Apps development.
JavaSoftware DevelopmentWeb Development

Oracle financial services software limited

Software Developer

Jul 2016Jun 2018 · 1 yr 11 mos · Pune, Maharashtra, India · On-site

  • Full Stack Developer for ORMB : Barclays Bank Implementation.
  • Functioned as a Lead Software Developer for Financial Data Migration of 1M+ pricing records.
  • Programmed object-oriented programs in Java EE 7.
  • Extensively wrote complex PL/SQL queries and stored procedures on Oracle Relational DB.
  • Accelerated multiple Dynamic UIs improving performance and responsiveness.
  • Developed features and worked across technologies : HTML/CSS, JavaScript, jQuery, Groovy, Hibernate, XML, Java Multithreading and SOAP-based Web-Services.
JavaKotlinFull Stack DevelopmentSoftware Development

Education

Veermata Jijabai Technological Institute (VJTI)

Master of Technology - MTech — Computer Engineering

Jan 2014Jan 2016

Datta Meghe College of Engineering CIDCO Sector III Airoli Navi Mumbai 400 708

Bachelor of Engineering - BE — Computer Engineering

Jan 2010Jan 2014

Stackforce found 100+ more professionals with Observability & Incident Management

Explore similar profiles based on matching skills and experience