S

Shambhu Prasad

CTO

Redmond, Washington, United States10 yrs 11 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Proven track record in cloud solution delivery.
  • Expertise in AI/ML-driven solutions and automation.
  • Strong leadership in cross-functional engineering teams.
Stackforce AI infers this person is a Cloud Solutions Architect with extensive experience in Azure and enterprise-level software engineering.

Contact

Skills

Core Skills

Cloud SolutionsAzureAi/mlSoftware Engineering

Other Skills

Azure Premium Filesgeo-replicationhigh throughputP99 trafficresiliency planningtest integrationsystem analysisMicrosoft Copilotincident reportsdashboard creationLivesiteincident managementmachine learningREST APIscross-region migration

About

· Engineering leader with expertise in Microsoft Azure, Cloud storage systems, Distributed file systems, SMB protocol, REST APIs, Data replication, Backup, Scalability, and Performance optimization. · Skilled in AI/ML-driven solutions, Copilot agents, JupyterLite, and driving adoption of emerging technologies across enterprise environments. · Proven track record of leading cross-functional, cross-geographical, and agile engineering teams to deliver reliable, resilient, secure, and customer-centric cloud solutions. · Strong focus on business impact through customer communication, cross-team collaboration, mentoring engineers, and building high-performing teams aligned with strategic goals and long-term product vision.

Experience

10 yrs 11 mos
Total Experience
5 yrs 5 mos
Average Tenure
10 yrs 8 mos
Current Experience

Microsoft

6 roles

Principal Software Engineering Manager

Promoted

Oct 2025Present · 7 mos

Principal Software Engineer

Promoted

Mar 2024Nov 2025 · 1 yr 8 mos

  • Geo replication for Premium File Shares:
  • · Leading a team of 4 developers to deliver an upgraded Azure Premium Files geo-replication pipeline, supporting high throughput and high-TPS workloads.
  • · Defined project goals, roadmap, and feature scope; led semester planning, resource allocation, and design
  • reviews; expanded team capacity through hiring and ensured on-time delivery of mission-critical features.
  • · Conducted system analysis and resiliency planning for P99 traffic, sustaining 100K TPS and 10+ GBPS per
  • share across thousands of file shares at hyperscale.
  • · Scaled 4 core components across 3 workloads to achieve 8× TPS and 4× throughput, strengthening enterprise reliability, and partnered with test teams to integrate features into release pipelines.
  • Livesite Leader:
  • · Managing the cross functional and cross geographical Livesite Auto-Analysis team of 5 engineers, leveraging Microsoft Copilot to create automated incident reports that reduced time-to-analyze and time-to-mitigate, improving overall service health.
  • · Delivered 11 auto-analysis reports, 4 Copilot plugins, 5 dashboards, and 2 integrations, saving the team 40
  • hours per week in livesite incident analysis and management.
Azure Premium Filesgeo-replicationhigh throughputP99 trafficresiliency planningtest integration+2

Senior Software Engineer

Sep 2020Mar 2024 · 3 yrs 6 mos

  • Livesite Leader
  • Brought the concept of Livesite as a Feature to streamline the core knowledge and develop features to systematically bring down the load on the on-call developers.
  • Managing a team of 6 developers to work in a fast paced environment to implement features related to livesite with high accuracy, reliability and carefully crafted report.
  • Designed a generic approach to create and target different categories of incidents and tackle them at 2 levels, breadth and depth.
  • Developed a breadth analysis design to do most common analysis which can cater to large volume of incidents reducing investigation time down to < 5 minutes from multiple hours. Led 2 developers to deliver the project for 2 major incident categories and got buy-in to further expand to more categories.
  • Created a depth analysis design to investigate and root cause issues automatically using SME (subject matter expert) knowledge thus reducing incident volume overall. Led 2 developers and 2 SMEs to deliver the analysis for 2 incident categories and got buy in to add coverage for more categories of incidents.
  • Exploring machine learning algorithms to handle large volume of available metrics data to automatically cluster and identify new patterns of issues.
  • Cross Region Migration feature work
  • Collaborated with multiple storage teams to deliver support for cross region migration.
  • Changed the authentication mechanism for account migration to support a more global and safe authentication mechanism to allow access of data across regions safely.
  • Copy File REST API enhancements
  • Designed and implemented changes to Copy API to reduce its impact on geo replication pipeline by converting shallow copy of small files to deep copy.
  • Redesigned Create File REST API to support 3 symantics at once, Create, Copy(new) and CreateWithData(new), facilitating easy future support of CreateWithData API helping clients create small files in a single request.
Livesiteincident managementmachine learningREST APIscross-region migrationCloud Solutions+1

Software Engineer II

Aug 2019Aug 2020 · 1 yr

  • Led the project to unify behavior across Azure Storage Services to bring parity in the unicode character set supported by Azure Storage across its offerings.
  • Collaborated with product owners of Windows and 8 front ends of Azure Storage, designed the discovery, development and deployment plan and executing it for all services with joint effort.
  • Brought changes to Windows network driver character validation logic and deployed it safely across more than 5000 storage tenants using an OS patch without causing any unexpected new behavior for other storage services in a multi-semester, multi-org effort.
Azure Storage Servicesunicode character setcollaborationCloud SolutionsSoftware Engineering

Software Engineer II

Promoted

Sep 2017Aug 2019 · 1 yr 11 mos

  • Spearheaded the design and implementation of new APIs in Azure Files repo to copy File ranges from another source and Batch API based on OData guidelines. API is written for an infra that handles billions of requests per hour.
  • Designed a backup solution for newly released Azure File Shares. Working on problems to support multiple File systems, with handling for 100s of millions of Files and shares of size upto 100TB with browse and File level restore capabilities.
  • Improved the performance and consistency of the Azure Files restore by identifying bottlenecks in multiple systems such as Azure Files Backup, Data Movement Library and Azure Files APIs. Improved the performance from an inconsistent 5-40 files per second to consistent 150 files per second and complete use of available bandwidth from an Azure File Share to increase support from 5TB to 100TB file share restore.
  • Led a team of 7 engineers to deliver diverse solutions such as new backup management experience, performance improvements in restore, auto-upgrade of on-premise backup agents, switch to cheaper storage formats and new ways to protect data for customers with large amount data (10s to 100s of TB)
  • Developed Azure Backup plugin for Microsoft Windows Admin Center for quick and easy on-boarding for new customers and management at scale.
  • Received 2 organization awards for Collaboration and Innovation.
API designAzure Filesbackup solutionsperformance improvementsCloud SolutionsSoftware Engineering

Software Engineer

Jul 2015Aug 2017 · 2 yrs 1 mo

  • Developed Enterprise level cloud backup solutions to backup data locally and in Microsoft Azure Cloud.
  • Product handles 7,00,000 backups from 1,80,000 systems every week supporting up to 54 TB backup from single source.
  • Product works standalone and with Microsoft Azure Backup Server to protect files, Windows System State, databases, VMs and Exchange server.
  • Shipped Microsoft Azure Backup Server V1.
  • Shipped backup agent upgrades every 6 weeks.
  • Bug Fixes: Fixed major bugs involving file streams, virtual disk service, vhd & file system, multi-threading bugs, anti-virus conflicts and legacy OS support.
  • Feature: Added support for multiple backup schedule and retention policy for different data sources.
  • Feature: Added support for Direct-2-Cloud Windows System State backup and restore for Server OS 2008 R2 and above.
  • Feature: Ransomware Protection: Added multi-factor authorization for critical backup operations like passphrase change and delete backups. Added feature to maintain minimum safe retention policy and re-protect deleted data.
  • Feature: Added Azure Authentication Directory (AAD) based authentication support to move forward from deprecated ACS authentication.
  • Incentive: Took ownership of Offline Seeding feature. Prepared a get-well plan. Wrote a tool to automate manual steps.
  • Incentive: Wrote a tool to do integrity check of data before customer ships any backup disk to Azure data-centers to avoid sending bad/corrupt data.
  • Impact: Backup success rate increased from 95% to 99.6%.
  • Impact: Product registered 2nd highest Month-over-Month growth in Feb 2017 due to AAD support added for new Azure regions.
  • Impact: 5500 machines started protecting Windows System State even before general availability (GA) of feature.
  • Got 2 promotions in 2 years. Received three organization awards for Collaboration, Customer Obsession and Data Driven decision making for success rate improvements.
cloud backup solutionsMicrosoft Azurefeature developmentCloud SolutionsSoftware Engineering

Google summer of code

Student Developer

May 2014Aug 2014 · 3 mos

  • Developed a module to support carbon messages for an xmpp chat server MongooseIM. The module allows a user to sync to server from multiple clients such that each client will be able to receive all the messages that are sent or received by one client.
  • I was also responsible to write test suites and check the compatibility of my module with other modules that are already present in the server.
xmpp chat servermodule developmenttest suites

Microsoft

Intern Software Developer in Test

May 2013Jun 2013 · 1 mo

  • Project: Develop One Click Scale Testing Framework for ELS (Elastic Load Service)
  • There were a lot of manual steps involved in the weekly scale testing (8-24 hour rigorous test) of the newly developed ELS. To remove all the manual steps and run tests with a single click starting from service deployment/upgrade to test runs, run validations, performance counter collection and scale down of services and a final report can be sent to the whole team with all the analysis and results, this framework was developed
scale testing frameworkautomation

Education

Indian Institute of Technology, Kharagpur

Bachelor of Technology (BTech) — Computer Science

Jan 2010Jan 2015

St. Xavier's High School, Patna

High School

Jan 1998Jan 2008

Stackforce found 100+ more professionals with Cloud Solutions & Azure

Explore similar profiles based on matching skills and experience