Sharma Podila

CTO

Saratoga, California, United States29 yrs 10 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Expert in Gen AI application infrastructure.
Proven leader in building high-performing engineering teams.
Exceptional communicator in high-stakes environments.

Stackforce AI infers this person is a SaaS and Cloud Infrastructure expert with a focus on distributed systems and resource management.

Contact

Skills

Core Skills

Gen Ai ApplicationData IngestionControl PlanesDistributed SystemsInfrastructureData ProcessingResource ManagementCloud InfrastructureContainer OrchestrationSystem ArchitectureScheduling

Other Skills

RAG+LLMknowledge base queryingtool callsAgent execution infradistributed consensusAWSdata tieringKafkaMicroservicesresource allocationcapacity SLAscontrol planecloud nativecontainer managementdata management

About

Technology leader with deep expertise in Gen AI application infrastructure, distributed systems, IaaS, and resource management in both public clouds (AWS) and data centers. Founding Tech Lead for Meta's RAS - led the global data center resource allocation strategy. Founding Engineer for Netflix's Container Platform (Titus) and Stream Processor (Mantis) Proven leader and mentor - developed high-performing engineering teams, guiding managers and tech leads in building robust distributed systems and control planes for large-scale compute and data services. Exceptional communicator - adept at aligning diverse teams, stakeholders, and customers in high-stakes environments.

Experience

29 yrs 10 mos

Total Experience

4 yrs

Average Tenure

1 yr 6 mos

Current Experience

Walmart global tech

Engineering Fellow

Nov 2024 – Present · 1 yr 6 mos · Sunnyvale, California, United States

Transposit

2 roles

VP of Engineering

Jun 2023 – Aug 2024 · 1 yr 2 mos

Defining, executing, and building Gen AI application for enterprise operational use cases to augment humans for root causing incidents and reducing time to recovery.
Built a proprietary data ingestion and RAG+LLM based system for knowledge base querying, tool calls, and Agent execution infra.
Leading, mentoring, hiring, and driving execution of the overall engineering team.

Gen AI applicationdata ingestionRAG+LLMknowledge base queryingtool callsAgent execution infra

VP of Platform Engineering

Jan 2023 – Aug 2024 · 1 yr 7 mos

Stripe

Technology Leader, Architect, Online Data Infrastructure

Nov 2021 – Nov 2022 · 1 yr · San Francisco Bay Area

Leading Tech Leads and Staff/Senior engineers to design and build control planes for online databases, distributed consensus, and efficiency initiatives to shift the cost curve down via use of block store and data tiering across multiple AWS regions.

control planesdistributed consensusAWSdata tieringdistributed systems

Netflix

Software Engineering

Dec 2019 – Nov 2021 · 1 yr 11 mos · San Francisco Bay Area

Defining, designing, and building infrastructure for client telemetry data associated with Netflix product and playback experience on all user devices.
Tech-leading effort to redefine data ingestion, processing, and serving, across product and analytics use cases to set us up for the next major scale up and new product explorations.
Building asynchronous data processing architecture over Kafka, Mantis, and Microservices to power product analytics, real time observability, ML data ingestion, and serving real time materialized views.

infrastructuredata ingestionKafkaMicroservicesdata processing

Facebook

Software Engineer

Nov 2017 – Dec 2019 · 2 yrs 1 mo · San Francisco Bay Area

Founding tech lead of RAS (https://research.facebook.com/file/4561236690664587/RAS-Continuously-Optimized-Region-Wide-Datacenter-Resource-Allocation.pdf) - a new system to dynamically guarantee capacity SLAs for all shared and private compute and data clusters across worldwide Data Centers for stateful and stateless services.
Designed declarative heterogeneous capacity models for guaranteed and elastic needs. Built a control plane and services to iteratively integrate into the existing set of systems.

resource allocationcapacity SLAscontrol planeresource managementdistributed systems

Netflix

Senior Software Engineer, Edge Engineering

Sep 2013 – Oct 2017 · 4 yrs 1 mo · San Francisco Bay Area

Defined, architected, and built a multi-region cloud native cluster management system for containerized applications, projects Mantis and Titus, now open source. Built control plane, API, and scheduler services for service, batch, and stream processing workloads.
Author of open source extensible scheduling library Fenzo (https://github.com/Netflix/Fenzo).
Founding engineer for Mantis and Titus.

cloud nativecontainer managementschedulingcloud infrastructurecontainer orchestration

Oracle corporation

Senior Principal Engineer

Feb 2010 – Sep 2013 · 3 yrs 7 mos

Architect and hands on technical leadership for large scale distributed resource, data, and system management software development for application in the grid/cloud/farm computing and Data Center optimizations.

resource managementdata managementsystem managementsystem architecture

Sun microsystems inc.

5 roles

Principal Engineer / Director of Technology

Promoted

Sep 2008 – Feb 2010 · 1 yr 5 mos

Chief architect, delivering multiple generations of DReAM (Distributed Resource Allocation Manager) software and evolving the Microprocessor (cloud-like) Ranch with thousands of servers in multiple locations.
Developed state of the art resource allocation and scheduling techniques for large scale compute clusters. Core features include advanced reservation, back-filling, job dependencies, deadlines, self tuning resource profiler, rapid scheduling, and scheduler tracing. These helped deliver consistent and sustained results of record utilization levels of the compute cluster while maintaining disparate SLAs among dozens of independent organizations.
Hands-on technical leadership for resource and data management software technologies with 20+ member core team in multiple locations.

resource allocationscheduling techniquesresource managementscheduling

Senior Staff Engineer

Promoted

Oct 2004 – Sep 2008 · 3 yrs 11 mos

Architect, technical lead for DReAM.
Developed distributed system infrastructure for compute clusters. Core features included inherently distributed architecture for scale, fault tolerance, automated scalable deployments and operational management.
Chaired meetings with users and stake holders including periodic, contentious cluster share negotiations. Created data aggregation, reporting, and tooling to help ease negotiations.

distributed systemsresource management