Spandana Raj Babbula

Software Engineer

Bengaluru, Karnataka, India10 yrs 4 mos experience

AI ML PractitionerAI Enabled

Key Highlights

Led infrastructure for Generative AI at Google.
Achieved 10x latency reduction for PaLM API.
Expert in Large Language Models and AI serving.

Stackforce AI infers this person is a Backend-heavy Infrastructure Engineer specializing in AI and Data Analytics.

Contact

spandana.babbula@gmail.com LinkedIn

Skills

Core Skills

Ai ServingLarge Language Models (llm)Generative AiData InfrastructureLow LatencyProject ManagementBig Data AnalyticsData ProcessingFull-stack Development

Other Skills

AlgorithmsArtificial Intelligence (AI)Batch ProcessingBig DataBigTableCC++Data AnalysisData AnalyticsData Storage TechnologiesData StructuresDatabase SystemsDistributed SystemsJaxMLOps

About

Impact-driven engineering leader with track record in building large scale and highly performant infrastructure. Experienced in 0-1 projects and scaling them from conception to successful products. Currently working on problems at the intersection of LLMs and infrastructure.

Experience

10 yrs 4 mos

Total Experience

8 yrs 3 mos

Average Tenure

2 yrs 1 mo

Current Experience

Google deepmind

Senior Staff Software Engineer

Apr 2024 – Present · 2 yrs 1 mo · Bengaluru, Karnataka, India · Hybrid

I work on Gemini inference and serving efficiency.

JaxxlaAI servingLarge Language Models (LLM)Technical LeadershipTPU

Google

5 roles

Senior Staff Software Engineer

Oct 2023 – Mar 2024 · 5 mos

Core Labs (Applied AI) @ Google
I lead the GenAI Retrieval Augmented Generation (RAG) infrastructure area, which includes Vector databases and infrastructure for other semantic retrieval techniques to augment LLM's knowledge for Q&A on private corpora. This infrastructure is being used by several internal and external facing products at Google for solving business-critical problems using RAG.
Gemini API semantic retrieval is built on top of vector store infrastructure developed by my team: https://ai.google.dev/

Large Language Models (LLM)Generative AITechnical LeadershipData InfrastructureData ProcessingData Storage Technologies+3

Staff Software Engineer and Manager

Promoted

Apr 2021 – Oct 2023 · 2 yrs 6 mos

Generative AI APIs @ Google Labs
Owned performance eval and improvements for Google PaLM API. Improved TPU inference latencies and e2e API latencies by understanding and experimenting with quantization, batching, model sharding. HBM usage and bandwidth constraints, TPU topologies etc.
Improved latency of the PaLM API by up to ~10x, making it several times faster than other GenAI API offerings in the industry. The performance work I did was behind multiple GenAI products/features announced at Google I/O 2023 including NotebookLM, Google Makersuite and "Help me write" in Gmail.
PaLM API: https://developers.generativeai.google/
NotebookLM: https://blog.google/technology/ai/notebooklm-google-ai/
Makersuite: https://makersuite.google.com/

Large Language Models (LLM)Low LatencyTeam ManagementAI servingMLOpsScalability

Senior Software Engineer, Search Ads A/B Experiments Analytics

Promoted

May 2019 – Apr 2021 · 1 yr 11 mos

Tech leading Search Ads Analysis Infrastructure - An extremely fast and reliable infra for A/B experiments analytics.

Batch ProcessingData ProcessingProject ManagementTeam ManagementStrategic RoadmapsTechnology Roadmapping+1

Software Engineer III, Actions-on-Google Data Platform & Analytics

Promoted

May 2017 – Apr 2019 · 1 yr 11 mos

Built a logging and analytics framework for Actions on Google, to enable product/feature teams to compute metrics, platform insights, ranking features and developer analytics for actions on the Google assistant.
Built efficient, scalable and maintainable infrastructure using Google's bigdata technologies like Flume, Mesa, Bigtable.
Actively collaborated with 10+ teams and product managers to evolve the infrastructure as per product requirements and help those teams to successfully leverage the infrastructure.
Key aspects of my work involve batch and streaming data pipelines, logs processing, real time analytics generation, database schema design, SQL query authoring, SQL pipelines, and building datasets that can be sliced and diced.

PipelinesBatch ProcessingData ProcessingBig Data Analytics

Software Engineer II, AdSense Publisher Optimization

Oct 2015 – Apr 2017 · 1 yr 6 mos

Built the backend infrastructure for Ad Balance, a feature that helps publishers show fewer,
best-performing ads and provide better visitor experience on their sites with minimal drop in earnings.
Launched new features to the AdSense A/B experiments - automatic A/B experiments and autocomplete A/B experiments.
Launched new recommendation types for AdSense publishers - Matched content, Account linking with Google Analytics.

Full-Stack Development

Xerox research center india

Research Intern

Jun 2014 – Jul 2014 · 1 mo · Bangalore

Worked towards efficient and accurate transductive classification in Hypergraphs using a Heat-Kernel framework.
Developed a generic mathematical framework for classification in multi-view data represented as multi-layer graphs.
The framework learns a linear model that combines spectral information from multiple layers using the graph Laplacian.

Microsoft research

Research Intern

May 2013 – Jul 2013 · 2 mos · Redmond

Research Intern, Cloud Systems and Applications Group
Worked towards an efficient online traffic engineering scheme that exploits information on transfer sizes and deadlines to pack long-running transfers across network paths and time, using a tailored approximate solution to mixed packing and covering problem.
My work was a part of a paper “Calendaring for wide area networks” that has been published in ACM SIGCOMM 2014.