Ananya T.

AI Researcher

San Jose, California, United States6 yrs experience
Highly StableAI Enabled

Key Highlights

  • Achieved 98% latency reduction in ML systems.
  • Designed GPU-accelerated pipelines for document processing.
  • Scaled AI systems to support thousands of concurrent users.
Stackforce AI infers this person is a highly skilled AI/ML Engineer with expertise in backend systems and GPU performance optimization.

Contact

Skills

Core Skills

Gpu SystemsDeep LearningAi/ml EngineeringBackend Engineering

Other Skills

AlgorithmsApache KafkaArtificial Intelligence (AI)BitbucketC++CUDAColBERTCompetitive CodingData StructuresDatabase Management System (DBMS)Developer ToolsDocFormerDockerDonutFAISS

About

I’m a Master’s student in Computer Science at SJSU with 6+ years of industry experience as an AI/ML Engineer, GPU & Systems Engineer, and Backend Software Developer. I enjoy working at the intersection of deep learning, high-performance computing, and large-scale distributed systems — building solutions that are both intelligent and extremely efficient. Across roles, I have: Accelerated GPU pipelines (CUDA, TensorRT) and achieved up to 98% latency reduction in real-world ML systems. Built and optimized LLM, RAG, and multimodal AI systems using PyTorch, TensorFlow, Transformers, LangGraph, and vector databases. Developed backend microservices (Java, Spring Boot, Redis, Kafka) serving millions of users with low-latency, fault-tolerant architectures. Designed GPU-accelerated document AI pipelines, improving semantic structure extraction accuracy to 92% and boosting throughput 4–5×. Conducted research in generative modeling (VAE, GAN, Diffusion), benchmarking and optimizing models for stability, quality, and speed. I enjoy solving hard systems + ML problems: How do we make models faster, smarter, and more scalable? How do we bridge ML algorithms with hardware-aware optimization? How do we design distributed systems that stay reliable under massive load? My strengths: Deep Learning & LLMs — PyTorch, Transformers, diffusion models, vision models GPU Systems — CUDA kernels, TensorRT, mixed precision, profiling (Nsight) Backend Engineering — Java, Spring Boot, PostgreSQL, MongoDB, Redis, Kafka Systems Thinking — concurrency, performance tuning, distributed design I’m currently seeking Summer 2026 internships across Software Engineering, AI/ML, Systems/Infrastructure, GPU/Performance Engineering, and Applied Research.

Experience

6 yrs
Total Experience
3 yrs
Average Tenure
--
Current Experience

San josé state university

AI Engineer

Sep 2025Present · 9 mos · San Jose, California, United States · On-site

  • Designed and deployed a GPU-accelerated Document AI pipeline to process 10,000+ OCR-scanned documents using LayoutLM, Donut, Pix2Struct, and DocFormer. Achieved 92% semantic structure accuracy.
  • Optimized inference using TensorRT, CUDA kernels, mixed precision (FP16/INT8), reducing per-document latency from 45s → 8s (82% reduction).
  • Built asynchronous data loaders, parallel batching, and multi-GPU orchestration (PyTorch DDP), improving GPU utilization by 45%.
  • Fine-tuned multimodal transformers on 2,000+ annotated documents, improving table detection accuracy by 34% and speeding up post-processing by 60%.
  • Delivered accessibility-compliant outputs (WCAG 2.1 AA), enabling large-scale AI-driven digitization for historical archives.
  • Led full experimentation cycles—evaluation design, ablations, profiling, and system-level optimization.
GPU-accelerated Document AI pipelineLayoutLMDonutPix2StructDocFormerTensorRT+5

Airtel international llp-airtel africa

Software Engineer

Feb 2022Jul 2025 · 3 yrs 5 mos · Gurugram, Haryana, India

  • AI-Powered Employee Platform (LLM/RAG + Systems Engineering)
  • Architected a GPU-accelerated LLM RAG system serving 500K+ monthly queries; deployed FAISS + ColBERT retrieval achieving 43% improvement in MRR.
  • Reduced LLM inference latency by 98% (300s → 6s) via INT8 quantization, pruning, TensorRT optimization, and GPU memory tuning.
  • Built a LangGraph-based multi-agent orchestration framework automating HR, CRM, and payroll workflows, cutting response times from 24 hours → 3 minutes.
  • Scaled the platform to support 10,000+ concurrent users, implementing backpressure mechanisms, caching (Redis), and distributed load balancing.
  • Backend & Distributed Systems Contributions
  • Debugged and resolved 100+ production issues across employee and payment platforms.
  • Optimized new-joiner portal load times by 98%, improving UX and system throughput.
  • Built Kafka-backed event-driven microservices and notification services (10,000+ daily events, 99.5% delivery rate).
  • Deployed caching layers via Redis, reducing DB queries by 70% and response latency (800ms → 240ms).
GPU-accelerated LLM RAG systemFAISSColBERTINT8 quantizationTensorRT optimizationLangGraph+4

Hsbc software development ind pvt ltd

Software Engineer

Jul 2019Feb 2022 · 2 yrs 7 mos · Pune, Maharashtra, India

  • Built low-latency APIs for global payment tracking platforms used by 5,000+ enterprise clients, reducing response times from 1.2s → 350ms (71% improvement) via Redis caching and query parallelization.
  • Engineered event-driven workflows for payment updates, supporting 1M+ daily transactions with high reliability.
  • Designed a transaction recovery service (Saga pattern), auto-resolving 95% of failed transactions due to network disruptions.
  • Developed onboarding and payment mapping systems with 90% unit test coverage, improving reliability and long-term maintainability.
  • Applied ML models (fraud detection, clustering, recommendation systems), achieving 84% precision and improving cross-sell conversion by 22%.
Low-latency APIsRedis cachingevent-driven workflowsfraud detectionclusteringrecommendation systems+2

indian council of medical research (icmr)

Software Intern

Jul 2018Dec 2018 · 5 mos · Delhi, India · On-site

  • Conducted ML research on antibiotic resistance prediction across 5,000+ medical records using Random Forest, Logistic Regression, and XGBoost.
  • Improved minority class recall by 23% using SMOTE and PCA-based feature engineering.
  • Built full research pipeline — preprocessing → feature selection → modeling → evaluation with AUC-ROC metrics (achieved 0.82 AUC).
  • Presented findings to biomedical researchers, demonstrating clear statistical reasoning and reproducibility.
ML researchRandom ForestLogistic RegressionXGBoostSMOTEPCA+1

Education

San José State University

Masters — Computer Science

Aug 2025Jun 2027

Indira Gandhi National Open University

Diploma in Creative Writing

Jul 2022Jul 2023

Banasthali Vidyapith

Bachelor of Technology - BTech — Computer Science

Jul 2015May 2019

Uttam School for Girls - India

High School Diploma

Apr 2008Apr 2014

Nirmala Convent School, Bulandshahr

Apr 2000Mar 2008

Stackforce found 100+ more professionals with Gpu Systems & Deep Learning

Explore similar profiles based on matching skills and experience