Ananya T. — AI Researcher

I’m a Master’s student in Computer Science at SJSU with 6+ years of industry experience as an AI/ML Engineer, GPU & Systems Engineer, and Backend Software Developer. I enjoy working at the intersection of deep learning, high-performance computing, and large-scale distributed systems — building solutions that are both intelligent and extremely efficient. Across roles, I have: Accelerated GPU pipelines (CUDA, TensorRT) and achieved up to 98% latency reduction in real-world ML systems. Built and optimized LLM, RAG, and multimodal AI systems using PyTorch, TensorFlow, Transformers, LangGraph, and vector databases. Developed backend microservices (Java, Spring Boot, Redis, Kafka) serving millions of users with low-latency, fault-tolerant architectures. Designed GPU-accelerated document AI pipelines, improving semantic structure extraction accuracy to 92% and boosting throughput 4–5×. Conducted research in generative modeling (VAE, GAN, Diffusion), benchmarking and optimizing models for stability, quality, and speed. I enjoy solving hard systems + ML problems: How do we make models faster, smarter, and more scalable? How do we bridge ML algorithms with hardware-aware optimization? How do we design distributed systems that stay reliable under massive load? My strengths: Deep Learning & LLMs — PyTorch, Transformers, diffusion models, vision models GPU Systems — CUDA kernels, TensorRT, mixed precision, profiling (Nsight) Backend Engineering — Java, Spring Boot, PostgreSQL, MongoDB, Redis, Kafka Systems Thinking — concurrency, performance tuning, distributed design I’m currently seeking Summer 2026 internships across Software Engineering, AI/ML, Systems/Infrastructure, GPU/Performance Engineering, and Applied Research.

Stackforce AI infers this person is a highly skilled AI/ML Engineer with expertise in backend systems and GPU performance optimization.

Location: San Jose, California, United States

Experience: 6 yrs

Skills

Gpu Systems
Deep Learning
Ai/ml Engineering
Backend Engineering

Career Highlights

Achieved 98% latency reduction in ML systems.
Designed GPU-accelerated pipelines for document processing.
Scaled AI systems to support thousands of concurrent users.

Work Experience

San José State University

AI Engineer (9 mos)

Airtel International LLP-Airtel Africa

Software Engineer (3 yrs 5 mos)

Hsbc Software Development Ind Pvt Ltd

Software Engineer (2 yrs 7 mos)

Indian Council of Medical Research (ICMR)

Software Intern (5 mos)

Education

Masters at San José State University

Diploma in Creative Writing at Indira Gandhi National Open University

Bachelor of Technology - BTech at Banasthali Vidyapith

High School Diploma at Uttam School for Girls - India

at Nirmala Convent School, Bulandshahr

Ananya T.

AI Researcher

San Jose, California, United States6 yrs experience

Highly StableAI Enabled

Key Highlights

Achieved 98% latency reduction in ML systems.
Designed GPU-accelerated pipelines for document processing.
Scaled AI systems to support thousands of concurrent users.

Stackforce AI infers this person is a highly skilled AI/ML Engineer with expertise in backend systems and GPU performance optimization.

Contact

Skills

Core Skills

Gpu SystemsDeep LearningAi/ml EngineeringBackend Engineering

Other Skills

AlgorithmsApache KafkaArtificial Intelligence (AI)BitbucketC++CUDAColBERTCompetitive CodingData StructuresDatabase Management System (DBMS)Developer ToolsDocFormerDockerDonutFAISS

About

Experience

6 yrs

Total Experience

3 yrs

Average Tenure

Current Experience

San josé state university

AI Engineer

Sep 2025 – Present · 9 mos · San Jose, California, United States · On-site

Designed and deployed a GPU-accelerated Document AI pipeline to process 10,000+ OCR-scanned documents using LayoutLM, Donut, Pix2Struct, and DocFormer. Achieved 92% semantic structure accuracy.
Optimized inference using TensorRT, CUDA kernels, mixed precision (FP16/INT8), reducing per-document latency from 45s → 8s (82% reduction).
Built asynchronous data loaders, parallel batching, and multi-GPU orchestration (PyTorch DDP), improving GPU utilization by 45%.
Fine-tuned multimodal transformers on 2,000+ annotated documents, improving table detection accuracy by 34% and speeding up post-processing by 60%.
Delivered accessibility-compliant outputs (WCAG 2.1 AA), enabling large-scale AI-driven digitization for historical archives.
Led full experimentation cycles—evaluation design, ablations, profiling, and system-level optimization.

GPU-accelerated Document AI pipelineLayoutLMDonutPix2StructDocFormerTensorRT+5

Airtel international llp-airtel africa

Software Engineer

Feb 2022 – Jul 2025 · 3 yrs 5 mos · Gurugram, Haryana, India

AI-Powered Employee Platform (LLM/RAG + Systems Engineering)
Architected a GPU-accelerated LLM RAG system serving 500K+ monthly queries; deployed FAISS + ColBERT retrieval achieving 43% improvement in MRR.
Reduced LLM inference latency by 98% (300s → 6s) via INT8 quantization, pruning, TensorRT optimization, and GPU memory tuning.
Built a LangGraph-based multi-agent orchestration framework automating HR, CRM, and payroll workflows, cutting response times from 24 hours → 3 minutes.
Scaled the platform to support 10,000+ concurrent users, implementing backpressure mechanisms, caching (Redis), and distributed load balancing.
Backend & Distributed Systems Contributions
Debugged and resolved 100+ production issues across employee and payment platforms.
Optimized new-joiner portal load times by 98%, improving UX and system throughput.
Built Kafka-backed event-driven microservices and notification services (10,000+ daily events, 99.5% delivery rate).
Deployed caching layers via Redis, reducing DB queries by 70% and response latency (800ms → 240ms).

GPU-accelerated LLM RAG systemFAISSColBERTINT8 quantizationTensorRT optimizationLangGraph+4

Hsbc software development ind pvt ltd

Software Engineer

Jul 2019 – Feb 2022 · 2 yrs 7 mos · Pune, Maharashtra, India

Built low-latency APIs for global payment tracking platforms used by 5,000+ enterprise clients, reducing response times from 1.2s → 350ms (71% improvement) via Redis caching and query parallelization.
Engineered event-driven workflows for payment updates, supporting 1M+ daily transactions with high reliability.
Designed a transaction recovery service (Saga pattern), auto-resolving 95% of failed transactions due to network disruptions.
Developed onboarding and payment mapping systems with 90% unit test coverage, improving reliability and long-term maintainability.
Applied ML models (fraud detection, clustering, recommendation systems), achieving 84% precision and improving cross-sell conversion by 22%.

Low-latency APIsRedis cachingevent-driven workflowsfraud detectionclusteringrecommendation systems+2

indian council of medical research (icmr)

Software Intern

Jul 2018 – Dec 2018 · 5 mos · Delhi, India · On-site

Conducted ML research on antibiotic resistance prediction across 5,000+ medical records using Random Forest, Logistic Regression, and XGBoost.
Improved minority class recall by 23% using SMOTE and PCA-based feature engineering.
Built full research pipeline — preprocessing → feature selection → modeling → evaluation with AUC-ROC metrics (achieved 0.82 AUC).
Presented findings to biomedical researchers, demonstrating clear statistical reasoning and reproducibility.

ML researchRandom ForestLogistic RegressionXGBoostSMOTEPCA+1