Robin S. — Senior Software Engineer

I’m a Senior LLM & AI Platform Engineer with 8+ years, with last 6 years focusing on machine learning, LLMs/NLP, and GenAI. I like taking LLM ideas from “cool demo” to reliable, observable, cost-aware production services. Initial 2 years, I worked on designing and building scalable, highly available backend & distributed systems with AWS. Recently I’ve been: • Designing RAG pipelines end-to-end: ingestion, chunking, embeddings (OpenAI/Hugging Face), vector DBs (Pinecone, Weaviate, FAISS), hybrid BM25 + dense search, reranking, and prompt registries. • Building LLM services on Kubernetes (EKS) with Docker, FastAPI, GitHub Actions CI/CD, and full telemetry (OpenTelemetry, Prometheus/Grafana, token/latency/cost metrics). • Working on LLM serving & LLMOps: vLLM/TGI, quantization, KV cache, batching, routing between managed APIs (OpenAI/Anthropic) and OSS models, plus profiling (torch.profiler, py-spy, basic CUDA) to tune performance and cost. • Prototyping agentic systems with LangChain/LangGraph: planner + tool-using agents (RAG search, productivity tools), structured tool-calls (Pydantic/JSON), traces for debugging, guardrails, and caching. I enjoy roles where I can: • Own LLM platforms end-to-end – RAG, agents, serving, evaluation, and observability • Improve retrieval & RAG quality (hybrid search, semantic search, evaluation harnesses: MRR, Recall@K, NDCG, LLM-as-judge) • Collaborate with product/infra/ML teams to build AI platforms and agentic workflows that actually move KPIs Open to roles focused on: LLM platforms, RAG, multi-agent systems, and AI infrastructure (serving, eval, observability). Core stack: • RAG & IR: OpenAI/HF models, Pinecone/Weaviate/FAISS, BM25 + hybrid search, reranking • LLMOps / MLOps: Docker, K8s (EKS/ECS), GitHub Actions, Terraform, MLflow, observability, SLOs • Cloud/backend: AWS (S3/ECS/EKS/Lambda/VPC/IAM, SageMaker), Kafka/Kinesis, OpenSearch/Elasticsearch, Python/Java/Node-TS

Stackforce AI infers this person is a SaaS-focused LLM and AI infrastructure engineer.

Location: Seattle, Washington, United States

Experience: 8 yrs 6 mos

Skills

Llmops
Rag
Mlops
Distributed Systems
Microservices

Career Highlights

Expert in building scalable LLM platforms.
Proficient in RAG and LLMOps methodologies.
Strong background in cloud-based distributed systems.

Work Experience

Shell

Senior Software Engineer (LLM/LLMops) (3 yrs 3 mos)

Thomson Reuters

Senior Software Engineer (ML/LLM Platform) (3 yrs 1 mo)

TradeRev

Software Developer (11 mos)

AT&T

Software Engineer (1 yr 3 mos)

Education

Bachelor of Technology (B.Tech.) at Guru Nanak Dev University

Robin S.

Senior Software Engineer

Seattle, Washington, United States8 yrs 6 mos experience

Most Likely To SwitchAI Enabled

Key Highlights

Expert in building scalable LLM platforms.
Proficient in RAG and LLMOps methodologies.
Strong background in cloud-based distributed systems.

Stackforce AI infers this person is a SaaS-focused LLM and AI infrastructure engineer.

Contact

Skills

Core Skills

LlmopsRagMlopsDistributed SystemsMicroservices

Other Skills

AWSAWS SageMakerAmazon DynamodbAmazon KinesisAmazon Web Services (AWS)Anomaly DetectionApache KafkaApollo GraphQLBM25CouchbaseDatadogDeep LearningDockerElasticsearchEmbeddings

About

Experience

8 yrs 6 mos

Total Experience

2 yrs 1 mo

Average Tenure

3 yrs 3 mos

Current Experience

Shell

Senior Software Engineer (LLM/LLMops)

Feb 2023 – Present · 3 yrs 3 mos · United States · Remote

Led the design and implementation of a GenAI platform on Kubernetes (EKS) with Dockerized microservices, GitHub Actions–based CI/CD, and end-to-end observability using OpenTelemetry + Prometheus/Grafana and SLO-driven monitoring for LLM-powered features.
Designed and shipped retrieval-augmented generation (RAG) services: document ingestion → chunking → embeddings (OpenAI, BGE/TE3-small) → vector stores (Weaviate, FAISS) with hybrid BM25 + dense retrieval and reranking, exposed via FastAPI endpoints for internal applications.
Built evaluation and guardrail tooling around RAG and chat flows, including goldset-based IR benchmarks (MRR/Recall@K), citation/correctness checks, and early LLM-as-a-judge / rubric-based evaluations, with regression tests in CI so prompt, retriever, and model changes could be rolled out safely.
Optimized LLM usage and serving by adding prompt and embedding versioning, context-window minimization, response caching, and token/cost telemetry, and by experimenting with self-hosted open-source models using vLLM/TGI, profiling with py-spy/torch.profiler/basic CUDA and trying INT8/4 quantization and speculative decoding alongside managed APIs (OpenAI/Anthropic) to inform routing and fallback strategies.
Collaborated with data science teams on training and fine-tuning models (via SageMaker/PyTorch) for internal use-cases, and integrated those models into GenAI services with shared patterns for deployment, monitoring, and rollout.
Mentored junior engineers and led design reviews for new GenAI services, helping set patterns for RAG, observability, and safe deployments.

GenAIKubernetesDockerGitHub ActionsOpenTelemetryPrometheus+8

Thomson reuters

Senior Software Engineer (ML/LLM Platform)

Jan 2020 – Feb 2023 · 3 yrs 1 mo · Toronto, Ontario, Canada · Remote

Designed and operated a central ML/LLM serving platform on Kubernetes (EKS) with Docker, MLflow model registry, and CI/CD (GitHub Actions/Jenkins), enabling PyTorch / Hugging Face Transformers models to be deployed as autoscaled REST/gRPC services with full observability (Prometheus, Grafana, OpenTelemetry).
Led semantic search and early RAG-style systems over large legal/news corpora: generated embeddings with Transformers, indexed vectors in FAISS alongside Elasticsearch/BM25, and implemented hybrid dense + sparse retrieval for internal Q&A and document discovery use cases.
Built evaluation and experimentation frameworks (NDCG, MRR, Recall@K, LLM-as-a-judge) and Kafka/Spark-based data & batch-inference pipelines, integrating model outputs into downstream Elasticsearch/OpenSearch indices and GraphQL APIs.

KubernetesDockerMLflowGitHub ActionsPrometheusGrafana+7

Traderev

Software Developer

Jan 2019 – Dec 2019 · 11 mos · Greater Toronto Area, Canada · On-site

Migrated from monolith to microservices, improving system resilience and deployment velocity.
Integrated AWS Elasticsearch and Kafka to power real-time vehicle search; cut search latency ~35%.
Strengthened real-time data pipelines with event streaming and containerized deploys.
Stack: Java, Kafka, Elasticsearch, Spring Boot, Docker, AWS, Microservices

JavaKafkaElasticsearchSpring BootDockerAWS+2

At&t

Software Engineer

Oct 2017 – Jan 2019 · 1 yr 3 mos · Toronto, Canada Area · On-site

Developed microservices using Java, Spring Boot, Docker, and Kubernetes for distributed content caching, reducing database load and improving content delivery speed for DirecTV.
Designed and deployed applications to maintain a distributed cache for live and on-demand content using Couchbase and MySQL on AWS, achieving a 93% reduction in database hits.
Built reporting tools with Couchbase MapReduce, enhancing data retrieval, analysis, and consistency across multiple sources.
Utilized Datadog for monitoring, ensuring performance, scalability, and reliability of microservices in a cloud environment.

JavaSpring BootDockerKubernetesCouchbaseMySQL+3