Robin S.

Senior Software Engineer

Seattle, Washington, United States8 yrs 4 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Expert in building scalable LLM platforms.
  • Proficient in RAG and LLMOps methodologies.
  • Strong background in cloud-based distributed systems.
Stackforce AI infers this person is a SaaS-focused LLM and AI infrastructure engineer.

Contact

Skills

Core Skills

LlmopsRagMlopsDistributed SystemsMicroservices

Other Skills

AWSAWS SageMakerAmazon DynamodbAmazon KinesisAmazon Web Services (AWS)Anomaly DetectionApache KafkaApollo GraphQLBM25CouchbaseDatadogDeep LearningDockerElasticsearchEmbeddings

About

I’m a Senior LLM & AI Platform Engineer with 8+ years, with last 6 years focusing on machine learning, LLMs/NLP, and GenAI. I like taking LLM ideas from “cool demo” to reliable, observable, cost-aware production services. Initial 2 years, I worked on designing and building scalable, highly available backend & distributed systems with AWS. Recently I’ve been: • Designing RAG pipelines end-to-end: ingestion, chunking, embeddings (OpenAI/Hugging Face), vector DBs (Pinecone, Weaviate, FAISS), hybrid BM25 + dense search, reranking, and prompt registries. • Building LLM services on Kubernetes (EKS) with Docker, FastAPI, GitHub Actions CI/CD, and full telemetry (OpenTelemetry, Prometheus/Grafana, token/latency/cost metrics). • Working on LLM serving & LLMOps: vLLM/TGI, quantization, KV cache, batching, routing between managed APIs (OpenAI/Anthropic) and OSS models, plus profiling (torch.profiler, py-spy, basic CUDA) to tune performance and cost. • Prototyping agentic systems with LangChain/LangGraph: planner + tool-using agents (RAG search, productivity tools), structured tool-calls (Pydantic/JSON), traces for debugging, guardrails, and caching. I enjoy roles where I can: • Own LLM platforms end-to-end – RAG, agents, serving, evaluation, and observability • Improve retrieval & RAG quality (hybrid search, semantic search, evaluation harnesses: MRR, Recall@K, NDCG, LLM-as-judge) • Collaborate with product/infra/ML teams to build AI platforms and agentic workflows that actually move KPIs Open to roles focused on: LLM platforms, RAG, multi-agent systems, and AI infrastructure (serving, eval, observability). Core stack: • RAG & IR: OpenAI/HF models, Pinecone/Weaviate/FAISS, BM25 + hybrid search, reranking • LLMOps / MLOps: Docker, K8s (EKS/ECS), GitHub Actions, Terraform, MLflow, observability, SLOs • Cloud/backend: AWS (S3/ECS/EKS/Lambda/VPC/IAM, SageMaker), Kafka/Kinesis, OpenSearch/Elasticsearch, Python/Java/Node-TS

Experience

Shell

Senior Software Engineer (LLM/LLMops)

Feb 2023Present · 3 yrs 1 mo · United States · Remote

  • Led the design and implementation of a GenAI platform on Kubernetes (EKS) with Dockerized microservices, GitHub Actions–based CI/CD, and end-to-end observability using OpenTelemetry + Prometheus/Grafana and SLO-driven monitoring for LLM-powered features.
  • Designed and shipped retrieval-augmented generation (RAG) services: document ingestion → chunking → embeddings (OpenAI, BGE/TE3-small) → vector stores (Weaviate, FAISS) with hybrid BM25 + dense retrieval and reranking, exposed via FastAPI endpoints for internal applications.
  • Built evaluation and guardrail tooling around RAG and chat flows, including goldset-based IR benchmarks (MRR/Recall@K), citation/correctness checks, and early LLM-as-a-judge / rubric-based evaluations, with regression tests in CI so prompt, retriever, and model changes could be rolled out safely.
  • Optimized LLM usage and serving by adding prompt and embedding versioning, context-window minimization, response caching, and token/cost telemetry, and by experimenting with self-hosted open-source models using vLLM/TGI, profiling with py-spy/torch.profiler/basic CUDA and trying INT8/4 quantization and speculative decoding alongside managed APIs (OpenAI/Anthropic) to inform routing and fallback strategies.
  • Collaborated with data science teams on training and fine-tuning models (via SageMaker/PyTorch) for internal use-cases, and integrated those models into GenAI services with shared patterns for deployment, monitoring, and rollout.
  • Mentored junior engineers and led design reviews for new GenAI services, helping set patterns for RAG, observability, and safe deployments.
GenAIKubernetesDockerGitHub ActionsOpenTelemetryPrometheus+8

Thomson reuters

Senior Software Engineer (ML/LLM Platform)

Jan 2020Feb 2023 · 3 yrs 1 mo · Toronto, Ontario, Canada · Remote

  • Designed and operated a central ML/LLM serving platform on Kubernetes (EKS) with Docker, MLflow model registry, and CI/CD (GitHub Actions/Jenkins), enabling PyTorch / Hugging Face Transformers models to be deployed as autoscaled REST/gRPC services with full observability (Prometheus, Grafana, OpenTelemetry).
  • Led semantic search and early RAG-style systems over large legal/news corpora: generated embeddings with Transformers, indexed vectors in FAISS alongside Elasticsearch/BM25, and implemented hybrid dense + sparse retrieval for internal Q&A and document discovery use cases.
  • Built evaluation and experimentation frameworks (NDCG, MRR, Recall@K, LLM-as-a-judge) and Kafka/Spark-based data & batch-inference pipelines, integrating model outputs into downstream Elasticsearch/OpenSearch indices and GraphQL APIs.
KubernetesDockerMLflowGitHub ActionsPrometheusGrafana+7

Traderev

Software Developer

Jan 2019Dec 2019 · 11 mos · Greater Toronto Area, Canada · On-site

  • Migrated from monolith to microservices, improving system resilience and deployment velocity.
  • Integrated AWS Elasticsearch and Kafka to power real-time vehicle search; cut search latency ~35%.
  • Strengthened real-time data pipelines with event streaming and containerized deploys.
  • Stack: Java, Kafka, Elasticsearch, Spring Boot, Docker, AWS, Microservices
JavaKafkaElasticsearchSpring BootDockerAWS+2

At&t

Software Engineer

Oct 2017Jan 2019 · 1 yr 3 mos · Toronto, Canada Area · On-site

  • Developed microservices using Java, Spring Boot, Docker, and Kubernetes for distributed content caching, reducing database load and improving content delivery speed for DirecTV.
  • Designed and deployed applications to maintain a distributed cache for live and on-demand content using Couchbase and MySQL on AWS, achieving a 93% reduction in database hits.
  • Built reporting tools with Couchbase MapReduce, enhancing data retrieval, analysis, and consistency across multiple sources.
  • Utilized Datadog for monitoring, ensuring performance, scalability, and reliability of microservices in a cloud environment.
JavaSpring BootDockerKubernetesCouchbaseMySQL+3

Education

Guru Nanak Dev University

Bachelor of Technology (B.Tech.) — Computer Software Engineering

Stackforce found 100+ more professionals with Llmops & Rag

Explore similar profiles based on matching skills and experience