Divya Sirala

CTO

Gurugram, Haryana, India3 yrs 8 mos experience
Most Likely To SwitchAI ML Practitioner

Key Highlights

  • Expert in building production-grade LLM systems.
  • Proficient in RAG architecture and benchmarking.
  • Strong leadership in AI/ML project execution.
Stackforce AI infers this person is a GenAI and AI Benchmarking specialist in the AI industry.

Contact

Skills

Core Skills

GenaiAgentic Ai DevelopmentRag ArchitectureSoftware Development

Other Skills

LangGraphScrumRetrieval-Augmented Generation (RAG)Artificial Intelligence (AI)Prompt DesignConfluenceProblem SolvingTeam LeadershipTeam ManagementAgile Project ManagementPrompt EvaluationLoRAMultimodal AIHugging Face TransformersOpenAI API

About

GenAI / Agentic AI Engineer with 4+ years of experience building production-grade LLM systems focused on reliability, benchmarking, and RAG architecture. I design evaluation-driven GenAI systems using LangGraph for multi-agent orchestration and LangChain for structured RAG pipelines, where prompts are versioned, agents are testable, and model performance is measured through automated benchmarking frameworks (TerminalBench-style). My work emphasizes reasoning evaluation, regression testing, determinism, and failure-mode analysis. I build enterprise-grade RAG systems with optimized chunking, retrieval quality tuning, grounding strategies, and hallucination control. Seeking senior GenAI / Agentic AI roles where production rigor, observability, and reliability are core system requirements.

Experience

3 yrs 8 mos
Total Experience
1 yr 2 mos
Average Tenure
1 yr 4 mos
Current Experience

Turing

2 roles

AI/ML Pod Lead

Promoted

Sep 2025Present · 9 mos · Remote

  • TBench 1.0 : Prompt Evaluation & Observability
  • Architected a LangSmith-style prompt evaluation pipeline, treating prompts as versioned, testable artifacts with controlled execution (frozen system prompts, low temperature, fixed tools). Led a pod of 6 engineers to curate task suites designed to expose reasoning failures, instruction-following gaps, and robustness issues. Evaluated GPT-5 and Claude Sonnet using traced runs and task-level scoring to identify prompt regressions, model weaknesses, and reliability tradeoffs, enabling data-driven prompt and model selection.
LangGraphScrumGenAIAgentic AI Development

AI/ML Engineer - GenAI & RAG - AI Benchmarking - Prompt Engineering

Jan 2025Present · 1 yr 5 mos · Remote

  • TBench 2.0 : Log-Driven Model Evaluation & Reliability
  • Designed prompts to stress-test and break model behavior across complex reasoning and edge cases. Built a log-driven model evaluation pipeline benchmarking GPT-Codex and Claude Sonnet against golden expectations. Analyzed execution logs to compare correctness, consistency, error handling, and failure patterns, surfacing subtle behavioral differences not visible in single-run testing and aligning with production-grade LLMOps practices.
  • Linux Environment LLM Benchmarking
  • Designed task prompts providing sufficient operational context for LLMs to solve terminal-based tasks in a Linux environment using only an initial instruction. Implemented a log-based evaluation workflow benchmarking GPT-5 and Claude against golden reference solutions. Evaluated task completion, reasoning fidelity, and failure recovery in non-chat, constrained execution environments to assess model suitability for system- and tool-oriented workloads.
  • RLHF, SFT & Chain-of-Thought Benchmarking
  • Worked on RLHF-, SFT-, and Chain-of-Thought–based benchmarking tasks to evaluate alignment, instruction adherence, and reasoning stability. Compared pre- and post-alignment behavior, focusing on reducing silent failures, improving consistency, and stabilizing long-context reasoning. Applied findings to guide prompt design and evaluation strategies in applied GenAI systems.
Retrieval-Augmented Generation (RAG)Artificial Intelligence (AI)GenAIRAG Architecture

Qbs learning

Data Scientist - GenAI & Agentic AI

Sep 2024Jan 2025 · 4 mos · Noida, Uttar Pradesh, India

  • Built and evaluated GenAI-powered solutions with agentic AI workflows for automation and intelligent decision-making.
  • Developed LLM-based prototypes (RAG, conversational AI) for education and training applications.
  • Collaborated with cross-functional teams to transform research concepts into scalable data science solutions.
Retrieval-Augmented Generation (RAG)Artificial Intelligence (AI)GenAIAgentic AI Development

Outlier

AI Trainer - Prompt Engineering & LLM Optimization - AI Benchmarking

Jan 2023Jan 2025 · 2 yrs · India · Remote

  • Trained and optimized LLMs using SFT, RLHF, CoT, and multimodal prompting.
  • Designed and reviewed datasets to enhance reasoning, accuracy, and alignment.
Artificial Intelligence (AI)Prompt DesignGenAIAgentic AI Development

Aristocrat technologies | emea

Software Developer

Sep 2022Aug 2024 · 1 yr 11 mos · Gurugram, Haryana, India

  • Developed and optimized C++ applications with focus on performance, multithreading, and debugging (GDB).
  • Contributed to game development systems on Linux, ensuring high reliability and scalability.
  • Collaborated in an Agile/SCRUM environment, streamlining development workflows with Git, JIRA, and CI/CD tools.
ConfluenceProblem SolvingSoftware Development

Education

NIT Jalandhar

Master of Technology - MTech — Control and Instrumentation

Jul 2020Jul 2022

Bipin Chandra Tripathi Kumaon Engineering College

Bachelor of Technology - BTech — Electronics and Communications Engineering

Jan 2014Jan 2018

Stackforce found 100+ more professionals with Genai & Agentic Ai Development

Explore similar profiles based on matching skills and experience