Kayhan Babaee

CTO

Vancouver, British Columbia, Canada8 yrs 6 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in architecting scalable AI platforms.
  • Proven leadership in cross-functional engineering teams.
  • Strong focus on regulatory compliance in AI systems.
Stackforce AI infers this person is a Healthcare-focused AI Engineering leader with expertise in scalable systems and regulatory compliance.

Contact

Skills

Core Skills

Ai EngineeringSystem ArchitectureEngineering ManagementTime Series ForecastingTeam LeadershipMlopsSystem DesignMachine LearningNatural Language Processing (nlp)

Other Skills

Agentic AI DevelopmentLarge Language Models (LLM)Data AnalysisPeople ManagementSystem MonitoringLeadershipHierarchical ReconciliationAnalytical SkillsBudgetingMentoringContainerizationAmazon Web Services (AWS)Test-Driven DevelopmentAPI DevelopmentTerraform

About

As Director of ML and AI Engineering at ODAIA, I lead the strategy and execution of the AI and data platform powering commercial and marketing intelligence products in life sciences. My focus is scaling AI from individual models and concepts into durable organizational capability that teams across the company can build on with confidence. I oversee multiple engineering teams responsible for ML platforms, analytics infrastructure, and AI product enablement. We operate large-scale ingestion and feature computation pipelines, centralized feature stores, and production LLM systems that generate explainable insights from complex healthcare data. The platform is built around Kubernetes-based system design and can run across AWS, Azure, and GCP to support reliability, regulatory requirements, and workload portability. We design services as composable primitives so product teams can safely integrate predictive models, decision logic, and AI reasoning into customer-facing workflows. A major focus of the platform is supporting agentic workloads. We build systems that coordinate multi-step analysis, investigation, and recommendation flows across structured and unstructured data, allowing AI to plan, retrieve evidence, and produce auditable outputs. This requires strong observability, execution tracing, guardrails, and deterministic fallbacks so automated reasoning remains trustworthy in regulated environments. Recent initiatives include a production agentic AI platform that plans and executes multi-step analytical workflows over customer data, hierarchical time-series forecasting, and standardized AI service interfaces for rapid adoption across product teams. I work closely with product, data science, and executive leadership to define architectural direction and platform contracts so new capabilities can be delivered without rebuilding infrastructure. The objective is a cohesive AI ecosystem that scales product development while improving consistency and reliability. I invest heavily in developing leaders, mentoring managers and senior engineers, and establishing clear ownership boundaries that increase autonomy while preserving accountability. I stay closely involved in architecture and system design, guiding decisions around service boundaries, data contracts, and operational maturity. While my background spans Python, Go, Rust, and SQL, my focus is designing scalable distributed systems and production ML and LLM platforms with strong observability, fault tolerance, performance isolation, and cost awareness built into the platform from the start.

Experience

8 yrs 6 mos
Total Experience
1 yr 11 mos
Average Tenure
3 mos
Current Experience

Odaia

5 roles

Director of ML & AI Engineering

Promoted

Mar 2026Present · 3 mos

  • ● Lead the AI engineering organization building the company's production agentic AI platform for life-sciences commercial teams. The system autonomously plans and executes multi-step analytical workflows over customer data, delivering finished analytical artifacts (narrated insights, charts, and formatted reports) rather than raw answers, with multi-turn context continuity.
  • ● Own the distributed-systems architecture underpinning the platform: a gateway–worker design with sticky-session routing over consistent hashing, autoscaling worker pools, service discovery, and full OpenTelemetry tracing. This enables predictable latency, fault isolation, and state continuity for long-running agentic conversations under production load.
  • ● Architected a single-tenant external deployment model where each enterprise customer receives a dedicated, isolated instance exposed as an MCP server, with client identity fixed at boot and scoped skill and data allowlists. The same platform codebase serves multiple customers under strict data-residency and regulatory requirements.
  • ● Designed a modular skills framework that lets the agent take bounded, auditable actions across retrieval, analysis, artifact generation, and external-system integration, with per-role and per-customer capability gating delivered via configuration rather than code releases.
  • ● Established a production LLM evaluation system as the governance backbone of the platform: systematic response-quality measurement, automated validation pipelines, full traceability from answer to source data, prompt versioning, and per-query cost tracking. This is a prerequisite for trustworthy agentic AI in a regulated environment.
  • ● Partner with product and executive leadership on commercial positioning and external-delivery readiness, aligning architecture, compliance posture, and team capacity with the sales pipeline.
Agentic AI DevelopmentLarge Language Models (LLM)System ArchitectureMachine LearningData AnalysisAI Engineering

Senior Engineering Manager

Promoted

Dec 2025Mar 2026 · 3 mos

  • ● Own the next generation time series forecasting platform, leading design of hierarchical reconciliation pipelines that align forecasts across brands, markets, and channels while running reliably at large scale with clear SLAs, observability, and cost guardrails.
  • ● Treated developer experience as a first-class responsibility, driving improvements in monorepo tooling, CI pipelines, internal CLIs, and documentation so that ML engineers, data scientists, and platform engineers could ship new models and LLM-powered features faster with fewer deployment issues and much smoother onboarding.
People ManagementTime Series ForecastingSystem MonitoringLeadershipHierarchical ReconciliationEngineering Management

Engineering Manager

Jan 2024Dec 2025 · 1 yr 11 mos

  • ● Managed two cross-functional teams, overseeing comprehensive project planning, execution, and delivery while actively fostering a culture of collaboration and continuous improvement to enhance team productivity and drive innovation across the organization.
  • ● Spearheaded development and implementation of a centralized feature store as a feature generation source for ML models, ensuring complete feature traceability and significantly reducing costs by eliminating unnecessary data generation. The system enabled faster predictions/inference, facilitated rapid model iterations for the data science team.
  • ● Led the team in mastering and implementing Dagster on Kubernetes for workflow orchestration, substantially enhancing feature store pipeline reliability and maintainability while ensuring cloud-agnostic capabilities for future scalability, which can be deployed in customers' cloud end-to-end.
  • ● Directed the design and deployment of Omni-Channel Attribution and Attribution Sequencing models to enhance marketing attribution accuracy. Developed algorithms to determine optimal sequences of channels that maximize specific signals throughout the system over time, leading to substantial improvements in attribution analysis precision.
  • ● Optimized complex data pipelines handling large volumes of data through strategically deploying tiered infrastructure and systematically resolving out-of-memory (OOM) and thresholding issues. Successfully increased resource utilization for a workload of approximately 5,000 container executions per hour (at peak), resulting in significant operational cost reductions.
  • ● Designed and deployed an insight pipeline using Large Language Models (LLMs) for time-series prediction in the pharma sector, enhancing model explainability and decision-making processes.
  • ● Led the strategic integration of Snowflake as a comprehensive data warehousing solution for analytics and platform teams, successfully transitioning from legacy systems.
System ArchitectureEngineering ManagementAnalytical SkillsBudgetingMentoring

ML Lead

Jul 2023Jan 2024 · 6 mos

  • ● Cultivated a robust culture of collaboration and continuous improvement by initiating regular knowledge-sharing sessions and implementing streamlined communication processes, resulting in streamlined delivery cycles and accelerated innovation cycles.
  • ● Spearheaded the implementation and supervision of comprehensive testing protocols, including expanded unit and integration testing frameworks. This initiative ensured the development of robust and reliable code, significantly elevating overall code quality standards.
  • ● Led the development of a Pydantic-validated interface for unified configuration, dramatically improving consistency and effectively mitigating configuration issues from multiple sources, resulting in enhanced system reliability and reduced debugging time.
  • ● Directed the strategic shift from AWS RDS dependencies to a more cost-effective data lake structure on S3, leveraging technologies like DuckDB and Polars. This move reduced operational costs and significantly improved the performance and time efficiency of critical project pipelines.
  • ● Implemented a SOC2-compliant image strategy utilizing slim containers, successfully addressed cold-start issues, and optimized deployment efficiency across the platform. Introduced an advanced dependency management system with Poetry and Pyenv, ensuring consistently reproducible environments and streamlining the environment setup process.
  • ● Integrated static code analysis tools and linters (Ruff, Yapf) into the CI/CD pipeline as pre-commit steps, enforcing stringent code quality standards and consistency. Additionally, optimized test infrastructure and environment setup processes, ensuring highly efficient and reliable testing integrated seamlessly into PR and pre-commit workflows.
Team LeadershipContainerizationAmazon Web Services (AWS)Test-Driven DevelopmentAPI DevelopmentMLOps

Senior Machine Learning/MLOps Engineer

Feb 2023Jul 2023 · 5 mos

  • ● Successfully transitioned the organization to a monorepo structure, consolidating multiple repositories into one unified system to streamline machine learning code management and deployment processes, significantly improving development efficiency.
  • ● Implemented comprehensive branch-specific tests and infrastructure deployment protocols for rigorous code validation, ensuring exceptional quality and stability of the machine learning codebase throughout the development lifecycle.
  • ● Applied infrastructure as code principles across the entire pipeline, enabling version-controlled resource configuration and highly efficient management of machine learning infrastructure components.
  • ● Designed and implemented an intuitive CLI (Command Line Interface) for the monorepo, effectively automating routine tasks, substantially reducing operational time, and minimizing potential errors in machine learning operations.
  • ● Developed a general interface for unified configuration across all modules, ensuring consistent and thoroughly validated configuration settings throughout the machine learning pipeline, significantly reducing configuration-related issues and enhancing overall system reliability for different clients.
MLOpsTime Series ForecastingTerraformAmazon ECSSystem TestingSystem Design

Zemply

Advisor

Jun 2024Jan 2026 · 1 yr 7 mos

  • Collaborated on backend system design to enhance platform functionality and user experience.
  • Integrated AI-powered photo recognition tools to streamline workflows and improve efficiency.
  • Advised on the map recommendation feature for the mobile app, enhancing user navigation and engagement.
iOS DevelopmentAndroid DevelopmentReact NativeCloudflare

Voiceflow

Machine Learning Engineer

Jan 2022Feb 2023 · 1 yr 1 mo · Vancouver, British Columbia, Canada

  • ● Led a team of data scientists and infrastructure engineers in integrating in-house Natural Language Understanding (NLU) training and inference pipelines into the main product. The solution was designed to scale for 100k Voiceflow users, delivering a real-time, high-accuracy, and cost-effective performance that outperformed and replaced the Microsoft LUIS API in both English and Japanese.
  • ● Designed and prototyped user-centered intent and entity classifier models using conditional random fields (CRFs) and pre-trained language models. Customized the models for each user based on the intent and entity data provided by the user and established an automated deployment pipeline.
  • ● Implemented logging and monitoring systems for the training and inference pipelines of ML models in production using Google Cloud Platform (GCP) native services and infrastructure as code (IaC). These systems captured critical events and alerted stakeholders about model performance and latency, ensuring prompt resolution of any issues.
Python (Programming Language)PyTorchNatural Language Processing (NLP)NLP LibrariesMachine Learning AlgorithmsMachine Learning

Cymax group technologies

2 roles

Senior Machine Learning Engineer/Solution Architect

Jul 2021Jan 2022 · 6 mos · Vancouver, British Columbia, Canada

  • ● Led a cross-functional team of data scientists and software developers to design, develop, test, deploy, and monitor a product-carrier damage estimation engine. Managed project timelines and resources, ensuring delivery of high-quality, scalable solutions that met business objectives and customer requirements.
  • ● Collaborated with stakeholders to gather requirements, design solution architecture, and develop testing and validation strategies to ensure high accuracy and reliability of the product-carrier damage estimation engine.
  • ● Developed and deployed an end-to-end machine learning pipeline that utilized text and image similarity algorithms to accurately identify listing errors, ultimately reducing the number of erroneous orders and improving customer satisfaction.
Azure Kubernetes Service (AKS)Pricing StrategyK-Nearest Neighbors (KNN)Approximation AlgorithmsMicrosoft SQL ServerMachine Learning+1

Artificial Intelligence Engineer

May 2020Jul 2021 · 1 yr 2 mos · Vancouver, British Columbia, Canada

  • ● Implemented an ML-based pricing engine that leveraged attention mechanism architecture to drive price competitiveness and increase profit margins for the e-commerce department.
  • ● Designed and deployed the product similarity algorithm (retrieval and ranking) based on deep neural network (DNN) image feature extraction and approximate nearest neighbor (ANN) on Azure Kubernetes Service (AKS).
  • ● Streamlined AI/ML development operations by creating and implementing codebase templates, CI/CD manifests, and deployment pipelines on AKS, resulting in reduced operational overhead and cost savings.
Python (Programming Language)Machine LearningKnowledge EngineeringAlgorithm DevelopmentGitOpsMLOps

Flarenet network

Research Assistant

Sep 2017Apr 2020 · 2 yrs 7 mos · Vancouver, Canada Area

  • ● Developed Python/Fortran programs to optimize and automate the calculation of Black Carbon (BC) optical properties, taking into account the complex physical properties of BC and the limited data available on its optical properties.
  • ● Improved instrument readings accuracy and reduced uncertainty by developing a Python program to calculate losses in multipart sampling lines during measurement campaigns. This was critical as particles were being lost due to reasons such as gravitational settling and inertial deposition on the walls of the tubing during directional changes.
  • ● Measured and characterized the BC concentration emitted by oil and gas industry flares using a new, low-weight, state-of-the-art sensor.
  • ● Investigated the critical global concerns surrounding optical and morphological properties of particulate emissions, explicitly focusing on BC as a climate-forcing agent. Conducted research on flare-generated BC as a significant worldwide emission and the most critical source of BC deposition in the Arctic.
Research and Development (R&D)Measurement SystemsAerosol ScienceFortranClimate Change

Education

The University of British Columbia

Master of Science - MS — Applied Science

Jan 2017Jan 2020

Sharif University of Technology

Bachelor’s Degree — Applied Science

Jan 2012Jan 2017

Stackforce found 100+ more professionals with Ai Engineering & System Architecture

Explore similar profiles based on matching skills and experience