Khai Tran

Software Engineer

San Mateo, California, United States18 yrs 3 mos experience

Most Likely To SwitchAI ML Practitioner

Key Highlights

Expert in scaling AI systems for real-time applications.
Proven track record in architecting data privacy solutions.
Strong background in distributed systems and query optimization.

Stackforce AI infers this person is a SaaS expert with a focus on AI and data infrastructure.

Contact

Skills

Core Skills

Artificial Intelligence (ai)Large Language Models (llm)

Other Skills

Big DataCUDAGraphics Processing UnitC/C++C#JavaPerlSQL ServerOracle SQLPostgreSQLx86 AssemblyUnixWindowsLaTeXMatlab

About

3-in-1 engineer: data infra, AI, and distributed system. Skilled in database internals, query optimization and execution, model inference, GPU kernel programming, AI modeling, LLM fine tuning, and distributed system. Strong interests in physics, neuroscience, and cell biology.

Experience

18 yrs 3 mos

Total Experience

2 yrs 3 mos

Average Tenure

3 yrs 9 mos

Current Experience

2 roles

Senior Staff Software Engineer

Promoted

Sep 2022 – Present · 3 yrs 9 mos · Sunnyvale, CA

Having fun with 10x scaling different inference systems for Ads Ranking model, ranging from legacy DLRM models on TensorFlow to transformer-based model on PyTorch (see serving system in https://arxiv.org/pdf/2602.11410) and small language models on SGLang.
Architected and built a data privacy policy enforcement system to protect LinkedIn from DMA violations (see https://arxiv.org/pdf/2502.01998)

Big DataCUDAGraphics Processing UnitArtificial Intelligence (AI)Large Language Models (LLM)

Staff Software Engineer

Sep 2016 – Nov 2020 · 4 yrs 2 mos

My main focus at LinkedIn was to provide the portability for user data transformation code.
First project - building the near-realtime metrics platform
o Built the near-realtime metrics platform from scratch by auto-generating streaming
code with Apache Beam API from offline data transformation batch scripts in Pig Latin or Hive. This was almost my sole project where I investigated possible solutions for the problem, proposed the design, implemented the solution, developed a deployment system, and operated the service.
Second project - building a portable data transformation fluent API in Java that can be shared among online OLTP engines, nearline streaming engines, and offline batch engines.
Key features:
o Declarative and type-safe, similar to JOOQ ( https://www.jooq.org )
o Support imperative logics with Java Lambda functions, which are converted into Transport UDFs later
o Can be translated into any SQL dialects using Coral
o Easy to test on any type systems (like Avro GenericRecord, Spark Internal Row, … )
To provide those features, we architected the system with four components:
o Code generator to auto-generate type-safe structures from a given schema
o Core API and implementation with AST elements
o SQL compiler to translate ASTs into Calcite relational algebra plans and then target SQL using Coral
o A portable row-at-a-time engine that can be used to unit test user data transformation code or embedded in a host engine
Open source contributions:
o Apache Calcite: Converting Pig Latin scripts into Calcite relational algebra
https://github.com/apache/calcite/pull/1265
o A founding member and one of the major contributors of project Coral, a library for translating SQL among different dialects https://github.com/linkedin/coral
o Designed and contributed to project Transport UDFs, a framework for writing performant user-defined functions (UDFs) that are portable across multiple engines https://github.com/linkedin/transport

Airbnb

Staff Software Engineer

Nov 2020 – Jun 2022 · 1 yr 7 mos

In Data Infrastructure team

Amazon web services

2 roles

Software Engineer

Dec 2015 – Sep 2016 · 9 mos · Palo Alto, CA

In Redshift team, working on Redshift query processing.

Software Engineer

Aug 2014 – Nov 2015 · 1 yr 3 mos · Palo Alto, CA

In DynamoDB (NoSQL services at AWS) team, worked on every aspect of DynamoDB frontend.

Oracle

Senior Member of Technical Staff

Feb 2013 – Aug 2014 · 1 yr 6 mos · Redwood City, CA

Integrating Rapid, a hardware-software co-design system targeting large-scale data management and analysis, into Oracle RDBMS.
Designed and implemented cross-engine query optimization algorithms for Oracle query optimizer. Worked with Oracle query optimizer and query compilation code for splitting queries across two execution engines. (a patent filed)
Implemented a platform-independent representation of query execution plan for transporting execution plans across different query execution engines (like a serialization language for query execution plans.)
Designed a change propagation system that synchronizes data updates from Oracle RDBMS to Rapid and implemented a prototype for append-only insert statements.

Google

Summer intern

May 2012 – Aug 2012 · 3 mos · Mountain View, CA

Designed and implemented MapRedduce-based checksum workers for computing checksums of ads tables in Google stats servers. Obtained a speedup of 120x on a cluster of 200 machines.
Designed and implemented map-only MapReduce-based expansion workers for aggregating data among different versions of ads tables.

Microsoft

Summer intern (MSR)

Jun 2011 – Aug 2011 · 2 mos · Redmond, WA

Optimized and tuned a system, called Deuteronomy, for faster performance.
Proposed a new threading model to avoid the context switching cost.
Improved the system performance by a factor of 10.

Oracle

Summer intern

Jun 2010 – Aug 2010 · 2 mos

Analyzed the system to find the bottlenecks of the hash-join operator at the runtime.
Proposed solutions to eliminate the bottlenecks.

Microsoft jim gray systems labs

Research Assistant

Jan 2009 – Jan 2013 · 4 yrs · Madison, Wisconsin Area

Seeking for innovative concurrency control and data partitioning for OLTP workloads on multicores:
Implemented a system in C to run simple database transactions using hardware Transactional Memory, spinlocks, and database locks for concurrency control. Worked with a hardware prototype that does not support C compilers.
Developed a new concurrency control approach for highly-partitioned OLTP workloads
on multicore systems. Implemented the approach in a system running TPC-C transactions without using locking.
Developed a framework for automatically partitioning OLTP databases. Obtained a good partitioning solution for TPC-E with the framework.