Cristian Orellana

AI Researcher

San Francisco, California, United States19 yrs experience

Highly Stable

Key Highlights

Expert in data infrastructure and efficiency optimization.
Led multiple high-impact data engineering projects.
Strong background in telemetry and privacy-aware systems.

Stackforce AI infers this person is a Data Engineering expert with a focus on infrastructure efficiency in tech-driven environments.

Contact

Skills

Core Skills

Data EngineeringInfrastructure EfficiencySystem ArchitectureTechnical LeadershipInfrastructure DesignProcess OptimizationMachine Learning

Other Skills

GPUscompute analyticscapacity planningresource utilizationcost attributionhardware healthcore datasetsData Qualityefficiencylogging systemTelemetryprivacystructured loggingtechnical architecturedeveloper efficiency

Experience

19 yrs

Total Experience

3 yrs

Average Tenure

9 mos

Current Experience

Openai

Data

Sep 2025 – Present · 9 mos · San Francisco Bay Area · Hybrid

GPUs & compute analytics / efficiency.
I design and build processes and datasets that help the company with capacity planning (how much and what compute we need to buy, where to locate it), supply understanding (what we are purchasing, to whom, when it will arrive, at what price, etc), resource utilization (the best pairing between models and chip models while minimizing idle capacity), resource tracking (what products/models are running on what machines), cost attribution (how much we spend x different dimensions) and hardware health (minimize downtime and unhealthy compute) while working on infra and platform efficiency on the side. Fun stuff.

GPUscompute analyticscapacity planningresource utilizationcost attributionhardware health+2

Netflix

Sr. Staff Data Eng

Aug 2023 – Sep 2025 · 2 yrs 1 mo · Los Gatos, California, United States · Remote

Lead the group in charge of the definition and implementation of core datasets that describe how users interact with the product.
Continuing with Data Quality, in particular its measurement, prevention, detection and resolution. Creating processes to ensure that no regressions are introduced within the life cycle of the data, from the client devices to the specialized tables that live in the warehouse.
Leveraging my experience in the efficiency space to detect opportunities and build systems to keep our systems operating as lean as possible.
Previously,
In this position I am helping defining the architecture of the new logging system for Netflix.
My main labors are mostly the ones of a system architect + leadership: writing several documents to describe the different parts of the architecture and how they all interact with each other, aligning the different teams working on its construction, selling/presenting the project to the different leaders within our XNFs which will interact with the team, make sure the system is easy to extend and quick to use and author events, etc.
I am also working on efficiency related projects for the warehouse, similar to what I did @ Meta.

core datasetsData Qualitysystem architectureefficiencylogging systemData Engineering+1

Meta

4 roles

Senior Technical Lead Manager

Promoted

Aug 2022 – Aug 2023 · 1 yr

I lead the Telemetry Area for Reality Labs, making sure the information captured by the devices is privacy aware, performant, structured (as in with a clear declaration of what is being logged), standardized and high quality.
I also lead the Technical architectural committee for Reality Labs, the main objective of this group is to ensure our technical approaches across the org are standardized, high quality in its technical aspects and making use of already established infrastructure (ie. not reinvent the wheel).
The position requires a heaving involvement with different XFNs from the Data Infra teams and driving alignment with the different SWE teams across hardware, software and legal verticals.
In this position, I also work in the Developer Efficiency area, my team works in the creation of the data infrastructure that supports the operational and analytical needs of it. Developer efficiency is about making sure the software written by the teams in Reality Labs is high quality, the bugs are addressed in a diligent way, the right teams are hold accountable for test errors, test coverage, bug resolve time, etc.

Telemetryprivacystructured loggingtechnical architecturedeveloper efficiencyData Engineering+1

Senior Staff Data Engineer

Mar 2022 – Nov 2022 · 8 mos

I am the lead designer and tech lead for the next gen logging infrastructure/ecosystem of Reality Labs. Building Telemetry, Privacy, Structured Logging, Metadata, Differential privacy, De-identification and Policy enforcement @ Device & Warehouse level.
Creator of the data infra that powers the Integrity ML models @ News Feed (Sampling + labeling + Prevalence estimation + Feature stores for ML models)
Designer of core datasets for Facebook Video Analytics (encoding, engagement, sessions)
Team-agnostic work:
Company wide expert for Warehouse Efficiency (Designed and team lead for: Table structure auto-tuning & Bucket Pruning, Collaborator for: Z-ORDER, Range partitioning, ORC Storage layout optimization, intermediate dataset materialization, spark hyper-parameter tuning, efficiency auto-recommendation system, Developer efficiency tools, saved millions of $ for the company in efficiency work)
Author of several efficiency related articles within the company, leveraged by thousand of engineers and DSs.
Public speaker @ different DE conferences.

logging infrastructureTelemetryPrivacyDifferential privacyPolicy enforcementData Engineering+1

Staff Data Engineer

Mar 2020 – Mar 2022 · 2 yrs

Most of my work revolved about efficiency and optimization of the largest processes running in Facebook's warehouse.
I wanted to scale this effort so I created several notes that got very popular among the DE and SWE community of the company and started to work with different infra teams in the implementation of new systems and processed aimed at detecting and preventing the over-utilization of resources.
Notable projects I lead, designed and acted as tech lead:
Auto-partitioning system: a tool that finds the best partition and bucketing schema for each table, based on how the query patterns of downstream pipelines access them.
Since some of the tables are multi-petabyte with thousand of downstream consumers, the savings have been in the millions of USD, because of CPU savings mostly, but also there is the benefit of improved wall time and data arriving much earlier to dashboards and reports.
LORAN: a system that efficiently sample datasets @ scale. Since most longitudinal analyses don't require all the data, why not stored it sampled in a way that respect privacy? This system is still powering the core datasets (sampled) at Meta Videos
Auto-skew detector & remover: A system that detects skew and removes it. It shines at destroying the hardest incarnations of skewness, like multi-skewed joins (when both tables have skew in different columns) or skewness due to NULLS or default values.
Query Parsers: To extract info about the ways the tables and its columns are used across the warehouse. This is a very efficient system that processes about 5M rows per day.
Join table recommender: an ML model that tells you what are the better ways of joining two or more tables, based on the historical info on previous queries. My part here was providing the data the team that created the ML model needed.

efficiencyoptimizationauto-partitioningsamplingquery parsersData Engineering+1

Senior Data Engineer

Nov 2016 – Mar 2020 · 3 yrs 4 mos

First DE in Feed Integrity, creator of the data infra that powers the Integrity ML models @ News Feed. I designed and implemented the systems that sample the content (Videos, posts, photos, comments, etc) based on different methodologies. Also designed and implemented the systems that send the samples for rater evaluation, consolidates the ratings, creates prevalence estimations and prepare the data for ML consumption. All of this for dozens of countries and languages.
I worked in defining ways to capture the data that allowed the company to measure different categories of problems: Misinformation, Content Quality, URL Quality, Polarization, Hateful Speech, Ads Farms and Withholding, and to prepare that data so it could be used in the creation of ML models.
Also worked on the creation of an internal system that, for any given piece of content, generate different 'goodness' scores, in real time, which are later used to rank the content in Feed.
In Stories, I created the datasets to power the Performance, Quality and Efficiency efforts for the product and expanded it for other features within Feed.

data infrastructuresampling methodologiesML modelscontent evaluationData EngineeringMachine Learning