Anil Katti

Co-Founder

San Francisco, California, United States19 yrs 5 mos experience

AI ML PractitionerAI Enabled

Key Highlights

Led modernization of Apple's ML inference stack.
Developed advanced on-device ML features for Apple products.
Co-founded a startup focused on robotics education.

Stackforce AI infers this person is a Machine Learning and Software Engineering expert in the Technology sector.

Contact

Skills

Core Skills

Machine LearningArtificial Intelligence

Other Skills

Engineering LeadershipSoftware EngineeringCore MLAlgorithmsCC++Video CompressionDigital Image ProcessingParallel AlgorithmsCachingComputer ScienceOperating SystemsVideo StandardsJavaScriptData Structures

About

I am passionate about solving complex engineering problems and building great products. At Apple, my north star has been making machine learning and artificial intelligence accessible to more developers on our platforms. In the past, I have worked extensively on video coding, image processing, and parallel algorithms. As an engineering leader, I drive clarity during uncertain times and keep teams motivated and focused. I promote open culture and advocate a decision making framework built around optimizing user experience. I have recruited top-notch engineers and built great teams within Apple. I push for excellence while prioritizing team well-being and health.

Experience

19 yrs 5 mos

Total Experience

2 yrs 11 mos

Average Tenure

1 yr 9 mos

Current Experience

South park commons

2 roles

Founder Fellow

Apr 2025 – Present · 1 yr 1 mo

Member

Nov 2024 – Present · 1 yr 6 mos

Uttara labs

Co-Founder, CTO

Aug 2024 – Present · 1 yr 9 mos · San Francisco Bay Area · Remote

Apple

3 roles

Senior Engineering Manager, On Device ML

Promoted

Jan 2022 – Jul 2024 · 2 yrs 6 mos

In this role I led initiatives to modernize and unify our inference stack centered around a new model format, intermediate representation and a ML compiler stack. This effort involved close collaboration with teams across AIML, Video Engineering and Software Engineering to provide a solid platform and ecosystem to accelerate the journey from research to production.
My organization contributed significantly to the development of on-device infrastructure supporting Apple Intelligence features like Writing Tools, Image Playgrounds, App Intent, and Predictive Code Completion in Xcode. I was fortunate to represent the collective efforts of hundreds of talented engineers at WWDC 2024.
As the engineering leader for Core ML and its underlying frameworks at Apple, I was driven by a commitment to serving our amazing clients. We enabled advanced on-device experiences in apps like Camera, Keyboard, Siri, and other first-party services. We also supported third-party applications from companies like Adobe and Meta and thousands more helping them leverage Apple’s powerful hardware for on-device machine learning. Additionally, I spearheaded collaborations with Meta on the ExecuTorch Core ML integration and partnered with Hugging Face to revitalize Core ML’s open-source initiatives.

Engineering LeadershipMachine LearningArtificial Intelligence

Engineering Manager, CoreML

Jan 2019 – Jan 2022 · 3 yrs

Lead a team of engineers responsible for Core ML, Apple’s on-device inference / training framework. Shipped key public features including on-device training, model deployment, encryption and the new model package format.

Senior Software Engineer, IMG

Sep 2015 – Jan 2019 · 3 yrs 4 mos

HLS is Apple’s media streaming technology and FPS is Apple’s content protection technology. Shipped key public features in Apple’s content protection stack including support for secure offline key management, dual expiry windows, secure key invalidation, secure stop for auditing, key preloading, encrypted media extensions on WebKit, HDCP monitoring and enforcement. Worked on different aspects of player stack to support features like offline playback, HEVC support, player pre-warming, HLS stream validation tool and built a heads-up display for visualizing streaming performance statistics.

Cisco systems (scientific atlanta)

Software Engineer

Jul 2011 – Sep 2015 · 4 yrs 2 mos · Greater Atlanta Area

Shipped key features in AnyRes, a 4Kp60 real-time HEVC encoder including hierarchical motion estimation and input / reference picture buffer management. Architected modular encoder design to achieve real-time performance by exploiting CTU row-level and video frame-level parallelism. Built in-house HEVC bitstream analyzer to help with algorithm development. The tool overlaid CTU partitions, prediction units, intra modes, inter mode directions, and motion vectors on reconstructed video frames for visualizing algorithm eﬀiciency. Devised and ran experiments to assess subjective quality of compressed video to evaluate algorithm eﬀiciency.

The university of texas at austin

2 roles

Graduate Student

Promoted

Aug 2009 – May 2011 · 1 yr 9 mos · Austin, Texas Area

Courses: Algorithms, Parallel Algorithms, Digital Image and Video Processing, Introduction to Cognitive
Sciences, Distributed Computing, Operating Systems Implementation, Autonomous Robots
Major: Theoretical Computer Science

Graduate Research Assistant

Aug 2009 – May 2011 · 1 yr 9 mos · Austin, Texas Area

Worked with Prof. Vijaya Ramachandran on cache replacement strategies for Multi-Core processors. Involved in extensive theoretical research and published the work in IPDPS 2012 (Top Tier CS conference). A copy of my thesis is here.
Abstract:
We consider cache replacement algorithms at a shared cache in a multicore system which receives an arbitrary interleaving of requests from processes that have full knowledge about their individual request sequences. We establish tight bounds on the competitive ratio of deterministic and randomized cache replacement strategies when processes share memory blocks. Our main result for this case is a deterministic algorithm called global-maxima which is optimum up to a constant factor when processes share memory blocks. Our framework is a generalization of the application controlled caching framework in which processes access disjoint sets of memory blocks. We also present a deterministic algorithm called rr-proc-mark which exactly matches the lower bound on the competitive ratio of deterministic cache replacement algorithms when processes access disjoint sets of memory blocks. We extend our results to multiple levels of caches and prove that an exclusive cache is better than both inclusive and non-inclusive caches; this validates the experimental findings in the literature. Our results could be applied to shared caches in multicore systems in which processes work together on multithreaded computations like Gaussian elimination paradigm, fast Fourier transform, matrix multiplication, etc. In these computations, processes have full knowledge about their individual request sequences and can share memory blocks.

Vmlogix

Software Engineer

Apr 2008 – Jul 2009 · 1 yr 3 mos · Bengaluru Area, India

Development of key features in virtual lab automation software. Focused on implementation of IP fencing feature and GuestAgent.

Techsouls

Co-Founder

Jul 2007 – Mar 2008 · 8 mos · Bengaluru Area, India

With a vision to make robotics education accessible in India, developed affordable robotic kits using Arduino, an open-source prototyping platform. Worked on creating robotics simulation software, course material and conducted workshops in engineering schools.