Tomer Gal ⚡ — CTO

Global CTO at Deloitte, of the NVIDIA Alliance. I lead our global vision and execution at the intersection of enterprise transformation, AI and accelerated computing. My mission is to bridge cutting-edge deep learning and GPU-accelerated technologies with real-world business impact—helping organizations innovate faster, smarter, and at scale. Previously heading Deloitte’s Deep Learning & Accelerated Computing practice, I have built and deployed advanced solutions leveraging CUDA, TensorRT, and AI model optimization, accelerating performance across industries. I also serve as NVIDIA’s lecturer for CUDA (C++/Python) and Deep Learning, empowering the next generation of developers and engineers. Expertise includes: Enterprise Transformation: Designing and implementing scalable architectures for AI-driven systems across cloud, hybrid, and on-premise infrastructures. Agentic systems development, LangChain/LangGraph, inferenced on NVIDIA technologies, NVIDIA NIM, etc. AI & Deep Learning: Design, development, and deployment of models for classification, detection, segmentation, tracking, depth estimation, and data compression. GPU & HPC Optimization: CUDA development, TensorRT conversion, OpenCL, OpenCV optimization, lock-free parallel programming, and SIMD SSE/AVX. Embedded Systems & Edge AI: Development across NVIDIA Jetson platforms (Nano, TX2, Xavier, Xavier NX, Orin) and heterogeneous compute environments (Xilinx Zynq, ARM, FPGA).

Stackforce AI infers this person is a leader in AI-driven enterprise solutions with a focus on high-performance computing.

Location: San Jose, California, United States

Experience: 20 yrs 6 mos

Skills

Artificial Intelligence (ai)
Cuda
Natural Language Processing (nlp)
Data Science
Parallel Computing

Career Highlights

Global CTO leading AI and enterprise transformation.
Expert in CUDA and deep learning technologies.
Lecturer empowering future developers in AI.

Work Experience

Reichman University

Head of Artificial Intelligence (6 mos)

Deloitte

Global Chief Technology Officer | NVIDIA Alliance (1 yr 5 mos)

Managing Director (2 yrs 4 mos)

Israeli Hi-Tech Association איגוד ההיי-טק הישראלי

Chairman of the AI Forum (2 yrs 9 mos)

Braude Academic College

University Lecturer & Faculty Member - Natural Language Processing with Deep Learning (2 yrs 11 mos)

NVIDIA

DLI Certified Instructor - Accelerating CUDA C++ with multiple GPUs (4 yrs 6 mos)

Fundamentals of Deep Learning Lecturer (5 yrs 7 mos)

DLI Certified Instructor - Fundamentals of Accelerated Computing with CUDA Python (7 yrs 1 mo)

DLI University Ambassador - Fundamentals of Accelerated Computing with CUDA C/C++ (7 yrs 7 mos)

DLI Certified Instructor - Fundamentals of Accelerated Computing with CUDA C/C++ (7 yrs 7 mos)

DLI Certified Instructor - Fundamentals of Deep Learning for Computer Vision (2 yrs)

NVIDIA Deep Learning Institute

DLI Certified Instructor - Deep Learning for Multiple Data Types (6 yrs 1 mo)

Israel Innovation Authority רשות החדשנות

Professional Evaluator for the Israel Innovation Authority (Office of the Chief Scientist) (4 yrs 4 mos)

Elbit Systems Ltd

Deep Learning Lecturer (4 yrs 7 mos)

Braude Academic College

Deep Learning University Lecturer (7 yrs 2 mos)

Digital Signal Processing University Lecturer (11 mos)

Theory Of Compilation University Lecturer (1 yr 8 mos)

Cloud Computing University Lecturer (2 yrs)

Android Development University Lecturer (3 yrs)

Heterogeneous Parallel Programming Course (CUDA/OpenCL) University Lecturer (10 yrs 8 mos)

HFT Algorithmic Trading

[OpTeamIzer] CUDA Development and optimizations (1 yr 6 mos)

Biosense Webster

[OpTeamIzer] Consultancy - GPU Optimization Specialist (5 mos)

CMT

[OpTeamIzer] Consultancy - Image processing, GPU Optimization Specialist (2 yrs 10 mos)

OpTeamizer Ltd.

Founder and CTO (11 yrs 2 mos)

GE Healthcare

Software Team Leader (2 yrs)

Lead Software Engineer (3 yrs 2 mos)

Technion - Israel Institute of Technology

Mentor, CS Industrial Project Course (4 mos)

M.S.T - Medical Surgery Technologies Ltd.

[OpTeamIzer] Consultancy - Image Processing Optimization Specialist (5 yrs)

Rafael Advanced Defense Systems

Lecturer - Architecture, Optimizations, Efficient Code (0 mo)

Technion - Israel Institute of Technology

Mentor, CS Industrial Project Course (4 mos)

OpTeamIzer Consultancy - Software optimization (5 mos)

Mentor, CS Industrial Project Course (4 mos)

Samsung

Optimization Specialist (0 mo)

University of Haifa

Teaching Assistant - Operating Systems (2 yrs)

Teaching assistant - Assembly (x86) (2 yrs)

Teaching Assistant - Introduction to hardware (1 yr)

Intel Corporation

Junior Architect, Architecture Team (3 yrs 11 mos)

Education

Artificial Intelligence Graduate Program at Stanford University

Phd candidate at University of Haifa

Master's Degree at University of Haifa

Bachelor's Degree at University of Haifa

Tomer Gal ⚡

CTO

San Jose, California, United States20 yrs 6 mos experience

Most Likely To SwitchAI Enabled

Key Highlights

Global CTO leading AI and enterprise transformation.
Expert in CUDA and deep learning technologies.
Lecturer empowering future developers in AI.

Stackforce AI infers this person is a leader in AI-driven enterprise solutions with a focus on high-performance computing.

Contact

tomer.gal@opteamizer.co.il LinkedIn

Skills

Core Skills

Artificial Intelligence (ai)CudaNatural Language Processing (nlp)Data ScienceParallel Computing

Other Skills

FinanceGenerative AIExecutive ManagementnvidiaDirector levelManagement ProfessionalDeep LearningComputer VisionDocker ProductsLinuxGenerative Neural NetworksBusiness TransformationOptimizationProfiling ToolsSIMD

About

Experience

20 yrs 6 mos

Total Experience

4 yrs 5 mos

Average Tenure

6 yrs 1 mo

Current Experience

Reichman university

Head of Artificial Intelligence

Nov 2025 – Present · 6 mos

Head of Artificial Intelligence for Capital Market Research

Artificial Intelligence (AI)Finance

Deloitte

2 roles

Global Chief Technology Officer | NVIDIA Alliance

Promoted

Dec 2024 – Present · 1 yr 5 mos

I oversee Deloitte’s global technical alliance with NVIDIA, engaging with senior executives and engineering leaders to co-create transformative solutions, influence product strategy, and scale delivery worldwide. By bridging strategy, innovation, and execution, I ensure that Deloitte and NVIDIA together accelerate enterprise transformation across industries. My leadership spans next-generation AI, large language models (LLMs), and high-performance computing, driving impact from silicon to service & product.

Artificial Intelligence (AI)CUDAGenerative AIExecutive Managementnvidia

Managing Director

Jan 2024 – Present · 2 yrs 4 mos

Head of the Deep Learning and Accelerated computing at Deloitte, I manage the NVIDIA alliance in Israel, leading client projects and training Deloitte colleagues in AI and accelerated computing globally. I Joined Deloitte with the acquisition of my company OpTeamizer, which I founded on 2015 and was an NVIDIA partner.

Director levelManagement ProfessionalCUDAArtificial Intelligence (AI)

Israeli hi-tech association איגוד ההיי-טק הישראלי

Chairman of the AI Forum

Aug 2023 – Present · 2 yrs 9 mos

Chairman of the AI forum at the Israeli Hi-Tech Association

Artificial Intelligence (AI)Generative AI

Braude academic college

University Lecturer & Faculty Member - Natural Language Processing with Deep Learning

Jun 2023 – Present · 2 yrs 11 mos

While Computer Vision has been a dominant force in the field of AI, NLP has recently started also to take center stage. The ability for machines to understand, interpret, generate, and interact using human language is becoming increasingly relevant in today's technologically-driven world. This is where our new course steps in.
The course provides a comprehensive understanding of NLP using deep learning, right from the foundations of word vectors to the sophisticated applications of self-attention and transformers. The curriculum is meticulously designed to balance theory and hands-on experience, featuring practical tutorials on PyTorch and engaging insights into recurrent neural networks and natural language generation.
In an era where AI is revolutionizing industries and redefining possibilities, mastering NLP is no longer a choice but a necessity. Looking forward to seeing you in class!

Artificial Intelligence (AI)Natural Language Processing (NLP)Deep Learning

Nvidia

6 roles

DLI Certified Instructor - Accelerating CUDA C++ with multiple GPUs

Nov 2021 – Present · 4 yrs 6 mos

Parallel Computing

Fundamentals of Deep Learning Lecturer

Oct 2020 – Present · 5 yrs 7 mos

Learn deep learning techniques for a range of computer vision tasks, including training and deploying neural networks. You will learn to:
Implement common deep learning workflows such as Image Classification and Object Detection.
Experiment with data, training parameters, network structure, and other strategies to increase performance and capability.
Deploy your networks to start solving real-world problems.

DLI Certified Instructor - Fundamentals of Accelerated Computing with CUDA Python

Apr 2019 – Present · 7 yrs 1 mo

NVIDIA DLI Certified Instructor for CUDA Python, teaching the course in Israel:
This course explores how to use Numba—the just-in-time, type-specializing Python function compiler to accelerate Python programs to run on massively parallel NVIDIA GPUs.
It teaches how to:
Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs).
Use Numba to create and launch custom CUDA kernels.
Apply key GPU memory management techniques.
Upon completion, attendees will be able to use Numba to compile and launch CUDA kernels to accelerate Python applications on NVIDIA GPUs.

DLI University Ambassador - Fundamentals of Accelerated Computing with CUDA C/C++

Oct 2018 – Present · 7 yrs 7 mos

Parallel Computing

DLI Certified Instructor - Fundamentals of Accelerated Computing with CUDA C/C++

Oct 2018 – Present · 7 yrs 7 mos

Parallel Computing

DLI Certified Instructor - Fundamentals of Deep Learning for Computer Vision

Oct 2018 – Oct 2020 · 2 yrs

Nvidia deep learning institute

DLI Certified Instructor - Deep Learning for Multiple Data Types

Apr 2020 – Present · 6 yrs 1 mo

NVIDIA DLI Instructor of the workshop for Deep Learning using Multiple Data Types.
The workshop learning objectives:
> Implement common deep learning workflows such as image segmentation and text generation
> Compare and contrast data types, workflows, and frameworks
> Combine deep learning-powered computer vision and natural language processing to start solving
sophisticated real-world problems that require multiple input data types

Data Science

Israel innovation authority רשות החדשנות

Professional Evaluator for the Israel Innovation Authority (Office of the Chief Scientist)

Jan 2020 – May 2024 · 4 yrs 4 mos

Professional Evaluator for the Israeli Innovation Authority, Office of the Chief Scientist (OCS).
I perform evaluations of industry funding requests in the Computer Vision Software domain.
I provide the OCS committee with information, comments and reports for the evaluation of the grant applications.

Data Science

Elbit systems ltd

Deep Learning Lecturer

Oct 2019 – May 2024 · 4 yrs 7 mos

Data ScienceComputer Vision

Braude academic college

6 roles

Deep Learning University Lecturer

Mar 2019 – Present · 7 yrs 2 mos

Lecturer of the Deep Learning course at the Ort Braude college of Software Engineering.
The course is based on Stanford's excellent deep learning course.
Course content:
1. Computer vision overview
2. Python/numpy tutorial
3. Image classification
4. Convolutional Neural Networks
5. Training Neural Networks
6. Deep Learning Hardware and Software
7. CNN Architectures (AlexNet, VGG, GoogLeNet, ResNet)
8. Recurrent Neural Networks
9. Practical Object Detection and Segmentation
10. Visualizing and Understanding
11. Video understanding

Data ScienceComputer Vision

Digital Signal Processing University Lecturer

Nov 2018 – Oct 2019 · 11 mos

Lecturer of the Digital Signal Processing course at ORT Braude, Software Engineering department

Theory Of Compilation University Lecturer

Feb 2017 – Oct 2018 · 1 yr 8 mos

Cloud Computing University Lecturer

Oct 2016 – Oct 2018 · 2 yrs

Android Development University Lecturer

Oct 2015 – Oct 2018 · 3 yrs

Lecturer of Android Development course

Heterogeneous Parallel Programming Course (CUDA/OpenCL) University Lecturer

Sep 2015 – Present · 10 yrs 8 mos

Lecturer of Heterogeneous Parallel Programming course.
The course covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, and parallel algorithm patterns.
The programming languages of the course are OpenCL and CUDA.

Parallel ComputingComputer Vision

Hft algorithmic trading

[OpTeamIzer] CUDA Development and optimizations

Jul 2015 – Jan 2017 · 1 yr 6 mos

Biosense webster

[OpTeamIzer] Consultancy - GPU Optimization Specialist

May 2015 – Oct 2015 · 5 mos

BioSense Innovation Team

Cmt

[OpTeamIzer] Consultancy - Image processing, GPU Optimization Specialist

Mar 2014 – Jan 2017 · 2 yrs 10 mos

Opteamizer ltd.

Founder and CTO

Mar 2013 – May 2024 · 11 yrs 2 mos

Make it right. Then, Make it fast!
Founder and CTO at OpTeamIzer Ltd.
OpTeamizer provides the services of mentoring, consulting and implementation of projects where an expertise of accelerating applications performance is required, or where designing the architecture for such a system needs to be planned.
Software optimization:
CPU Optimizations using SIMD SSE/AVX, parallel programming.
Usage of atomic commands, lock free data structures, etc.
Finding hotspots, detecting and analyzing of various types of bottlenecks.
GPU Development:
Design, architect and implement new systems.
GPU optimizations of OpenCL code, supporting high memory bandwidth requirements and high compute efficiency.
Mentoring and tutoring development teams, teaching a 3 days course of hands on OpenCL.

Ge healthcare

2 roles

Software Team Leader

Feb 2013 – Feb 2015 · 2 yrs

Ranked each year as a Role Model
Leading a scrum team in an agile environment, focused on reliability and performance of the Ultrasound system
Surfacing, tackling and solving complex software bugs
Designing mechanisms for preventing or early detecting software bugs according to the history of bugs found so far. Most challenging bugs were related to memory corruption.
As a result of this effort, managed to achieve the lowest crash rate that the Ultrasound scanner ever achieved. ( More than x200 improvement)
Defining the SW team workflow and introduction of Continuous Integration using Jenkins CI server
Leading the automation of stress tests, logs analysis and crash dumps, all incorporated into Jenkins

Lead Software Engineer

Nov 2009 – Jan 2013 · 3 yrs 2 mos

Ranked each year as a Role Model and was chosen to participate in a 2 years GE Excellence program.
Architect and developer of GPU OpenCL image reconstruction, unique processing data transfer rates of 3GB/s.
Parallel developer and designer in a large C++ codebase (1.5M lines of code), with more than 100 executing threads at runtime.
Architect and developer of a highly efficient parallel tissue processing framework which maximizes CPU cache utilization
Expert at analyzing hotspots/bottlenecks using Intel VTune
SIMD Programming - Acceleration of C++ image processing algorithms by using assembly like vector instructions (SSE/AVX)
Leading software projects with academic collaboration: CPU Profiler, C++ 11 mechanisms, Android voice recognition.
Mentored the employees on a wide topics related to software engineering, efficient code, etc.
Awarded for improving system lifetime by x10 by locating and fixing many memory leaks which caused memory fragmentation at the time the system was running on 32 bits.
Awarded for solving a DICOM connectivity issue where the customers experienced a very long time required for sending exams.

Technion - israel institute of technology

Mentor, CS Industrial Project Course

Feb 2013 – Jun 2013 · 4 mos · Haifa

Leading 2 students in a software project for General Electric.
Project: C++11 advanced Mechanisms.
Implementations of image processing pipeline, lock free data structures and the concept of hierarchical locking.

M.s.t - medical surgery technologies ltd.

[OpTeamIzer] Consultancy - Image Processing Optimization Specialist

Jan 2013 – Jan 2018 · 5 yrs

Data ScienceDocker ProductsParallel ComputingComputer VisionGenerative AILinux+1

Rafael advanced defense systems

Lecturer - Architecture, Optimizations, Efficient Code

Mar 2012 – Mar 2012 · 0 mo

Lecturer in a 40 hours course, covering:
CPU architecture
Writing Efficient code
SIMD programming (SSE)
Profiling using VTune
Parallel programming
Unit testing
Static Analysis

Technion - israel institute of technology

3 roles

Mentor, CS Industrial Project Course

Feb 2012 – Jun 2012 · 4 mos · Haifa

Leading 2 students in a software project for General Electric.
Project: Android voice recognition.
Incorporated Android voice recognition into a mobile android remote control application, used for controlling GE Ultrasound system.

OpTeamIzer Consultancy - Software optimization

Jan 2012 – Jun 2012 · 5 mos · Haifa

Hired by the Technion, Civil Engineering department
Improved by x100 a Flac3D software which models non-conventional wells. Without the optimization it would have been unusable due to long run time.

Mentor, CS Industrial Project Course

Feb 2011 – Jun 2011 · 4 mos · Haifa

Leading 2 students in a software project for General Electric.
Project: CPU profiler.
Using the debugger API, sampled the callstack of the running process every few milliseconds and plotted its histogram for locating hotspots.

Samsung

Optimization Specialist

Jan 2011 – Jan 2011 · 0 mo

Accelerated an existing CUDA implementation by a factor of x3

University of haifa

3 roles

Teaching Assistant - Operating Systems

Promoted

Jan 2007 – Jan 2009 · 2 yrs · Haifa

Teaching assistant - Assembly (x86)

Jan 2007 – Jan 2009 · 2 yrs · Haifa

Teaching Assistant - Introduction to hardware

Jan 2006 – Jan 2007 · 1 yr · Haifa

Intel corporation

Junior Architect, Architecture Team

Jan 2005 – Dec 2008 · 3 yrs 11 mos · Israel

Investigated and predicted the scaling of mechanisms related to the CPU performance states and sleep states.
Development using C#, Java, ASM, minimal Windows driver development and VBA(Excel) for data analysis.