Tomer Gal ⚡

CTO

San Jose, California, United States20 yrs 6 mos experience
Most Likely To SwitchAI Enabled

Key Highlights

  • Global CTO leading AI and enterprise transformation.
  • Expert in CUDA and deep learning technologies.
  • Lecturer empowering future developers in AI.
Stackforce AI infers this person is a leader in AI-driven enterprise solutions with a focus on high-performance computing.

Contact

Skills

Core Skills

Artificial Intelligence (ai)CudaNatural Language Processing (nlp)Data ScienceParallel Computing

Other Skills

FinanceGenerative AIExecutive ManagementnvidiaDirector levelManagement ProfessionalDeep LearningComputer VisionDocker ProductsLinuxGenerative Neural NetworksBusiness TransformationOptimizationProfiling ToolsSIMD

About

Global CTO at Deloitte, of the NVIDIA Alliance. I lead our global vision and execution at the intersection of enterprise transformation, AI and accelerated computing. My mission is to bridge cutting-edge deep learning and GPU-accelerated technologies with real-world business impact—helping organizations innovate faster, smarter, and at scale. Previously heading Deloitte’s Deep Learning & Accelerated Computing practice, I have built and deployed advanced solutions leveraging CUDA, TensorRT, and AI model optimization, accelerating performance across industries. I also serve as NVIDIA’s lecturer for CUDA (C++/Python) and Deep Learning, empowering the next generation of developers and engineers. Expertise includes: Enterprise Transformation: Designing and implementing scalable architectures for AI-driven systems across cloud, hybrid, and on-premise infrastructures. Agentic systems development, LangChain/LangGraph, inferenced on NVIDIA technologies, NVIDIA NIM, etc. AI & Deep Learning: Design, development, and deployment of models for classification, detection, segmentation, tracking, depth estimation, and data compression. GPU & HPC Optimization: CUDA development, TensorRT conversion, OpenCL, OpenCV optimization, lock-free parallel programming, and SIMD SSE/AVX. Embedded Systems & Edge AI: Development across NVIDIA Jetson platforms (Nano, TX2, Xavier, Xavier NX, Orin) and heterogeneous compute environments (Xilinx Zynq, ARM, FPGA).

Experience

20 yrs 6 mos
Total Experience
4 yrs 5 mos
Average Tenure
6 yrs 1 mo
Current Experience

Reichman university

Head of Artificial Intelligence

Nov 2025Present · 6 mos

  • Head of Artificial Intelligence for Capital Market Research
Artificial Intelligence (AI)Finance

Deloitte

2 roles

Global Chief Technology Officer | NVIDIA Alliance

Promoted

Dec 2024Present · 1 yr 5 mos

  • I oversee Deloitte’s global technical alliance with NVIDIA, engaging with senior executives and engineering leaders to co-create transformative solutions, influence product strategy, and scale delivery worldwide. By bridging strategy, innovation, and execution, I ensure that Deloitte and NVIDIA together accelerate enterprise transformation across industries. My leadership spans next-generation AI, large language models (LLMs), and high-performance computing, driving impact from silicon to service & product.
Artificial Intelligence (AI)CUDAGenerative AIExecutive Managementnvidia

Managing Director

Jan 2024Present · 2 yrs 4 mos

  • Head of the Deep Learning and Accelerated computing at Deloitte, I manage the NVIDIA alliance in Israel, leading client projects and training Deloitte colleagues in AI and accelerated computing globally. I Joined Deloitte with the acquisition of my company OpTeamizer, which I founded on 2015 and was an NVIDIA partner.
Director levelManagement ProfessionalCUDAArtificial Intelligence (AI)

Israeli hi-tech association איגוד ההיי-טק הישראלי

Chairman of the AI Forum

Aug 2023Present · 2 yrs 9 mos

  • Chairman of the AI forum at the Israeli Hi-Tech Association
Artificial Intelligence (AI)Generative AI

Braude academic college

University Lecturer & Faculty Member - Natural Language Processing with Deep Learning

Jun 2023Present · 2 yrs 11 mos

  • While Computer Vision has been a dominant force in the field of AI, NLP has recently started also to take center stage. The ability for machines to understand, interpret, generate, and interact using human language is becoming increasingly relevant in today's technologically-driven world. This is where our new course steps in.
  • The course provides a comprehensive understanding of NLP using deep learning, right from the foundations of word vectors to the sophisticated applications of self-attention and transformers. The curriculum is meticulously designed to balance theory and hands-on experience, featuring practical tutorials on PyTorch and engaging insights into recurrent neural networks and natural language generation.
  • In an era where AI is revolutionizing industries and redefining possibilities, mastering NLP is no longer a choice but a necessity. Looking forward to seeing you in class!
Artificial Intelligence (AI)Natural Language Processing (NLP)Deep Learning

Nvidia

6 roles

DLI Certified Instructor - Accelerating CUDA C++ with multiple GPUs

Nov 2021Present · 4 yrs 6 mos

Parallel Computing

Fundamentals of Deep Learning Lecturer

Oct 2020Present · 5 yrs 7 mos

  • Learn deep learning techniques for a range of computer vision tasks, including training and deploying neural networks. You will learn to:
  • Implement common deep learning workflows such as Image Classification and Object Detection.
  • Experiment with data, training parameters, network structure, and other strategies to increase performance and capability.
  • Deploy your networks to start solving real-world problems.

DLI Certified Instructor - Fundamentals of Accelerated Computing with CUDA Python

Apr 2019Present · 7 yrs 1 mo

  • NVIDIA DLI Certified Instructor for CUDA Python, teaching the course in Israel:
  • This course explores how to use Numba—the just-in-time, type-specializing Python function compiler to accelerate Python programs to run on massively parallel NVIDIA GPUs.
  • It teaches how to:
  • Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs).
  • Use Numba to create and launch custom CUDA kernels.
  • Apply key GPU memory management techniques.
  • Upon completion, attendees will be able to use Numba to compile and launch CUDA kernels to accelerate Python applications on NVIDIA GPUs.

DLI University Ambassador - Fundamentals of Accelerated Computing with CUDA C/C++

Oct 2018Present · 7 yrs 7 mos

Parallel Computing

DLI Certified Instructor - Fundamentals of Accelerated Computing with CUDA C/C++

Oct 2018Present · 7 yrs 7 mos

Parallel Computing

DLI Certified Instructor - Fundamentals of Deep Learning for Computer Vision

Oct 2018Oct 2020 · 2 yrs

Nvidia deep learning institute

DLI Certified Instructor - Deep Learning for Multiple Data Types

Apr 2020Present · 6 yrs 1 mo

  • NVIDIA DLI Instructor of the workshop for Deep Learning using Multiple Data Types.
  • The workshop learning objectives:
  • > Implement common deep learning workflows such as image segmentation and text generation
  • > Compare and contrast data types, workflows, and frameworks
  • > Combine deep learning-powered computer vision and natural language processing to start solving
  • sophisticated real-world problems that require multiple input data types
Data Science

Israel innovation authority רשות החדשנות

Professional Evaluator for the Israel Innovation Authority (Office of the Chief Scientist)

Jan 2020May 2024 · 4 yrs 4 mos

  • Professional Evaluator for the Israeli Innovation Authority, Office of the Chief Scientist (OCS).
  • I perform evaluations of industry funding requests in the Computer Vision Software domain.
  • I provide the OCS committee with information, comments and reports for the evaluation of the grant applications.
Data Science

Elbit systems ltd

Deep Learning Lecturer

Oct 2019May 2024 · 4 yrs 7 mos

Data ScienceComputer Vision

Braude academic college

6 roles

Deep Learning University Lecturer

Mar 2019Present · 7 yrs 2 mos

  • Lecturer of the Deep Learning course at the Ort Braude college of Software Engineering.
  • The course is based on Stanford's excellent deep learning course.
  • Course content:
  • 1. Computer vision overview
  • 2. Python/numpy tutorial
  • 3. Image classification
  • 4. Convolutional Neural Networks
  • 5. Training Neural Networks
  • 6. Deep Learning Hardware and Software
  • 7. CNN Architectures (AlexNet, VGG, GoogLeNet, ResNet)
  • 8. Recurrent Neural Networks
  • 9. Practical Object Detection and Segmentation
  • 10. Visualizing and Understanding
  • 11. Video understanding
Data ScienceComputer Vision

Digital Signal Processing University Lecturer

Nov 2018Oct 2019 · 11 mos

  • Lecturer of the Digital Signal Processing course at ORT Braude, Software Engineering department

Theory Of Compilation University Lecturer

Feb 2017Oct 2018 · 1 yr 8 mos

Cloud Computing University Lecturer

Oct 2016Oct 2018 · 2 yrs

Android Development University Lecturer

Oct 2015Oct 2018 · 3 yrs

  • Lecturer of Android Development course

Heterogeneous Parallel Programming Course (CUDA/OpenCL) University Lecturer

Sep 2015Present · 10 yrs 8 mos

  • Lecturer of Heterogeneous Parallel Programming course.
  • The course covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, and parallel algorithm patterns.
  • The programming languages of the course are OpenCL and CUDA.
Parallel ComputingComputer Vision

Hft algorithmic trading

[OpTeamIzer] CUDA Development and optimizations

Jul 2015Jan 2017 · 1 yr 6 mos

Biosense webster

[OpTeamIzer] Consultancy - GPU Optimization Specialist

May 2015Oct 2015 · 5 mos

  • BioSense Innovation Team

Cmt

[OpTeamIzer] Consultancy - Image processing, GPU Optimization Specialist

Mar 2014Jan 2017 · 2 yrs 10 mos

Opteamizer ltd.

Founder and CTO

Mar 2013May 2024 · 11 yrs 2 mos

  • Make it right. Then, Make it fast!
  • Founder and CTO at OpTeamIzer Ltd.
  • OpTeamizer provides the services of mentoring, consulting and implementation of projects where an expertise of accelerating applications performance is required, or where designing the architecture for such a system needs to be planned.
  • Software optimization:
  • CPU Optimizations using SIMD SSE/AVX, parallel programming.
  • Usage of atomic commands, lock free data structures, etc.
  • Finding hotspots, detecting and analyzing of various types of bottlenecks.
  • GPU Development:
  • Design, architect and implement new systems.
  • GPU optimizations of OpenCL code, supporting high memory bandwidth requirements and high compute efficiency.
  • Mentoring and tutoring development teams, teaching a 3 days course of hands on OpenCL.

Ge healthcare

2 roles

Software Team Leader

Feb 2013Feb 2015 · 2 yrs

  • Ranked each year as a Role Model
  • Leading a scrum team in an agile environment, focused on reliability and performance of the Ultrasound system
  • Surfacing, tackling and solving complex software bugs
  • Designing mechanisms for preventing or early detecting software bugs according to the history of bugs found so far. Most challenging bugs were related to memory corruption.
  • As a result of this effort, managed to achieve the lowest crash rate that the Ultrasound scanner ever achieved. ( More than x200 improvement)
  • Defining the SW team workflow and introduction of Continuous Integration using Jenkins CI server
  • Leading the automation of stress tests, logs analysis and crash dumps, all incorporated into Jenkins

Lead Software Engineer

Nov 2009Jan 2013 · 3 yrs 2 mos

  • Ranked each year as a Role Model and was chosen to participate in a 2 years GE Excellence program.
  • Architect and developer of GPU OpenCL image reconstruction, unique processing data transfer rates of 3GB/s.
  • Parallel developer and designer in a large C++ codebase (1.5M lines of code), with more than 100 executing threads at runtime.
  • Architect and developer of a highly efficient parallel tissue processing framework which maximizes CPU cache utilization
  • Expert at analyzing hotspots/bottlenecks using Intel VTune
  • SIMD Programming - Acceleration of C++ image processing algorithms by using assembly like vector instructions (SSE/AVX)
  • Leading software projects with academic collaboration: CPU Profiler, C++ 11 mechanisms, Android voice recognition.
  • Mentored the employees on a wide topics related to software engineering, efficient code, etc.
  • Awarded for improving system lifetime by x10 by locating and fixing many memory leaks which caused memory fragmentation at the time the system was running on 32 bits.
  • Awarded for solving a DICOM connectivity issue where the customers experienced a very long time required for sending exams.

Technion - israel institute of technology

Mentor, CS Industrial Project Course

Feb 2013Jun 2013 · 4 mos · Haifa

  • Leading 2 students in a software project for General Electric.
  • Project: C++11 advanced Mechanisms.
  • Implementations of image processing pipeline, lock free data structures and the concept of hierarchical locking.

M.s.t - medical surgery technologies ltd.

[OpTeamIzer] Consultancy - Image Processing Optimization Specialist

Jan 2013Jan 2018 · 5 yrs

Data ScienceDocker ProductsParallel ComputingComputer VisionGenerative AILinux+1

Rafael advanced defense systems

Lecturer - Architecture, Optimizations, Efficient Code

Mar 2012Mar 2012 · 0 mo

  • Lecturer in a 40 hours course, covering:
  • CPU architecture
  • Writing Efficient code
  • SIMD programming (SSE)
  • Profiling using VTune
  • Parallel programming
  • Unit testing
  • Static Analysis

Technion - israel institute of technology

3 roles

Mentor, CS Industrial Project Course

Feb 2012Jun 2012 · 4 mos · Haifa

  • Leading 2 students in a software project for General Electric.
  • Project: Android voice recognition.
  • Incorporated Android voice recognition into a mobile android remote control application, used for controlling GE Ultrasound system.

OpTeamIzer Consultancy - Software optimization

Jan 2012Jun 2012 · 5 mos · Haifa

  • Hired by the Technion, Civil Engineering department
  • Improved by x100 a Flac3D software which models non-conventional wells. Without the optimization it would have been unusable due to long run time.

Mentor, CS Industrial Project Course

Feb 2011Jun 2011 · 4 mos · Haifa

  • Leading 2 students in a software project for General Electric.
  • Project: CPU profiler.
  • Using the debugger API, sampled the callstack of the running process every few milliseconds and plotted its histogram for locating hotspots.

Samsung

Optimization Specialist

Jan 2011Jan 2011 · 0 mo

  • Accelerated an existing CUDA implementation by a factor of x3

University of haifa

3 roles

Teaching Assistant - Operating Systems

Promoted

Jan 2007Jan 2009 · 2 yrs · Haifa

Teaching assistant - Assembly (x86)

Jan 2007Jan 2009 · 2 yrs · Haifa

Teaching Assistant - Introduction to hardware

Jan 2006Jan 2007 · 1 yr · Haifa

Intel corporation

Junior Architect, Architecture Team

Jan 2005Dec 2008 · 3 yrs 11 mos · Israel

  • Investigated and predicted the scaling of mechanisms related to the CPU performance states and sleep states.
  • Development using C#, Java, ASM, minimal Windows driver development and VBA(Excel) for data analysis.

Education

Stanford University

Artificial Intelligence Graduate Program — Artificial Intelligence

Feb 2022Feb 2025

University of Haifa

Phd candidate — Computer Science

Jan 2011Jan 2013

University of Haifa

Master's Degree — Computer Science

Jan 2006Jan 2009

University of Haifa

Bachelor's Degree — Computer Science

Jan 2002Jan 2006

Stackforce found 100+ more professionals with Artificial Intelligence (ai) & Cuda

Explore similar profiles based on matching skills and experience