Ben Burtenshaw

Head of Design

Antwerp, Flemish Region, Belgium13 yrs 7 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Expert in developing educational AI courses.
Strong background in Natural Language Processing and Machine Learning.
Proven track record in community engagement and advocacy.

Stackforce AI infers this person is a Machine Learning Educator and Advocate in the AI and Data Science industry.

Contact

Skills

Core Skills

Natural Language Processing (nlp)Machine LearningMlopsData EngineeringData QualityUniversity TeachingProject ManagementData GovernancePython (programming Language)Communication

Other Skills

Cloud InfrastructureAnalytical SkillsPresentation SkillsData StructuresData StrategiesAnnotationDesign ThinkingQuantitative ResearchData VisualizationData ScienceArtificial Intelligence (AI)SQLJavaScriptPyTorchPandas (Software)

About

I contribute to the developer advocacy team by supporting the community in building AI models and applications. I help develop open-source learning communities and educational AI courses, including initiatives like the MCP Course and the LLM Course.

Experience

13 yrs 7 mos

Total Experience

1 yr 10 mos

Average Tenure

1 yr 3 mos

Current Experience

Hugging face

3 roles

Community Education

May 2025 – Present · 1 yr · Remote

Help to grow open source learning communities on the Hugging Face hub through educational AI courses. This year I've worked on these courses:
The MCP Course https://huggingface.co/mcp-course
The LLM Course https://huggingface.co/huggingface-course
The Reasoning Course https://huggingface.co/reasoning-course
The Agents Course https://huggingface.co/agents-course
The smol course https://github.com/huggingface/smol-course

Cloud InfrastructureProject ManagementAnalytical SkillsPresentation SkillsData GovernanceData Structures+26

Machine Learning Advocacy Engineer

Feb 2025 – Present · 1 yr 3 mos · Remote

Advocacy Team! I work as an MLE within the developer advocacy team at Hugging Face. I focus on helping the community to build AI models and applications.

Machine Learning Engineer

Jun 2024 – Mar 2025 · 9 mos · Remote

Argilla Team! We focus on building and sharing datasets, and helping others to do the same.
We do this by:
sharing tools for data curation https://docs.argilla.io/latest/
sharing tools for synthetic data generation https://distilabel.argilla.io/latest/
organising community projects to build datasets https://huggingface.co/data-is-better-together
sharing learning material https://huggingface.co/blog/burtenshaw/ui-finetune-llm-autotrain

Uplimit

3 roles

Instructor for Synthetic Datasets

Nov 2024 – Mar 2025 · 4 mos · Remote

This course provides an introduction to synthetic data generation techniques for fine-tuning AI models, with a focus on Large Language Models (LLMs). You'll learn how to create high-quality synthetic datasets that can be used to improve the performance and capabilities of pre-trained AI models. The course covers a range of data generation methods for various task types, including text classification, Supervised Fine-Tuning (SFT), retrieval, reranking, and Preference Tuning. You'll gain hands-on experience in generating synthetic data, and leveraging LLMs as judges for quality assessment or labelling data. Additionally, the course explores potential challenges and considerations when using synthetic data in AI development, including ensuring data diversity, maintaining data quality, addressing the lack of human involvement, and navigating restrictions in model licenses.

Instructor for Finetuning Open source LLMs

Jan 2024 – Nov 2024 · 10 mos · Remote

This course provides a practical introduction to finetuning open source large language models (LLMs, and focuses on applying models to custom and domain specific use cases. The course is intended for data scientists, machine learning engineers, and AI researchers aiming to excel in fine-tuning open-source LLMs tailored for products and applications. We'll delve into training techniques including direct preference optimization (DPO), supervised fine-tuning (SFT), and odds ratio preference optimization (ORPO).

Instructor for Productizing Open Source LLMs

Jul 2023 – Jan 2024 · 6 mos · Remote

The Productizing Open Source LLMs course is designed to provide a deep understanding of open-source LLM projects, coupled with hands-on experience in building robust applications. Week one explores the essentials of LLM applications and a comparative analysis of key open-source models. The second week focuses on evaluating and enhancing LLM outputs, addressing challenges, and customizing applications for specific needs. The final week is dedicated to safeguarding your applications, ensuring operational integrity and safe interactions. Participants will leave with a deep understanding of how to navigate the open-source landscape and build secure, high-quality LLM applications!

Argilla

Machine Learning Engineer

Feb 2024 – Jan 2025 · 11 mos · Remote

Argilla was acquired by Hugging Face in June 2024.
I joined Argilla as a Machine Learning engineer to help share open datasets for AI, and to build tools like Argilla and distilabel. Argilla is a human feedback tool for AI datasets. It lets AI engineers and domain experts collaborate on problems so that they can build high quality datasets around that problem. I worked on refactoring and simplifying the python SDK of argilla to make it easier to use at scale.
disitlabel is a python framework for generating synthethetic datasets. It's best features are that it includes research backed prompting strategies for synthetic data generation, and that it has integrations for LLM providers. I contributed to distilabel, wrote documentation, led a community project using distilabel (https://huggingface.co/blog/burtenshaw/domain-specific-datasets), and worked on developer advocacy in the emerging space of synthetic datasets.

Faktion.ai

Senior Machine Learning Engineer

May 2022 – Apr 2024 · 1 yr 11 mos · Antwerp, Flemish Region, Belgium

Trained and deployed a multimodal ML service for product categorisation that uses text, image, and structured data.
Designed and implemented an MLOps platform for a document tagging service using Feast Feature Store, Kubeflow Pipelines, Seldon Core, and Weights and Biases metric logging.
Built quality control API using Seldon core and Alibi to explain model prediction and feature reliability.
Project managed 3 Machine learning and NLP projects in parallel with 4 colleagues over 6 months, using an agile methodology and integrating with one political institution and one scale up.
Implemented two feasibility studies on state-of-the-art text generation and text quality analysis using transformers models. Published the feasibility studies using product demo's in Streamlit and FastAPI, as well as report and market study documents.

Project ManagementData QualityNatural Language Processing (NLP)MLOpsData GovernanceData Engineering+5

University of groningen

Lecturer

Mar 2022 – Dec 2022 · 9 mos · Groningen, Netherlands

Teaching Annotation for Machine Learning to students of BSc Information Science.

University TeachingAnnotationMachine LearningPresentation SkillsData StructuresAnalytical Skills

Uman

NLP Engineer

Aug 2021 – May 2022 · 9 mos · Ghent, Flemish Region, Belgium

Fresh out of my PhD, I was tasked with building a number of experimental and exciting features: Document layout parsing using Transformer models, Named Entity Extraction, and Knowledge Base Construction. These features exposed me to industrial tools and approaches like ONNX quantization, Kubernetes clusters, BigQuery data warehouses, and GRPC micro-services. A real contrast to the academic world, that I've learnt a lot from in a short space of time.

Natural Language Processing (NLP)Data EngineeringPresentation SkillsData StructuresCloud Infrastructure

University of antwerp

Course Coordinator : Python Programming Bootcamp

Mar 2020 – Nov 2021 · 1 yr 8 mos · Antwerp Area, Belgium

Tutored python skills to 20 students from a humanities background, studying MA Digital Text Analysis.
Learnt to communicate industry concepts and terminology. Developed a bespoke Python for Digital Text and Data Analysis curriculum with colleagues.
Administered and organised an intensive 1 month study program.

Python (Programming Language)CommunicationPresentation SkillsData Structures

Research foundation flanders - fwo

NLP Researcher

Jan 2018 – Feb 2022 · 4 yrs 1 mo · Brussels Area, Belgium

Researched language technology and artificial intelligence in healthcare, through conversation analytics on deep learning models.
Communicated results to scientific review board through presentation and formal written reports.
Built and maintained 2 SQL databases for evaluating machine learning pipelines, in Dutch and English.
Developed interactive dashboards for language insights from Data Science models using BI tools

Project ManagementData GovernancePresentation SkillsData StructuresAnalytical Skills

Self-employed

Freelance Software Engineer

Aug 2011 – Oct 2016 · 5 yrs 2 mos · London, England, United Kingdom

Developed and designed custom CMS websites for creatives, in collaboration with artists, designers, musicians, and researchers.
Programmed python and php to adapt interactive back-end solutions, and HTML & JavaScript to create front-end interfaces.