Marie Stephen Leo

CEO

Singapore, Singapore, Singapore15 yrs 7 mos experience
AI ML PractitionerAI Enabled

Key Highlights

  • Pioneered Sephora's first Generative AI chatbot.
  • Led a data team that doubled in size.
  • Achieved top writer status in AI on Medium.
Stackforce AI infers this person is a Data Science and AI leader in the Retail industry.

Contact

Skills

Core Skills

Data ScienceMlopsGenerative Ai

Other Skills

Data EngineeringProduct AnalyticsAWSGoogle CloudDBTKafkaAirflowSparkKubernetesSQLFastAPIAzure OpenAILangchainTransformers4RecAWS Personalize

About

I post regularly on Data & AI topics including MLOps and Generative AI. If you like my posts, let's connect here on Linkedin or on Twitter @MarieStephenLeo! - I lead the Data organization covering Data Engineering, Data Science, MLOps, and Product Analytics for Sephora SEA, ANZ and India. - I’ve pioneered generative AI technologies at Sephora, building the world’s first customer-facing virtual beauty advisor, which is currently deployed to Sephora mobile app customers in Singapore. I’ve also been instrumental in developing hyper-personalized session-based recommendation engines for several double-digit percent uplifts. - Under my leadership, the team has more than doubled, developing a comprehensive Lambda data platform with medallion ELT architecture on Google Cloud, DBT, and Kafka - I have 15+ years of experience in generating measurable value from data assets while managing teams of Data Scientists & Engineers. - I achieved Top Writer in Artificial Intelligence on Medium with 1000+ followers. - I'm a LinkedIn Top Voice with 15K+ followers. - I've published in ACL (Association of Computational Linguistics, Jul 2020) and have one IP (IPCOM000266868D, Aug 2021) Some of my technical work: - My Medium Blog: https://stephen-leo.medium.com - My talk at Quantum Black Meetup on Generative AI for production customer facing applications: https://www.slideshare.net/StephenLeo7/from-lab-to-life-lessons-from-developing-and-deploying-real-world-llm-applications-8f6b - My talk at Quantum Black Meetup on Weak Supervision: https://www.slideshare.net/StephenLeo7/weak-supervisionpdf - My open-source Github repositories -- 300 🌟 on Github: https://github.com/stephenleo/cship -- 150+🌟 on Github: https://github.com/stephenleo/llm-structured-output-benchmarks -- 80+ 🌟 on Github: https://github.com/stephenleo/stripnet - Please take a look at my Linkedin Recommendations below to see what others are saying about me!

Experience

15 yrs 7 mos
Total Experience
3 yrs 11 mos
Average Tenure
3 yrs 8 mos
Current Experience

Sephora

Data Director, APAC

Sep 2022Present · 3 yrs 8 mos · Singapore

  • I lead the Data organization covering Data Engineering, Data Science, MLOps, and Product Analytics for Sephora SEA, ANZ and India
  • Under my leadership, the data team has
  • 1. Developed several batch and real-time ELT data pipelines following a lambda data architecture with Kimball dimensional modeling and medallion architecture leveraging Airflow, DBT, Kafka, Spark, and Kubernetes.
  • 2. Built the MLOps platform and deployed Sephora's first real-time session-based recommendation engine with Transformers4Rec, AWS Personalize, and NVIDIA Triton inference for double-digit percent metrics improvement.
  • 3. Developed LVMH group's first customer-facing Generative AI chatbot: Sephora Beauty Bae, a virtual beauty advisor, by implementing Retrieval Augmented Generation with Azure OpenAI, Google's Gemini, and Langchain as a FastAPI microservice.
  • 4. Deployed widely successful SQL-based self-service real-time data ingestion and reverse-ETL tools with Terraform and a host of Google Cloud and open-source technologies.
  • 5. Led cloud FinOps monitoring and control, collaborating with multiple BUs to optimize cloud usage, reducing raw cloud costs by a double-digit percentage.
  • 6. Migrated GTM from client-side to server-side for several % main thread blocking time speed up.
  • 7. More than doubled in headcount and absorbed additional portfolios.
  • 8. Achieved engagement survey scores higher than the company average.
  • 9. Instated monthly technical sharing to share the wonders of modern data & AI technologies to wider audience within Sephora.
  • 10. Contributed small updates to various open-source projects like LiteLLM, Kubeflow Pipelines, etc
  • I'm also actively engaged in presenting ML topics in several forums.
  • 1. Presented to data professionals across LVMH Maisons at the LVMH Data/AI summit in Paris
  • 2. Presented to the entire Sephora SEA, Oceania, and Korea audience in the Sephora learning week
  • 3. Gave a talk for the Generative AI fast-track e-learning course for all LVMH employees
Generative AIData EngineeringData ScienceMLOpsProduct AnalyticsAWS+7

General assembly

Data Science Instructor (Part Time)

Nov 2021Dec 2022 · 1 yr 1 mo · Singapore

  • I co-instructed a part-time Data Science immersive BootCamp. I mainly cover topics related to NLP, Deep Learning and Big Data processing.
  • I created new lessons on MLOps on the cloud using MLFlow, Flask, Docker and Google Cloud Run for severless cloud deployment of ML models and Streamlit on Streamlit Cloud as a UI to gather inputs from users, post requests to the API and display the model predictions returned by the API
Machine LearningNLPDeep LearningMLOpsFlaskDocker+2

Edelman data & intelligence (dxi)

Director of Data Science APAC

Jun 2021Sep 2022 · 1 yr 3 mos · Singapore

  • I manage a team of Data Scientists to research, architect, develop and scale cutting-edge Machine Learning products from concept to production using cloud technologies. Under my leadership at Edelman, the team has grown by 3X and covers the entire end-end product development lifecycle.
  • Personally, I'm a hands-on leader who loves to code! I have significant expertise in the full stack of Data Science product development such as:
  • 1. ETL pipelines for near real-time as well as batch big data processing
  • 2. Deep Learning research, model development, tuning, and deployment for both real-time and batch processing in AWS and GCP
  • 3. Serverless microservices-based backend development using Docker containers and Cloud technologies such as Lambda, API Gateway, Cloud Run, etc
  • 4. Frontend development using Plotly Dash, Bootstrap, CSS and Streamlit
  • 5. MLOps using AWS Sagemaker, GCP VertexAI and other technologies
  • I'm also actively engaged in presenting ML products and services to clients. I love explaining complex ML concepts in simple and easy-to-understand language with lots of visuals!
Machine LearningDeep LearningMLOpsAWSGCPDocker+2

Tokopedia

Data Science Lead

Nov 2019Jun 2021 · 1 yr 7 mos · Singapore

  • Responsible for architecting and scaling various ML/DL solutions using Google Cloud Platform tools such as AI Platform notebooks, training jobs, model endpoints, Compute Engine, Big Query, GCS, Dataflow & Dataproc.
  • Developed an ultra-scalable product matching and anomaly detection service using Sentence Transformers NLP and distributed ANN technologies including Elasticsearch, Milvus, and GCP SCANN.
  • Architected and deployed RESTful microservices using Docker, GCP Compute Engine, and Seldon service on GKE.
  • Working-level knowledge in various Agile software development tools (Jira, Confluence, Github), CI/CD (TravisCI, Jenkins) and DevOps (Terraform, Ansible).
  • Approximate Nearest Neighbors (ANN):
  • Achieved >85% Recall at 98% Blocking within 500ms for 50Million product pairs and <20ms latency for K=100 NN query across 30Million unique products.
  • Architected and developing an enterprise-level “Embedding Feature Store” with 10ms vector retrieval and 20ms ANN search across multiple images, text, and clickstream domains.
  • Contributed Opendistro benchmarking to ANN-Benchmarks open-source GitHub repository.
  • Wrote multiple medium posts on ANN topics with total ~30K views and >30stars on Github.
  • Natural Language Processing (NLP):
  • Experienced in various SoTA NLP techniques and frameworks including FastText, and BERT.
  • Trained large-scale deep learning neural networks in distributed Tensorflow for name-gender prediction on 40Mil names with 1.5% CTR improvement and product reviews analytics on 200Mil reviews and QnA with 0.32% CVR improvement.
  • Co-developed an in-house Python package for Natural Language Pre-Processing of Bahasa Indonesian and English text which is widely adopted within the internal DS community.
  • Co-authored a paper that was accepted to the International ACL2020 conference in Seattle.
  • Top10 code contributor to the DS GitHub repository.
Machine LearningNLPGoogle CloudElasticsearchDockerData Science

Micron technology

4 roles

Data Science Manager

Promoted

Sep 2017Oct 2019 · 2 yrs 1 mo

  • Managed team of data engineers/scientists to develop production full-stack engineering solutions including ETL, analytics, and visualization in Hadoop ecosystem, SQL, Teradata, Tableau, and Web.
  • Developed showcase projects that enabled Micron to be one of Singapore’s first factories to be awarded the prestigious World Economic Forum Lighthouse Network for leadership in I4.0 technology.
  • Engaged cross-functional business stakeholders to scope and propose data-driven projects for both operational efficiency improvement as well as advanced A.I. analytics.
  • Own, prioritize, plan, and execute a portfolio of data science, data engineering, BI, and big data projects.
  • Won the prestigious “Idea of the Quarter” award for an innovative algorithm developed that reduces 90% of Type1 errors in A/B testing data analysis and results in 80% faster decision making.
  • Led several Tableau dashboarding projects to monitor and drill down into root cause of issues saving > $100M in 2018.
  • Led the investigation and optimization of an HDFS, HBase, Hive, Spark, and Python data pipeline that successfully reduced missing data rate from 90% to 2% within four weeks. Implemented additional logging and built telemetry dashboards to monitor data quality continuously.
  • Led a Deep Learning based image similarity to mimic human behavior while recognizing patterns that could accurately catch process root cause defects which are responsible for 30-40% yield loss issues using Python, bash, HBase, and Spark.
  • Leading a Machine Learning based virtual metrology effort to reduce costly actual product measurements using Python, bash, NIFI, and HBase with est $1M+ savings.
  • Developed a DOE data analysis reporting portal using Python, SQL, bash, and HTML. Coded up an analysis tool to use Natural Language Processing on job application resumes and rank top candidates based on suitability to Job Description.
Machine LearningDeep LearningData EngineeringPythonSQLData Science

Senior Engineer Lead (E4)

Promoted

Mar 2013Aug 2017 · 4 yrs 5 mos

  • Implemented innovative test programs for quality, yield, and test time improvement.
  • Developed novel data analytics automation for test results using Bash, Python, JMP Script, and HTML that reduced the time spent on data analytics from several hours to several minutes.
  • Lead several analytics product integration projects using a combination of statistical data analytics including ANOVA, KW, regression, classification and on product DOE testing. Developed and monitored several KPI dashboards and automated emails.
Machine LearningData EngineeringPythonSQLData Science

Senior Engineer (E3)

Promoted

Feb 2012Feb 2013 · 1 yr

Machine LearningData SciencePython

Engineer

Jun 2010Jan 2012 · 1 yr 7 mos

Machine LearningData SciencePython

Education

National University of Singapore

Master of Science — Electrical Engineering

Jan 2009Jan 2010

Pondicherry Engineering College

Bachelor of Technology (B.Tech.) — Electronics and Communication Engineering

Jan 2004Jan 2008

Stackforce found 100+ more professionals with Data Science & Mlops

Explore similar profiles based on matching skills and experience