Diptanu Gon Choudhury

Founder

San Francisco, California, United States19 yrs 3 mos experience
AI ML PractitionerAI Enabled

Key Highlights

  • Expert in Distributed Systems and AI/ML integration.
  • Led development of scalable systems at LinkedIn and Facebook.
  • Co-founded innovative tech startups with open-source contributions.
Stackforce AI infers this person is a highly skilled engineer in AI/ML and Distributed Systems, focusing on scalable cloud solutions.

Contact

Skills

Core Skills

Distributed SystemsAi/mlCloud ComputingEmbedded Systems

Other Skills

structured extraction engineknowledge graphsLLM applicationscluster schedulerKubernetesdistributed shard managementHelixreal time speech recognitionspeech modelsoptimization of PyTorch kernelsNomadenterprise-scale systemsContinuous DeploymentAgile practicesUser Interface library

About

I work in the intersection of Distributed Systems and AI/ML. I have worked on Facebook's Machine Learning/AI training platform FBLearner, some larger scale inference services which serve NLP models for online inference, large scale data preparation and feature serving service and on training and optimizing PyTorch NLP models to run efficiently on servers. In the past I have built large scale distributed cluster schedulers Nomad and Titus at HashiCorp and Netflix. I have also built foundational cloud infrastructure software such as highly available service discovery systems, traffic routing services, RPC libraries, etc. Papers - [1] Cross Lingual LID using Self Supervised Learning - https://arxiv.org/pdf/2107.04082.pdf [2] Designing cluster schedulers for internet-scale services - https://queue.acm.org/detail.cfm?id=3199609 [3] XDP - Programable Data Path in the Kernel - https://www.usenix.org/system/files/login/articles/login_spring18_05_choudhury.pdf Talks - [1] Chaos Engineering and design patterns for building highly available services https://www.youtube.com/watch?v=sYlWtTbpHQI [2] Reliably shipping containers in a resource rich world using Titan https://www.youtube.com/watch?v=V3OfAATYksM [3] https://www.youtube.com/watch?v=gnNp0t2JAjg [4] https://www.youtube.com/watch?v=Blh608p2M1E [5] https://speakerdeck.com/diptanu/distributed-scheduling-with-apache-mesos-in-the-cloud

Experience

19 yrs 3 mos
Total Experience
2 yrs 5 mos
Average Tenure
2 yrs 5 mos
Current Experience

Tensorlake

Founder

Dec 2023Present · 2 yrs 5 mos

  • We are building a structured extraction engine to build near-real time indexes and knowledge graphs from unstructured multi-modal data for LLM based applications.
structured extraction engineknowledge graphsLLM applicationsDistributed SystemsAI/ML

Linkedin

Senior Staff Software Engineer

May 2023Dec 2023 · 7 mos

  • I led the initiative to build a scalable cluster scheduler for LinkedIN's internal data systems like Espresso, Liquid and Kafka. The cluster scheduler introduced cooperative scheduling between Kubernetes and LinkedIN's distributed shard management system called Helix. I designed the system from the ground up, the protocol between the scheduler and the shard management system, bootstrapped the team, our onboarding experience, etc.
cluster schedulerKubernetesdistributed shard managementHelixDistributed SystemsCloud Computing

Facebook

Software Engineer, Facebook AI Applied Research

Feb 2017May 2023 · 6 yrs 3 mos · Menlo Park, California

  • Worked on building Facebook's real time speech recognition inference engine that transcribes videos and audio on Instagram and Facebook. Led development and research on various speech models such as Language ID(LID), Voice Actitivity Detector(VAD) and other ASR models. While I was in the Speech organization I also helped with making speech models faster by leading projects related to optimization of pytorch kernels for transformers and attention layers. Prior to working on Speech I worked on modernizing the graph job scheduler that powers Facebook's Machine Learning platform FBLearner.
real time speech recognitionspeech modelsoptimization of PyTorch kernelsAI/MLDistributed Systems

Hashicorp

Senior Software Engineer

Nov 2015Feb 2017 · 1 yr 3 mos

  • I was one of the early engineers at HashiCorp and co-led the development of the open-source Nomad cluster scheduler.
Nomadcluster schedulerDistributed Systems

Netflix

Senior Software Engineer, Cloud Platform Engineering

Oct 2013Nov 2015 · 2 yrs 1 mo

Thoughtworks

Senior Consultant

Nov 2009Aug 2013 · 3 yrs 9 mos · London, United Kingdom

  • I am a consultant and an individual contributor at ThoughtWorks Europe. I have engaged with multiple clients in designing enterprise-scale systems. I have also participated in engagements. which required enabling the clients with Continuous Deployment, Test Driven Development and other Agile practices.
enterprise-scale systemsContinuous DeploymentAgile practicesCloud Computing

Cisco

Software Engineer

Jun 2008Nov 2009 · 1 yr 5 mos

  • At NDS[acquired by Cisco] my work involved developing and maintaining a User Interface library and a middleware called MediaHighway targeted for Set top boxes. The UI library is completely written in Java and is based on MVC architecture. The middleware is based in C and the UI library interacts with the Middleware using JNI.
User Interface librarymiddlewareSet top boxesEmbedded Systems

Lifeoz

Co-founder

Dec 2006Jul 2008 · 1 yr 7 mos

  • We created a GIS based social networking system using mashups. Lifeoz connects the space and time variables of a user. It shows the time-line of a person's life.
  • Lifeoz was completely built using Open source components. A lot of code has been contributed to the open source community as well.
GISsocial networkingOpen source componentsCloud Computing

Education

National Institute of Technology, Jalandhar

B-Tech — Electronics and Communication Engineering

Jan 2004Jan 2008

Stackforce found 100+ more professionals with Distributed Systems & Ai/ml

Explore similar profiles based on matching skills and experience