Anshul Jindal

CEO

Munich, Bavaria, Germany9 yrs 4 mos experience

Key Highlights

Developed a framework for federating serverless platforms.
Created tools optimizing memory configurations for serverless functions.
Led a project funded by a €100,000 grant.

Stackforce AI infers this person is a Cloud Computing and Serverless Architecture expert with a focus on AI and DevOps.

Contact

Skills

Core Skills

Cloud ComputingSite Reliability EngineeringServerless ComputingWeb DevelopmentEdge Computing

Other Skills

AWS LambdaAlgorithmsAmazon EC2Amazon Web Services (AWS)AnsibleCC++CSSCloud ApplicationsData StructuresDebugging CodeDevOpsDockerDreamweaverGit

About

Before joining NVIDIA, I was a Site Reliability Engineer II at Argo AI, where I created Kubernetes platforms using various cloud services and tools, such as AKS, GKE, EKS, GitOps Flux, OIDC, and Terraform. I was also responsible for the observability stack and resolving alerts for mission-critical services. Prior to that, I completed my Ph.D. in Computer Science from Technical University of Munich, where I created a framework and a tool for federation of multiple serverless compute platforms across multi-cloud and edge-cloud continuum called FDN: Function Delivery Network. My research was published in several prestigious conferences and journals and received an outstanding paper award and a grant from the Software Campus program. I am passionate about AI, solving complex problems, and delivering reliable, scalable, and efficient solutions for cloud computing.

Experience

9 yrs 4 mos

Total Experience

2 yrs 1 mo

Average Tenure

2 yrs 6 mos

Current Experience

Nvidia

Senior Solution Architect

Dec 2023 – Present · 2 yrs 6 mos · Greater Munich Metropolitan Area · On-site

Argo ai

Site Reliability Engineer II

Sep 2022 – Nov 2023 · 1 yr 2 mos · Munich, Bavaria, Germany · Hybrid

Created Kubernetes platform using AKS (Azure), GKE (Google cloud) and EKS (AWS) using GitOps Flux, OIDC, and other add-ons to create multiple Kubernetes clusters for different development teams.
Created terraform modules for various AWS (IAM, S3, RDS, Athena, Glue, Lambda, Identity Center, EKS, VPC, Transit gatewaySubnets, Subnets, NAT Gateway) and Azure services (AKS, Flexible Postgres, VM Scale sets, Application Gateway along with WAF, Resource Groups, Storage accounts, VNets, Subnets, NAT Gateway, Vault) for the development teams.
Responsible for Observability stack (Grafana, Loki, Fluentbit, Promtail, Prometheus, Thanos) and resolved alerts for mission-critical services to adhere to Service Level Objectives.
Involved in on call rotation and multiple blameless postmortems.

Amazon Web Services (AWS)Google Cloud Platform (GCP)GrafanaPrometheus.ioCloud ComputingSite Reliability Engineering

Huawei

Research Internship

Aug 2021 – Oct 2021 · 2 mos · Munich, Bavaria, Germany

Developed three early-exiting versions and two split computing versions of Yolov5 model in the context of Edge AI for edge-cloud continuum.
Created a split computing pipeline (programmed in Golang) for AlexNet for Sedna in KubeEdge.
Created an automatic deployment infrastructure using Ansible for EdgeAI with Kubernetes, KubeEdge and Sedna.

DevOpsEdge Computing

Software campus

Project Lead Developer

Feb 2021 – Jul 2022 · 1 yr 5 mos

Recevied a grant worth of €100,000 for developing project BEHAVE – Behavioral Modeling of Application Functions in Serverless Computing. This project is part of software campus which is funded by the Federal Ministry of Education and Research (BMBF), with Huawei Munich Research Centre as industry partner.
https://softwarecampus.de/en/project/behave-behavioral-modeling-of-application-functions-in-serverless-computing/

Huawei

Developer

May 2020 – Oct 2020 · 5 mos · Munich, Bavaria, Germany

Developed an online memory-leak detection algorithm called Precog in Python for cloud VMs. The algorithm achieves an accuracy score of 85\% with less than half a second prediction time per VM (Deployed in production).
Developed multiple ML-based algorithms for detecting anomalous hypervisors in the cloud infrastructure.

Bmw group

Consultant and Developer

May 2019 – Nov 2019 · 6 mos · Munich, Bavaria, Germany

Developed a framework using PySpark, TimeScaleDB, and Grafana for automatically detecting the anomalies in BMW’s IT landscape Big data. The framework detected 90% of the anomalies and showed the most relevant features for root cause analysis in the Grafana Dashboard.
Scalability test of the developed framework demonstrates the reduction in training time of 100 transactions by 80% when using ten cores instead of 1 core.

Technical university munich

2 roles

Scientific Researcher (Ph.D.)

Promoted

Dec 2018 – Aug 2022 · 3 yrs 8 mos

Created a Function Delivery Network (FDN) framework in Golang for the federation of multiple serverless compute platforms (AWS Lambda, Google Cloud Functions (GCF), OpenWhisk, OpenFaaS) spread across multi-cloud and edge-cloud continuum using Virtual Kubelet. It provides users with a unified interface based on Kubernetes deployment YAML files to manage functions across multiple platforms.
Developed a Python monitoring tool for unifying the monitoring of multiple serverless compute platforms (AWS Lambda, GCF, OpenWhisk, OpenFaaS) based on Prometheus, CloudWatch, Google Cloud Monitoring, and Grafana.
Extended HAProxy for delivering FaaS functions invocations to a suitable subset of serverless compute platforms in the FDN based on function awareness and data awareness. Then developed, two load balancing algorithms: Latency-Aware, and Service Level Objective (SLO)-Aware for load-balancing the invocations across the selected subset of platforms. The SLO-Aware algorithm performed the best, and the function's P90 response time adhered to the defined SLOs.
Developed a Python-based tool called SLAM: SLO-Aware Memory Optimization to recommend the optimal memory configurations for FaaS functions within a serverless application that minimizes cost and meets SLO on AWS Lambda. The suggested memory configurations guarantee that more than 95% of requests are completed within the defined SLOs.
Created an online cloud computing exercise submission tool in Node.js used by approximately 600 master and bachelor students.
Developed automation of deploying multiple Kubernetes clusters in FDN spread across multi-cloud and edge-cloud continuum using Ansible, Terraform, and GitHub actions.

MongoDBNode.jsAmazon Web Services (AWS)TerraformAWS LambdaGoogle Cloud Platform (GCP)+14

Student Assistant

Mar 2017 – Nov 2018 · 1 yr 8 mos

Student Assistant at the Chair of Computer Architecture and parallel systems.
Developed a tool to automatically estimate and analyze the different configurations of existing cloud auto-scaling solutions.
Conducted cloud computing course for industry employees: https://cloud.caps.in.tum.de/index.php?lang=en
Developed web-based cloud computing lecture exercises automatic correction framework.

Samsung electronics

2 roles

Senior Software Engineer

Apr 2016 – Aug 2016 · 4 mos · Bengaluru, Karnataka, India

Design and development of the firmware for PCIe based NVMe Solid State Drives.
Developed reservation and virtualization feature (SR-IOV) for a multi-function/multi-controller architecture based Solid State Drive (Samsung's SSD, PM1725).
Designed and implemented various algorithms for pattern detection and prefetching the data from the flash device for optimizing the performance of the firmware for NVMe based client SSD.

Software Engineer

Jul 2014 – Mar 2016 · 1 yr 8 mos · Bengaluru, Karnataka, India

Designed and developed a python and web-based interface for automation analysis of the Crash Dumps in SSD emulator.
Involved in the development of an emulator of a storage controller in C++.