Anshul Jindal

CEO

Munich, Bavaria, Germany9 yrs 4 mos experience

Key Highlights

  • Developed a framework for federating serverless platforms.
  • Created tools optimizing memory configurations for serverless functions.
  • Led a project funded by a €100,000 grant.
Stackforce AI infers this person is a Cloud Computing and Serverless Architecture expert with a focus on AI and DevOps.

Contact

Skills

Core Skills

Cloud ComputingSite Reliability EngineeringServerless ComputingWeb DevelopmentEdge Computing

Other Skills

AWS LambdaAlgorithmsAmazon EC2Amazon Web Services (AWS)AnsibleCC++CSSCloud ApplicationsData StructuresDebugging CodeDevOpsDockerDreamweaverGit

About

Before joining NVIDIA, I was a Site Reliability Engineer II at Argo AI, where I created Kubernetes platforms using various cloud services and tools, such as AKS, GKE, EKS, GitOps Flux, OIDC, and Terraform. I was also responsible for the observability stack and resolving alerts for mission-critical services. Prior to that, I completed my Ph.D. in Computer Science from Technical University of Munich, where I created a framework and a tool for federation of multiple serverless compute platforms across multi-cloud and edge-cloud continuum called FDN: Function Delivery Network. My research was published in several prestigious conferences and journals and received an outstanding paper award and a grant from the Software Campus program. I am passionate about AI, solving complex problems, and delivering reliable, scalable, and efficient solutions for cloud computing.

Experience

9 yrs 4 mos
Total Experience
2 yrs 1 mo
Average Tenure
2 yrs 6 mos
Current Experience

Nvidia

Senior Solution Architect

Dec 2023Present · 2 yrs 6 mos · Greater Munich Metropolitan Area · On-site

Argo ai

Site Reliability Engineer II

Sep 2022Nov 2023 · 1 yr 2 mos · Munich, Bavaria, Germany · Hybrid

  • Created Kubernetes platform using AKS (Azure), GKE (Google cloud) and EKS (AWS) using GitOps Flux, OIDC, and other add-ons to create multiple Kubernetes clusters for different development teams.
  • Created terraform modules for various AWS (IAM, S3, RDS, Athena, Glue, Lambda, Identity Center, EKS, VPC, Transit gatewaySubnets, Subnets, NAT Gateway) and Azure services (AKS, Flexible Postgres, VM Scale sets, Application Gateway along with WAF, Resource Groups, Storage accounts, VNets, Subnets, NAT Gateway, Vault) for the development teams.
  • Responsible for Observability stack (Grafana, Loki, Fluentbit, Promtail, Prometheus, Thanos) and resolved alerts for mission-critical services to adhere to Service Level Objectives.
  • Involved in on call rotation and multiple blameless postmortems.
Amazon Web Services (AWS)Google Cloud Platform (GCP)GrafanaPrometheus.ioCloud ComputingSite Reliability Engineering

Huawei

Research Internship

Aug 2021Oct 2021 · 2 mos · Munich, Bavaria, Germany

  • Developed three early-exiting versions and two split computing versions of Yolov5 model in the context of Edge AI for edge-cloud continuum.
  • Created a split computing pipeline (programmed in Golang) for AlexNet for Sedna in KubeEdge.
  • Created an automatic deployment infrastructure using Ansible for EdgeAI with Kubernetes, KubeEdge and Sedna.
DevOpsEdge Computing

Software campus

Project Lead Developer

Feb 2021Jul 2022 · 1 yr 5 mos

  • Recevied a grant worth of €100,000 for developing project BEHAVE – Behavioral Modeling of Application Functions in Serverless Computing. This project is part of software campus which is funded by the Federal Ministry of Education and Research (BMBF), with Huawei Munich Research Centre as industry partner.
  • https://softwarecampus.de/en/project/behave-behavioral-modeling-of-application-functions-in-serverless-computing/

Huawei

Developer

May 2020Oct 2020 · 5 mos · Munich, Bavaria, Germany

  • Developed an online memory-leak detection algorithm called Precog in Python for cloud VMs. The algorithm achieves an accuracy score of 85\% with less than half a second prediction time per VM (Deployed in production).
  • Developed multiple ML-based algorithms for detecting anomalous hypervisors in the cloud infrastructure.

Bmw group

Consultant and Developer

May 2019Nov 2019 · 6 mos · Munich, Bavaria, Germany

  • Developed a framework using PySpark, TimeScaleDB, and Grafana for automatically detecting the anomalies in BMW’s IT landscape Big data. The framework detected 90% of the anomalies and showed the most relevant features for root cause analysis in the Grafana Dashboard.
  • Scalability test of the developed framework demonstrates the reduction in training time of 100 transactions by 80% when using ten cores instead of 1 core.

Technical university munich

2 roles

Scientific Researcher (Ph.D.)

Promoted

Dec 2018Aug 2022 · 3 yrs 8 mos

  • Created a Function Delivery Network (FDN) framework in Golang for the federation of multiple serverless compute platforms (AWS Lambda, Google Cloud Functions (GCF), OpenWhisk, OpenFaaS) spread across multi-cloud and edge-cloud continuum using Virtual Kubelet. It provides users with a unified interface based on Kubernetes deployment YAML files to manage functions across multiple platforms.
  • Developed a Python monitoring tool for unifying the monitoring of multiple serverless compute platforms (AWS Lambda, GCF, OpenWhisk, OpenFaaS) based on Prometheus, CloudWatch, Google Cloud Monitoring, and Grafana.
  • Extended HAProxy for delivering FaaS functions invocations to a suitable subset of serverless compute platforms in the FDN based on function awareness and data awareness. Then developed, two load balancing algorithms: Latency-Aware, and Service Level Objective (SLO)-Aware for load-balancing the invocations across the selected subset of platforms. The SLO-Aware algorithm performed the best, and the function's P90 response time adhered to the defined SLOs.
  • Developed a Python-based tool called SLAM: SLO-Aware Memory Optimization to recommend the optimal memory configurations for FaaS functions within a serverless application that minimizes cost and meets SLO on AWS Lambda. The suggested memory configurations guarantee that more than 95% of requests are completed within the defined SLOs.
  • Created an online cloud computing exercise submission tool in Node.js used by approximately 600 master and bachelor students.
  • Developed automation of deploying multiple Kubernetes clusters in FDN spread across multi-cloud and edge-cloud continuum using Ansible, Terraform, and GitHub actions.
MongoDBNode.jsAmazon Web Services (AWS)TerraformAWS LambdaGoogle Cloud Platform (GCP)+14

Student Assistant

Mar 2017Nov 2018 · 1 yr 8 mos

  • Student Assistant at the Chair of Computer Architecture and parallel systems.
  • Developed a tool to automatically estimate and analyze the different configurations of existing cloud auto-scaling solutions.
  • Conducted cloud computing course for industry employees: https://cloud.caps.in.tum.de/index.php?lang=en
  • Developed web-based cloud computing lecture exercises automatic correction framework.

Samsung electronics

2 roles

Senior Software Engineer

Apr 2016Aug 2016 · 4 mos · Bengaluru, Karnataka, India

  • Design and development of the firmware for PCIe based NVMe Solid State Drives.
  • Developed reservation and virtualization feature (SR-IOV) for a multi-function/multi-controller architecture based Solid State Drive (Samsung's SSD, PM1725).
  • Designed and implemented various algorithms for pattern detection and prefetching the data from the flash device for optimizing the performance of the firmware for NVMe based client SSD.

Software Engineer

Jul 2014Mar 2016 · 1 yr 8 mos · Bengaluru, Karnataka, India

  • Designed and developed a python and web-based interface for automation analysis of the Crash Dumps in SSD emulator.
  • Involved in the development of an emulator of a storage controller in C++.

Education

Technical University of Munich

Doctor of Philosophy - PhD — Computer Science

Jan 2018Jan 2022

Technical University of Munich

Master's of Science — Informatics (Computer Science)

Jan 2016Jan 2018

National Institute of Technology Hamirpur-Alumni

Bachelor's degree — Computer Science and Engineering

Jan 2010Jan 2014

Delhi Public School - India

Higher Secondary Examination

Jan 1998Jan 2010

Stackforce found 100+ more professionals with Cloud Computing & Site Reliability Engineering

Explore similar profiles based on matching skills and experience