Sudip Maji

DevOps Engineer

Bengaluru, Karnataka, India10 yrs 10 mos experience
Most Likely To Switch

Key Highlights

  • Expert in building scalable distributed systems.
  • Proven track record in infrastructure engineering.
  • Strong experience in Kubernetes and cloud platforms.
Stackforce AI infers this person is a SaaS Infrastructure Engineer with expertise in cloud-native technologies.

Contact

Skills

Core Skills

Infrastructure EngineeringKubernetesPlatform Engineering

Other Skills

Amazon Web Services (AWS)Google Cloud Platform (GCP)TerraformGo (Programming Language)DevOpsAmazon EKSPlatform ArchitectureAWS EKSAWSPythonData StructuresWeb DevelopmentWeb ApplicationsAlgorithmsSystem Administration

About

Interested in building systems i.e distributed systems, event-driven systems, infrastructure engineering, container orchestration, high throughput data analysis etc

Experience

10 yrs 10 mos
Total Experience
2 yrs 2 mos
Average Tenure
2 yrs 10 mos
Current Experience

Harness

Staff Software Engineer, Infrastructure Platform

Aug 2023Present · 2 yrs 10 mos · Bengaluru · Remote

  • ● Building observability stack at Petabyte scale
  • ● Built an alert native incident mitigation workflow system designed to process alerts from various monitoring systems and execute predefined workflows. It tries to remediate incidents with configurable, action oriented workflows before escalating to oncall engineers. Countless engineering hours saved.
  • ● Built a CLI tool to zero downtime kubernetes cluster upgrades, Istio mesh upgrades across clouds i.e GKE, EKS.
  • o 5x team productivity increase
  • o 500+ hours saved per year, Upgrade time reduced from weeks to days.
  • ● Traceable packages signing
  • o Read: https://docs.traceable.ai/docs/code-signing, signing ensures packages produced by Traceable AI are publicly verifiable if they were altered any way after build.
  • o deb, rpm, apt, yum, archives, helm, docker images or any type of package released by Traceable AI is signed by Traceable’s GPG key and are publicly verifiable.
  • ● Improvements in Disaster Recovery process
  • o Add multi cloud support for backup mechanism
  • o Implement CSI backup using external snapshot controller to fasten backup process without
  • impacting production, superseding old backup mechanism using mongodump which used to take hours to complete and affect production
Amazon Web Services (AWS)Google Cloud Platform (GCP)TerraformKubernetesGo (Programming Language)Infrastructure Engineering

Disney+ hotstar

Platform Engineer

Jan 2022Aug 2023 · 1 yr 7 mos · Bengaluru, Karnataka, India · Remote

  • ● Lead platform team and manage owned services (deployment portal, scaling systems, ssh access service etc.)
  • o Responsible for extending in house scaling softwares to support business specific scaling needs
  • o Slowly getting rid of legacy in house managed systems to opensource actively managed
  • softwares
  • o Primary owner of hotstar scaling systems
  • ● Design Multi Datacenter infrastructure for growing business of Disney in different countries: https://blog.hotstar.com/scaling-infrastructure-for-millions-datacenter-abstraction-part-2-42b04ef5ed6a
  • o Kubernetes resources/clusters capacity planning
  • o Introduce a CLI and DSL for devs to manage resources easily so that they don’t have to know
  • kubernetes nomenclature
  • o Set naming conventions to support Multi DC
  • o This project has set fundamentals for scaling for big events like ICC World Cup 2023 50MN+ concurrent user and many more.
  • ● Maintain Kubernetes infrastructure on AWS EKS
  • o Manage 10+ large (45000 cpu core peak) prod clusters
  • o E2E cluster lifecycle maintenance i.e zero downtime upgrades, component upgrades
  • o Maintain, load test and scale monitoring system (victoria metrics, metrics server, vmagent,
  • kube-state-metrics)
DevOpsKubernetesAmazon Web Services (AWS)Go (Programming Language)Amazon EKSPlatform Architecture+1

Moengage inc.

Lead Site Reliability Engineer

Feb 2019Jan 2022 · 2 yrs 11 mos

  • Leading migration from EC2 VMs to kubernetes orchestrated containers
  • Dynamic configuration management using consul, confd, git2consul
  • Plan AWS VPC architecture using Terraform to have secure, cost friendly, environment agnostic infrastructure, integrated with Atlantis to manage provisioning from github comments
  • GitOps release pipeline using Argo CD and Drone CI for the platform
  • End to end SDLC planning on kubernetes, i.e. Secret management, access management, configuration management, build, deployment, monitoring, alerting, logging, everything as IAC and change management in git. It provides a simple abstraction to developers so that they don't have to worry about nuances of infrastructure

Plivo

Senior Engineer, DevOps

May 2018Jan 2019 · 8 mos · Bengaluru Area, India

  • Manage/Implement infrastructure as code using Terraform in AWS
  • Migrate existing micro-services from AWS Opsworks to Docker containers in AWS ECS with terraform

Hackerearth

2 roles

Software Engineer

Jul 2015May 2018 · 2 yrs 10 mos · Bengaluru Area, India

  • Migrate whole infrastructure from EC2 Classic to VPC in AWS.
  • Design Highly available, low latency, horizontally scalable Message tracker using cassandra, thrift, haproxy, Apache Zepplin (https://github.com/iamsudip/he-clog)
  • Implement auto-scaling solution on top of various data sources (rabbitmq, sqs, cloudwatch, product data etc).
  • Build in house deployment system using python-fabric
  • Build a CLI tool and centralised service to manage infrastructure services like autoscaling, deployment, auto heal known problems, manage SSH access to machines for HackerEarth engineers, etc.
  • Create a bot to manage processes running on different machines, manage servers, relay AWS alarms, control deployment etc. If you are lonely talk to it, you won't feel bored.
  • Build and maintain public APIs for HackerEarth's Recruit (https://goo.gl/4zNdjb) and challenges (https://goo.gl/2JP2xA)
  • Build services around Hackerearth's code compilation and assessment service.
  • Manage/Implement services on top of rabbitmq, sentry, mongo, redis, sqs, mysql, haproxy, Cassandra, Saltstack etc.

Intern

Jan 2015Apr 2015 · 3 mos · Bangalore Area, India

Education

Dr. B.C. Roy Engineering College

Bachelor of Technology (B.Tech.) — Computer Science & Engineering

Durgapur A. V. B. High School

Stackforce found 100+ more professionals with Infrastructure Engineering & Kubernetes

Explore similar profiles based on matching skills and experience