Bijoy A. — SRE (Site Reliability Engineer)
Currently working as an SRE with the Platform Engineering team, where reliability, scalability, and automation are at the core of everything I do.With proven experience in building, scaling, and maintaining production systems, I work across both DevOps and Machine Learning Operations (MLOps) — delivering robust infrastructure for traditional services as well as AI workloads. Core Skill Set:- DevOps; Infra-as-Code: Kubernetes, Docker, Helm, Terraform, Ansible, CI/CD, GitOps, Argo CD, Github Actions- Monitoring; Observability: Prometheus, Grafana, ELK Stack- Cloud; Hybrid Environments: AWS, GCP, Azure, and On-Prem- Languages; Scripting: Python, Bash- MLOps; ML Stack: PyTorch, TensorFlow, model deployment, versioning, etc. Passionate about:- Building scalable infrastructure for traditional and AI workloads- Observability, failover, and performance tuning in GPU-accelerated distributed systems. I bring experience in infrastructure setup to production support — and have deep experience with system performance tuning, debugging, and working within micro-services and distributed architectures.
Stackforce AI infers this person is a DevOps and MLOps specialist in the SaaS industry.
Location: Bengaluru, Karnataka, India
Experience: 11 yrs
Career Highlights
- Expert in building scalable infrastructure for AI workloads.
- Strong background in DevOps and MLOps practices.
- Proficient in performance tuning for distributed systems.
Work Experience
Cisco
Software Engineer SRE IV (6 mos)
Software Engineer SRE III (4 yrs 11 mos)
Dell Technologies
Software Engineer SRE II (1 yr 8 mos)
Software Engineer SRE I (2 yrs 7 mos)
Production Support/DevOps (7 mos)
Production Support Engineer (1 yr 3 mos)
Education
Bachelor of Technology - BTech at Dr MGR Educational and Research Institute