Bijoy A. — SRE (Site Reliability Engineer)
Currently working as an SRE with the Platform Engineering team, where reliability, scalability, and automation are at the core of everything I do.With proven experience in building, scaling, and maintaining production systems, I work across both DevOps and Machine Learning Operations (MLOps) — delivering robust infrastructure for traditional services as well as AI workloads. Core Skill Set:- DevOps; Infra-as-Code: Kubernetes, Docker, Helm, Terraform, Ansible, CI/CD, GitOps, Argo CD, Github Actions- Monitoring; Observability: Prometheus, Grafana, ELK Stack- Cloud; Hybrid Environments: AWS, GCP, Azure, and On-Prem- Languages; Scripting: Python, Bash- MLOps; ML Stack: PyTorch, TensorFlow, model deployment, versioning, etc. Passionate about:- Building scalable infrastructure for traditional and AI workloads- Observability, failover, and performance tuning in GPU-accelerated distributed systems. I bring experience in infrastructure setup to production support — and have deep experience with system performance tuning, debugging, and working within micro-services and distributed architectures.
Stackforce AI infers this person is a DevOps and MLOps specialist in the SaaS industry.
Location: Bengaluru, Karnataka, India
Experience: 10 yrs 10 mos
Career Highlights
- Expert in building scalable infrastructure for AI workloads.
- Strong background in DevOps and MLOps practices.
- Proficient in performance tuning for distributed systems.
Work Experience
Cisco
Software Engineer SRE IV (4 mos)
Software Engineer SRE III (4 yrs 9 mos)
Dell Technologies
Software Engineer SRE II (1 yr 8 mos)
Software Engineer SRE I (2 yrs 7 mos)
Production Support/DevOps (7 mos)
Production Support Engineer (1 yr 3 mos)
Education
Bachelor of Technology - BTech at Dr MGR Educational and Research Institute