Bijoy A.

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India11 yrs experience

Highly Stable

Key Highlights

Expert in building scalable infrastructure for AI workloads.
Strong background in DevOps and MLOps practices.
Proficient in performance tuning for distributed systems.

Stackforce AI infers this person is a DevOps and MLOps specialist in the SaaS industry.

Contact

Skills

Other Skills

Azure DevOpsAzure Kubernetes Service (AKS)C++GitGitlabLinuxLinux System AdministrationLoad BalancingNI LabVIEWPython ScriptingRelease ManagementUnix

About

Currently working as an SRE with the Platform Engineering team, where reliability, scalability, and automation are at the core of everything I do.With proven experience in building, scaling, and maintaining production systems, I work across both DevOps and Machine Learning Operations (MLOps) — delivering robust infrastructure for traditional services as well as AI workloads. Core Skill Set:- DevOps; Infra-as-Code: Kubernetes, Docker, Helm, Terraform, Ansible, CI/CD, GitOps, Argo CD, Github Actions- Monitoring; Observability: Prometheus, Grafana, ELK Stack- Cloud; Hybrid Environments: AWS, GCP, Azure, and On-Prem- Languages; Scripting: Python, Bash- MLOps; ML Stack: PyTorch, TensorFlow, model deployment, versioning, etc. Passionate about:- Building scalable infrastructure for traditional and AI workloads- Observability, failover, and performance tuning in GPU-accelerated distributed systems. I bring experience in infrastructure setup to production support — and have deep experience with system performance tuning, debugging, and working within micro-services and distributed architectures.