Shubhrajeet Behadi

SRE (Site Reliability Engineer)

Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia14 yrs 9 mos experience

Most Likely To Switch

Key Highlights

Over 13 years of experience in cloud and DevOps.
Expert in cloud cost management and optimization.
Proven track record in leading DevOps transformations.

Stackforce AI infers this person is a Fintech and Media cloud infrastructure expert with strong DevOps capabilities.

Contact

Skills

Core Skills

Cloud Cost ManagementIncident ManagementCost OptimizationDevops AdoptionCloud StrategyCloud Infrastructure ManagementBuild And Release Management

Other Skills

AWSAWS CloudFormationAWS Elastic BeanstalkAWS Reserved InstancesActiveMQAerospikeAlibaba CloudApache KafkaApache SparkAurora I/O storageBashCI/CDCassandraCloudOpsCloudera

About

13+ years of experience in building highly available applications by implementing latest toolsets and practices for large scale consumer facing applications and assisting in Digital Transformation for industries like e-commerce, media and logistics . Focussed on measurable metrics to define successful delivery of software while emphasizing on automation as much as possible to keep operational and maintenance costs to minimum. Well versed with Cloud (AWS, Alibaba Cloud & Azure) and deployment automation with focus on monitoring dashboards for business visibility, performance and SLAs. Core strengths include team building, infrastructure, deployment & process automation. I am always up for new challenges, solving complex technical problems and working on latest tools and technology.

Experience

14 yrs 9 mos

Total Experience

1 yr 10 mos

Average Tenure

3 yrs 2 mos

Current Experience

Moneylion

2 roles

Director, SRE

Promoted

Jun 2023 – Present · 2 yrs 11 mos · WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia · On-site

Cloud Cost Management & FinOps
Led multi-million-dollar cloud cost optimizations by optimizing AWS Reserved Instances, Aurora I/O storage, and EBS volumes.
Established a scalable chargeback model for AWS and Datadog, driving cost accountability across engineering teams.
Ensured cost governance and efficiency while enabling new AI and R&D initiatives without exceeding budget baselines.
Service Availability & Incident Management
Built a self-service observability framework for seamless Datadog Service Catalog onboarding, enabling real-time reliability tracking.
Standardized incident response workflows, ensuring faster resolution, improved coordination, and C-level visibility.
Strengthened disaster recovery (DR) processes, significantly reducing recovery time and improving system resilience.
Automation & Operational Excellence
Migrated AWS access management to AWS SSO, ensuring centralized authentication and SOC2 compliance.
Implemented CNOPS (CleanRoom No Operations) to automate secure, auditable, and just-in-time access to AWS, databases, and GitHub.
Spearheaded EKS upgrades and CI/CD pipeline automation, improving deployment efficiency and reducing operational bottlenecks.
Scalability & Performance Optimization
Designed Kubernetes best practices to optimize reliability, availability, and performance of critical workloads.
Optimized CI/CD workflows, reducing pipeline execution time for key services.
Innovation & Cloud Modernization
Introduced KEDA, Karpenter, Kubecost, and ArgoCD, improving Kubernetes scalability and automation.
Implemented Kustomize to simplify external service management and upgrades.
Continuously evaluate and integrate emerging cloud technologies to boost developer productivity and efficiency.
Leadership & Talent Growth
Led and scaled globally distributed SRE teams, ensuring cross-time-zone collaboration and high-compliance infrastructure.
Promoted internal talent growth, reducing external hiring by upskilling junior engineers.

AWSDatadogKubernetesEKSCI/CDCloud Cost Management+1

Engineering Manager, SRE

Mar 2023 – Jun 2023 · 3 mos · WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia · On-site

Drove the intiative to achieve a significant cost reduction of 15% in AWS spends by effectively optimizing AWS usage across RDS, EC2, Cloudwatch and Kinesis.
Successfully raised awareness among engineering teams regarding their AWS consumption, empowering them to optimize their usage when expenses exceed the historically set usage.
Implemented FinOps best practices to foster a culture of accountability, ensuring that every team takes ownership of their cloud usage and actively seeks cost-saving opportunities.

AWSFinOpsCost Optimization

Red hat

Senior Consultant - Hybrid Cloud Practices

Sep 2021 – Mar 2023 · 1 yr 6 mos · WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia · Remote

Part of the APAC Transformation and Adoption team responsible for promoting and
implementing DevOps adoption using open practice library.
Helping customers in their Container Adoption Journey (CAJ), Automation Adoption Journey
(AAJ) and Hybrid Cloud Adoption Journey (HCAJ).
Trusted advisor for hybrid cloud practices (OpenShift, Kubernetes, Advance Cluster
Management, CI/CD).

DevOpsKubernetesOpenShiftDevOps Adoption

Tng digital

Senior Specialist DevOps

Aug 2020 – Sep 2021 · 1 yr 1 mo · WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia

Leading SRE-DevOps to formulate cloud strategy to provide the supporting infra & delivery pipeline for Malaysia’s fastest growing wallet ecosystem.
Architecting cloud solutions to optimize overall infrastructure costs to bring down infra cost per wallet transaction.
Enabling cloud native platform to support faster transactions without compromising on security and compliance.
Leveraging Alibaba Cloud as the infrastructure provider to build highly available, secure and robust transaction framework.
Enabling 360 degree monitoring and alerts to gain visibility into the wallet ecosystem to meet the set SLOs.
Designed & implemented kubernetes auto scaling using preemptible instances in AliCloud achieving upto 90% cost optimization as compared to pay-as-you-go pricing.

Cloud StrategyAlibaba CloudKubernetes

Astro

Assistant Vice President, Devops Engineer

Nov 2017 – Aug 2020 · 2 yrs 9 mos · Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia

Leading DevOps adoption in Astro's digital transformation to enable application teams to onboard agile practices, automation processes and devops culture to enhance rapid development by easing out the process of building software products by staying efficient, outage free, stable,secure and reliable.
Leading CloudOps team to adopt CCOE (Cloud Centre Of Excellence) practices for cloud governance. Strategized and improved FinOps tools usage to deliver biweekly and monthly reports on cloud spending and wastage resulting in increased visibility and subsequently achieving 30% cost reduction with almost 100% Reserved Instances coverage & utilization along with ~90% Savings Plan coverage.
Member of Astro's Cloud Centre Of Excellence (CCOE) team which was built to efficiently operate and support cloud applications, provide better cloud governance and streamline the service delivery and assurance of cloud services to Astro’s business units.
Enabled applications in Kubernetes containers orchestration with automated deployment pipelines using Helm, Drone and Bitbucket-pipelines to achieve 60% infrastructure cost savings
Implemented automated infrastructure provisioning & code deployment in a single pipeline using Terraform, Packer and AWS SDKs to reduce infrastructure provisioning time by 80% and time to market by 70% with proper audit trail of infra and application code changes.
Implemented 100% resource tagging and security compliance using automated infra provision pipelines to enable resource reboot automation across multiple AWS accounts which ultimately helped in savings of around 40% per year.
Implemented automated encryption of EC2 and RDS instances to meet Astro's security compliance guidelines.
Owned One Stop Monitoring Dashboard as a solution for monitoring system infra, application performance and centralized logging built in totality to be used by any application to have visibility into their application and accordingly improve its performance.

DevOpsCloudOpsTerraformDevOps Adoption

Rivigo

Sr. Devops Engineer

Oct 2015 – Nov 2017 · 2 yrs 1 mo · Gurugram, Haryana, India

Single handedly took care of complete cloud infrastructure administration and DevOps automation to implement efficient cloud operation strategy right from the scratch.
Reduced AWS costs by 30% by implementing automated nightly shutdown of dev/staging Cloud resources.
Implemented automated provision and setup of AWS resources and installation of required toolsets as per the requirement.
Implemented single click build and deployment using jenkins and AWS elasticbeanstalk environments
Implemented automated build of android app and push to Playstore using android sdk and jenkins.
Automated business reports and alerts according to application team's requirements.
Designed and implemented HA & scalable MongoDB cluster with proper backup and monitoring.
Implemented automated user access(SSH) management in AWS EC2 servers which can be easily tracked to make sure the necessary access gets created and deleted accordingly when a user joins or leaves the organisation.

AWSDevOps AutomationCloud Infrastructure Management

Snapdeal

Build and Release Engineer

Nov 2014 – Oct 2015 · 11 mos · New Delhi

Worked on deploying, monitoring and maintenance of pre-production environments(integrated testing environments).
Worked on provisioning of AWS instances and installation of required toolsets as per the infrastructure architecture and requirements.
Automated build and deployment using Jenkins and deployment scripts.
Setup HA kafka, zookeeper, aerospike, cassandra, hadoop and mongoDB cluster according to application team’s requirements.
Writing puppet modules for configuration of a large number of servers.
Automated daily tasks using bash scripts.

AWSJenkinsMongoDBCloud Infrastructure Management

Olx

System Administrator

May 2014 – Nov 2014 · 6 mos

Maintenance of pre-prod and production environment.
Worked on carrying out releases using jenkins and providing the necessary support to app team.
End-to-end Outage/Incident management to ensure minimum service downtime.
Worked closely with the realease team based in Argentina to manage releases for OLX global.
Carried out RCA for any kind of outages and finding and implementing permanent fix.

AWSJenkinsBuild and Release Management

Tata consultancy services

Systems Engineer

Jul 2011 – Apr 2014 · 2 yrs 9 mos

Worked with the TCS PreSales team for product BaNCS to interact with the potential customers, demo our product and services and accordingly answer any queries, doubts.
Carried out system administration tasks for pre-prod environments.
Build and Deployment of Application EARs on WAS/Weblogic Servers.
Scripting for database backup and restore
Automation using bash scripts.

Jenkins