Tahir Mehraj — SRE (Site Reliability Engineer)

I've spent the last 10 years in the trenches of site reliability engineering—from managing telecom networks in Kashmir to scaling cloud infrastructure at Atlassian serving 45,000+ instances monthly. Here's what nobody tells you about SRE: The real challenge isn't just keeping systems up. It's doing it cost-effectively, at scale, without burning out your team. THE NUMBERS At Atlassian, I cut AWS infrastructure costs by 35% while improving reliability. Saved 906 engineering hours monthly by eliminating pipeline flakiness. Migrated 2,000+ customer instances to the cloud without a major incident. These weren't accidents. They came from obsessive focus on building self-healing systems, making observability actually useful, automating repetitive work, and treating cost optimization as an engineering discipline. WHAT I SHARE HERE I write about the real, unglamorous work of SRE: - War stories from 3 AM incidents and what they taught me - Practical Kubernetes cost optimization patterns - Building observability that prevents outages - Lessons from 10 years of on-call rotations - The business case for reliability engineering - Honest takes on cloud architecture decisions No fluff. No theory without practice. Just what actually works when you're responsible for production systems. MY BACKGROUND Senior SRE at CrowdStrike, previously at Atlassian. Working with AWS, Kubernetes, Datadog, Terraform, Python, and the cloud-native ecosystem. Started in telecom network operations, moved through NOC and infrastructure support, eventually landing in SRE/DevOps. I've been the person getting woken up at 3 AM and the person designing systems that don't wake anyone up. Conducted 80+ technical interviews. Mentored engineering teams. Built automation saving 1,500+ FTE days. Achieved 99.9% uptime while reducing costs. WHY FOLLOW ME If you're dealing with cloud costs spiraling out of control, pipelines that fail randomly, or alerts that wake you up for nothing—I've been there. I share what I learned the hard way so you don't have to. Hit Follow for practical SRE insights without the buzzwords. Want to connect? Send me a note about what you're working on. Always interested in interesting infrastructure challenges. #SRE #DevOps #CloudArchitecture #AWS #Kubernetes #Observability

Stackforce AI infers this person is a Site Reliability Engineer with extensive experience in cloud infrastructure and automation in the SaaS industry.

Location: Bengaluru, Karnataka, India

Experience: 10 yrs 8 mos

Skills

Site Reliability Engineering
Cloud Architecture
Monitoring & Observability
Continuous Integration And Continuous Delivery
Data Center Architecture

Career Highlights

Reduced AWS costs by 35% at Atlassian.
Saved 906 engineering hours monthly through automation.
Achieved 99.9% uptime for critical services.

Work Experience

CrowdStrike

Senior Site Reliability Engineer (6 mos)

Atlassian

Software Engineer (1 yr 2 mos)

Senior DevTools Engineer (1 yr 11 mos)

DevTools Engineer (2 yrs 7 mos)

Khoros

Mcs Engineer - II (1 yr)

MCS Engineer - I (4 mos)

Infogain

Critical Support Engineer (1 yr 6 mos)

Ericsson India

Network Operations Engineer (1 yr 8 mos)

Education

Bachelor of Engineering - BE at University of Kashmir

Tahir Mehraj

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India10 yrs 8 mos experience

Highly Stable

Key Highlights

Reduced AWS costs by 35% at Atlassian.
Saved 906 engineering hours monthly through automation.
Achieved 99.9% uptime for critical services.

Stackforce AI infers this person is a Site Reliability Engineer with extensive experience in cloud infrastructure and automation in the SaaS industry.

Contact

Skills

Core Skills

Site Reliability EngineeringCloud ArchitectureMonitoring & ObservabilityContinuous Integration And Continuous DeliveryData Center Architecture

Other Skills

KubernetesPython (Programming Language)AWSTerraformAnsibleDatadogPrometheusGrafanaPythonJavaBashCost OptimizationCI/CDAutomationDocker

About

Experience

10 yrs 8 mos

Total Experience

2 yrs 6 mos

Average Tenure

6 mos

Current Experience

Crowdstrike

Senior Site Reliability Engineer

Nov 2025 – Present · 6 mos

KubernetesPython (Programming Language)Site Reliability Engineering

Atlassian

3 roles

Software Engineer

Promoted

Sep 2024 – Nov 2025 · 1 yr 2 mos

Leading SRE/Devops initiatives for Atlassian's Data Center products, driving infrastructure reliability and automation at scale.
🚀 Major Impact Delivered:
Reduced AWS costs by 35% through environment lifecycle optimization
Saved 906 engineering hours monthly by reducing pipeline flakiness
Delivered tooling contributing to 1,500+ FTE days in productivity gains
Achieved 99.9% uptime for critical services
🛠️ Key Initiatives & Achievements:
Platform Engineering & Automation
Architected and implemented scalable environment provisioning system serving 45,000+ monthly instances
Reduced deployment time by 60% through IaC implementation using Terraform and Ansible
Built automated retry mechanisms reducing test failures by 90%
Observability & Reliability
Designed comprehensive monitoring framework using Prometheus, Grafana, and Datadog
Reduced MTTR by 25% through enhanced observability and automated alerting
Implemented proactive monitoring reducing P1 incidents by 35%
Created automated health checks preventing 80% of common failures
Security & Compliance
Led security automation initiative across 12+ repositories using Snyk and Renovate
Implemented GDPR/HIPAA compliance controls for sensitive environments
Automated security scanning in CI/CD pipelines, reducing vulnerabilities by 60%
Designed secure access management system for multi-tenant environments
Technical Leadership
Mentored 6+ engineers in SRE practices and cloud architecture
Conducted 80+ technical interviews as CAF certified lead interviewer
Led cross-functional projects improving system reliability
Created comprehensive technical documentation and runbooks
💻 Tech Stack:
Cloud & Infrastructure: AWS, Kubernetes, Docker, Terraform, Ansible
Monitoring & Observability: Datadog, Prometheus, Grafana, ELK Stack
CI/CD: Jenkins, Bamboo, Bitbucket Pipelines
Languages: Python, Java, Bash
Databases: MySQL, PostgreSQL

AWSKubernetesTerraformAnsibleDatadogPrometheus+6

Senior DevTools Engineer

Promoted

Oct 2022 – Sep 2024 · 1 yr 11 mos

During my professional experience, I have gained expertise in various areas related to RDBMS, Continuous Integration and Continuous Delivery (CI/CD), Performance Engineering, Docker, Proxy, Scripting, Amazon Web Services (AWS), Git, Jenkins, and Linux.
I have extensive experience in designing and implementing Continuous Integration and Continuous Delivery (CI/CD) pipelines using tools like Git, Jenkins, and Docker, which has resulted in efficient and automated software delivery for my teams.
My strong knowledge of Performance Engineering principles and techniques has enabled me to identify performance bottlenecks in complex systems and optimize their performance. I have also contributed to the development of performance testing strategies and test plans for various projects, which have helped to deliver quality software products.
I have hands-on experience with Amazon Web Services (AWS) and have worked on various services like EC2, S3, VPC, and ELB, among others. I have also worked on deploying applications to AWS using Docker containers and integrating them with proxy servers like HAProxy and Nginx.
My proficiency in scripting languages like Bash and Python has helped me automate repetitive tasks and create scripts to perform complex operations on Linux systems. I have also worked on database management systems like MySQL and Oracle, and have experience in installation, configuration, and optimization of these databases.
In summary, my expertise in RDBMS, Continuous Integration and Continuous Delivery (CI/CD), Performance Engineering, Docker, Proxy, Scripting, Amazon Web Services (AWS), Git, Jenkins, and Linux has been instrumental in successfully delivering high-quality software products.

AWSDockerGitJenkinsLinuxContinuous Integration and Continuous Delivery

DevTools Engineer

Mar 2020 – Oct 2022 · 2 yrs 7 mos

GitContinuous IntegrationContinuous Delivery

Khoros

2 roles

Mcs Engineer - II

Promoted

Mar 2019 – Mar 2020 · 1 yr

DevOpsData Center Architecture

MCS Engineer - I

Nov 2018 – Mar 2019 · 4 mos

As a seasoned professional with expertise in RDBMS, Data Center Administration, Performance Engineering, Docker, Elastic Search, Scripting, Amazon Web Services (AWS), Git, and Linux, I have played a key role in driving the success of numerous projects throughout my career.
In my previous roles, I have been responsible for the administration and optimization of data centers, ensuring that all systems and applications operate efficiently and effectively. I have leveraged my skills in Performance Engineering to identify and mitigate bottlenecks and optimize system performance. My experience with Docker has enabled me to containerize applications, making them portable and easy to manage.
In addition, I have extensive experience with Elastic Search, leveraging its powerful search and analytics capabilities to enable fast and efficient access to large volumes of data. I have utilized my scripting skills to automate tasks and streamline processes, improving efficiency and reducing errors.
Throughout my career, I have leveraged Amazon Web Services (AWS) to deliver scalable and cost-effective solutions to clients. My proficiency in Git has enabled me to effectively manage source code, collaborate with team members, and deploy changes to production systems.
Finally, my understanding of Linux has enabled me to effectively manage and optimize systems, from operating system configuration to application deployment and performance tuning. Overall, my diverse skill set has enabled me to be a valuable asset to any organization, and I look forward to bringing my expertise to new challenges and opportunities.

RDBMSData Center AdministrationPerformance EngineeringDockerElastic SearchScripting+4

Infogain

Critical Support Engineer

Apr 2017 – Oct 2018 · 1 yr 6 mos

As a dedicated member of the Network Operations Center (NOC), I bring a wealth of technical expertise and troubleshooting skills to my role. My primary responsibility is to perform technical analysis of system issues and outages across customer enterprise networks, using my deep understanding of network technologies to quickly identify the root cause of any problems.
Once I have identified an issue, I leverage my extensive research skills to troubleshoot and resolve the issue, using a range of tools and techniques to diagnose and address the problem as quickly and effectively as possible. If the issue is particularly complex, I work closely with higher-level Core/Escalation Engineering to develop and implement a solution.
Throughout this process, I am responsible for researching and documenting various mitigation strategies, using my knowledge of customer technologies to develop effective solutions that minimize the impact of any issues on customer operations. I must also stay up-to-date with the latest trends and developments in network technologies, in order to ensure that I am able to provide the highest level of service to our customers.
Finally, I use my excellent organizational and prioritization skills to manage issues in a 24 x 7 environment with critical uptime requirements, ensuring that all issues are addressed promptly and effectively. My dedication to providing exceptional customer service, combined with my deep technical expertise, make me an invaluable asset to any team or organization.

Site Reliability Engineering

Ericsson india

Network Operations Engineer

Aug 2015 – Apr 2017 · 1 yr 8 mos · Srinagar Area, India

As a seasoned telecom professional, I have extensive experience in overseeing complex projects related to the design, deployment, and optimization of telecommunications networks. In my role as a Transmission Project and Operation specialist, I was responsible for overseeing the installation and operation of IP equipment to support site rollout, as well as the dimensioning of backbone IP/CEN networks.
One of my primary responsibilities was to perform design optimization and audit of the network to ensure that it met customer requirements. This involved working closely with a range of stakeholders to gather requirements, conduct analysis, and develop and implement solutions that would help the network operate more efficiently and effectively.
Throughout my work, I focused on the design and optimization of RAN, BSC/MSC, Transmission, OM, and Services Networks, using my deep understanding of network technologies to identify areas for improvement and develop innovative solutions to address any issues. I also leveraged my expertise in project management to ensure that all projects were completed on time and within budget, using agile methodologies to adapt to changing customer needs and requirements.
Overall, my extensive experience in telecom, combined with my deep technical knowledge and project management skills, make me a valuable asset to any team or organization seeking to optimize their network operations and provide the highest level of service to their customers.

Mobile Switching Centre Server (MSS)Computer Network Operations