Tahir Mehraj

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India10 yrs 8 mos experience
Highly Stable

Key Highlights

  • Reduced AWS costs by 35% at Atlassian.
  • Saved 906 engineering hours monthly through automation.
  • Achieved 99.9% uptime for critical services.
Stackforce AI infers this person is a Site Reliability Engineer with extensive experience in cloud infrastructure and automation in the SaaS industry.

Contact

Skills

Core Skills

Site Reliability EngineeringCloud ArchitectureMonitoring & ObservabilityContinuous Integration And Continuous DeliveryData Center Architecture

Other Skills

KubernetesPython (Programming Language)AWSTerraformAnsibleDatadogPrometheusGrafanaPythonJavaBashCost OptimizationCI/CDAutomationDocker

About

I've spent the last 10 years in the trenches of site reliability engineering—from managing telecom networks in Kashmir to scaling cloud infrastructure at Atlassian serving 45,000+ instances monthly. Here's what nobody tells you about SRE: The real challenge isn't just keeping systems up. It's doing it cost-effectively, at scale, without burning out your team. THE NUMBERS At Atlassian, I cut AWS infrastructure costs by 35% while improving reliability. Saved 906 engineering hours monthly by eliminating pipeline flakiness. Migrated 2,000+ customer instances to the cloud without a major incident. These weren't accidents. They came from obsessive focus on building self-healing systems, making observability actually useful, automating repetitive work, and treating cost optimization as an engineering discipline. WHAT I SHARE HERE I write about the real, unglamorous work of SRE: - War stories from 3 AM incidents and what they taught me - Practical Kubernetes cost optimization patterns - Building observability that prevents outages - Lessons from 10 years of on-call rotations - The business case for reliability engineering - Honest takes on cloud architecture decisions No fluff. No theory without practice. Just what actually works when you're responsible for production systems. MY BACKGROUND Senior SRE at CrowdStrike, previously at Atlassian. Working with AWS, Kubernetes, Datadog, Terraform, Python, and the cloud-native ecosystem. Started in telecom network operations, moved through NOC and infrastructure support, eventually landing in SRE/DevOps. I've been the person getting woken up at 3 AM and the person designing systems that don't wake anyone up. Conducted 80+ technical interviews. Mentored engineering teams. Built automation saving 1,500+ FTE days. Achieved 99.9% uptime while reducing costs. WHY FOLLOW ME If you're dealing with cloud costs spiraling out of control, pipelines that fail randomly, or alerts that wake you up for nothing—I've been there. I share what I learned the hard way so you don't have to. Hit Follow for practical SRE insights without the buzzwords. Want to connect? Send me a note about what you're working on. Always interested in interesting infrastructure challenges. #SRE #DevOps #CloudArchitecture #AWS #Kubernetes #Observability

Experience

10 yrs 8 mos
Total Experience
2 yrs 6 mos
Average Tenure
6 mos
Current Experience

Crowdstrike

Senior Site Reliability Engineer

Nov 2025Present · 6 mos

KubernetesPython (Programming Language)Site Reliability Engineering

Atlassian

3 roles

Software Engineer

Promoted

Sep 2024Nov 2025 · 1 yr 2 mos

  • Leading SRE/Devops initiatives for Atlassian's Data Center products, driving infrastructure reliability and automation at scale.
  • 🚀 Major Impact Delivered:
  • Reduced AWS costs by 35% through environment lifecycle optimization
  • Saved 906 engineering hours monthly by reducing pipeline flakiness
  • Delivered tooling contributing to 1,500+ FTE days in productivity gains
  • Achieved 99.9% uptime for critical services
  • 🛠️ Key Initiatives & Achievements:
  • Platform Engineering & Automation
  • Architected and implemented scalable environment provisioning system serving 45,000+ monthly instances
  • Reduced deployment time by 60% through IaC implementation using Terraform and Ansible
  • Built automated retry mechanisms reducing test failures by 90%
  • Observability & Reliability
  • Designed comprehensive monitoring framework using Prometheus, Grafana, and Datadog
  • Reduced MTTR by 25% through enhanced observability and automated alerting
  • Implemented proactive monitoring reducing P1 incidents by 35%
  • Created automated health checks preventing 80% of common failures
  • Security & Compliance
  • Led security automation initiative across 12+ repositories using Snyk and Renovate
  • Implemented GDPR/HIPAA compliance controls for sensitive environments
  • Automated security scanning in CI/CD pipelines, reducing vulnerabilities by 60%
  • Designed secure access management system for multi-tenant environments
  • Technical Leadership
  • Mentored 6+ engineers in SRE practices and cloud architecture
  • Conducted 80+ technical interviews as CAF certified lead interviewer
  • Led cross-functional projects improving system reliability
  • Created comprehensive technical documentation and runbooks
  • 💻 Tech Stack:
  • Cloud & Infrastructure: AWS, Kubernetes, Docker, Terraform, Ansible
  • Monitoring & Observability: Datadog, Prometheus, Grafana, ELK Stack
  • CI/CD: Jenkins, Bamboo, Bitbucket Pipelines
  • Languages: Python, Java, Bash
  • Databases: MySQL, PostgreSQL
AWSKubernetesTerraformAnsibleDatadogPrometheus+6

Senior DevTools Engineer

Promoted

Oct 2022Sep 2024 · 1 yr 11 mos

  • During my professional experience, I have gained expertise in various areas related to RDBMS, Continuous Integration and Continuous Delivery (CI/CD), Performance Engineering, Docker, Proxy, Scripting, Amazon Web Services (AWS), Git, Jenkins, and Linux.
  • I have extensive experience in designing and implementing Continuous Integration and Continuous Delivery (CI/CD) pipelines using tools like Git, Jenkins, and Docker, which has resulted in efficient and automated software delivery for my teams.
  • My strong knowledge of Performance Engineering principles and techniques has enabled me to identify performance bottlenecks in complex systems and optimize their performance. I have also contributed to the development of performance testing strategies and test plans for various projects, which have helped to deliver quality software products.
  • I have hands-on experience with Amazon Web Services (AWS) and have worked on various services like EC2, S3, VPC, and ELB, among others. I have also worked on deploying applications to AWS using Docker containers and integrating them with proxy servers like HAProxy and Nginx.
  • My proficiency in scripting languages like Bash and Python has helped me automate repetitive tasks and create scripts to perform complex operations on Linux systems. I have also worked on database management systems like MySQL and Oracle, and have experience in installation, configuration, and optimization of these databases.
  • In summary, my expertise in RDBMS, Continuous Integration and Continuous Delivery (CI/CD), Performance Engineering, Docker, Proxy, Scripting, Amazon Web Services (AWS), Git, Jenkins, and Linux has been instrumental in successfully delivering high-quality software products.
AWSDockerGitJenkinsLinuxContinuous Integration and Continuous Delivery

DevTools Engineer

Mar 2020Oct 2022 · 2 yrs 7 mos

GitContinuous IntegrationContinuous Delivery

Khoros

2 roles

Mcs Engineer - II

Promoted

Mar 2019Mar 2020 · 1 yr

DevOpsData Center Architecture

MCS Engineer - I

Nov 2018Mar 2019 · 4 mos

  • As a seasoned professional with expertise in RDBMS, Data Center Administration, Performance Engineering, Docker, Elastic Search, Scripting, Amazon Web Services (AWS), Git, and Linux, I have played a key role in driving the success of numerous projects throughout my career.
  • In my previous roles, I have been responsible for the administration and optimization of data centers, ensuring that all systems and applications operate efficiently and effectively. I have leveraged my skills in Performance Engineering to identify and mitigate bottlenecks and optimize system performance. My experience with Docker has enabled me to containerize applications, making them portable and easy to manage.
  • In addition, I have extensive experience with Elastic Search, leveraging its powerful search and analytics capabilities to enable fast and efficient access to large volumes of data. I have utilized my scripting skills to automate tasks and streamline processes, improving efficiency and reducing errors.
  • Throughout my career, I have leveraged Amazon Web Services (AWS) to deliver scalable and cost-effective solutions to clients. My proficiency in Git has enabled me to effectively manage source code, collaborate with team members, and deploy changes to production systems.
  • Finally, my understanding of Linux has enabled me to effectively manage and optimize systems, from operating system configuration to application deployment and performance tuning. Overall, my diverse skill set has enabled me to be a valuable asset to any organization, and I look forward to bringing my expertise to new challenges and opportunities.
RDBMSData Center AdministrationPerformance EngineeringDockerElastic SearchScripting+4

Infogain

Critical Support Engineer

Apr 2017Oct 2018 · 1 yr 6 mos

  • As a dedicated member of the Network Operations Center (NOC), I bring a wealth of technical expertise and troubleshooting skills to my role. My primary responsibility is to perform technical analysis of system issues and outages across customer enterprise networks, using my deep understanding of network technologies to quickly identify the root cause of any problems.
  • Once I have identified an issue, I leverage my extensive research skills to troubleshoot and resolve the issue, using a range of tools and techniques to diagnose and address the problem as quickly and effectively as possible. If the issue is particularly complex, I work closely with higher-level Core/Escalation Engineering to develop and implement a solution.
  • Throughout this process, I am responsible for researching and documenting various mitigation strategies, using my knowledge of customer technologies to develop effective solutions that minimize the impact of any issues on customer operations. I must also stay up-to-date with the latest trends and developments in network technologies, in order to ensure that I am able to provide the highest level of service to our customers.
  • Finally, I use my excellent organizational and prioritization skills to manage issues in a 24 x 7 environment with critical uptime requirements, ensuring that all issues are addressed promptly and effectively. My dedication to providing exceptional customer service, combined with my deep technical expertise, make me an invaluable asset to any team or organization.
Site Reliability Engineering

Ericsson india

Network Operations Engineer

Aug 2015Apr 2017 · 1 yr 8 mos · Srinagar Area, India

  • As a seasoned telecom professional, I have extensive experience in overseeing complex projects related to the design, deployment, and optimization of telecommunications networks. In my role as a Transmission Project and Operation specialist, I was responsible for overseeing the installation and operation of IP equipment to support site rollout, as well as the dimensioning of backbone IP/CEN networks.
  • One of my primary responsibilities was to perform design optimization and audit of the network to ensure that it met customer requirements. This involved working closely with a range of stakeholders to gather requirements, conduct analysis, and develop and implement solutions that would help the network operate more efficiently and effectively.
  • Throughout my work, I focused on the design and optimization of RAN, BSC/MSC, Transmission, OM, and Services Networks, using my deep understanding of network technologies to identify areas for improvement and develop innovative solutions to address any issues. I also leveraged my expertise in project management to ensure that all projects were completed on time and within budget, using agile methodologies to adapt to changing customer needs and requirements.
  • Overall, my extensive experience in telecom, combined with my deep technical knowledge and project management skills, make me a valuable asset to any team or organization seeking to optimize their network operations and provide the highest level of service to their customers.
Mobile Switching Centre Server (MSS)Computer Network Operations

Education

University of Kashmir

Bachelor of Engineering - BE — Computer Science

Jan 2011Jan 2015

Stackforce found 100+ more professionals with Site Reliability Engineering & Cloud Architecture

Explore similar profiles based on matching skills and experience