Aakarshit Tyagi

CTO

Greater Toronto Area, Canada6 yrs 8 mos experience
Highly Stable

Key Highlights

  • Improved transaction success rates by 33% in fintech.
  • Reduced database provisioning time by 77% using IaC.
  • Led zero-downtime upgrades for major banking platforms.
Stackforce AI infers this person is a Site Reliability Engineer in the Fintech industry, specializing in high-availability systems.

Contact

Skills

Core Skills

Site Reliability EngineeringCloud ComputingInfrastructure As CodeDevopsMiddleware EngineeringAutomationSystems ArchitectureWeb Development

Other Skills

ANSYSAb InitioAdobe PhotoshopAnalytical SkillsApacheBashBorder Gateway Protocol (BGP)Business CommunicationsC++CATIAChefCritical ReadingCritical ThinkingDNS services migrationDesign Thinking

About

I am a Site Reliability Engineer with 7+ years of experience building scalable, fault-tolerant systems in fintech and telecom. Proven track record automating infrastructure, improving performance metrics, and delivering resilient production environments handling over 400M financial transactions daily. I enjoy travelling, discovering new places, working out at the gym, cycling, riding/driving, listening to music, and socialising during my free time. I’m looking for long-term work that aligns with my experience and growth journey. At: PhonePe (Walmart), I am a Senior SRE for a platform processing over 400 million financial transactions daily. I developed a least-time algorithm that improved success rates by 33% and cut p99 latencies by 90%. I also wrote infrastructure as code that reduced database provisioning time by 77% and optimised stream compression to cut backup time by 60% for 40 PB of daily data. At: Airtel X Labs, I developed Ansible playbooks integrated with Jenkins to manage infrastructure as code. I led zero-downtime database upgrades for the Ab Initio platform and engineered high-availability solutions for critical services, including extending Hashicorp Vault to achieve a 5-minute Recovery Point Objective (RPO). Before that, as a: Systems Engineer at Infosys for Citibank, I led a project to automate the builds of the Ab Initio Data platform, which resulted in a 95% reduction in manual effort and significantly improved infrastructure consistency. Core Competencies: Site Reliability Engineering (SRE), Infrastructure as Code (Ansible), Database Management (MariaDB Galera), CI/CD, Containerization (Podman, Docker), Secrets Management (Hashicorp Vault), High-Availability Systems, Python (FastAPI), Linux Systems (RHEL, Ubuntu), Networking

Experience

Opentext

Lead Engineer

Nov 2025Present · 4 mos · Richmond Hill, ON

Phonepe

2 roles

Site Reliability Engineer III

Jul 2024Jul 2025 · 1 yr

  • Site Reliability Engineer 3 on the Tech Infrastructure Team, responsible for the core UPI banking infrastructure.
  • Architected and deployed new, compliant Payment Service Provider (PSP) environments for major partners, including Axis Bank and Yes Bank. This involved creating novel solutions for unique disconnected environments and expanded the organization's capacity for handling UPI transactions.
  • Managed and executed successful multi-datacenter Disaster Recovery (DR) drills for critical Axis and YBL PSP infrastructures, ensuring business continuity and service resilience for millions of users.
  • Spearheaded significant capacity expansion projects and complex database migrations for the UPI platform, directly supporting a growing user base and increasing transaction volumes while improving overall service reliability (SR).
  • Drove research and implementation of key infrastructure enhancements to improve performance and stability, including developing an Nginx rebalancer script for better response times, migrating core DNS services to an Anycast setup, and stress-testing new hardware to confirm infrastructure stability.
Payment Service Provider (PSP) environmentsDisaster Recovery (DR) drillsInfrastructure enhancementsNginx rebalancer scriptDNS services migrationSite Reliability Engineering+1

Site Reliability Engineer II

Aug 2021Jun 2024 · 2 yrs 10 mos

  • Developed least-time algorithm for nginx+ to shape upstream network traffic to banks that
  • improved success rates by 33% and reduced p99 latencies by 90%
  • Designed and implemented high-availability and DR for largest banks in India (Yes, Axis and
  • ICICI) for a 15-minute failover of 400 million daily QR-based payments
  • Infrastructure Management as code using Ansible for building MariaDB Galera clusters with
  • support for templated configurations, reducing provisioning time by 77%
  • Performed database splits and upgrades for sharded-MariaDB (Galera) increasing capacity by 70%
  • Developed alerting using Riemann events that monitored latency and errors to improve MTTD
  • Benchmarked and deployed stream compression using Zstd to reduce log backup and mariabackup
  • time by 60% with improved network utilization for 40 PB of data per day
  • Improved SSL transparency alerts by scripting OpenSSL s_client based monitoring
  • Debugged complex issues: DNS PTR failures (djbdns), Linux Virtual Server healthchecks, online
  • DDL, DB migrations, BGP ECMP routing, synchronous/async replication, keepalived, KVM
  • Deployed local container registry as a cache for Azure to speed up deployments by 98%
  • Deployed Gitlab with user lifecycle management and automated backups to OpenZFS
  • Designed network and access policies for Thales HSMs to secure KYC communication
least-time algorithmhigh-availability solutionsInfrastructure as Code (Ansible)database upgradesstream compressionSite Reliability Engineering+1

Airtel x labs

Software Engineer DevOps

Aug 2020Jul 2021 · 11 mos · Gurgaon, Haryana, India

  • Working as a software engineer with focus in DevOps.
  • + Implemented ELA licensing model within DARTS to onboard new developers and infrastructure instantly
  • + Mitigated unplanned shutdowns of the licensing daemon to ensure site reliability
  • + Worked on application clustering to enable High Availability via load balancing
  • + Performed zero downtime database migration and schema model upgrades for business critical authorization application using A/B schema strategy
  • + Built elaborate pipelines for infrastructure automation and maintenance
  • + Config as code via Ansible
  • + Led Airtel's biggest business intelligence application upgrades to modern HTML5 compliant standards including responsibilities like ownership, delegation and communication with external stakeholders.
  • + Centralized password management for ephemeral functional user passwords and painless password rotation for legacy systems
  • + Building modern secrets management at Airtel using Hashicorp Vault.
  • Skills: Ansible, Jenkins, Shell, Linux, Systems Architecture, Public key cryptography, OracleDB, Ab Initio, Apache Hadoop, Hashicorp Products
ELA licensing modelHigh Availabilityzero downtime database migrationinfrastructure automationsecrets managementDevOps+1

Infosys

2 roles

Middleware Engineer

Jan 2019Aug 2020 · 1 yr 7 mos

  • ★ Involved in a Financial Services project providing middleware engineering and troubleshooting for applications including but not limited to:
  • + Ab>Initio (Co>Operating System, Application Hub, Control>Center, EME, GDE, and Ab-Apps)
  • + IBM WebSphere
  • + Apache Tomcat
  • + Apache, IHS, Nginx
  • + Big Data (Hadoop integration)
  • + Red Hat Enterprise Linux (releases 5, 6 and 7)
  • + Databases (Oracle, MongoDB)
  • + Chef
  • + Talend
  • ★ Automated the builds and validations of Ab Initio on new Big Data Servers and standardized the build process
  • ★ Automated setup of bulk passwordless SSH connectivity on new Hadoop Clusters
  • ★ Lead a key renewal activity for Ab Initio servers and developed a shell script to mass renew licenses globally via a One-Touch workflow.
  • ★ Wrote code for hostname-key implementation on global Ab>Initio hosts that reduced over a month of manual efforts to a few minutes via centralized license management
  • ★ Pioneered process improvements leading to decreased incident count (Emulating the Co>Operating system install to cut the setup time, generating customized reports, multiplexer/Kerberos session layer maintenance, instantiated tomcat startup and more)
  • ★ Involved in a sub-project that aims to automate the startup of essential application services through systemd/init.d shutdown snapshots that will lead to 50+ hours of efforts saved per week post-reboot/patching making the hosts highly available
  • ★ Involved in high impact, critical project deliverables as well as onboarding new middleware integration (Ex - CyberArk integration, Disaster Recovery)
  • ★ Involved in process improvements like the migration of Ab Initio jobs to dynamic layouts, Hadoop cluster expansions, RHEL 7 in-place upgrades
  • ★ Deep Expertise in Linux, Middleware environment
  • ★ Recognized contributor to my project, awarded as Best Debutant for Q3, 2019
  • Skills:
  • + Shell Scripting
  • + Python
  • + Perl
  • + RHEL administration
  • + Ab Initio administration
middleware engineeringautomated buildspasswordless SSH connectivityprocess improvementsMiddleware EngineeringAutomation

Engineer Trainee

Sep 2018Dec 2018 · 3 mos

  • ★ Unit: Cloud, Infrastructure, and Security
  • ★ Attended corporate training courses in:
  • + Python
  • + SQL
  • + Powershell Scripting
  • + Red Hat Enterprise Linux
  • + Microsoft Active Directory
  • + Microsoft Exchange
  • + Rights Management Services
  • + Windows Deployment Services
  • + Computer Networks
  • + Design Thinking
  • + Management & Organizational skills

Ipastore.me

Co-Founder & Systems Architect

Apr 2012Sep 2013 · 1 yr 5 mos · Worldwide

  • Worked on a substitute for Installous and Apptrackr when the parent forum Hackulous was shut down. ipastore was an iOS application/website that provided third party applications and trials for iOS devices. The team consisted of 6 members. My responsibilities included:
  • ★ Managing the backend (nginx, haproxy, php, varnish, mariadb, redis and memcached configurations across our master servers)
  • ★ Building/replicating the hosting stack
  • ★ Scaling the web application via database optimization & caching, application layer caching and edge content delivery (+1 million page views per day)
  • ★ Automating On and Off-site backups to Amazon Glacier for application and databases
  • ★ Implementing and testing caching techniques (edge, fastcgi, application opcode and database caches)
  • ★ System Security & Stability
  • ★ Managing iptables and nginx block rules to mitigate DDOS, brute-force and throttle suspicious attempts
  • ★ Penetration testing
  • ★ Tweaking system parameters for high performance and high availability
  • ★ Forum interaction
  • ★ Mail Server (Exim4, Postfix)
  • ★ DNS (Cloudflare) and much more
  • ★ Managing distributed Linux servers
backend managementdatabase optimizationsystem securityscaling web applicationsSystems ArchitectureWeb Development

Education

SVKM's Narsee Monjee Institute of Management Studies (NMIMS)

Diploma

Jul 2022Jul 2023

UPES

Bachelor of Technology

Jan 2014Jan 2018

Stackforce found 100+ more professionals with Site Reliability Engineering & Cloud Computing

Explore similar profiles based on matching skills and experience