B

Basavaiah Thambara

SRE (Site Reliability Engineer)

Milpitas, California, United States14 yrs 11 mos experience
Highly Stable

Key Highlights

  • 18+ years in reliability engineering and distributed systems.
  • Proven track record in cost-saving optimizations.
  • Recognized leader with patents and conference presentations.
Stackforce AI infers this person is a SaaS Infrastructure Engineer with extensive experience in distributed systems and reliability engineering.

Contact

Skills

Core Skills

Distributed SystemsSite Reliability EngineeringDatabase Administration

Other Skills

RustGoLangAutomationKubernetesEspressoBlob storageBlock storagePerformance EngineeringMySQL OptimizationAuto Rebalancing ArchitectureRack AwarenessCapacity EstimationPostmortem ReviewsMySQLPerformance Tuning

About

Staff Software Engineer, Site Reliability with 18+ years of experience building and leading reliability engineering initiatives in large-scale, mission-critical environments. Proven expertise in distributed systems, cloud-native infrastructure, observability, automation, and performance engineering. Adept at designing highly available systems, reducing operational toil, and leading cross-functional initiatives that improve resiliency, scalability, and cost efficiency. Recognized technical leader with patents and conference presentations in reliability engineering and distributed databases.Core SkillsProgramming Languages: Bash, Python, Rust, Go, Java and C/C++SDLC experience: Design/Coding, Design/code reviews, unit testing, integration/performance testing, CI/CD (Github Actions); safe automated rollouts (canary/blue-green/rollbacks), custom automations for release pipelines, jira.Large-scale distributed systems: design/operate services at scale.Systems & IaC: Linux, Docker, Kubernetes, Ansible, Terraform, AWS, GCP, and Azure, .Workflow Engines: Airflow and TemporalDatabases: SQL, MySQL, Oracle, PostgreSQL, Kafka, Redis, MongoDB, Espresso (LinkedIn distributed document store), TiDBSRE fundamentals: SLIs/SLOs/error budgets mindset, production reliability practices, incident/change management lifecycle, Observability with metrics/logs/tracing/alerting, participation in 12×7 on-call rotations, driving on-call toil reduction efforts.Observability & Monitoring: Prometheus, Grafana, ELK, Nagios, custom logging/metrics pipelines, Azure Data Explorer (Kusto)Storage: Distributed storage/DFS, K8s storage (PV/PVC/StorageClass/CSI), I/O profiling & throughput/latency tuningLeadership & Collaboration: Building teams; cross-org technical leadership; mentoring/coaching; execution through others; clear written/verbal communication; partnering with product/infra teams.

Experience

14 yrs 11 mos
Total Experience
5 yrs 1 mo
Average Tenure
5 mos
Current Experience

Apple

Site Reliability Engineer

Dec 2025Present · 5 mos · California, United States · On-site

  • taking care of Voldemort (a distributed key value store)

Linkedin

3 roles

Leadership | Staff Site Reliability Engineer

Mar 2020Present · 6 yrs 2 mos

  • Storage systems
  • Blob storage system(Rust) - Build highly scalable blob storage system serving both HDD and SSD based storage across LinkedIn
  • Block storage(GoLang) - build high performance block storage with spdk libraries and RDMA over Ethernet
  • Automated Espresso storage node capacity management and reduced human toil.
  • Design and build automation to migrate espresso(140 clusters) to Kubernetes
  • designed and implemented a novel bootstrap method which brought down node swap time from 2days to 6hrs, reduced cluster expansion time by 50%
  • designed and implemented a project to tech refresh 100 espresso clusters to improve performance, save costs
  • done homogenization of h/w in clusters to bring savings of around $xM
  • Automated capacity management which helped in cluster capacity re-certification and brought savings $xxM
  • Bootstrapped a MySQL server engineering team in SREs and built multiple features, developed a bulk transaction throttling feature to prevent incidents due to bulk tranactions
RustGoLangAutomationKubernetesEspressoBlob storage+3

Staff Software Engineer Reliability

Promoted

Oct 2015Mar 2020 · 4 yrs 5 mos

  • Espresso is a distributed online, fault tolerant, document store built on top of MySQL which scales horizontally, Most of the site facing databases are hosted on espresso, 100s of clusters,tens of thousands of machines, storing data in Petabytes. Managing Espresso for almost 3yrs, accomplishments include
  • Proposed and implemented MySQL Optimization which saved $M
  • Migrated whole espresso infrastructure to an auto rebalancing architecture which saved $$M and made cluster expansions and shrinks simple
  • Worked with developers, designed and Implemented rack awareness in espresso which makes the system resilient to top of rack switch failures
  • Closely worked with developers, designed and documented espresso cluster expansions and shrinks, database moves , also done more than 50 cluster expansions
  • Identified inefficiencies in backups,restores,secondary indexes affecting cluster migrations, expansions, shrinks and operability of clusters and proposed solutions to solve them
  • Reviewed failure and recovery scenarios and proposed solutions to save SSD cost and avoid data loss in the system
  • capacity estimation to meet traffic targets
  • Consultation on complicated outages, schema designs, postmortem reviews and system design reviews and technical guidance to dev and SRE management.
  • Presented in internal and external SRECon conferences and Local Kafka meetups.
  • gave multiple internal trainings, mentored SREs
  • In this role prior to espresso charter, was part of database administration(DBA) team and contributed to the following projects
  • Slideshare database UTF8 migration with negligible downtime
  • MySQL Oneclick ETL to Hadoop and MySQL Change data capture to Kafka
  • Ideation of Query Analyzer to capture MySQL queries in real time
  • MySQL and Oracle schema and query tuning for multiple databases, data center build outs
  • Successfully filed two patents on incremental data audits
MySQL OptimizationAuto Rebalancing ArchitectureRack AwarenessCapacity EstimationPostmortem ReviewsDistributed Systems+1

Senior Software Engineer Database Reliability

Mar 2013Sep 2015 · 2 yrs 6 mos

  • As a Lead database engineer, support around 100 MySQL database servers and 200 Oracle database servers. Majority of the oracle databases and few of the MySQL databases are site facing and rest all databases are related to internal tools.I also support teradata databases,some of the major works include
  • Lot of MySQL database audit and reviews and improved the performance of the databases
  • automated goldengate parameter generation, which reduced 70% of manual work required for Oracle goldengate active-active setup
  • Participate in on-call rotation (12x7 support) for MySQL and Oracle
  • Perform capacity planning and business continuity planning for MySQL databases
  • Built multiple Oracle databases and setup golden gate bi-directional replication
  • Architect new solutions for MySQL databases and new projects
  • Implemented MySQL active-active setup using Oracle goldengate replication
  • Implemented a framework to get incremental changes from MySQL and merge with full dumps on Hadoop using Tungsten replicator - automated using python, pig(embedded pig in python) and azkaban jobs
MySQL AdministrationOracle Database SupportCapacity PlanningBusiness Continuity PlanningDatabase Administration

Yahoo!

2 roles

Lead Systems Engineer(Database)

Promoted

Apr 2011Feb 2013 · 1 yr 10 mos · Bangalore,India

  • Promoted from Senior Database Engineer to Lead Database Engineer.My team became central mysql team for whole Yahoo! In this role i supported more than 50 MySQL databases running on 1000+ servers including all the environments in addition to what we were already supporting.List includes databases across lot of product groups like Yahoo audience, communications and communities, search, corporate telecom (Asterisk), video search, Content platform and Company wide SharedDb high availability platform. Following are the few products to name
  • RightMedia Exchange (RMX)/NGD - Display advertising platform
  • Themis – tiny ad platform serving yahoo groups
  • VideoSearch – video tag classification data store
  • RichMedia – small ad platform
  • Ionix – creative testing platform
  • Dapper – Smart ads platform
  • Shareddb – Yahoo shared MySQL infrastructure
  • Mail – Mail antispam data
  • Koprol – spacial networking service
  • Asterisk – internal telecom databases
  • SDS – data pipeline meta information
  • Nova – Hadoop meta store using mysql cluster
  • Yahoo voices – Associated content
  • I had taken complete ownership of multiple projects of varying complexities and drove them flawlessly,improved multiple systems.Always lead the team with commitment and hard work,some of the major activities
  • Schema,query reviews and performance tuning
  • Perform capacity planning and business continuity planning
  • Design optimal backup and recovery procedure specific to each database
  • Facilitate installation, configuration, upgrades and schema deployment with
  • minimum downtime
  • Interact with development and service engineering teams to ensure the overall application design is optimal
  • automate crucial maintenance tasks using shell script/perl,write stored procedures,events and triggers
  • Additionally, provided MySQL trainings for multiple teams,few company wide trainings
  • Trained another new team member for MySQL DBA role
MySQLDatabase Performance TuningBackup and Recovery ProceduresAutomationDatabase Administration

Senior Systems Engineer(Database)

Jun 2007Mar 2011 · 3 yrs 9 mos · Bangalore,India

  • Joined Yahoo! as a technical Senior Database Engineer after my M.Tech.Started working on MySQL administration and managed multiple MySQL databases at scale, also worked on few Oracle databases as well.Here are the major projects i worked on
  • RightMedia Exchange (RMX) - is an online advertising marketplace that enables advertisers, publishers, and ad networks to efficiently trade digital media. Yahoo! acquired it in 2007 and it was the high priority revenue-generating display advertising platform for yahoo.I had a pleasure of working on RightMedia MySQL databases since its acquisition and involved in almost all the projects from migration to Performance improvement of the databases. RightMedia had around 25 MySQL databases across 250+ servers
  • Themis – tiny ad platform serving yahoo groups 30 mysql servers
  • VideoSearch – video tag classification data store,25 mysql servers
  • As a senior MySQL DBA I worked extensively with all project development and operations teams to provide best mysql database support.Some of the major works include
  • Setup Nagios monitoring for whole rightmedia MySQL databases
  • Participated in on-call rotation (12x7 support)
  • Upgraded Rightmedia databases from MySQL 5.0 to MySQL 5.1,disk upgrades for multiple mysql clusters
  • Migrated legacy Rightmedia mysql databases to Yahoo! stack which involved building multiple datacenters,cutovers,failover,fail-back
  • Decommissioning of all mysql databases from one datacenter
  • Implementation of load balancing, high availability and BCP Certifications for MySQL databases
  • Oracle database RAC upgrades in qa/dev environments and few production upgrades and automation of duplicate alert elimination
  • Automated binlog purge/backup,table partition management(add/purge) and multiple mysql instance management
  • Have done character set migration from latin1 to utf8 and database merge of Themis database cluster
  • Additionally, mentored a new joinee as MySQL DBA and guided an intern in his internship project
MySQL AdministrationPerformance ImprovementHigh AvailabilityLoad BalancingDatabase Administration

Education

National Institute of Technology Karnataka

M.Tech — Computer Science and Engineering

Jan 2005Jan 2007

Jawaharlal Nehru Technological University

B.Tech — Computer Science and Engineering

Jan 2001Jan 2005

Stackforce found 100+ more professionals with Distributed Systems & Site Reliability Engineering

Explore similar profiles based on matching skills and experience