S

Shashank Mathur

SRE (Site Reliability Engineer)

Mumbai, Maharashtra, India14 yrs 11 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Over 10 years of experience in Site Reliability Engineering.
  • Expert in Kubernetes and automation for high-performance solutions.
  • Proven track record in troubleshooting and debugging complex systems.
Stackforce AI infers this person is a Site Reliability Engineer specializing in SaaS infrastructure and automation.

Contact

Skills

Core Skills

Site Reliability EngineeringAutomationSystems AdministrationSystems EngineeringWeb Hosting

Other Skills

KubernetesDockerMonitoringScriptingPythonGoPuppetNagiosIcingaLinuxLAMPApacheNginxMySQLPostgreSQL

About

Staff Site Reliability Engineer with over 10 years of experience designing, building, and operating large-scale distributed systems. My areas of expertise include: •Kubernetes: Designing, deploying, and managing distributed microservices web applications on container platforms like Docker to deliver high-performance solutions. •Automation: Streamlining processes and enhancing productivity by leveraging tools such as Python and Go for scripting and configuration management (e.g., Puppet). •Observability: Ensuring optimal system performance by implementing metrics, monitoring, and alerting strategies to identify and address issues proactively. •Troubleshooting and Debugging: Strong analytical and troubleshooting skills with the ability to debug complex issues and system outages. Skilled at identifying root causes and implementing both short-term and long-term remediation. •Teamwork and Collaboration: Excellent communication and interpersonal skills with a track record of working cross-functionally to solve complex technical challenges. Able to mentor and guide other engineers to help strengthen team and organizational knowledge. •Networking: Solid understanding of protocols (HTTP, TCP), web technologies (webservers, load balancers), and network architecture to troubleshoot and optimize connectivity.

Experience

14 yrs 11 mos
Total Experience
4 yrs 11 mos
Average Tenure
10 yrs 6 mos
Current Experience

Opentable

Staff Site Reliability Engineer

Nov 2015Present · 10 yrs 6 mos · Mumbai Area, India

  • Develop and maintain scalable infrastructure components and tools for infrastructure monitoring.
  • Evaluating and adding support for new operations tools.
  • Development & enhancement of monitoring tools to manage services and applications developed and used by OpenTable.
  • Manage availability, latency, scalability and efficiency of OpenTable’s services by engineering reliability into software and systems
  • Respond to and resolve emergent service problems; build tools and automation to prevent problem recurrence
  • Review and influence new and evolving design, architecture, standards, and methods for operating services and systems
  • Participate in software and system performance analysis and tuning, service capacity planning and demand forecasting
  • Using dynamic programming/scripting languages & tools , such as Ruby, Python, Shell to architect, implement and integrate build software and productivity tools
KubernetesDockerAutomationMonitoringScriptingPython+3

Directi

Senior Systems Administrator

Jul 2013Nov 2015 · 2 yrs 4 mos · Mumbai Area, India

  • Monitoring the stability of servers using tools like Nagios, Icinga, Ganglia and other internal tools.
  • Automation and implementation of permanent resolutions to prevent outages / downtimes.
  • Script and code tools for automation and efficient management of sites/products.
  • Handle incident response, troubleshooting and fix for various product/services.
  • Handle escalations as per policies/procedures.
  • Puppet configuration management.
  • Managing products using Linux and Linux application stacks (LAMP, Postgres, MySQL, etc)
NagiosIcingaAutomationLinuxPuppetSystems Administration

Gigapros networks, llc

Systems Engineer

Jun 2011Jul 2013 · 2 yrs 1 mo · Jabalpur Area, India

  • Deploying new servers to be used for web hosting and other applications implementing LAMP stack..
  • Configuring and managing servers running Web servers(Apache, Nginx) DNS servers(named, powerdns), Mail Servers(sendmail, postfix), Databases (MySQL, postgresql), and services like FTP, Dovecot, etc.
  • Scrutinizing servers and dealing with server downs, service failures and resource shortages, high load, high disk space, spamming and DOS attacks
  • Implementing & administering various security and performance enhancements such as mod_ security, CSF/LFD, IPTables, Etc.
LAMPApacheNginxMySQLPostgreSQLSecurity Enhancements+2

Education

freeCodeCamp

Full Stack Web Development Certification — Computer Software Engineering

Jan 2015Jan 2016

Rajiv Gandhi Prodyogiki Vishwavidyalaya

B.E. — Information Technology

Jan 2007Jan 2011

Stackforce found 100+ more professionals with Site Reliability Engineering & Automation

Explore similar profiles based on matching skills and experience