Shashank Mathur

SRE (Site Reliability Engineer)

Mumbai, Maharashtra, India14 yrs 11 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Over 10 years of experience in Site Reliability Engineering.
Expert in Kubernetes and automation for high-performance solutions.
Proven track record in troubleshooting and debugging complex systems.

Stackforce AI infers this person is a Site Reliability Engineer specializing in SaaS infrastructure and automation.

Contact

Skills

Core Skills

Site Reliability EngineeringAutomationSystems AdministrationSystems EngineeringWeb Hosting

Other Skills

KubernetesDockerMonitoringScriptingPythonGoPuppetNagiosIcingaLinuxLAMPApacheNginxMySQLPostgreSQL

About

Staff Site Reliability Engineer with over 10 years of experience designing, building, and operating large-scale distributed systems. My areas of expertise include: •Kubernetes: Designing, deploying, and managing distributed microservices web applications on container platforms like Docker to deliver high-performance solutions. •Automation: Streamlining processes and enhancing productivity by leveraging tools such as Python and Go for scripting and configuration management (e.g., Puppet). •Observability: Ensuring optimal system performance by implementing metrics, monitoring, and alerting strategies to identify and address issues proactively. •Troubleshooting and Debugging: Strong analytical and troubleshooting skills with the ability to debug complex issues and system outages. Skilled at identifying root causes and implementing both short-term and long-term remediation. •Teamwork and Collaboration: Excellent communication and interpersonal skills with a track record of working cross-functionally to solve complex technical challenges. Able to mentor and guide other engineers to help strengthen team and organizational knowledge. •Networking: Solid understanding of protocols (HTTP, TCP), web technologies (webservers, load balancers), and network architecture to troubleshoot and optimize connectivity.

Experience

14 yrs 11 mos

Total Experience

4 yrs 11 mos

Average Tenure

10 yrs 6 mos

Current Experience

Opentable

Staff Site Reliability Engineer

Nov 2015 – Present · 10 yrs 6 mos · Mumbai Area, India

Develop and maintain scalable infrastructure components and tools for infrastructure monitoring.
Evaluating and adding support for new operations tools.
Development & enhancement of monitoring tools to manage services and applications developed and used by OpenTable.
Manage availability, latency, scalability and efficiency of OpenTable’s services by engineering reliability into software and systems
Respond to and resolve emergent service problems; build tools and automation to prevent problem recurrence
Review and influence new and evolving design, architecture, standards, and methods for operating services and systems
Participate in software and system performance analysis and tuning, service capacity planning and demand forecasting
Using dynamic programming/scripting languages & tools , such as Ruby, Python, Shell to architect, implement and integrate build software and productivity tools

KubernetesDockerAutomationMonitoringScriptingPython+3

Directi

Senior Systems Administrator

Jul 2013 – Nov 2015 · 2 yrs 4 mos · Mumbai Area, India

Monitoring the stability of servers using tools like Nagios, Icinga, Ganglia and other internal tools.
Automation and implementation of permanent resolutions to prevent outages / downtimes.
Script and code tools for automation and efficient management of sites/products.
Handle incident response, troubleshooting and fix for various product/services.
Handle escalations as per policies/procedures.
Puppet configuration management.
Managing products using Linux and Linux application stacks (LAMP, Postgres, MySQL, etc)

NagiosIcingaAutomationLinuxPuppetSystems Administration

Gigapros networks, llc

Systems Engineer

Jun 2011 – Jul 2013 · 2 yrs 1 mo · Jabalpur Area, India

Deploying new servers to be used for web hosting and other applications implementing LAMP stack..
Configuring and managing servers running Web servers(Apache, Nginx) DNS servers(named, powerdns), Mail Servers(sendmail, postfix), Databases (MySQL, postgresql), and services like FTP, Dovecot, etc.
Scrutinizing servers and dealing with server downs, service failures and resource shortages, high load, high disk space, spamming and DOS attacks
Implementing & administering various security and performance enhancements such as mod_ security, CSF/LFD, IPTables, Etc.

LAMPApacheNginxMySQLPostgreSQLSecurity Enhancements+2