Aneesh Pulickal Karunakaran

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India21 yrs 1 mo experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in managing large scale Hadoop clusters.
Proven track record in building resilient systems.
Strong background in capacity planning and performance tuning.

Stackforce AI infers this person is a Site Reliability Engineer with expertise in managing large-scale data systems.

Contact

Skills

Core Skills

Distributed SystemsMicroservices ArchitectureSite Reliability EngineeringCapacity PlanningHadoop ManagementOperations

Other Skills

Chaos/Resilience FrameworkSLA/SLOObservabilityApache HadoopPrestoPerformance TuningAutomationHDFSYARNMonitoringHadoop EcosystemShell ScriptingApacheUnix Shell ScriptingTCP/IP

About

• Define and implement SLA/SLO • Experience with microservices architecture • Manage large scale Hadoop cluster ecosystems, including design, capacity planning, cluster setup, performance tuning and ongoing monitoring • Designing Physical Architecture of more scalable and manageable complex webservers. • Develop Business Continuity plan for the cluster or server farm. • Capacity Planning. • Creating automation tools for tasks which requires manual attention. • Strong knowledge of Linux Operating System • Python programming language: Worked in as a team member on various projects assigned, with ongoing enhancement of the knowledge level

Experience

21 yrs 1 mo

Total Experience

3 yrs 6 mos

Average Tenure

6 yrs

Current Experience

Atlassian

Principal Engineer

Jun 2020 – Present · 6 yrs · Bengaluru

Building Chaos/Resilience Framework for Atlassian.

Chaos/Resilience FrameworkDistributed SystemsMicroservices Architecture

Uber

Sr Software Engineer

Mar 2019 – May 2020 · 1 yr 2 mos · Bengaluru Area, India

Work as the lead SRE in Uber Marketplace Platform(MP) SRE team. Joined as the first SRE and successfully bootstrapped a team of 6+ members. Involved in the hiring, training, mentoring of the team members from the inception.
Implemented SLA/SLO for Marketplace Platform Services.
Platform level and Application level Capacity testing and planning
Improving observability of Marketplace Platform.

SLA/SLOCapacity PlanningObservabilitySite Reliability Engineering

Sr Site Reliability Engineer

Sep 2016 – Mar 2019 · 2 yrs 6 mos · Bengaluru Area, India

Managing 100PB+ Apache Hadoop Clusters and Presto clusters at Linkedin.
Responsible for Hadoop Performance, reliability, capacity planning and Monitoring
Support other Hadoop Echo systems such as Apache Hive, Pig, Spark,etc
Automation of various tasks including cluster management tools, capacity planning tool etc.
Working with multiple configuration Management systems such as bcf2, salt etc.
Active role in planning and implementation of Preso at Linkedin.

Apache HadoopPrestoPerformance TuningAutomationHadoop ManagementSite Reliability Engineering

Inmobi

Tech Lead, Operations

Jul 2014 – Sep 2016 · 2 yrs 2 mos · Bangalore

Primary ownership of Grid Platform and Data Streaming/Messaging platform
o Have deep level operational understanding of HDFS and YARN
o Hands on experience with other Grid components Hbase,Zookeeper,Oozie and Falcon
o Work closely with platform dev team, actively takes part in discussions on improving cluster stability, adding new features, configuration tunings, operability improvements, ongoing issues etc.
o Implemented Namenode and ResourceManager High availability
o Implemented cgroup in yarn to control and govern CPU resource utilization across multiple tasks
o Capacity Planning and cluster augmentation
o Troubleshooting issues related to hadoop ecosystems.
o Troubleshooting issues related to Linux OS, Hardware etc
o Graphing of Hadoop jmx metrics using Grafana and Graphite.
o Own inmobi messaging/streaming platform, that constitute of scribe messaging
service and conduit
Hadoop Upgrades
o Planned and executed Cloudera(CDH4) to HortonWorks(HDP) 2.2.4 ( Hadoop 2.6.0) upgrade. Worked closely with dev, QA, other stakeholders and were able to complete this task without any surprises.
o Takes care of minor version upgrades (patches, bug fixes) and configuration changes across grid clusters
Hadoop Monitoring
o Graphing of hdfs and yarn jms metrics using graphite and grafana.
o Graphing of system metrics using collectd, graphite and grafana
o Use Nagios for sending oncall alerts
HDFS/yarn Tuning
Very good understanding of HDFS and YARN configurations

HDFSYARNCapacity PlanningMonitoringHadoop ManagementOperations

Yahoo

Tech Lead

Dec 2007 – Jul 2014 · 6 yrs 7 mos · Bengaluru Area, India

Managing Yahoo advertising applications using the Yahoo Hadoop eco system
Work with cross-functional team to onboard new pipelines
Capacity Planning and conducting Load test on hardware.
Primary ownership of Yahoo Image search crawlers and thumbnails
Prepared and executed migration plan for upgrades like FreeBSD to RHEL Migration, PHP 4 to PHP5 Migration, Migrating Legacy component to Yahoo Standard component.
Configuration and management of RHEL hosts which serve Yahoo Image Search web traffic.
Performance tuning for Image Search web servers and backend clusters running on RHEL.
Creating automation tools for tasks which requires manual attention on daily basis.
Implemented Business Continuity Plan for multiple components.