Aneesh Pulickal Karunakaran

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India21 yrs 1 mo experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in managing large scale Hadoop clusters.
  • Proven track record in building resilient systems.
  • Strong background in capacity planning and performance tuning.
Stackforce AI infers this person is a Site Reliability Engineer with expertise in managing large-scale data systems.

Contact

Skills

Core Skills

Distributed SystemsMicroservices ArchitectureSite Reliability EngineeringCapacity PlanningHadoop ManagementOperations

Other Skills

Chaos/Resilience FrameworkSLA/SLOObservabilityApache HadoopPrestoPerformance TuningAutomationHDFSYARNMonitoringHadoop EcosystemShell ScriptingApacheUnix Shell ScriptingTCP/IP

About

• Define and implement SLA/SLO • Experience with microservices architecture • Manage large scale Hadoop cluster ecosystems, including design, capacity planning, cluster setup, performance tuning and ongoing monitoring • Designing Physical Architecture of more scalable and manageable complex webservers. • Develop Business Continuity plan for the cluster or server farm. • Capacity Planning. • Creating automation tools for tasks which requires manual attention. • Strong knowledge of Linux Operating System • Python programming language: Worked in as a team member on various projects assigned, with ongoing enhancement of the knowledge level

Experience

21 yrs 1 mo
Total Experience
3 yrs 6 mos
Average Tenure
6 yrs
Current Experience

Atlassian

Principal Engineer

Jun 2020Present · 6 yrs · Bengaluru

  • Building Chaos/Resilience Framework for Atlassian.
Chaos/Resilience FrameworkDistributed SystemsMicroservices Architecture

Uber

Sr Software Engineer

Mar 2019May 2020 · 1 yr 2 mos · Bengaluru Area, India

  • Work as the lead SRE in Uber Marketplace Platform(MP) SRE team. Joined as the first SRE and successfully bootstrapped a team of 6+ members. Involved in the hiring, training, mentoring of the team members from the inception.
  • Implemented SLA/SLO for Marketplace Platform Services.
  • Platform level and Application level Capacity testing and planning
  • Improving observability of Marketplace Platform.
SLA/SLOCapacity PlanningObservabilitySite Reliability Engineering

Linkedin

Sr Site Reliability Engineer

Sep 2016Mar 2019 · 2 yrs 6 mos · Bengaluru Area, India

  • Managing 100PB+ Apache Hadoop Clusters and Presto clusters at Linkedin.
  • Responsible for Hadoop Performance, reliability, capacity planning and Monitoring
  • Support other Hadoop Echo systems such as Apache Hive, Pig, Spark,etc
  • Automation of various tasks including cluster management tools, capacity planning tool etc.
  • Working with multiple configuration Management systems such as bcf2, salt etc.
  • Active role in planning and implementation of Preso at Linkedin.
Apache HadoopPrestoPerformance TuningAutomationHadoop ManagementSite Reliability Engineering

Inmobi

Tech Lead, Operations

Jul 2014Sep 2016 · 2 yrs 2 mos · Bangalore

  • Primary ownership of Grid Platform and Data Streaming/Messaging platform
  • o Have deep level operational understanding of HDFS and YARN
  • o Hands on experience with other Grid components Hbase,Zookeeper,Oozie and Falcon
  • o Work closely with platform dev team, actively takes part in discussions on improving cluster stability, adding new features, configuration tunings, operability improvements, ongoing issues etc.
  • o Implemented Namenode and ResourceManager High availability
  • o Implemented cgroup in yarn to control and govern CPU resource utilization across multiple tasks
  • o Capacity Planning and cluster augmentation
  • o Troubleshooting issues related to hadoop ecosystems.
  • o Troubleshooting issues related to Linux OS, Hardware etc
  • o Graphing of Hadoop jmx metrics using Grafana and Graphite.
  • o Own inmobi messaging/streaming platform, that constitute of scribe messaging
  • service and conduit
  • Hadoop Upgrades
  • o Planned and executed Cloudera(CDH4) to HortonWorks(HDP) 2.2.4 ( Hadoop 2.6.0) upgrade. Worked closely with dev, QA, other stakeholders and were able to complete this task without any surprises.
  • o Takes care of minor version upgrades (patches, bug fixes) and configuration changes across grid clusters
  • Hadoop Monitoring
  • o Graphing of hdfs and yarn jms metrics using graphite and grafana.
  • o Graphing of system metrics using collectd, graphite and grafana
  • o Use Nagios for sending oncall alerts
  • HDFS/yarn Tuning
  • Very good understanding of HDFS and YARN configurations
HDFSYARNCapacity PlanningMonitoringHadoop ManagementOperations

Yahoo

Tech Lead

Dec 2007Jul 2014 · 6 yrs 7 mos · Bengaluru Area, India

  • Managing Yahoo advertising applications using the Yahoo Hadoop eco system
  • Work with cross-functional team to onboard new pipelines
  • Capacity Planning and conducting Load test on hardware.
  • Primary ownership of Yahoo Image search crawlers and thumbnails
  • Prepared and executed migration plan for upgrades like FreeBSD to RHEL Migration, PHP 4 to PHP5 Migration, Migrating Legacy component to Yahoo Standard component.
  • Configuration and management of RHEL hosts which serve Yahoo Image Search web traffic.
  • Performance tuning for Image Search web servers and backend clusters running on RHEL.
  • Creating automation tools for tasks which requires manual attention on daily basis.
  • Implemented Business Continuity Plan for multiple components.
Hadoop EcosystemCapacity PlanningPerformance TuningHadoop ManagementOperations

Poornam info vision

Sr Software Engineer

Mar 2004Nov 2006 · 2 yrs 8 mos

Education

MES College of Engg,Kerala

B.Tech — Electronics and Instrumentation

Jan 1998Jan 2002

Stackforce found 100+ more professionals with Distributed Systems & Microservices Architecture

Explore similar profiles based on matching skills and experience