Senthil Kumar Balaguru

Software Engineer

Bengaluru, Karnataka, India16 yrs 3 mos experience

Key Highlights

  • Expert in Big Data technologies and Hadoop ecosystem.
  • Significant contributions to Apache Spark and other open-source projects.
  • Proven track record in managing large-scale Hadoop clusters.
Stackforce AI infers this person is a Big Data Engineer with extensive experience in Hadoop and open-source contributions.

Contact

Skills

Core Skills

Open Source ContributionApache SparkHadoop Ecosystem ManagementBig Data Engineering

Other Skills

SolrKnoxHudiSparkTroubleshootingPatch CreationHadoopHDFSYarnMapReduceHivePigImpalaFlumeKafka

Experience

16 yrs 3 mos
Total Experience
2 yrs 7 mos
Average Tenure
2 yrs 1 mo
Current Experience

Acceldata

Senior Staff Software Engineer

May 2024Present · 2 yrs 1 mo · Greater Bengaluru Area · On-site

Visa

Staff Software Engineer

Aug 2022May 2024 · 1 yr 9 mos · Bengaluru, Karnataka, India

Huawei technologies india

Senior System Architect

Sep 2021Aug 2022 · 11 mos · Bangalore Urban, Karnataka, India

  • Contributing to Open Source - Apache Spark
Apache SparkOpen Source Contribution

Cloudera

2 roles

Senior Software Engineer

Apr 2020Sep 2021 · 1 yr 5 mos

  • Working as Spark Backline Engineer
  • 1. Troubleshooting Spark issues
  • 2. Create patches to Cloudera Spark distribution
  • 3. Contributing to Open Source - Apache Spark
SparkTroubleshootingPatch CreationApache Spark

Senior Customer Operations Engineer

Oct 2016Mar 2020 · 3 yrs 5 mos

  • I work on several of the Hadoop eco-system components
  • Hdfs, Yarn, MapReduce, Spark, Hive, Pig, Impala, Solr, Flume, Kafka, Sqoop, ClouderaManager etc.
  • Security integration - Kerberos/AD, Sentry, Encryption - over the wire and at rest
  • Cloudera Data Science Workbench
  • Roles and Responsibilities:
  • Working with customers to ensure they get most out of their Hadoop deployments and the Cloudera Enterprise Datahub.
  • Managing the Cloudera Hadoop clusters, which ranges from 1-node cluster to 250+ nodes cluster, for various customers across the globe. As a Customer Operations Engineer, I am working closely with customers to understand their Cloudera cluster issues and providing resolutions for them.
  • Installing / configuring / tuning Hadoop clusters depending on the clients needs and environment.
  • Resident Architect at some of the major organizations, helping them strategize their Big Data journey, use-case identification/implementation
  • Translate business requirements into technical requirements
  • Architect and Deploy the Cloudera distribution for Apache Hadoop
  • Security deployment and ensure my customers meet their compliance requirements
  • Data Ingestion/Migration Design and Implementation
  • ETL Design and Implementation
  • Application Architecture Design
  • Use-case development
  • Performance tuning and Cluster architecture, deployment
  • Train new joinees
HadoopHDFSYarnMapReduceSparkHive+7

The apache software foundation

Open Source Software

Oct 2016Jan 2026 · 9 yrs 3 mos

  • Open Source Contributor for Apache Spark, Solr, Knox and Hudi
Apache SparkSolrKnoxHudiOpen Source Contribution

Banca sella, chennai

Big Data Technical Specialist

Nov 2015Sep 2016 · 10 mos

  • Synopsis:
  • To set-up Search engine on Hadoop Stack using Apache Solr and Cloudera Search Manager
  • Responsibilities :
  • Working on a live 6 nodes Hadoop cluster running CDH 5.4.0
  • Worked with highly structured and semi structured data of 4 TB in size (12 TB with replication
  • factor of 3)
  • Extracted the data from Oracle into HDFS using Sqoop.
  • Created and worked Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
  • Extensive experience in writing Pig (version 0.12) scripts to transform raw data from several data sources
  • into forming baseline data.
  • Developed Hive (version 1.10) scripts for end user / analyst requirements to perform ad-hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and
  • External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and
  • how does it translate to MapReduce jobs.
  • Experience in using Sequence files, ORCFile,and HAR file formats.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process
  • Good working knowledge of Amazon Web Service components like EC2
  • Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
  • Very good experience in monitoring and managing the Hadoop cluster
Apache SolrHadoopSqoopPigHiveOozie+1

Ibm india pvt ltd

Senior Developer

May 2011Oct 2015 · 4 yrs 5 mos · Greater Chennai Area

  • Working as Big-data- Hadoop Developer
  • Responsibilities:
  • Worked on a live 16 nodes Hadoop cluster running Apache Hadoop 1.2.1
  • Worked with highly structured and semi structured data of 5 to 6 TB in size (15 to 18 TB with replication factor of 3)
  • Extracted the data from Oracle into HDFS using Sqoop.
  • Created and worked Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
  • Extensive experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
  • Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
  • Experience in using Sequence files, RCFile,and HAR file formats.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process
  • Good working knowledge of Amazon Web Service components like EC2
  • Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
  • Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Good working knowledge of HBase
  • Good Working knowledge of Tableau, R-Language and Apache Mahout too
HadoopSqoopPigHiveOozieAWS+1

Infosys technologies ltd

Technology Analyst

Oct 2010Mar 2011 · 5 mos · Greater Chennai Area

  • Worked as C/C++ Developer in Banking Domain
HadoopSqoopPigHiveOozieAWS+1

Honeywell thro trigent

Software Engineer

Oct 2009Sep 2010 · 11 mos

  • Working as Software Engineer in Avionics(DO-178B) Domain

Education

Thiagarajar College of Engineering

Master of Engineering (M.Eng.) — Computer Science

Jan 2007Jan 2009

Tech/IT 2006 Dhanalakshmi College of Engineering, Anna University

B

Jan 2006Present

Govt High School

Stackforce found 100+ more professionals with Open Source Contribution & Apache Spark

Explore similar profiles based on matching skills and experience