Senthil Kumar Balaguru

Software Engineer

Bengaluru, Karnataka, India16 yrs 3 mos experience

Key Highlights

Expert in Big Data technologies and Hadoop ecosystem.
Significant contributions to Apache Spark and other open-source projects.
Proven track record in managing large-scale Hadoop clusters.

Stackforce AI infers this person is a Big Data Engineer with extensive experience in Hadoop and open-source contributions.

Contact

Skills

Core Skills

Open Source ContributionApache SparkHadoop Ecosystem ManagementBig Data Engineering

Other Skills

SolrKnoxHudiSparkTroubleshootingPatch CreationHadoopHDFSYarnMapReduceHivePigImpalaFlumeKafka

Experience

16 yrs 3 mos

Total Experience

2 yrs 7 mos

Average Tenure

2 yrs 1 mo

Current Experience

Acceldata

Senior Staff Software Engineer

May 2024 – Present · 2 yrs 1 mo · Greater Bengaluru Area · On-site

Visa

Staff Software Engineer

Aug 2022 – May 2024 · 1 yr 9 mos · Bengaluru, Karnataka, India

Huawei technologies india

Senior System Architect

Sep 2021 – Aug 2022 · 11 mos · Bangalore Urban, Karnataka, India

Contributing to Open Source - Apache Spark

Apache SparkOpen Source Contribution

Cloudera

2 roles

Senior Software Engineer

Apr 2020 – Sep 2021 · 1 yr 5 mos

Working as Spark Backline Engineer
1. Troubleshooting Spark issues
2. Create patches to Cloudera Spark distribution
3. Contributing to Open Source - Apache Spark

SparkTroubleshootingPatch CreationApache Spark

Senior Customer Operations Engineer

Oct 2016 – Mar 2020 · 3 yrs 5 mos

I work on several of the Hadoop eco-system components
Hdfs, Yarn, MapReduce, Spark, Hive, Pig, Impala, Solr, Flume, Kafka, Sqoop, ClouderaManager etc.
Security integration - Kerberos/AD, Sentry, Encryption - over the wire and at rest
Cloudera Data Science Workbench
Roles and Responsibilities:
Working with customers to ensure they get most out of their Hadoop deployments and the Cloudera Enterprise Datahub.
Managing the Cloudera Hadoop clusters, which ranges from 1-node cluster to 250+ nodes cluster, for various customers across the globe. As a Customer Operations Engineer, I am working closely with customers to understand their Cloudera cluster issues and providing resolutions for them.
Installing / configuring / tuning Hadoop clusters depending on the clients needs and environment.
Resident Architect at some of the major organizations, helping them strategize their Big Data journey, use-case identification/implementation
Translate business requirements into technical requirements
Architect and Deploy the Cloudera distribution for Apache Hadoop
Security deployment and ensure my customers meet their compliance requirements
Data Ingestion/Migration Design and Implementation
ETL Design and Implementation
Application Architecture Design
Use-case development
Performance tuning and Cluster architecture, deployment
Train new joinees

HadoopHDFSYarnMapReduceSparkHive+7

The apache software foundation

Open Source Software

Oct 2016 – Jan 2026 · 9 yrs 3 mos

Open Source Contributor for Apache Spark, Solr, Knox and Hudi

Apache SparkSolrKnoxHudiOpen Source Contribution

Banca sella, chennai

Big Data Technical Specialist

Nov 2015 – Sep 2016 · 10 mos

Synopsis:
To set-up Search engine on Hadoop Stack using Apache Solr and Cloudera Search Manager
Responsibilities :
Working on a live 6 nodes Hadoop cluster running CDH 5.4.0
Worked with highly structured and semi structured data of 4 TB in size (12 TB with replication
factor of 3)
Extracted the data from Oracle into HDFS using Sqoop.
Created and worked Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
Extensive experience in writing Pig (version 0.12) scripts to transform raw data from several data sources
into forming baseline data.
Developed Hive (version 1.10) scripts for end user / analyst requirements to perform ad-hoc analysis
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and
External tables in Hive to optimize performance
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and
how does it translate to MapReduce jobs.
Experience in using Sequence files, ORCFile,and HAR file formats.
Developed Oozie workflow for scheduling and orchestrating the ETL process
Good working knowledge of Amazon Web Service components like EC2
Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
Very good experience in monitoring and managing the Hadoop cluster

Apache SolrHadoopSqoopPigHiveOozie+1

Ibm india pvt ltd

Senior Developer

May 2011 – Oct 2015 · 4 yrs 5 mos · Greater Chennai Area

Working as Big-data- Hadoop Developer
Responsibilities:
Worked on a live 16 nodes Hadoop cluster running Apache Hadoop 1.2.1
Worked with highly structured and semi structured data of 5 to 6 TB in size (15 to 18 TB with replication factor of 3)
Extracted the data from Oracle into HDFS using Sqoop.
Created and worked Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
Extensive experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
Experience in using Sequence files, RCFile,and HAR file formats.
Developed Oozie workflow for scheduling and orchestrating the ETL process
Good working knowledge of Amazon Web Service components like EC2
Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
Good working knowledge of HBase
Good Working knowledge of Tableau, R-Language and Apache Mahout too