Karthick Selvam

Data Engineer

Bengaluru, Karnataka, India7 yrs 11 mos experience
Highly Stable

Key Highlights

  • 15+ years in IT infrastructure and data engineering
  • Expert in building scalable data pipelines
  • Proven track record in cloud migration strategies
Stackforce AI infers this person is a Cloud Data Engineering expert with a strong focus on Big Data and DevOps.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingBig Data EngineeringData ProcessingDevopsCi/cdInfrastructure AutomationNetwork ManagementTechnical TrainingNetworking

Other Skills

AWS: GlueAnsibleApache AirflowApache BeamApache FlinkApache FlumeApache Hadoop (HDFS, YARN, MapReduce)Apache HiveApache KafkaApache OozieApache Spark (PySpark, Scala)Apache Spark (PySpark, Spark SQL, Spark Streaming)AthenaAzure: Data FactoryBitbucket

About

I transform raw data into scalable solutions. With 15+ years in IT infrastructure and deep specialization in data engineering, I design and optimize robust data pipelines that power analytics and decision-making. Core Data Engineering Expertise: Built petabyte-scale data lakes using Hadoop ecosystem (HDFS, Hive, Spark) Developed real-time streaming pipelines with Kafka and Spark Streaming Automated ETL workflows at scale using Airflow and cloud-native tools Optimized query performance in data warehouses (Redshift, BigQuery) Implemented CI/CD for data pipelines using Jenkins and GitOps Technical Highlights: Reduced data processing costs by 35% through Spark optimization Migrated on-prem Hadoop to cloud-native solutions (AWS EMR, Dataproc) Designed schema evolution strategies for evolving data needs Implemented data quality frameworks with Great Expectations Containerized data applications using Docker and Kubernetes Training & Knowledge Sharing: Conducted 50+ workshops on big data technologies Authored internal playbooks for data engineering best practices Mentored junior engineers in distributed systems concepts I thrive at the intersection of data infrastructure and business value. Let's connect to discuss data pipeline architecture, performance optimization, or cloud migration strategies.

Experience

7 yrs 11 mos
Total Experience
1 yr 8 mos
Average Tenure
--
Current Experience

Rps consulting pvt. ltd.

Data Engineering SME/Trainer (Contractor)(Cognizant)

Mar 2019Apr 2025 · 6 yrs 1 mo · Bengaluru, Karnataka, India · Hybrid

  • ☁️ Multi-Cloud Data Engineering Leadership
  • Designed, deployed, and optimized large-scale data pipelines across AWS, Azure, and GCP, leveraging services like:
  • AWS: Glue, EMR, Redshift, Kinesis, Lambda, S3, Athena
  • Azure: Data Factory, Synapse, Databricks, Blob Storage, HDInsight
  • GCP: BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage
  • Reduced data processing costs by 35% by implementing auto-scaling and serverless architectures across cloud platforms.
  • 🛠 Big Data & Distributed Processing
  • Architected Spark (PySpark/Scala) and Hadoop (HDFS, Hive, HBase) solutions for batch/real-time analytics, improving processing speeds by 50%.
  • Developed Apache Beam pipelines for unified batch/streaming workflows, deployed on Dataflow (GCP) and Flink (AWS/Azure).
  • Automated ETL workflows using Airflow (MWAA, Cloud Composer) with 500+ DAGs managing 10TB+ daily data.
  • ⚡ CI/CD & Infrastructure as Code (IaC)
  • Built end-to-end CI/CD pipelines for data workflows using GitHub Actions, Jenkins, and Terraform, reducing deployment time by 60%.
  • Implemented DataOps practices with automated testing (Great Expectations), monitoring (Grafana/Prometheus), and alerting.
  • 🎓 Training & Knowledge Sharing (SME & Trainer)
  • Conducted 500+ workshops upskilling 500+ engineers on AWS/Azure/GCP Data Engineering, Spark, Airflow, and CI/CD.
  • Authored internal playbooks and best practice guides for cloud data engineering, adopted company-wide.
  • Served as technical reviewer for cloud certification programs (AWS/Azure Data Engineer, GCP Professional Data Engineer).
  • 📈 Key Business Impact
  • Migrated legacy on-prem Hadoop to cloud-native (AWS EMR + Azure Databricks), saving lots in infra costs.
  • Led a cross-cloud data mesh initiative, enabling real-time analytics for 10+ business units.
AWS: GlueEMRRedshiftKinesisLambdaS3+20

Iiht ltd

2 roles

Data Engineering Trainer(Corporate)

Promoted

Jul 2018Mar 2019 · 8 mos · Bengaluru, Karnataka, India · On-site

  • 🛠️ Hadoop & Spark Ecosystem Expertise
  • Designed and optimized large-scale data processing pipelines using Apache Hadoop (HDFS, YARN, MapReduce) and Apache Spark (PySpark, Spark SQL, Spark Streaming).
  • Improved batch processing speeds by 40% by tuning Spark jobs (partitioning, caching, executor configurations).
  • Built real-time analytics solutions using Spark Streaming and Kafka, processing 5TB+ daily data with sub-second latency.
  • 📂 Data Storage & Query Optimization
  • Managed Hive data warehouses with 20+ TB datasets, optimizing HQL queries and partitioning strategies.
  • Integrated HBase for low-latency NoSQL access, reducing read times by 30% for key customer-facing applications.
  • Implemented Apache Parquet/ORC formats for storage efficiency, cutting costs by 25%.
  • 🔌 Ecosystem Tooling & Automation
  • Automated ETL workflows using Apache Oozie and Airflow, orchestrating 200+ jobs with SLA monitoring.
  • Developed custom Spark UDFs and Hive functions to streamline complex data transformations.
  • Set up Kerberos-secured Hadoop clusters with Ranger for fine-grained access control.
  • 📈 Scalability & Performance
  • Migrated legacy MR jobs to Spark, reducing runtime from hours to minutes.
  • Scaled Hadoop clusters from 10 to 100+ nodes, ensuring 99.9% uptime for critical pipelines.
  • Led performance benchmarking for Spark vs. Tez vs. Flink, documenting best practices for the team.
  • 🎓 Knowledge Sharing
  • Trained 150+ Batches for Induction Level and Experience Level on Hadoop/Spark internals, tuning, and debugging.
  • Authored internal wikis on troubleshooting "NameNode bottlenecks" and "Spark OOM errors."
Apache Spark (PySpark, Spark SQL, Spark Streaming)Apache Hadoop (HDFS, YARN, MapReduce)Big Data EngineeringApache HiveHBaseImpala+8

Technical Trainer

Apr 2014Apr 2015 · 1 yr · Coimbatore Area, India

  • 🔧 Infrastructure Automation & Configuration Management
  • Automated large-scale infrastructure using SaltStack, leveraging Jinja templating to manage 200+ servers with zero configuration drift.
  • Designed dynamic Salt States for OS hardening (aligned with RHCSA/RHCE standards), reducing security vulnerabilities by 45%.
  • Orchestrated network automation workflows (integrated with CCNA/CCNP routing protocols) to deploy VLANs and firewall rules across hybrid environments.
  • Integrated SaltStack with Git for version-controlled infrastructure, enabling audit trails and rollback capabilities.
  • 🎓 Enterprise Technical Training Leadership
  • Developed and delivered certification-focused programs for:
  • Networking: CCNA/CCNP Routing & Switching (hands-on labs with GNS3/Packet Tracer)
  • Linux: RHCSA/RHCE (customized kernel tuning & systemd scenarios)
  • Security: CEHv7 (Wireshark traffic analysis, penetration testing)
  • Increased certification pass rates by 30% through targeted lab simulations and troubleshooting drills.
  • Built virtual training labs using VMware ESXi and SaltStack Cloud, reducing setup time by 60%.
  • 📡 Network & Linux Expertise
  • Applied CCNP-level routing (OSPF, BGP) and switching (VLANs, STP) concepts to design realistic training topologies.
  • Automated RHEL deployments (RHCSA/RHCE) using SaltStack, including SELinux policies and kickstart configurations.
  • Conducted Wireshark deep-dives to teach packet analysis for security and network performance tuning.
  • 📊 Metrics-Driven Training Optimization
  • Implemented skills-gap analysis tools, leading to a 25% improvement in course effectiveness.
  • Authored 50+ lab guides with SaltStack automation snippets for repeatable learning environments.
SaltStackGitCCNACCNPLinuxInfrastructure Automation+1

Benchmark india private limited

Devops Engineer

Apr 2015Mar 2019 · 3 yrs 11 mos · Coimbatore Area, India · On-site

  • 🛠️ Version Control & CI/CD Pipelines
  • Managed multi-repository ecosystems using Git, Subversion, Bitbucket, and GitLab, enabling 200+ developers to collaborate seamlessly.
  • Designed end-to-end CI/CD pipelines with Jenkins, GitLab CI/CD, and Bitbucket Pipelines, reducing deployment time by 70%.
  • Implemented branching strategies (GitFlow, Trunk-Based) and automated code reviews, cutting merge conflicts by 60%.
  • 🐳 Containerization & Orchestration
  • Containerized 50+ microservices using Docker, reducing deployment overhead by 40%.
  • Orchestrated scalable clusters with Kubernetes (EKS/GKE/AKS), achieving 99.95% uptime for production workloads.
  • Automated rolling updates and canary deployments, minimizing downtime during releases.
  • ⚙️ Infrastructure as Code (IaC) & Configuration Management
  • Automated cloud/server provisioning using Terraform and Ansible, deploying 500+ servers with zero manual intervention.
  • Managed configuration drift at scale using Chef/Puppet/SaltStack, ensuring 100% consistency across environments.
  • Built self-healing infrastructures with Ansible Playbooks, reducing ops tickets by 35%.
  • 📦 Package Management & Artifact Repositories
  • Standardized dependency management with npm, Maven, pip, and Helm.
  • Hosted internal artifact repositories (Nexus, JFrog Artifactory), securing 10,000+ builds.
  • 🔍 Monitoring, Logging & Alerting
  • Deployed Prometheus + Grafana for real-time metrics, reducing MTTR by 50%.
  • Centralized logs with ELK Stack (Elasticsearch, Logstash, Kibana) for 10TB+ daily logs.
  • Set up PagerDuty/Slack alerts for critical incidents, improving SLA compliance to 99.9%.
  • 🧪 Testing & Security Automation
  • Integrated SonarQube and OWASP ZAP into pipelines, blocking 100+ vulnerabilities pre-production.
  • Automated Selenium/JUnit testing, increasing test coverage from 60% to 90%.
  • 📈 Key Business Impact
  • Migrated legacy SVN to Git, improving developer productivity by 30%.
  • Trained 40+ engineers on DevOps best practices, accelerating team onboarding.
GitSubversionBitbucketGitLabJenkinsGitLab CI/CD+13

Mazenet

L3 Instructor(CCNA)

Apr 2010May 2011 · 1 yr 1 mo · Sallem

  • 🎓 Certification-Focused Training Delivery
  • Designed and delivered 120+ interactive workshops for CCNA & CCNP Routing & Switching candidates, achieving 92% exam pass rates (vs. industry avg. of ~70%).
  • Developed custom lab scenarios mirroring real-world CCIE-level challenges, including:
  • Advanced Routing: OSPFv3, BGP path selection, IPv6 migration
  • Switching: Layer 3 EtherChannel, VTP pruning, STP optimization
  • Troubleshooting: Methodologies for RIP/EIGRP convergence issues
  • 🛠️ Hands-On Lab Engineering
  • Built virtual topologies using CML (Cisco Modeling Labs), GNS3, and physical gear (Catalyst 3850/ISR 4331), reducing lab setup time by 50%.
  • Created "Trouble Ticket" exercises simulating network outages, improving students’ diagnostic speed by 40%.
  • 📚 Curriculum Development & Innovation
  • Authored 25+ lab manuals with annotated Wireshark captures for key protocols (e.g., BGP Keepalives, OSPF LSAs).
  • Integrated Cisco DevNet basics (Python + NETCONF) into CCNP training, future-proofing students for automation roles.
  • 📈 Performance Tracking & Coaching
  • Implemented personalized learning plans using Cisco’s NetAcad analytics, helping struggling students improve scores by 35%.
  • Mentored 5 junior trainers, standardizing teaching methodologies across the organization.
  • 🏆 Student Success Highlights
  • Trained 300+ professionals, with 85% securing promotions/new roles within 6 months of certification.
  • Recognized as "Top Rated Trainer" (2012) based on post-course feedback (4.9/5 avg. rating).
VMwareLinuxTechnical TrainingNetworking

Aircel limited

POA Assistant

Apr 2005Mar 2007 · 1 yr 11 mos · Comibatore

  • 📄 Document Management & Digitization
  • Scanned, indexed, and archived 500+ customer documents daily with 99.8% accuracy, ensuring compliance with data retention policies.
  • Spearheaded the transition from physical to digital records, reducing document retrieval time by 40%.
  • 🖥️ Data Entry & System Maintenance
  • Uploaded and maintained 10,000+ customer records in the company’s database, ensuring real-time accessibility for cross-functional teams.
  • Automated repetitive data entry tasks using Excel macros, improving departmental efficiency by 25%.

Education

Veltech Multitech Dr.Rangarajan Dr.Sakunthala Engineering College

Master's Degree — MCA

Jan 2014Jan 2016

Bishop Ambrose College of Arts and Science

Bachelor's Degree — BCA

Jan 2007Jan 2010

Neelambal Subramaniam Higher Secondary School

High School — Maths & Biology

Jan 2004Jan 2005

Neelambal Subramaniam Hr Sec School

SSLC — English

Jan 2002Jan 2003

Stackforce found 100+ more professionals with Data Engineering & Cloud Computing

Explore similar profiles based on matching skills and experience