Karthick Selvam

Data Engineer

Bengaluru, Karnataka, India7 yrs 11 mos experience
Highly Stable

Key Highlights

  • 15+ years in IT infrastructure and data engineering
  • Expert in building scalable data pipelines
  • Proven track record in cloud migration strategies
Stackforce AI infers this person is a Cloud Data Engineering expert with a strong focus on Big Data and DevOps.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingBig Data EngineeringData ProcessingDevopsCi/cdInfrastructure AutomationNetwork ManagementTechnical TrainingNetworking

Other Skills

AWS: GlueAnsibleApache AirflowApache BeamApache FlinkApache FlumeApache Hadoop (HDFS, YARN, MapReduce)Apache HiveApache KafkaApache OozieApache Spark (PySpark, Scala)Apache Spark (PySpark, Spark SQL, Spark Streaming)AthenaAzure: Data FactoryBitbucket

About

I transform raw data into scalable solutions. With 15+ years in IT infrastructure and deep specialization in data engineering, I design and optimize robust data pipelines that power analytics and decision-making. Core Data Engineering Expertise: Built petabyte-scale data lakes using Hadoop ecosystem (HDFS, Hive, Spark) Developed real-time streaming pipelines with Kafka and Spark Streaming Automated ETL workflows at scale using Airflow and cloud-native tools Optimized query performance in data warehouses (Redshift, BigQuery) Implemented CI/CD for data pipelines using Jenkins and GitOps Technical Highlights: Reduced data processing costs by 35% through Spark optimization Migrated on-prem Hadoop to cloud-native solutions (AWS EMR, Dataproc) Designed schema evolution strategies for evolving data needs Implemented data quality frameworks with Great Expectations Containerized data applications using Docker and Kubernetes Training & Knowledge Sharing: Conducted 50+ workshops on big data technologies Authored internal playbooks for data engineering best practices Mentored junior engineers in distributed systems concepts I thrive at the intersection of data infrastructure and business value. Let's connect to discuss data pipeline architecture, performance optimization, or cloud migration strategies.

Experience

Rps consulting pvt. ltd.

Data Engineering SME/Trainer (Contractor)(Cognizant)

Mar 2019Apr 2025 · 6 yrs 1 mo · Bengaluru, Karnataka, India · Hybrid

  • ☁️ Multi-Cloud Data Engineering Leadership
  • Designed, deployed, and optimized large-scale data pipelines across AWS, Azure, and GCP, leveraging services like:
  • AWS: Glue, EMR, Redshift, Kinesis, Lambda, S3, Athena
  • Azure: Data Factory, Synapse, Databricks, Blob Storage, HDInsight
  • GCP: BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage
  • Reduced data processing costs by 35% by implementing auto-scaling and serverless architectures across cloud platforms.
  • 🛠 Big Data & Distributed Processing
  • Architected Spark (PySpark/Scala) and Hadoop (HDFS, Hive, HBase) solutions for batch/real-time analytics, improving processing speeds by 50%.
  • Developed Apache Beam pipelines for unified batch/streaming workflows, deployed on Dataflow (GCP) and Flink (AWS/Azure).
  • Automated ETL workflows using Airflow (MWAA, Cloud Composer) with 500+ DAGs managing 10TB+ daily data.
  • ⚡ CI/CD & Infrastructure as Code (IaC)
  • Built end-to-end CI/CD pipelines for data workflows using GitHub Actions, Jenkins, and Terraform, reducing deployment time by 60%.
  • Implemented DataOps practices with automated testing (Great Expectations), monitoring (Grafana/Prometheus), and alerting.
  • 🎓 Training & Knowledge Sharing (SME & Trainer)
  • Conducted 500+ workshops upskilling 500+ engineers on AWS/Azure/GCP Data Engineering, Spark, Airflow, and CI/CD.
  • Authored internal playbooks and best practice guides for cloud data engineering, adopted company-wide.
  • Served as technical reviewer for cloud certification programs (AWS/Azure Data Engineer, GCP Professional Data Engineer).
  • 📈 Key Business Impact
  • Migrated legacy on-prem Hadoop to cloud-native (AWS EMR + Azure Databricks), saving lots in infra costs.
  • Led a cross-cloud data mesh initiative, enabling real-time analytics for 10+ business units.
AWS: GlueEMRRedshiftKinesisLambdaS3+20

Iiht ltd

2 roles

Data Engineering Trainer(Corporate)

Promoted

Jul 2018Mar 2019 · 8 mos · Bengaluru, Karnataka, India · On-site

  • 🛠️ Hadoop & Spark Ecosystem Expertise
  • Designed and optimized large-scale data processing pipelines using Apache Hadoop (HDFS, YARN, MapReduce) and Apache Spark (PySpark, Spark SQL, Spark Streaming).
  • Improved batch processing speeds by 40% by tuning Spark jobs (partitioning, caching, executor configurations).
  • Built real-time analytics solutions using Spark Streaming and Kafka, processing 5TB+ daily data with sub-second latency.
  • 📂 Data Storage & Query Optimization
  • Managed Hive data warehouses with 20+ TB datasets, optimizing HQL queries and partitioning strategies.
  • Integrated HBase for low-latency NoSQL access, reducing read times by 30% for key customer-facing applications.
  • Implemented Apache Parquet/ORC formats for storage efficiency, cutting costs by 25%.
  • 🔌 Ecosystem Tooling & Automation
  • Automated ETL workflows using Apache Oozie and Airflow, orchestrating 200+ jobs with SLA monitoring.
  • Developed custom Spark UDFs and Hive functions to streamline complex data transformations.
  • Set up Kerberos-secured Hadoop clusters with Ranger for fine-grained access control.
  • 📈 Scalability & Performance
  • Migrated legacy MR jobs to Spark, reducing runtime from hours to minutes.
  • Scaled Hadoop clusters from 10 to 100+ nodes, ensuring 99.9% uptime for critical pipelines.
  • Led performance benchmarking for Spark vs. Tez vs. Flink, documenting best practices for the team.
  • 🎓 Knowledge Sharing
  • Trained 150+ Batches for Induction Level and Experience Level on Hadoop/Spark internals, tuning, and debugging.
  • Authored internal wikis on troubleshooting "NameNode bottlenecks" and "Spark OOM errors."
Apache Spark (PySpark, Spark SQL, Spark Streaming)Apache Hadoop (HDFS, YARN, MapReduce)Big Data EngineeringApache HiveHBaseImpala+8

Technical Trainer

Apr 2014Apr 2015 · 1 yr · Coimbatore Area, India

  • 🔧 Infrastructure Automation & Configuration Management
  • Automated large-scale infrastructure using SaltStack, leveraging Jinja templating to manage 200+ servers with zero configuration drift.
  • Designed dynamic Salt States for OS hardening (aligned with RHCSA/RHCE standards), reducing security vulnerabilities by 45%.
  • Orchestrated network automation workflows (integrated with CCNA/CCNP routing protocols) to deploy VLANs and firewall rules across hybrid environments.
  • Integrated SaltStack with Git for version-controlled infrastructure, enabling audit trails and rollback capabilities.
  • 🎓 Enterprise Technical Training Leadership
  • Developed and delivered certification-focused programs for:
  • Networking: CCNA/CCNP Routing & Switching (hands-on labs with GNS3/Packet Tracer)
  • Linux: RHCSA/RHCE (customized kernel tuning & systemd scenarios)
  • Security: CEHv7 (Wireshark traffic analysis, penetration testing)
  • Increased certification pass rates by 30% through targeted lab simulations and troubleshooting drills.
  • Built virtual training labs using VMware ESXi and SaltStack Cloud, reducing setup time by 60%.
  • 📡 Network & Linux Expertise
  • Applied CCNP-level routing (OSPF, BGP) and switching (VLANs, STP) concepts to design realistic training topologies.
  • Automated RHEL deployments (RHCSA/RHCE) using SaltStack, including SELinux policies and kickstart configurations.
  • Conducted Wireshark deep-dives to teach packet analysis for security and network performance tuning.
  • 📊 Metrics-Driven Training Optimization
  • Implemented skills-gap analysis tools, leading to a 25% improvement in course effectiveness.
  • Authored 50+ lab guides with SaltStack automation snippets for repeatable learning environments.
SaltStackGitCCNACCNPLinuxInfrastructure Automation+1

Benchmark india private limited

Devops Engineer

Apr 2015Mar 2019 · 3 yrs 11 mos · Coimbatore Area, India · On-site

  • 🛠️ Version Control & CI/CD Pipelines
  • Managed multi-repository ecosystems using Git, Subversion, Bitbucket, and GitLab, enabling 200+ developers to collaborate seamlessly.
  • Designed end-to-end CI/CD pipelines with Jenkins, GitLab CI/CD, and Bitbucket Pipelines, reducing deployment time by 70%.
  • Implemented branching strategies (GitFlow, Trunk-Based) and automated code reviews, cutting merge conflicts by 60%.
  • 🐳 Containerization & Orchestration
  • Containerized 50+ microservices using Docker, reducing deployment overhead by 40%.
  • Orchestrated scalable clusters with Kubernetes (EKS/GKE/AKS), achieving 99.95% uptime for production workloads.
  • Automated rolling updates and canary deployments, minimizing downtime during releases.
  • ⚙️ Infrastructure as Code (IaC) & Configuration Management
  • Automated cloud/server provisioning using Terraform and Ansible, deploying 500+ servers with zero manual intervention.
  • Managed configuration drift at scale using Chef/Puppet/SaltStack, ensuring 100% consistency across environments.
  • Built self-healing infrastructures with Ansible Playbooks, reducing ops tickets by 35%.
  • 📦 Package Management & Artifact Repositories
  • Standardized dependency management with npm, Maven, pip, and Helm.
  • Hosted internal artifact repositories (Nexus, JFrog Artifactory), securing 10,000+ builds.
  • 🔍 Monitoring, Logging & Alerting
  • Deployed Prometheus + Grafana for real-time metrics, reducing MTTR by 50%.
  • Centralized logs with ELK Stack (Elasticsearch, Logstash, Kibana) for 10TB+ daily logs.
  • Set up PagerDuty/Slack alerts for critical incidents, improving SLA compliance to 99.9%.
  • 🧪 Testing & Security Automation
  • Integrated SonarQube and OWASP ZAP into pipelines, blocking 100+ vulnerabilities pre-production.
  • Automated Selenium/JUnit testing, increasing test coverage from 60% to 90%.
  • 📈 Key Business Impact
  • Migrated legacy SVN to Git, improving developer productivity by 30%.
  • Trained 40+ engineers on DevOps best practices, accelerating team onboarding.
GitSubversionBitbucketGitLabJenkinsGitLab CI/CD+13

Mazenet

L3 Instructor(CCNA)

Apr 2010May 2011 · 1 yr 1 mo · Sallem

  • 🎓 Certification-Focused Training Delivery
  • Designed and delivered 120+ interactive workshops for CCNA & CCNP Routing & Switching candidates, achieving 92% exam pass rates (vs. industry avg. of ~70%).
  • Developed custom lab scenarios mirroring real-world CCIE-level challenges, including:
  • Advanced Routing: OSPFv3, BGP path selection, IPv6 migration
  • Switching: Layer 3 EtherChannel, VTP pruning, STP optimization
  • Troubleshooting: Methodologies for RIP/EIGRP convergence issues
  • 🛠️ Hands-On Lab Engineering
  • Built virtual topologies using CML (Cisco Modeling Labs), GNS3, and physical gear (Catalyst 3850/ISR 4331), reducing lab setup time by 50%.
  • Created "Trouble Ticket" exercises simulating network outages, improving students’ diagnostic speed by 40%.
  • 📚 Curriculum Development & Innovation
  • Authored 25+ lab manuals with annotated Wireshark captures for key protocols (e.g., BGP Keepalives, OSPF LSAs).
  • Integrated Cisco DevNet basics (Python + NETCONF) into CCNP training, future-proofing students for automation roles.
  • 📈 Performance Tracking & Coaching
  • Implemented personalized learning plans using Cisco’s NetAcad analytics, helping struggling students improve scores by 35%.
  • Mentored 5 junior trainers, standardizing teaching methodologies across the organization.
  • 🏆 Student Success Highlights
  • Trained 300+ professionals, with 85% securing promotions/new roles within 6 months of certification.
  • Recognized as "Top Rated Trainer" (2012) based on post-course feedback (4.9/5 avg. rating).
VMwareLinuxTechnical TrainingNetworking

Aircel limited

POA Assistant

Apr 2005Mar 2007 · 1 yr 11 mos · Comibatore

  • 📄 Document Management & Digitization
  • Scanned, indexed, and archived 500+ customer documents daily with 99.8% accuracy, ensuring compliance with data retention policies.
  • Spearheaded the transition from physical to digital records, reducing document retrieval time by 40%.
  • 🖥️ Data Entry & System Maintenance
  • Uploaded and maintained 10,000+ customer records in the company’s database, ensuring real-time accessibility for cross-functional teams.
  • Automated repetitive data entry tasks using Excel macros, improving departmental efficiency by 25%.

Education

Veltech Multitech Dr.Rangarajan Dr.Sakunthala Engineering College

Master's Degree — MCA

Jan 2014Jan 2016

Bishop Ambrose College of Arts and Science

Bachelor's Degree — BCA

Jan 2007Jan 2010

Neelambal Subramaniam Higher Secondary School

High School — Maths & Biology

Jan 2004Jan 2005

Neelambal Subramaniam Hr Sec School

SSLC — English

Jan 2002Jan 2003

Stackforce found 100+ more professionals with Data Engineering & Cloud Computing

Explore similar profiles based on matching skills and experience