Karthick Selvam

Data Engineer

Bengaluru, Karnataka, India7 yrs 11 mos experience

Highly Stable

Key Highlights

15+ years in IT infrastructure and data engineering
Expert in building scalable data pipelines
Proven track record in cloud migration strategies

Stackforce AI infers this person is a Cloud Data Engineering expert with a strong focus on Big Data and DevOps.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingBig Data EngineeringData ProcessingDevopsCi/cdInfrastructure AutomationNetwork ManagementTechnical TrainingNetworking

Other Skills

AWS: GlueAnsibleApache AirflowApache BeamApache FlinkApache FlumeApache Hadoop (HDFS, YARN, MapReduce)Apache HiveApache KafkaApache OozieApache Spark (PySpark, Scala)Apache Spark (PySpark, Spark SQL, Spark Streaming)AthenaAzure: Data FactoryBitbucket

About

I transform raw data into scalable solutions. With 15+ years in IT infrastructure and deep specialization in data engineering, I design and optimize robust data pipelines that power analytics and decision-making. Core Data Engineering Expertise: Built petabyte-scale data lakes using Hadoop ecosystem (HDFS, Hive, Spark) Developed real-time streaming pipelines with Kafka and Spark Streaming Automated ETL workflows at scale using Airflow and cloud-native tools Optimized query performance in data warehouses (Redshift, BigQuery) Implemented CI/CD for data pipelines using Jenkins and GitOps Technical Highlights: Reduced data processing costs by 35% through Spark optimization Migrated on-prem Hadoop to cloud-native solutions (AWS EMR, Dataproc) Designed schema evolution strategies for evolving data needs Implemented data quality frameworks with Great Expectations Containerized data applications using Docker and Kubernetes Training & Knowledge Sharing: Conducted 50+ workshops on big data technologies Authored internal playbooks for data engineering best practices Mentored junior engineers in distributed systems concepts I thrive at the intersection of data infrastructure and business value. Let's connect to discuss data pipeline architecture, performance optimization, or cloud migration strategies.

Experience

7 yrs 11 mos

Total Experience

1 yr 8 mos

Average Tenure

Current Experience

Rps consulting pvt. ltd.

Data Engineering SME/Trainer (Contractor)(Cognizant)

Mar 2019 – Apr 2025 · 6 yrs 1 mo · Bengaluru, Karnataka, India · Hybrid

☁️ Multi-Cloud Data Engineering Leadership
Designed, deployed, and optimized large-scale data pipelines across AWS, Azure, and GCP, leveraging services like:
AWS: Glue, EMR, Redshift, Kinesis, Lambda, S3, Athena
Azure: Data Factory, Synapse, Databricks, Blob Storage, HDInsight
GCP: BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage
Reduced data processing costs by 35% by implementing auto-scaling and serverless architectures across cloud platforms.
🛠 Big Data & Distributed Processing
Architected Spark (PySpark/Scala) and Hadoop (HDFS, Hive, HBase) solutions for batch/real-time analytics, improving processing speeds by 50%.
Developed Apache Beam pipelines for unified batch/streaming workflows, deployed on Dataflow (GCP) and Flink (AWS/Azure).
Automated ETL workflows using Airflow (MWAA, Cloud Composer) with 500+ DAGs managing 10TB+ daily data.
⚡ CI/CD & Infrastructure as Code (IaC)
Built end-to-end CI/CD pipelines for data workflows using GitHub Actions, Jenkins, and Terraform, reducing deployment time by 60%.
Implemented DataOps practices with automated testing (Great Expectations), monitoring (Grafana/Prometheus), and alerting.
🎓 Training & Knowledge Sharing (SME & Trainer)
Conducted 500+ workshops upskilling 500+ engineers on AWS/Azure/GCP Data Engineering, Spark, Airflow, and CI/CD.
Authored internal playbooks and best practice guides for cloud data engineering, adopted company-wide.
Served as technical reviewer for cloud certification programs (AWS/Azure Data Engineer, GCP Professional Data Engineer).
📈 Key Business Impact
Migrated legacy on-prem Hadoop to cloud-native (AWS EMR + Azure Databricks), saving lots in infra costs.
Led a cross-cloud data mesh initiative, enabling real-time analytics for 10+ business units.

AWS: GlueEMRRedshiftKinesisLambdaS3+20

Iiht ltd

2 roles

Data Engineering Trainer(Corporate)

Promoted

Jul 2018 – Mar 2019 · 8 mos · Bengaluru, Karnataka, India · On-site

🛠️ Hadoop & Spark Ecosystem Expertise
Designed and optimized large-scale data processing pipelines using Apache Hadoop (HDFS, YARN, MapReduce) and Apache Spark (PySpark, Spark SQL, Spark Streaming).
Improved batch processing speeds by 40% by tuning Spark jobs (partitioning, caching, executor configurations).
Built real-time analytics solutions using Spark Streaming and Kafka, processing 5TB+ daily data with sub-second latency.
📂 Data Storage & Query Optimization
Managed Hive data warehouses with 20+ TB datasets, optimizing HQL queries and partitioning strategies.
Integrated HBase for low-latency NoSQL access, reducing read times by 30% for key customer-facing applications.
Implemented Apache Parquet/ORC formats for storage efficiency, cutting costs by 25%.
🔌 Ecosystem Tooling & Automation
Automated ETL workflows using Apache Oozie and Airflow, orchestrating 200+ jobs with SLA monitoring.
Developed custom Spark UDFs and Hive functions to streamline complex data transformations.
Set up Kerberos-secured Hadoop clusters with Ranger for fine-grained access control.
📈 Scalability & Performance
Migrated legacy MR jobs to Spark, reducing runtime from hours to minutes.
Scaled Hadoop clusters from 10 to 100+ nodes, ensuring 99.9% uptime for critical pipelines.
Led performance benchmarking for Spark vs. Tez vs. Flink, documenting best practices for the team.
🎓 Knowledge Sharing
Trained 150+ Batches for Induction Level and Experience Level on Hadoop/Spark internals, tuning, and debugging.
Authored internal wikis on troubleshooting "NameNode bottlenecks" and "Spark OOM errors."

Apache Spark (PySpark, Spark SQL, Spark Streaming)Apache Hadoop (HDFS, YARN, MapReduce)Big Data EngineeringApache HiveHBaseImpala+8

Technical Trainer

Apr 2014 – Apr 2015 · 1 yr · Coimbatore Area, India

🔧 Infrastructure Automation & Configuration Management
Automated large-scale infrastructure using SaltStack, leveraging Jinja templating to manage 200+ servers with zero configuration drift.
Designed dynamic Salt States for OS hardening (aligned with RHCSA/RHCE standards), reducing security vulnerabilities by 45%.
Orchestrated network automation workflows (integrated with CCNA/CCNP routing protocols) to deploy VLANs and firewall rules across hybrid environments.
Integrated SaltStack with Git for version-controlled infrastructure, enabling audit trails and rollback capabilities.
🎓 Enterprise Technical Training Leadership
Developed and delivered certification-focused programs for:
Networking: CCNA/CCNP Routing & Switching (hands-on labs with GNS3/Packet Tracer)
Linux: RHCSA/RHCE (customized kernel tuning & systemd scenarios)
Security: CEHv7 (Wireshark traffic analysis, penetration testing)
Increased certification pass rates by 30% through targeted lab simulations and troubleshooting drills.
Built virtual training labs using VMware ESXi and SaltStack Cloud, reducing setup time by 60%.
📡 Network & Linux Expertise
Applied CCNP-level routing (OSPF, BGP) and switching (VLANs, STP) concepts to design realistic training topologies.
Automated RHEL deployments (RHCSA/RHCE) using SaltStack, including SELinux policies and kickstart configurations.
Conducted Wireshark deep-dives to teach packet analysis for security and network performance tuning.
📊 Metrics-Driven Training Optimization
Implemented skills-gap analysis tools, leading to a 25% improvement in course effectiveness.
Authored 50+ lab guides with SaltStack automation snippets for repeatable learning environments.

SaltStackGitCCNACCNPLinuxInfrastructure Automation+1

Benchmark india private limited

Devops Engineer

Apr 2015 – Mar 2019 · 3 yrs 11 mos · Coimbatore Area, India · On-site

🛠️ Version Control & CI/CD Pipelines
Managed multi-repository ecosystems using Git, Subversion, Bitbucket, and GitLab, enabling 200+ developers to collaborate seamlessly.
Designed end-to-end CI/CD pipelines with Jenkins, GitLab CI/CD, and Bitbucket Pipelines, reducing deployment time by 70%.
Implemented branching strategies (GitFlow, Trunk-Based) and automated code reviews, cutting merge conflicts by 60%.
🐳 Containerization & Orchestration
Containerized 50+ microservices using Docker, reducing deployment overhead by 40%.
Orchestrated scalable clusters with Kubernetes (EKS/GKE/AKS), achieving 99.95% uptime for production workloads.
Automated rolling updates and canary deployments, minimizing downtime during releases.
⚙️ Infrastructure as Code (IaC) & Configuration Management
Automated cloud/server provisioning using Terraform and Ansible, deploying 500+ servers with zero manual intervention.
Managed configuration drift at scale using Chef/Puppet/SaltStack, ensuring 100% consistency across environments.
Built self-healing infrastructures with Ansible Playbooks, reducing ops tickets by 35%.
📦 Package Management & Artifact Repositories
Standardized dependency management with npm, Maven, pip, and Helm.
Hosted internal artifact repositories (Nexus, JFrog Artifactory), securing 10,000+ builds.
🔍 Monitoring, Logging & Alerting
Deployed Prometheus + Grafana for real-time metrics, reducing MTTR by 50%.
Centralized logs with ELK Stack (Elasticsearch, Logstash, Kibana) for 10TB+ daily logs.
Set up PagerDuty/Slack alerts for critical incidents, improving SLA compliance to 99.9%.
🧪 Testing & Security Automation
Integrated SonarQube and OWASP ZAP into pipelines, blocking 100+ vulnerabilities pre-production.
Automated Selenium/JUnit testing, increasing test coverage from 60% to 90%.
📈 Key Business Impact
Migrated legacy SVN to Git, improving developer productivity by 30%.
Trained 40+ engineers on DevOps best practices, accelerating team onboarding.

GitSubversionBitbucketGitLabJenkinsGitLab CI/CD+13

Mazenet

L3 Instructor(CCNA)

Apr 2010 – May 2011 · 1 yr 1 mo · Sallem

🎓 Certification-Focused Training Delivery
Designed and delivered 120+ interactive workshops for CCNA & CCNP Routing & Switching candidates, achieving 92% exam pass rates (vs. industry avg. of ~70%).
Developed custom lab scenarios mirroring real-world CCIE-level challenges, including:
Advanced Routing: OSPFv3, BGP path selection, IPv6 migration
Switching: Layer 3 EtherChannel, VTP pruning, STP optimization
Troubleshooting: Methodologies for RIP/EIGRP convergence issues
🛠️ Hands-On Lab Engineering
Built virtual topologies using CML (Cisco Modeling Labs), GNS3, and physical gear (Catalyst 3850/ISR 4331), reducing lab setup time by 50%.
Created "Trouble Ticket" exercises simulating network outages, improving students’ diagnostic speed by 40%.
📚 Curriculum Development & Innovation
Authored 25+ lab manuals with annotated Wireshark captures for key protocols (e.g., BGP Keepalives, OSPF LSAs).
Integrated Cisco DevNet basics (Python + NETCONF) into CCNP training, future-proofing students for automation roles.
📈 Performance Tracking & Coaching
Implemented personalized learning plans using Cisco’s NetAcad analytics, helping struggling students improve scores by 35%.
Mentored 5 junior trainers, standardizing teaching methodologies across the organization.
🏆 Student Success Highlights
Trained 300+ professionals, with 85% securing promotions/new roles within 6 months of certification.
Recognized as "Top Rated Trainer" (2012) based on post-course feedback (4.9/5 avg. rating).

VMwareLinuxTechnical TrainingNetworking

Aircel limited

POA Assistant

Apr 2005 – Mar 2007 · 1 yr 11 mos · Comibatore

📄 Document Management & Digitization
Scanned, indexed, and archived 500+ customer documents daily with 99.8% accuracy, ensuring compliance with data retention policies.
Spearheaded the transition from physical to digital records, reducing document retrieval time by 40%.
🖥️ Data Entry & System Maintenance
Uploaded and maintained 10,000+ customer records in the company’s database, ensuring real-time accessibility for cross-functional teams.
Automated repetitive data entry tasks using Excel macros, improving departmental efficiency by 25%.