Vikash Kumar

Software Engineer

Bengaluru, Karnataka, India11 yrs 4 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Upgraded 120 EKS clusters with zero downtime.
  • Achieved 15% cost reduction using Graviton instances.
  • Designed metrics pipeline for 75 microservices.
Stackforce AI infers this person is a Site Reliability Engineer with expertise in SaaS and Fintech environments.

Contact

Skills

Core Skills

Site Reliability EngineeringKubernetesAwsTerraformAutomation

Other Skills

AnsibleApache KafkaBMC Control-MBackstageBuildkiteCC++CrowdStrikeDjangoDockerEKSElasticSearchElasticsearchGitGitHub

About

Site Reliability Engineer experienced with various cloud like AWS, GCP, Infrastructure monitoring and metrics collection, alerting, Kafka, Elasticsearch administration, Linux based operating systems.

Experience

11 yrs 4 mos
Total Experience
2 yrs 3 mos
Average Tenure
4 yrs 1 mo
Current Experience

Twilio

Senior Software Engineer

May 2022Present · 4 yrs 1 mo · Bangalore Urban · Remote

  • Began in the Infra Compute SRE team and currently contributing to the Twilio Segment Developer Platform team (InfraEnablement):
  • 1. Successfully upgraded ~120 EKS clusters from Kubernetes 1.21 to 1.23, encompassing more than 5,000 nodes, with zero downtime or SEVs.
  • 2. Enabled the use of Graviton instances within our EKS clusters, achieving a ~15% cost reduction compared to x86 instances while maintaining equivalent performance.
  • 3. Led the migration of services from ECS to EKS, ensuring a seamless transition.
  • 4. Enhanced Terraform initialisation speed by 50% through targeted optimisation of Atlantis configurations.
  • 5. Developed and implemented a comprehensive pipeline for managing Terraform modules in Buildkite Package, a dedicated registry, utilising Buildkite.
  • 6. Successfully deployed CrowdStrike Falcon Sensor across all Segment nodes by implementing it as a daemonset within Kubernetes. This approach ensured continuous monitoring and protection of each node against potential security threats, leveraging Falcon's capabilities to detect and mitigate vulnerabilities in real-time.
  • 7. Developed Backstage templates to automate the creation of GitHub repositories for the golden terraform modules, ensuring consistency across the organisation.
TerraformKubernetesAWSBuildkiteBackstageEKS+4

Workspan

Site Reliability Engineer II

Jul 2020Apr 2022 · 1 yr 9 mos · Bengaluru, Karnataka, India · Remote

  • 1. Led and streamlined release management, overseeing major, minor, and hotfix deployments.
  • 2. Enhanced system reliability by improving metrics, monitoring, and error reporting.
  • 3. Automated repetitive workflows to optimise efficiency.
  • 4. Played a key role in achieving SOC2 compliance by implementing robust processes.
  • 5. Integrated ScoutSuite for vulnerability scanning and established a static code analysis framework using SonarQube to improve code quality and security.
  • 6. Designed and implemented Jenkins pipelines to streamline automation across various development and operational tasks.
Release ManagementMetrics MonitoringAutomationSOC2 ComplianceJenkinsSonarQube+1

Exotel techcom private limited

2 roles

Software Engineer - II

Promoted

Nov 2019Jun 2020 · 7 mos

  • 1. Designed and implemented a metrics pipeline using Telegraf, rsyslog, Apache Kafka, Logstash, Elasticsearch, and Kibana to monitor approximately 75 microservices.
  • 2. Developed visualizations and dashboards for monitoring in Kibana, and configured alerting mechanisms to proactively address issues.
  • 3. Actively participated in DRI (on-call support) to troubleshoot and resolve incidents efficiently.
  • 4. Mentored junior engineers and new team members, accelerating their onboarding process and enhancing their productivity.
  • 5. Transitioned to the Billing team, where I contributed to feature enhancements and the development of new functionalities for the billing system.
  • 6. Built a service for aggregating billing events from multiple AWS accounts, reconciling them, and generating customer invoices.
TelegrafApache KafkaLogstashElasticsearchKibanaSite Reliability Engineering

Site Reliability Engineer

Dec 2018Oct 2019 · 10 mos

Societe generale global solution centre

Associate

Nov 2017Oct 2018 · 11 mos · Bangalore

  • The project involves about deployment of the release, configuration management, release management. Creating work flows using control M workload automation. Database reloading.

Infosys

Senior System Engineer

Aug 2014Oct 2017 · 3 yrs 2 mos · Bengaluru Area, India · On-site

  • Project: Technical Operations - Cross-Functional Automation (Dec 2014 - Aug 2016), Bangalore
  • Client: Goldman Sachs
  • Technologies: Python, SQL Database, SVN
  • Overview: Led a comprehensive assessment of the client’s existing environment to deliver end-to-end runbook automation solutions across multiple infrastructure platforms, including Network, Windows, Linux, Storage, Backup, Database, and Voice, managing over 50,000 servers. Successfully implemented extensive automation for all operational runbooks.
  • Project: Wealth Management - Automation, Monitoring & Instrumentation (Sept 2016 - Nov 2017), Bangalore
  • Client: Morgan Stanley
  • Technologies: Python, Django, SQL, In-house Monitoring Tools, Splunk, Netcool, ServiceNow
  • Overview: Automated routine tasks and established continuous monitoring and instrumentation for Unix & Windows hosts, MQs, NAS shares, URLs, and log files. Managed daily incident resolution and developed a Django-based dashboard for the L1 team to monitor and analyze alert details effectively.
PythonSQLSVNAutomation

Education

Institute Of Engineering and Management

Bachelor of Technology (B.Tech.) — Computer Science and Engineering

Jan 2010Jan 2014

Sree Ayyappa Public School

Higher Secondary — PCM

Stackforce found 100+ more professionals with Site Reliability Engineering & Kubernetes

Explore similar profiles based on matching skills and experience