Kumar Sonu

DevOps Engineer

United Kingdom7 yrs 5 mos experience

Key Highlights

  • Expert in Infrastructure Automation and Site Reliability Engineering.
  • Proven track record in large-scale system design and management.
  • Strong background in cloud technologies including AWS and Azure.
Stackforce AI infers this person is a SaaS and Fintech Infrastructure Engineer with expertise in automation and reliability.

Contact

Skills

Core Skills

Site Reliability EngineeringInfrastructure AutomationSystem DesignDistributed SystemsSoftware Development

Other Skills

ARM templatesAWSAnsibleAzure cloudBashBringing observability & monitoring into the systemBuilding and managing distributed applicationsC-networkingChefDockerElasticsearchHelmInfrastructure and Compliance AutomationJavaKafka

About

Self-initiated Engineer interested in scalability, reliability, performance, distributed systems, visualizations and machine learning, who believe in experimentation, and in results backed by data from automated processes. Specialties --- - Infrastructure and Compliance Automation - Building and managing distributed application on PaaS, Cloud(AWS, Azure) and Container Orchestrator(Mesos, Kubernetes). - Bringing observability & monitoring into the system - Large-scale system design - Writing clean/maintainable/efficient code - Reducing technical debt and operational load Github --- https://github.com/sahilsk Linkedin --- https://www.linkedin.com/in/sahilsk/

Experience

Meta

Senior Production Engineer

Jun 2022Mar 2025 · 2 yrs 9 mos · London, England, United Kingdom

Microsoft

Site Reliability Engineer-II

Jul 2017Dec 2018 · 1 yr 5 mos · Hyderabad, Telangana, India

  • Project: Microsoft Social Engagement(MSE) (3 months)
  • Role and Responsibilities:
  • Writing infrastructure as a code (IaaC), Software design and Architecture decision with developers
  • CI-CD with visual studio, and deployments & releases across various environment
  • debugging live site issues (on-call)
  • building diagnostic tools & dashboard
  • working on POCs with new technologies to align them with infra use-cases
  • Tools & Technologies: Azure cloud, Docker, Kubernetes, Helm, Prometheus, Linux(ubuntu), Puppet, Ansible, Powershell, Java|Python|NodeJs, ARM templates and more.
  • Project: Microsoft Dynamics CRM (13 months)
  • Role And Responsibilities:
  • Improving service reliability by pro-active monitoring
  • Deeper root cause analysis on outages using 5-WHYs
  • Tools & Technologies: Azure cloud, Windows Server, Powershell, DSC, SQL Servers
Azure cloudDockerKubernetesHelmPrometheusLinux(ubuntu)+9

Ola (ani technologies pvt. ltd)

Production Engineer -2

Aug 2015Jul 2017 · 1 yr 11 mos

  • ➢ Architect systems, infrastructure and platforms using Linux and Amazon web services to support applications in the communications space.
  • ➢ System and infrastructure configuration management, cost management , security, capacity planning, and stress testing.
  • ➢ Automate and implement permanent solutions to prevent outages/downtimes.
  • ➢ Design and implement continuous integration and continuous delivery platforms.
  • ➢ Design and implement platforms for monitoring, log processing, metrics collection and data visualization.
  • ➢ Script and code tools (in shell/node.js/python etc) for automation and efficiency.
  • Notable tools and technologies: Mesos Framework(marathon,chronos), docker, sensu monitoring/alerting, newrelic instrumentation, kafka, rabbitmq, elasticsearch, heka, python, aws, bash, opscode chef, ansible, prometheus(alert manager)
  • Project: Olamoney( formerly zipcash)
  • Role And Responsibilities:
  • Writing IaaS code for the entire Olamoney stack using opsworks and Chef.
  • Telemetry and alerting coverage over 200+ micro-services, databases, workers and cron jobs
  • Managing and improving Apache Mesos|Marathon backed infrastructure with security and compliance first mindset.
  • Automating micro-service deployment in versioned(git) and JIRA controlled fashion
  • Datacenter migration from Singapore to India in a month time, driven by CISA and RBI compliance on payment banks
  • Supporting Kafka, elasticsearch and redis etc.
LinuxAWSDockerMesos FrameworkAnsiblePrometheus+6

Delhi college of engineering

2 roles

Cloud Render-Farm

Jan 2011May 2011 · 4 mos

  • My Final Year B.E Project:
  • A typical 1 minute animation movie used to take many hours (12 hrs or even a day) in then P-4 powered processors. To reduce this rendering time a distributed rendering solution was proposed.
  • Idea was to make rendering of heavy animation raw file fast by dividing it in small chunks and distributing it among machines. Leveraging distributed computing, processing them parallely & then making it avail to the stakeholders.
  • I worked alone on this project. This project is important in my life because it was my first step towards distributed computing.
  • Technologies and Tools Used: Pub-Sub, C-networking, RoR(with BDD using rspec), NFS(network file system), mysql , html/css, Photoshop, Blender etc.

Publication Head - CSI-DCE 2010-11

Jan 2010May 2011 · 1 yr 4 mos

  • I was responsible for CSI-DCE social and media marketing.
  • This includes :-
  • designing pamphlets, flyers & poster
  • development and maintainance of CSI web portals.
  • Assigning tasks to juniors

Navigate delhi

Co-Founder

Apr 2010Apr 2011 · 1 yr · New Delhi Area, India

  • NAVIGATEDELHI.COM was an unprecedented venture to address all the traffic related problems of commuters of NCR. It was a start up with a vision to provide real time information on jams, route information, diversions and much more through various media so that to make the pertinent information accessible to everybody.

Education

Delhi College of Engineering

Bachelor of Engineering (B.E.) — Computer Science

Jan 2007Jan 2011

Stanford University

TECHNOLOGY ENTREPRENEURSHIP

Jan 2012Jan 2013

Stackforce found 100+ more professionals with Site Reliability Engineering & Infrastructure Automation

Explore similar profiles based on matching skills and experience