Amal Soman

SRE (Site Reliability Engineer)

Toronto, Ontario, Canada7 yrs 8 mos experience

Highly StableAI Enabled

Key Highlights

Expert in architecting cloud infrastructures across multiple platforms.
Proven track record in implementing observability solutions.
Strong collaboration skills in DevOps and SRE environments.

Stackforce AI infers this person is a Cloud Infrastructure Architect with a focus on DevOps and Site Reliability Engineering.

Contact

amalsoman10@gmail.com LinkedIn

Skills

Core Skills

Site Reliability EngineeringGoogle Cloud Platform (gcp)Kubernetes

Other Skills

AkamaiAmazon EKSAmazon Web Services (AWS)AnsibleApacheApache KafkaApache MesosAzure DevOpsAzure Kubernetes Service (AKS)CC++CalicoChef.ioComputer Network OperationsComputer Networking

About

At the forefront of technological evolution, I architect robust cloud infrastructures and spearhead the automation of DevOps processes, also make systems reliable based on SRE principles. My proficiency in multiple clouds (AWS, Azure & GCP) ecosystems, coupled with my ability to craft resilient systems, has been instrumental in streamlining continuous integration and deployment workflows. My collaborative approach, grounded in a strong understanding of Kubernetes, Terraform, and containerization, supports the team in achieving operational excellence. Embracing a culture of innovation and efficiency, we are committed to delivering scalable solutions that drive the company forward.

Experience

7 yrs 8 mos

Total Experience

1 yr 6 mos

Average Tenure

Current Experience

Verticalscope inc.

Staff Engineer - Devops/SRE

Sep 2024 – Present · 1 yr 8 mos · Toronto, Ontario, Canada · Remote

Loblaw companies limited

3 roles

Staff Site Reliability Engineer

Jun 2024 – Aug 2024 · 2 mos

Senior Site Reliability Engineer

Promoted

Oct 2022 – May 2024 · 1 yr 7 mos

⇢ Architected and implemented an Observability Platform for Loblaw using Golang and defined the SRE principle based Observability with SLI, SLO and Error Budget at Org Level, which helped the company to identify many potential issues and get alerted.
⇢ Replaced the single instance Prometheus for time series data with Victoriametrics, which is fast, scalable, fast data ingestion, light-speed querying.
⇢ Develop IAC terraform code for GCP resources including GKE (Google Kubernetes Engine) + istio and build a seamless rollout using Gitlab Pipelines.
⇢ Played a key-role in moving the standalone application running in VM to GKE (Google Kubernetes Engine) using helm,gitlab pipelines, vault.
⇢ Improved performance of services with the help of Akamai CDN and built IAC with gitlab pipelines for version rollouts.
⇢ Collaborate with the team for MR reviews/feedbacks, System design, coding in Go, Python, Bash.
Improved the application observability by instrument using opentelemetry.

Amazon Web Services (AWS)Go (Programming Language)Site Reliability EngineeringOperating SystemsPythonGitOps+13

Site Reliability Engineer 2

Feb 2021 – Oct 2022 · 1 yr 8 mos

GitOpsGrafanaKubernetesTerraform

Phonepe (a walmart-flipkart company)

Site Reliability Engineer

Aug 2020 – Jan 2021 · 5 mos · India

PhonePe is a Walmart - Flipkart owned company dealing with millions of money transfers per day based in India.
I am part of the infrastructure SRE team, where we set up and develop new tools to make things better.
Responsibilities:
⇢ Developed and implemented cgroup monitoring agent in Mesos slave machine using
Golang. Introduced alerting and visualized the metrics using Riemann, Influx and Grafana,
which helps in identifying the high resource consumed Docker containers.
⇢ Modified the Traefik log parsing agent written in Python to support multiple logging formats.
This provided more details in logs and simplified the troubleshooting efforts.
⇢ Pinpointed the TCP Retransmission issue between two payloads due to internal firewalls
and removing the same reduced the latency of the applications.
⇢ Introduced load balancing in DNS resolvers, because failure of a single resolver might cause
potential outages.

GrafanaTerraform

Ola (ani technologies pvt. ltd)

2 roles

Senior DevOps Engineer (Level-2)

Apr 2019 – Jul 2020 · 1 yr 3 mos

Olacabs is one of the world’s largest ride-hailing and India’s largest mobility platform serving 250+ cities across India, Australia, New Zealand, and the UK.
I am part of the DevOps team, where we manage the infrastructure and migrated the infra to multi-cloud env. Also, we define a cost-effective method to optimize the infrastructure based on the business model.
Responsibilities:
⇢ Architectured and designed an Inhouse cache platform using Kubernetes services like EKS
and AKS, Helm, Redis, HaProxy, Gitlab Pipelines to replace the AWS Elasticache Service. This platform helped in saving ~100k dollars per month bills.
⇢ Implemented a centralized logging platform for Kubernetes workload using Filebeat, Kafka
and Graylog.
⇢ Structured and Implemented Terraform modules for resource provisioning. The reusable
and usecase specific nature of the module makes feather extension easier, more flexible
and easy provisioning.

Devops Engineer (Level -1 )

Mar 2018 – Mar 2019 · 1 yr

Olacabs is one of the world’s largest ride-hailing and India’s largest mobility platform serving 250+ cities across India, Australia, New Zealand, and the UK.
I am part of the DevOps team, where we manage the infrastructure and migrated the infra to multi-cloud env. Also, we define a cost-effective method to optimize the infrastructure based on the business model.
Responsibilities:
⇢ Introduced automation tools in Python, Golang.
⇢ Played a key role in the setup of cloud-native architectures (Mesos) of Ola & Foodpanda in
AWS & Azure.
⇢ Contributed to the server bootstrap and configuration management using Chef, Ansible.
⇢ Made traffic routing more robust with help of multilayer load balancing using tools like kong,
haproxy, nginx, etc.
⇢ Implemented Inhouse Release Management by replacing Github-Travis with Gitlab-Jenkins
also structured Gitops model for pipelining.

GitOpsGrafanaKubernetesTerraform

Endurance international group

System Engineer

Jul 2017 – Feb 2018 · 7 mos · Bangalore

Endurance International Group is an IT services company specializing in Web hosting and related services.
I am part of the tools team, where we develop and introduce new tools to the team to easy up tasks.
Responsibilities:
⇢ Designed and Implemented a queue-based data migration tool to sync data between
locations. This tool removed the overhead of manually syncing the data. Written in Python, frontend is PHP.
⇢ Provide day to day configuration, monitoring, and support for specific aspects of systems to
standards as applicable.
⇢ Troubleshooting operating system level/hardware issues, boot freezing, memory crash, high
load, performance tuning, security, etc.. in live production Linux servers to ensure 99.9%
uptime.

GitOpsGrafanaKubernetesTerraform

Hostdime.com

System Engineer

Apr 2016 – Jun 2017 · 1 yr 2 mos · Thiruvananthapuram, Kerala, India

Configured, troubleshot, and maintained critical network services, ensuring optimal performance of DNS, HTTP, and FTP protocols.
Collaborated with the OpenStack implementation team to enhance cloud service offerings, contributing to improved scalability.
Managed Linux/Apache/MySQL/PHP web application stacks, streamlining operations and reducing downtime.