Amal Soman

SRE (Site Reliability Engineer)

Toronto, Ontario, Canada7 yrs 8 mos experience
AI EnabledAI ML Practitioner

Key Highlights

  • Expert in architecting cloud infrastructures across multiple platforms.
  • Proven track record in implementing observability solutions.
  • Strong collaboration skills in DevOps and SRE environments.
Stackforce AI infers this person is a Cloud Infrastructure Architect with a focus on DevOps and Site Reliability Engineering.

Contact

Skills

Core Skills

Site Reliability EngineeringGoogle Cloud Platform (gcp)Kubernetes

Other Skills

AkamaiAmazon EKSAmazon Web Services (AWS)AnsibleApacheApache KafkaApache MesosAzure DevOpsAzure Kubernetes Service (AKS)CC++CalicoChef.ioComputer Network OperationsComputer Networking

About

At the forefront of technological evolution, I architect robust cloud infrastructures and spearhead the automation of DevOps processes, also make systems reliable based on SRE principles. My proficiency in multiple clouds (AWS, Azure & GCP) ecosystems, coupled with my ability to craft resilient systems, has been instrumental in streamlining continuous integration and deployment workflows. My collaborative approach, grounded in a strong understanding of Kubernetes, Terraform, and containerization, supports the team in achieving operational excellence. Embracing a culture of innovation and efficiency, we are committed to delivering scalable solutions that drive the company forward.

Experience

Verticalscope inc.

Staff Engineer - Devops/SRE

Sep 2024Present · 1 yr 6 mos · Toronto, Ontario, Canada · Remote

Loblaw companies limited

3 roles

Staff Site Reliability Engineer

Jun 2024Aug 2024 · 2 mos

Senior Site Reliability Engineer

Promoted

Oct 2022May 2024 · 1 yr 7 mos

  • ⇢ Architected and implemented an Observability Platform for Loblaw using Golang and defined the SRE principle based Observability with SLI, SLO and Error Budget at Org Level, which helped the company to identify many potential issues and get alerted.
  • ⇢ Replaced the single instance Prometheus for time series data with Victoriametrics, which is fast, scalable, fast data ingestion, light-speed querying.
  • ⇢ Develop IAC terraform code for GCP resources including GKE (Google Kubernetes Engine) + istio and build a seamless rollout using Gitlab Pipelines.
  • ⇢ Played a key-role in moving the standalone application running in VM to GKE (Google Kubernetes Engine) using helm,gitlab pipelines, vault.
  • ⇢ Improved performance of services with the help of Akamai CDN and built IAC with gitlab pipelines for version rollouts.
  • ⇢ Collaborate with the team for MR reviews/feedbacks, System design, coding in Go, Python, Bash.
  • Improved the application observability by instrument using opentelemetry.
Amazon Web Services (AWS)Go (Programming Language)Site Reliability EngineeringOperating SystemsPythonGitOps+13

Site Reliability Engineer 2

Feb 2021Oct 2022 · 1 yr 8 mos

GitOpsGrafanaKubernetesTerraform

Phonepe (a walmart-flipkart company)

Site Reliability Engineer

Aug 2020Jan 2021 · 5 mos · India

  • PhonePe is a Walmart - Flipkart owned company dealing with millions of money transfers per day based in India.
  • I am part of the infrastructure SRE team, where we set up and develop new tools to make things better.
  • Responsibilities:
  • ⇢ Developed and implemented cgroup monitoring agent in Mesos slave machine using
  • Golang. Introduced alerting and visualized the metrics using Riemann, Influx and Grafana,
  • which helps in identifying the high resource consumed Docker containers.
  • ⇢ Modified the Traefik log parsing agent written in Python to support multiple logging formats.
  • This provided more details in logs and simplified the troubleshooting efforts.
  • ⇢ Pinpointed the TCP Retransmission issue between two payloads due to internal firewalls
  • and removing the same reduced the latency of the applications.
  • ⇢ Introduced load balancing in DNS resolvers, because failure of a single resolver might cause
  • potential outages.
GrafanaTerraform

Ola (ani technologies pvt. ltd)

2 roles

Senior DevOps Engineer (Level-2)

Apr 2019Jul 2020 · 1 yr 3 mos

  • Olacabs is one of the world’s largest ride-hailing and India’s largest mobility platform serving 250+ cities across India, Australia, New Zealand, and the UK.
  • I am part of the DevOps team, where we manage the infrastructure and migrated the infra to multi-cloud env. Also, we define a cost-effective method to optimize the infrastructure based on the business model.
  • Responsibilities:
  • ⇢ Architectured and designed an Inhouse cache platform using Kubernetes services like EKS
  • and AKS, Helm, Redis, HaProxy, Gitlab Pipelines to replace the AWS Elasticache Service. This platform helped in saving ~100k dollars per month bills.
  • ⇢ Implemented a centralized logging platform for Kubernetes workload using Filebeat, Kafka
  • and Graylog.
  • ⇢ Structured and Implemented Terraform modules for resource provisioning. The reusable
  • and usecase specific nature of the module makes feather extension easier, more flexible
  • and easy provisioning.

Devops Engineer (Level -1 )

Mar 2018Mar 2019 · 1 yr

  • Olacabs is one of the world’s largest ride-hailing and India’s largest mobility platform serving 250+ cities across India, Australia, New Zealand, and the UK.
  • I am part of the DevOps team, where we manage the infrastructure and migrated the infra to multi-cloud env. Also, we define a cost-effective method to optimize the infrastructure based on the business model.
  • Responsibilities:
  • ⇢ Introduced automation tools in Python, Golang.
  • ⇢ Played a key role in the setup of cloud-native architectures (Mesos) of Ola & Foodpanda in
  • AWS & Azure.
  • ⇢ Contributed to the server bootstrap and configuration management using Chef, Ansible.
  • ⇢ Made traffic routing more robust with help of multilayer load balancing using tools like kong,
  • haproxy, nginx, etc.
  • ⇢ Implemented Inhouse Release Management by replacing Github-Travis with Gitlab-Jenkins
  • also structured Gitops model for pipelining.
GitOpsGrafanaKubernetesTerraform

Endurance international group

System Engineer

Jul 2017Feb 2018 · 7 mos · Bangalore

  • Endurance International Group is an IT services company specializing in Web hosting and related services.
  • I am part of the tools team, where we develop and introduce new tools to the team to easy up tasks.
  • Responsibilities:
  • ⇢ Designed and Implemented a queue-based data migration tool to sync data between
  • locations. This tool removed the overhead of manually syncing the data. Written in Python, frontend is PHP.
  • ⇢ Provide day to day configuration, monitoring, and support for specific aspects of systems to
  • standards as applicable.
  • ⇢ Troubleshooting operating system level/hardware issues, boot freezing, memory crash, high
  • load, performance tuning, security, etc.. in live production Linux servers to ensure 99.9%
  • uptime.
GitOpsGrafanaKubernetesTerraform

Hostdime.com

System Engineer

Apr 2016Jun 2017 · 1 yr 2 mos · Thiruvananthapuram, Kerala, India

  • Configured, troubleshot, and maintained critical network services, ensuring optimal performance of DNS, HTTP, and FTP protocols.
  • Collaborated with the OpenStack implementation team to enhance cloud service offerings, contributing to improved scalability.
  • Managed Linux/Apache/MySQL/PHP web application stacks, streamlining operations and reducing downtime.

Education

Mahatma Gandhi University

Bachelor of Technology (B.Tech.) — Computer Science

Jan 2011Jan 2015

Stackforce found 100+ more professionals with Site Reliability Engineering & Google Cloud Platform (gcp)

Explore similar profiles based on matching skills and experience