Mani S

DevOps Engineer

Austin, Texas, United States8 yrs experience
Most Likely To Switch

Key Highlights

  • Led migration to AWS and OpenShift at United Airlines.
  • Built and scaled SRE teams across multiple organizations.
  • Expert in cloud architecture and DevOps automation.
Stackforce AI infers this person is a Cloud Infrastructure Architect with extensive experience in DevOps and Site Reliability Engineering.

Contact

Skills

Core Skills

Cloud ArchitectureSite Reliability EngineeringDevopsCloud Infrastructure DesignInfrastructure EngineeringCloud MigrationCloud Infrastructure

Other Skills

AWSAnsibleAppDynamicsAzureBambooBitbucketCloud ApplicationsCloudFormationCommunicationConfiguration ManagementData CenterDatadogDisaster RecoveryDockerELK

About

At United Airlines, my expertise in cloud architecture has been pivotal in our transition to a modern AWS environment. Leveraging my AWS certification and Kubernetes skills, I've led the charge in migrating from on-premise systems to OpenShift, ensuring our operations are both scalable and resilient. My role involves extensive hands-on experience with a myriad of AWS services, driving improvements in system performance and uptime. With the help of a dedicated team, we've automated workflows and enhanced DevOps processes, supporting the strategic growth and technological advancement of our organization.

Experience

8 yrs
Total Experience
1 yr 7 mos
Average Tenure
3 yrs
Current Experience

United airlines

Cloud Architect & SRE Leader

Jun 2023Present · 3 yrs · Chicago, Illinois, United States · Remote

  • Cloud Architect & Migration
  • Designed and implemented AWS cloud architecture, leading migration from on-premises to OpenShift 4.X with Kubernetes. Built scalable, high-availability environments leveraging AWS services: VPC, Subnets, NAT, IGW, Transit Gateway, Route 53, ALB, API Gateway, CloudFront, Security Groups, KMS, ACM, Landing Zone, Control Tower.
  • Compute & Storage
  • Optimized workloads using EC2, ECS, EKS, Lambda, Auto Scaling, Batch Jobs, MWAA. Architected scalable storage with S3, FSX, Elastic Cache, RDS (PostgreSQL, MySQL), DynamoDB, Redshift, Glue, EMR.
  • Infrastructure as Code & Automation
  • Led automation efforts using Terraform, CloudFormation, Ansible, Helm to standardize deployments. Managed Kubernetes networking with Istio, service mesh solutions.
  • CI/CD Modernization
  • Migrated pipelines from TeamCity to Harness, implementing 700+ pipelines with Terraform integration. Worked with GitHub Actions, Jenkins, Digital.ai, Helm for Kubernetes deployments. Integrated SonarQube, Veracode, Wiz for security.
  • Site Reliability Engineering (SRE) & Observability
  • Built dashboards with Datadog, Splunk, ELK, Dynatrace, CloudWatch for real-time monitoring. Reduced MTTD/MTTR with self-healing automation. Led Major Incident Management (MIM), Post-Incident Reviews (PIRs), RCA, problem management, and Chaos Engineering.
  • Cloud: AWS, GCP, OpenShift, Kubernetes
  • CI/CD: Harness, GitHub Actions, Jenkins, Terraform
  • Monitoring: Datadog, Splunk, ELK, Dynatrace
  • Languages: Python, Java, Spring, NodeJS
AWSKubernetesOpenShiftTerraformCloudFormationAnsible+7

Charles schwab

Sr.Lead Devops & SRE

May 2022May 2023 · 1 yr · Austin, Texas, United States · Hybrid

  • Led a team of 9 engineers in the design, development, and maintenance of cloud infrastructure and DevOps processes, ensuring high performance and uptime across systems. Spearheaded the design and implementation of AWS cloud architecture while also driving the migration to GCP, utilizing resources like Cloud APIs, GCE, GKE, Cloud Functions, Cloud Spanner, BigQuery, IAM, Cloud SQL, Pub/Sub, and Cloud Storage.
  • Managed the end-to-end implementation of Harness Next Gen, including project onboarding, managing Harness Delegates, templates, connectors, and CI/CD pipeline architecture. Developed Node.js and Python tools to automate operational tasks and enhance efficiency.
  • Onboarded a suite of new DevOps tools such as LaunchDarkly, TasktopViz, TasktopHub, Harness, Digital.ai, SauceLabs, Mabl, GitHub Actions, Artifactory, AppDynamics, Splunk, and ThousandEyes, creating automated workflows to streamline operations.
  • Implemented Disaster Recovery Plans (DRP), integrating frontend and backend services, synthetic monitoring, and metrics collection to create comprehensive observability dashboards. Utilized Docker Compose and Kubernetes for containerized builds and deployments, creating Helm charts for seamless service deployment.
  • Collaborated with cross-functional teams to design, build, and maintain cloud infrastructure, focusing on networking, server hardware, load balancer configurations, and distributed tracing. Configured Akamai rules for frontend CFN, ensuring smooth traffic routing and application performance.
  • Developed KPI metrics and log traces to enhance observability, ensuring end-to-end telemetry for monitoring both application and infrastructure performance. Mentored and trained team members, fostering a collaborative and service-focused work environment.
AWSGCPNode.jsPythonHarnessGitHub Actions+5

Sony pictures networks india

Sr.SRE Manager

Jul 2021Apr 2022 · 9 mos · Mumbai, Maharashtra, India

  • Cloud Infrastructure Design & Migration Led the design and migration of AWS and Azure cloud architectures, including Three-Tier, Serverless, and Microservices. Migrated 100+ resources from CloudFormation to Terraform, reducing provisioning time by 40% and enabling scalability for business growth.
  • CI/CD Pipeline & Automation Optimization Enhanced Jenkins for high availability and scalability, reducing build times by 30%. Automated over 50+ releases, increasing deployment frequency by 25%. Integrated New Relic for real-time monitoring, improving application performance tracking and reducing incident response times.
  • Containerization & Kubernetes Orchestration Designed and deployed containerized applications using ECS, EKS, and Azure Kubernetes Service (AKS), increasing availability by 99.99%. Optimized auto-scaling, resource allocation, and load balancing to support 100,000+ concurrent users, improving operational efficiency by 35%.
  • Serverless Solutions & Cloud Automation Developed serverless solutions with AWS Lambda and Azure Function Apps, automating 80% of application management tasks and reducing onboarding time by 50%. Worked with Python, Node.js, and TypeScript to build tools that streamlined cloud management and enhanced scalability.
  • Security, Compliance & Monitoring Ensured GDPR compliance and resolved vulnerabilities using Qualys and Nessus. Improved security and deployment processes, reducing incidents by 30%. Integrated monitoring tools like ElasticSearch, CloudWatch, and DataDog, creating 50+ dashboards and reducing downtime incidents by 20%.
AWSAzureTerraformJenkinsKubernetesNew Relic+4

Aditya birla capital

Sr.Lead SRE

Mar 2020Jul 2021 · 1 yr 4 mos · Goregaon, Maharashtra

  • I led a team of 15+ members across DevSecOps, DevOps, and SRE, building the SRE team from the ground up and aligning technical solutions with business goals through close collaboration with the VP, CTO, Senior IT Manager, product managers, and operations teams. I scaled application traffic from 1.5 TPS to 15 TPS by configuring SLA, SLO, and KPI metrics, ensuring performance objectives were met while driving system reliability and scalability.
  • I designed and implemented a highly available platform with 99.99% uptime, supporting critical business operations with seamless scalability and reliability. I developed observability and monitoring dashboards, automated alert systems, and auto-resolve jobs, which enabled proactive issue resolution and significantly enhanced operational visibility in a 24x7 production environment.
  • I architected AWS three-tier environments using Terraform and CloudFormation, automating infrastructure provisioning while establishing ECS environments, API Gateways, and VPC models to streamline deployments and infrastructure management. To ensure resilience, I implemented high-availability and disaster recovery solutions, supporting traffic loads of 30-50 TPS and reducing recovery times by 50%.
  • Applying MLOps principles, I designed and managed end-to-end Machine Learning pipelines to support business-critical AI/ML workloads. I automated model deployment with CI/CD ,Tools like Amazon SageMaker, Kubernetes, and MLFlow were leveraged to optimize large-scale machine learning model delivery.
  • Security was prioritized by addressing vulnerabilities, implementing GDPR compliance, and enhancing network security with WAF and encryption.
  • Additionally, I introduced modern observability tools like Grafana, Prometheus, CloudWatch, and DataDog, improving monitoring and reducing issue resolution times by 30%. By automating deployments and patching with Ansible, I cut build cycle times by 40% and managed CI/CD pipelines for seamless production releases.
AWSTerraformKubernetesAnsibleGrafanaPrometheus+2

Apple

Sr.Infrastructure Engineer

Apr 2019Feb 2020 · 10 mos · Austin, Texas Area

  • I led the design and migration of AWS Two-Tier and Microservices Architectures, transitioning from bare-metal environments to the cloud. This included creating and managing AWS CloudFormation templates for resource provisioning, ensuring a smooth migration of on-premises data centers to AWS.
  • I worked extensively with Docker and Kubernetes (ECS/EKS), managing clusters, namespaces, nodes, and pods while automating deployments to streamline microservices operations. As a release manager, I implemented blue-green deployments for high availability, coordinated releases with SRE and infrastructure teams, and ensured seamless application delivery.
  • I automated server configurations, builds, and deployments using Ansible, creating playbooks and managing SSL certificates with Cert-Manager and Ansible Vault. Additionally, I configured Cloudflare for CDN services to optimize performance and security.
  • Supporting CI/CD pipelines, I configured Jenkins and developed automation for QA, regression, and deployment tasks. I contributed to application development by integrating APIs using Node.js and maintaining Java-based applications with Tomcat and Apache servers.
  • I enhanced monitoring and troubleshooting capabilities by leveraging Splunk for log analysis, alert creation, and dashboard development, as well as Dynatrace for infrastructure metrics. My database responsibilities included managing Cassandra nodes, performing validations, and planning migrations to MongoDB to meet application requirements.
AWSDockerKubernetesAnsibleJenkinsNode.js+2

National geographic

Sr.Devops Engineer & SRE

Mar 2017Mar 2019 · 2 yrs · Washington D.C. Metro Area · On-site

  • Designed and implemented AWS hybrid and monolithic architectures, as well as GCP hybrid environments using shared VPC models, optimizing network usage, configuring subnets, and establishing firewall rules to improve security and scalability.
  • Migrated 400+ services from on-premises to AWS and GCP, including the SPI project, where I collaborated with leadership to move 20 TB of image data, achieving a 40% reduction in storage costs and a 50% improvement in data accessibility.
  • Automated infrastructure provisioning using CloudFormation templates, Ansible playbooks, and later transitioned to Terraform for scalability, reducing provisioning times by 60% and standardizing deployments.
  • Partnered with InfoSec teams to resolve application vulnerabilities, improving security posture by 50% and ensuring compliance with organizational standards.
  • Designed and managed QA automation pipelines, created and optimized Docker files, and implemented Kubernetes setups, enhancing deployment efficiency and reducing testing times by 40%.
  • Applied SRE principles post-migration to improve site reliability, reduce latency by 30%, and ensure consistent performance for cloud-hosted applications.
  • Managed and administered Bamboo CI/CD pipelines, optimizing workflows, creating automated build and release plans, and accelerating release cycles by 50%.Designed and implemented disaster recovery workflows for AWS, improving recovery speed by 50% and reducing downtime for critical applications.
AWSGCPAnsibleTerraformDockerDevOps+1

Linkwell solutions llc

DevOps Engineer

Sep 2015Jan 2017 · 1 yr 4 mos · Texas, United States

  • CICd automation and release and deploy, release manager, version controllers working on developing type script for web pages.
  • Setting up the monitoring tools like Splunk, Nagios, and working on configuration management tools like puppet and chef for automations
  • The migration of On-prem to AWS and working on AWS architecture
AWSDockerAnsibleSplunkDevOps

Efftronics systems private limited

Software Engineer /Linux Admin

Sep 2013Aug 2015 · 1 yr 11 mos · India · On-site

  • Managed source code versioning and release processes using Git and CVS, defining branching and merging strategies to streamline collaboration.
  • Automated application packaging and deployment pipelines via Jenkins, improving deployment efficiency and consistency.
  • Administered Linux environments and facilitated data transfers across hardware components to optimize infrastructure operations.
  • Enhanced testing and code quality by implementing JUnit and Selenium, with detailed code coverage reports generated via SonarQube.
  • Developed and deployed J2EE applications on JBoss and Apache Tomcat servers, ensuring reliable performance across environments.
  • Created and executed SQL scripts for deployment in multiple environments, standardizing database operations.
Linux System AdministrationSQLGit

Education

Silicon Valley University

Masters in Computer science — MSCS

Jan 2015Jan 2016

Nimra College of Engineering & Technology, Nimra nagar,(V), Ibrahimpatnam, PIN-521456,(CC-23)

Electronic and Communications Engineering Technology

Jan 2009Jan 2013

Stackforce found 100+ more professionals with Cloud Architecture & Site Reliability Engineering

Explore similar profiles based on matching skills and experience