Aman Upadhyay

DevOps Engineer

Bengaluru, Karnataka, India4 yrs 6 mos experience
Highly Stable

Key Highlights

  • Reduced deployment time from 40 minutes to 5 minutes.
  • Managed 100+ Cosmos and Ethereum validator nodes.
  • Built AI agents to automate operational workflows.
Stackforce AI infers this person is a Blockchain and DevOps expert specializing in scalable infrastructure solutions.

Contact

Skills

Core Skills

DevopsBlockchainCloud MigrationInfrastructureCloud ManagementFull Stack DevelopmentDeployment Infrastructure

Other Skills

AnsiblealloyArgoCDGitHub ActionsMonitoringAlertingEthereumPolygonOptimismArbitrumBaseNode.jsAmazon Web Services (AWS)Cascading Style Sheets (CSS)GitHub

About

The title changes — DevOps, Platform, Infrastructure — but the job has always been the same for me: figure out what is slowing the team down and fix it at the system level. 6 years in production. I have built infrastructure from scratch, migrated legacy systems to modern setups, and designed platforms that teams actually want to use. One migration took deployments from 40 minutes down to 4. That is the kind of problem I like solving — not just making things work, but making them fast, reliable, and low-maintenance. Blockchain infrastructure: 100+ Cosmos and Ethereum validator nodes in production across multiple networks. Hard forks, key management, zero-downtime upgrades. AI/ML infrastructure: GPU clusters running vLLM inference at scale on Kubernetes. Model serving, autoscaling, cost optimization across multi-tenant workloads. Multi-cloud on AWS and GCP. Systems aligned to ISO 27001 and SOC 2 — because compliance should be an engineering constraint, not a quarterly panic. What I have learned is that the hard part is never the stack. It is knowing when to automate vs leave manual, when Kubernetes helps vs hurts, when to build a platform vs just ship the fix. I have gotten that wrong enough times to know the difference. Right now I am building AI agents that automate the operational work most engineers still do by hand — access provisioning, knowledge base lookups, incident triage, runbook execution. Not demos. Production workflows that cut ticket volume so engineers can focus on what actually moves the business. Previously I built the DevOps and platform function from scratch — CI/CD, Ethereum and L2 node operations, observability, and security across distributed systems. I post here about what I learn building and running systems at scale: → Infrastructure decisions that actually matter → Platform engineering tradeoffs → Mistakes I have made so you do not have to Follow if you want real ops thinking, not motivational checklists.

Experience

4 yrs 6 mos
Total Experience
2 yrs 3 mos
Average Tenure
--
Current Experience

Stealth startup

Sr. DevOps & Solution Architect

Jan 2025Present · 1 yr 5 mos · United Arab Emirates · Remote

  • 1. Maintained and optimized over 100+ instances using Ansible, ensuring high availability and performance across the infrastructure.
  • 2. Revamped the entire monitoring and alerting stack, achieving high availability and reliability by redesigning the system architecture.
  • 3. Streamlined logging and monitoring processes by replacing outdated components with efficient, scalable solutions, enhancing overall system observability.
  • 4. Successfully migrated 100% of instances to new services with full monitoring compatibility, ensuring seamless transitions and improved system performance.
  • 5. Deployed and managed various blockchain networks, including Karak, Wormhole, Sui, Sei, and Ronin, Mezo ensuring seamless operations and reliability.
  • 6. Developed and implemented dashboards for infrastructure and service visibility, providing real-time insights and improving operational transparency.
AnsiblealloyDevOpsBlockchain

Fetch.ai

DevOps Engineer

Jun 2023Dec 2024 · 1 yr 6 mos · United Kingdom · Remote

  • 1. Led migration of legacy CI/CD pipelines to resilient structures with tools like ArgoCD and GitHub Actions , and documented over 40 undocumented applications in cloud infrastructure, streamlining deployment removing unnecessary components reducing the process from 40 minutes to 5 minutes
  • 2. Enhanced monitoring and alerting systems organization-wide. Established a central monitoring stack for 10+ clusters and 40+ services, cutting costs. Introduced dynamic alerts and unified individual environment dashboards into a dynamic one .
  • 3. Deployed and managed Ethereum, Polygon, Optimism, Arbitrum and Base nodes, ensuring reliable blockchain operations, while setting up alerting for new version releases via Slack integration.
  • 4. Introduced Go profiling for supported services to enhance execution speed and memory, and developed/integrated a Go profiling sidecar in Kubernetes environments, automating profile data extraction and enabling remote storage accessibility for developers.
  • 5. Automated user management processes, streamlining onboarding procedures and boosting operational efficiency, while creating an onboarding process step that integrates Google Workspace and GitHub teams, resulting in easier user access management across GCP projects and GitHub repos.
  • 6. Introduced observability practices through engagement with development teams and implementation of OpenTelemetry (Otel), while providing educational sessions on integration, testing, usage, and analysis of the data to the team.
  • 7. Successfully migrated various services between cloud platforms, ensuring minimal downtime and seamless operations.
  • 8. Enhanced security through VPN implementation for critical services, self-managed chart museum for Helm chart lifecycle, and enacted Role-Based Access Control (RBAC) and network policies in Kubernetes clusters.
  • 9. Implemented database management operators to enhance security and scalability, while educating team members on integrating these into their applications.
ArgoCDGitHub ActionsMonitoringAlertingEthereumPolygon+5

Syvora

3 roles

DevOps Engineer

Promoted

Jun 2022May 2023 · 11 mos

  • 1. Designed and Orchestrated Highly Scalable Blockchain Infrastructure
  • Architected and orchestrated cloud infrastructure for deploying a highly scalable blockchain RPC-based SaaS, managing 120 million daily transactions.
  • Engineered multi-regional clusters for high scalability and availability, reducing global response time.
  • 2. Accomplished Cosmos Validator Staking as a Service
  • Successfully delivered Staking as a Service for a Cosmos validator, managing an initial balance of 1.2 million USD.
  • Managed multiple blockchain systems including Cosmos, Ethereum, and Optimism.
  • 3. Streamlined Infrastructure Management & Automation
  • Automated end-to-end CI/CD processes, overseeing code lifecycles and managing multiple environments.
  • Achieved a 7900% reduction in deployment time for new blockchain instances, from 8 hours to just 6 minutes.
  • Automated snapshot creation process for multiple blockchain networks, eliminating manual work and data corruption risks.
  • 4. Implemented Robust Monitoring & Alerting Systems
  • Engineered comprehensive monitoring, alerting, and logging systems for effective infrastructure management.
  • Developed tools to monitor visible and non-visible failures, integrated with PagerDuty for reduced failure response time.
  • 5. Documentation & Incident Response Excellence
  • Developed high-quality documentation and on-call runbooks for effective incident response and resolution.
  • Integrated Kubecost operator to monitor and optimize overall Kubernetes infrastructure costs.
  • 6. Technical Expertise & Tooling
  • Deployed and managed in-house subgraphs, complete ETL tools for indexing EVM-based blockchains, and custom HAProxy setups for maintaining healthy RPC nodes.
  • Automated deployment of complex cloud infrastructure using Terraform.
BlockchainNode.jsInfrastructure

Junior DevOps Engineer

Promoted

Jun 2021May 2022 · 11 mos

  • 1. Secure Architecture Design & Implementation
  • Developed and implemented multiple isolated environments ensuring robust operational integrity.
  • Introduced secure secret injection methods enhancing secret management security across diverse environments.
  • 2. High Availability & Failover Mechanisms
  • Engineered failover mechanisms for blockchain nodes, guaranteeing high uptime and continuous operation.
  • Implemented horizontal pod auto-scaling and cluster auto-scaling for cost-efficient high availability.
  • 3. Proactive Infrastructure Management
  • Established comprehensive monitoring, logging, and alerting solutions for proactive infrastructure management.
  • Identified and addressed issues leading to random block misses by validators, enhancing system reliability.
  • 4. Deployment & Upgrade Strategies
  • Achieved zero downtime deployments/upgrades using Canary and Blue/Green rollout strategies in Kubernetes.
  • Managed infrastructure deployments for blockchain, ensuring operational efficiency and reliability.
  • 5. Cost Optimization & Disaster Recovery
  • Realized a significant 35% monthly cost savings through cloud architecture optimization.
  • Developed robust disaster recovery procedures ensuring resilience and continuity in unexpected events.
  • 6. Cloud Migration & Management
  • Executed cloud migration plans with meticulous roadmaps and risk assessments, ensuring smooth transitions.
Node.jsAmazon Web Services (AWS)DevOpsCloud Management

Full Stack Developer

Feb 2020May 2021 · 1 yr 3 mos

  • 1. Led End-to-End Development and Architectural Decisions
  • Spearheaded development initiatives from conception to deployment, guiding design and architectural decisions for high-performing applications.
  • 2. Accelerated Deployments with Docker Containerization
  • Doubled deployment speed and streamlined server provisioning by leveraging Docker containerization, enhancing efficiency and scalability.
  • 3. Established Robust Deployment Infrastructure
  • Designed and optimized deployment infrastructure and automation pipelines to ensure seamless and efficient delivery of software solutions.
  • 4. Reduced Third-Party Service Costs
  • Strategically refactored API call flows, resulting in a notable 40% reduction in third-party service costs, enhancing cost-effectiveness and resource utilization.
  • 5. Orchestrated Database Migration Tasks
  • Orchestrated complex database migration tasks to elevate platform functionality, ensuring smooth transitions and minimal disruptions.
  • 6. Developed Helm Charts for Kubernetes Deployments
  • Engineered Helm Charts for Kubernetes deployments of Node.js microservices, enabling simplified management and scalability.
  • 7. Implemented Monitoring, Logging, and Alerting
  • Implemented robust monitoring, logging, and alerting systems using Google Stackdriver, enhancing visibility and proactive issue resolution.
  • 8. Streamlined Deployment Processes
  • Pioneered the implementation of single-step deployment procedures using Python scripts and Helm Charts, optimizing deployment efficiency and reliability.
Cascading Style Sheets (CSS)Amazon Web Services (AWS)Full Stack DevelopmentDeployment Infrastructure

Education

Shri G S Institute of Technology & Science

Master of Computer Applications - MCA — Computer Applications

Jan 2018Jan 2021

Makhanlal Chaturvedi National University of Journalism and Communication, Bhopal

BCA — Computer Applications

Jan 2015Jan 2018

Stackforce found 100+ more professionals with Devops & Blockchain

Explore similar profiles based on matching skills and experience