Aman Upadhyay — DevOps Engineer

The title changes — DevOps, Platform, Infrastructure — but the job has always been the same for me: figure out what is slowing the team down and fix it at the system level. 6 years in production. I have built infrastructure from scratch, migrated legacy systems to modern setups, and designed platforms that teams actually want to use. One migration took deployments from 40 minutes down to 4. That is the kind of problem I like solving — not just making things work, but making them fast, reliable, and low-maintenance. Blockchain infrastructure: 100+ Cosmos and Ethereum validator nodes in production across multiple networks. Hard forks, key management, zero-downtime upgrades. AI/ML infrastructure: GPU clusters running vLLM inference at scale on Kubernetes. Model serving, autoscaling, cost optimization across multi-tenant workloads. Multi-cloud on AWS and GCP. Systems aligned to ISO 27001 and SOC 2 — because compliance should be an engineering constraint, not a quarterly panic. What I have learned is that the hard part is never the stack. It is knowing when to automate vs leave manual, when Kubernetes helps vs hurts, when to build a platform vs just ship the fix. I have gotten that wrong enough times to know the difference. Right now I am building AI agents that automate the operational work most engineers still do by hand — access provisioning, knowledge base lookups, incident triage, runbook execution. Not demos. Production workflows that cut ticket volume so engineers can focus on what actually moves the business. Previously I built the DevOps and platform function from scratch — CI/CD, Ethereum and L2 node operations, observability, and security across distributed systems. I post here about what I learn building and running systems at scale: → Infrastructure decisions that actually matter → Platform engineering tradeoffs → Mistakes I have made so you do not have to Follow if you want real ops thinking, not motivational checklists.

Stackforce AI infers this person is a Blockchain and DevOps expert specializing in scalable infrastructure solutions.

Location: Bengaluru, Karnataka, India

Experience: 4 yrs 6 mos

Skills

Devops
Blockchain
Cloud Migration
Infrastructure
Cloud Management
Full Stack Development
Deployment Infrastructure

Career Highlights

Reduced deployment time from 40 minutes to 5 minutes.
Managed 100+ Cosmos and Ethereum validator nodes.
Built AI agents to automate operational workflows.

Work Experience

Stealth Startup

Sr. DevOps & Solution Architect (1 yr 5 mos)

Fetch.ai

DevOps Engineer (1 yr 6 mos)

Syvora

DevOps Engineer (11 mos)

Junior DevOps Engineer (11 mos)

Full Stack Developer (1 yr 3 mos)

Education

Master of Computer Applications - MCA at Shri G S Institute of Technology & Science

BCA at Makhanlal Chaturvedi National University of Journalism and Communication, Bhopal

Aman Upadhyay

DevOps Engineer

Bengaluru, Karnataka, India4 yrs 6 mos experience

Highly Stable

Key Highlights

Reduced deployment time from 40 minutes to 5 minutes.
Managed 100+ Cosmos and Ethereum validator nodes.
Built AI agents to automate operational workflows.

Stackforce AI infers this person is a Blockchain and DevOps expert specializing in scalable infrastructure solutions.

Contact

Skills

Core Skills

DevopsBlockchainCloud MigrationInfrastructureCloud ManagementFull Stack DevelopmentDeployment Infrastructure

Other Skills

AnsiblealloyArgoCDGitHub ActionsMonitoringAlertingEthereumPolygonOptimismArbitrumBaseNode.jsAmazon Web Services (AWS)Cascading Style Sheets (CSS)GitHub

About

Experience

4 yrs 6 mos

Total Experience

2 yrs 3 mos

Average Tenure

Current Experience

Stealth startup

Sr. DevOps & Solution Architect

Jan 2025 – Present · 1 yr 5 mos · United Arab Emirates · Remote

1. Maintained and optimized over 100+ instances using Ansible, ensuring high availability and performance across the infrastructure.
2. Revamped the entire monitoring and alerting stack, achieving high availability and reliability by redesigning the system architecture.
3. Streamlined logging and monitoring processes by replacing outdated components with efficient, scalable solutions, enhancing overall system observability.
4. Successfully migrated 100% of instances to new services with full monitoring compatibility, ensuring seamless transitions and improved system performance.
5. Deployed and managed various blockchain networks, including Karak, Wormhole, Sui, Sei, and Ronin, Mezo ensuring seamless operations and reliability.
6. Developed and implemented dashboards for infrastructure and service visibility, providing real-time insights and improving operational transparency.

AnsiblealloyDevOpsBlockchain

Fetch.ai

DevOps Engineer

Jun 2023 – Dec 2024 · 1 yr 6 mos · United Kingdom · Remote

1. Led migration of legacy CI/CD pipelines to resilient structures with tools like ArgoCD and GitHub Actions , and documented over 40 undocumented applications in cloud infrastructure, streamlining deployment removing unnecessary components reducing the process from 40 minutes to 5 minutes
2. Enhanced monitoring and alerting systems organization-wide. Established a central monitoring stack for 10+ clusters and 40+ services, cutting costs. Introduced dynamic alerts and unified individual environment dashboards into a dynamic one .
3. Deployed and managed Ethereum, Polygon, Optimism, Arbitrum and Base nodes, ensuring reliable blockchain operations, while setting up alerting for new version releases via Slack integration.
4. Introduced Go profiling for supported services to enhance execution speed and memory, and developed/integrated a Go profiling sidecar in Kubernetes environments, automating profile data extraction and enabling remote storage accessibility for developers.
5. Automated user management processes, streamlining onboarding procedures and boosting operational efficiency, while creating an onboarding process step that integrates Google Workspace and GitHub teams, resulting in easier user access management across GCP projects and GitHub repos.
6. Introduced observability practices through engagement with development teams and implementation of OpenTelemetry (Otel), while providing educational sessions on integration, testing, usage, and analysis of the data to the team.
7. Successfully migrated various services between cloud platforms, ensuring minimal downtime and seamless operations.
8. Enhanced security through VPN implementation for critical services, self-managed chart museum for Helm chart lifecycle, and enacted Role-Based Access Control (RBAC) and network policies in Kubernetes clusters.
9. Implemented database management operators to enhance security and scalability, while educating team members on integrating these into their applications.

ArgoCDGitHub ActionsMonitoringAlertingEthereumPolygon+5

Syvora

3 roles

DevOps Engineer

Promoted

Jun 2022 – May 2023 · 11 mos

1. Designed and Orchestrated Highly Scalable Blockchain Infrastructure
Architected and orchestrated cloud infrastructure for deploying a highly scalable blockchain RPC-based SaaS, managing 120 million daily transactions.
Engineered multi-regional clusters for high scalability and availability, reducing global response time.
2. Accomplished Cosmos Validator Staking as a Service
Successfully delivered Staking as a Service for a Cosmos validator, managing an initial balance of 1.2 million USD.
Managed multiple blockchain systems including Cosmos, Ethereum, and Optimism.
3. Streamlined Infrastructure Management & Automation
Automated end-to-end CI/CD processes, overseeing code lifecycles and managing multiple environments.
Achieved a 7900% reduction in deployment time for new blockchain instances, from 8 hours to just 6 minutes.
Automated snapshot creation process for multiple blockchain networks, eliminating manual work and data corruption risks.
4. Implemented Robust Monitoring & Alerting Systems
Engineered comprehensive monitoring, alerting, and logging systems for effective infrastructure management.
Developed tools to monitor visible and non-visible failures, integrated with PagerDuty for reduced failure response time.
5. Documentation & Incident Response Excellence
Developed high-quality documentation and on-call runbooks for effective incident response and resolution.
Integrated Kubecost operator to monitor and optimize overall Kubernetes infrastructure costs.
6. Technical Expertise & Tooling
Deployed and managed in-house subgraphs, complete ETL tools for indexing EVM-based blockchains, and custom HAProxy setups for maintaining healthy RPC nodes.
Automated deployment of complex cloud infrastructure using Terraform.

BlockchainNode.jsInfrastructure

Junior DevOps Engineer

Promoted

Jun 2021 – May 2022 · 11 mos

1. Secure Architecture Design & Implementation
Developed and implemented multiple isolated environments ensuring robust operational integrity.
Introduced secure secret injection methods enhancing secret management security across diverse environments.
2. High Availability & Failover Mechanisms
Engineered failover mechanisms for blockchain nodes, guaranteeing high uptime and continuous operation.
Implemented horizontal pod auto-scaling and cluster auto-scaling for cost-efficient high availability.
3. Proactive Infrastructure Management
Established comprehensive monitoring, logging, and alerting solutions for proactive infrastructure management.
Identified and addressed issues leading to random block misses by validators, enhancing system reliability.
4. Deployment & Upgrade Strategies
Achieved zero downtime deployments/upgrades using Canary and Blue/Green rollout strategies in Kubernetes.
Managed infrastructure deployments for blockchain, ensuring operational efficiency and reliability.
5. Cost Optimization & Disaster Recovery
Realized a significant 35% monthly cost savings through cloud architecture optimization.
Developed robust disaster recovery procedures ensuring resilience and continuity in unexpected events.
6. Cloud Migration & Management
Executed cloud migration plans with meticulous roadmaps and risk assessments, ensuring smooth transitions.

Node.jsAmazon Web Services (AWS)DevOpsCloud Management

Full Stack Developer

Feb 2020 – May 2021 · 1 yr 3 mos

1. Led End-to-End Development and Architectural Decisions
Spearheaded development initiatives from conception to deployment, guiding design and architectural decisions for high-performing applications.
2. Accelerated Deployments with Docker Containerization
Doubled deployment speed and streamlined server provisioning by leveraging Docker containerization, enhancing efficiency and scalability.
3. Established Robust Deployment Infrastructure
Designed and optimized deployment infrastructure and automation pipelines to ensure seamless and efficient delivery of software solutions.
4. Reduced Third-Party Service Costs
Strategically refactored API call flows, resulting in a notable 40% reduction in third-party service costs, enhancing cost-effectiveness and resource utilization.
5. Orchestrated Database Migration Tasks
Orchestrated complex database migration tasks to elevate platform functionality, ensuring smooth transitions and minimal disruptions.
6. Developed Helm Charts for Kubernetes Deployments
Engineered Helm Charts for Kubernetes deployments of Node.js microservices, enabling simplified management and scalability.
7. Implemented Monitoring, Logging, and Alerting
Implemented robust monitoring, logging, and alerting systems using Google Stackdriver, enhancing visibility and proactive issue resolution.
8. Streamlined Deployment Processes
Pioneered the implementation of single-step deployment procedures using Python scripts and Helm Charts, optimizing deployment efficiency and reliability.