Saurabh Suman

Senior Software Engineer

Bengaluru, Karnataka, India10 yrs 10 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in cloud-native infrastructure for AI/ML data centers.
  • Proven track record in building high-throughput data systems.
  • Strong background in security-first architecture and observability.
Stackforce AI infers this person is a SaaS Infrastructure Engineer with expertise in AI/ML data systems and security architecture.

Contact

Skills

Core Skills

Cloud-native InfrastructureSecurity ArchitectureReal-time ObservabilityReal-time Data ProcessingAttribution ServicesMobile AttributionData Pipeline DevelopmentApplication DevelopmentPerformance Testing

Other Skills

GoScalagRPCKafkamTLSservice meshOpenTelemetrytime-series databasesApache StormHBaseJavaJettyX.509access controlHadoop

About

Engineer building cloud-native infrastructure for next-generation AI/ML data centers at NVIDIA. • I design and develop observability and network fabric management platforms that power large-scale GPU cluster deployments. My work spans the full stack—from designing secure microservices with mTLS and service mesh, to building high-throughput telemetry pipelines processing network topology and metrics at scale. • Current focus includes platform convergence for multi-protocol network fabrics, security-first architecture with service mesh and compliance frameworks, observability infrastructure using time-series databases and OpenTelemetry, and GPU-to-network correlation for end-to-end AI workload visibility. • I bring end-to-end ownership from design through production, with a track record of delivering complex distributed systems. I thrive at the intersection of infrastructure, security, and scalability—building platforms that support enterprise-grade reliability. Strong background in high-throughput data systems, having previously built attribution pipelines processing 3B+ events/day.

Experience

10 yrs 10 mos
Total Experience
2 yrs 8 mos
Average Tenure
3 yrs 6 mos
Current Experience

Nvidia

Senior System Software Engineer

Nov 2022Present · 3 yrs 6 mos

  • Architecting cloud-native network fabric management platform using Go/Scala microservices, gRPC, and Kafka, enabling real-time observability across GPU cluster infrastructures supporting AI/ML workloads at scale.
  • Designing and implementing security frameworks including mTLS with service mesh, access control systems, and driving compliance initiatives while leading vulnerability remediation across platform components.
  • Building high-throughput data ingestion pipelines with time-series database integration and OpenTelemetry, enabling real-time metrics collection and alerting for large-scale network fabrics.
  • Contributing to platform bringup for new hardware generations, mentoring engineers, leading cross-team collaborations, and driving quality initiatives including test automation and deployment optimizations.
GoScalagRPCKafkamTLSservice mesh+4

Yahoo

Software Dev Engineer

Mar 2020Nov 2022 · 2 yrs 8 mos · Champaign, Illinois, United States

  • Worked with Conversion Attribution Team, who owns service for conversion beacon attribution of events like clicks, impressions, app installs, email receipts, store visits etc from several Ad systems based on complex local rules. The supported Ad platforms are Yahoo Gemini, Brightroll DSP and Oath AOL. Technologies used are Apache Storm (Trident) and HBase for attribution and hadnling of 3B events/day.
  • Maintained a (web-service) real-time mobile attribution service for integrating with mobile measurement partners, where self attribution claims are returned using jetty framework.
  • Migrated service authentication for the product to a open source platform for X.509 certificate-based service authentication and authorization by fine-grained access control. It supports provisioning and configuration (centralized authorization) use case as well as serving /runtime (decentralized authorization) use cases.
Apache StormHBaseJavaJettyX.509access control+2

Experian

Software Developer II

Jun 2019Mar 2020 · 9 mos · Indianapolis, Indiana, United States

  • Redesigned and added features to internal pipeline of services used for products that serves around 200 clients worldwide
  • Worked with Bulk data processing and injection service from Hadoop to HBase that enables insight generation for client reports
HadoopHBaseBulk data processingData Pipeline Development

Oracle

2 roles

Senior Application Engineer

Promoted

Mar 2017May 2018 · 1 yr 2 mos

  • Managed Scrum team and engineered new responsive user interface for Release 13. Directed performance testing of the product along with improvements wherever necessary. The response time of clicks brought down to 2 seconds (90th Percentile)
  • Designed and developed Tuition Calculation APIs, enabling course fees calculation, recalculation, creation and back out of invoices. Developed UI Pages and Testing using JUnit Test Framework. Item completed ahead of time with 100% BAT, RRF success rate
  • Presented Demos regarding Deliverable to Senior Lead & Directors as well as working in close cooperation with project managers for use-case development and other team members to form a team effort in development. Achieved 95% and above success rate on epic
  • Worked with cross functional team to identify the scalability ,availability and security requirements of the products, in an iterative development environment
ScrumJUnitJavaApplication DevelopmentPerformance Testing

Application Engineer

Jun 2014Feb 2017 · 2 yrs 8 mos

  • Designed and coded application using Java, SQL in an Agile environment utilizing a Test Driven Development approach on applications Student Records, Student Financials. Currently being used by 350+ Higher education institutions globally
  • Excelled in rapid application development and management of technological issues for assigned projects and integration with Oracle Financials and HCM. Utilized object-oriented design methodology and large scale application development in Java
  • Contributed software engineering expertise in the development of products through the software life-cycle, from requirements definition through successful deployment on an enterprise-grade cloud solutions platform using Oracle’s FMW/Middleware development experiences
  • Created and maintained project tasks and schedules, provided programming estimates, identified potential problems and recommended alternative solutions using distributed systems and service-oriented architecture
JavaSQLAgileTest Driven DevelopmentApplication Development

Education

Purdue University

Master of Science - MS — Business Analytics and Information Management

Jan 2018Jan 2019

Birla Institute of Technology and Science, Pilani

B.E.Electrical and Electronics & M.Sc.Economics (Dual Degree)

Jan 2009Jan 2014

Stackforce found 100+ more professionals with Cloud-native Infrastructure & Security Architecture

Explore similar profiles based on matching skills and experience