Anthony Polyakov

Director of Engineering

Seattle, Washington, United States14 yrs 7 mos experience

Key Highlights

Expert in building scalable cloud infrastructure.
Proven track record in leading large engineering teams.
Innovative solutions for complex data processing challenges.

Stackforce AI infers this person is a SaaS and Cloud Infrastructure expert with a strong focus on database management and observability.

Contact

Skills

Core Skills

Cloud InfrastructureSoftware DevelopmentDatabase Management

Other Skills

compute servicesruntime infrastructureAPIsKubernetesmulti-cloudSDTKobservabilitycontinuous integrationcloud servicesgRPCCI/CDBigtableNoSQLconcurrent architectureJDBC driver

About

Senior hands on leader, manager of managers with a proven track of record delivering with teams of different sizes and spirits - from focused startup-like fast-paced groups to geo-distributed departments of 50+ engineers. I can build things ground up as well as working in mature environments with serious production workloads. Inspirer, speaker, executor, visioner and someone who strongly believes in leading by example and keeping things simple. I love debugging things and tracing complex problems. Eager learner and truth seeker. I enjoy tech work just as much as I enjoy building teams and helping people grow. My areas of tech interests include: streaming data processing, distributed systems, reactive system design, functional programming, in-memory processing. Some tech keywords: Java, Go, C/C++, Python, gRPC, REST, linux, docker, kubernetes, opentelemetry

Experience

14 yrs 7 mos

Total Experience

1 yr 11 mos

Average Tenure

Current Experience

Nvidia

Director

Dec 2024 – Mar 2026 · 1 yr 3 mos · Seattle, Washington, United States · On-site

Compute Infrastructure, DGX Cloud

Datarobot

VP of Engineering

Jun 2022 – Dec 2024 · 2 yrs 6 mos · Vancouver, British Columbia, Canada

Head of Global Platform team (80+ FTE) running foundational services (compute, runtime, storage, security, tenant context and other) for all DataRobot components including model training and AutoML, inferencing, GenAI, Notebooks supporting control plane for single and multi-tenant SaaS and infrastructure provisioning. Global Platform also includes Developer Experience team delivering observability, continuous integration services, service bootstrapping scaffolding framework (helping bootstrapping gRPC based services integrated with common k8s platform using Helm)
In my current role I was tasked to help moving DataRobot from on-premise monolithic application to a multi cloud multi service distributed system.
I started with re-defining platform from being a set of common tools to a actually being a product exposing well defined APIs to customers. We built the technical strategy and operational principles for the team working backwards from our customers (internal teams) and identified foundational services to be built in the first place:
Common k8s-based Runtime infrastructure for running DataRobot components in a secure and scalable way
Foundational compute services and APIs covering typical use case pattern - batch jobs API for offline training, FaaS like runtime services for real-time inferencing, hosting API for running user provided models and notebooks
SDTK (service development toolkit) for bootstrapping new services with all batteries included (RPC, o11y, deployment, pipelines, etc) to foster developing new services by ML teams outside of old monolith while having all best practices and platform capabilities included.
In under a year we had DataRobot moved to a new Kubernetes based architecture and rolled it out across AWS, GCP, Azure and self-managed k8s, in the next year we delivered foundational compute services and SDTK. We modernized DevEx stack getting rid of in-house tools in favour of modern o11y and CI/CD platforms integrated to SDTK

compute servicesruntime infrastructureAPIsKubernetesmulti-cloudSDTK+4

Google

Senior Engineering Manager

Sep 2021 – Sep 2022 · 1 yr · Waterloo, Ontario, Canada

Bigtable lead in Canada.
Bigtable is Google's petabyte scale ultra low latency managed NoSQL database powering most demanding Google services internally - YouTube, Gooogle Maps, Search, etc as well as largest enterprises externally on GCP.
I built Canadian team focusing on making it a self sufficient center of excellence vs a team extension to existing teams.
Together with the team focused on one of the hardest issue in Bigtable - noisy neighbour problem when requests made by one client negatively affect other clients. Together with the team debugged and improved concurrent architecture of the Bigtable API frontends services dropping number of noisy neighbour related tickets by 50%
Independently, as a new year hackathon project developed fully ANSI SQL compatible Bigtable JDBC driver based on Apache Phoenix
Drove the latency and resource consumption improvement project rearchitecting side cars for critical path authentication and authorization functionality to in-process architecture. As a result we saved up to 10% of CPU resrouces for Bigtable fleet

BigtableNoSQLconcurrent architectureJDBC driverlatency improvementDatabase Management+1

Atlassian

Senior Principal Engineer

Jul 2020 – Sep 2021 · 1 yr 2 mos · Vancouver, British Columbia, Canada

Senior Principal Engineer in Cloud Infrastructure team. Architrect of the new observability platform for Atlassian. OpenTelemetry open source contributor. Brought OpenTelemetry to Atlassian, evangelized and drove adoption, inspired a "if you miss something - make a PR" culture resulting in Atlassian being amongst top 30 OpenTelemetry contributors.
Authored design and drove production delivery for Observazaurus - a cross Atlassian observability platform. It was a realime stream data processing pipeline taking telemetry data from every Atlassian service (10's of terrabytes a day) and intelligently processing it to:
allow for quotas and limits
control cardinality
fan-out to hot and cold storages
detect anomalies in real-time
provide meta-observability
harmonize dimensions
route data to appropriate configurable observability backend (Splunk, SignalFX, cold storage, LightStep, etc)
Lead for company wide cross-functional technical advisory groups in SRE and Cloud Infrastructure engaging principal engineers across the organization to work on strategic initiatives and drive company tech strategy.

observability platformOpenTelemetryreal-time data processingcross-functional collaborationCloud InfrastructureSoftware Development

Amazon web services (aws)

Software Development Manager

Feb 2018 – Jul 2020 · 2 yrs 5 mos · Vancouver, British Columbia, Canada

Run Aurora MySQL, RDS MySQL and MariaDB teams (30 FTEs) - the next generation cloud databases at AWS. Responsible for all engineering and operational aspects of one of the largest database fleets in the world.
Among other things my team launched Aurora Global Database, Aurora Multi-Master, delivered major upgrade to RDS control plane to support MySQL 8.0, launched new RDS Recommendations service.
Major contributor to the design of Aurora Global Database - a geographically distributed relational database with sub second multi region latencies and global control plane. I drove the control plane and API design, developed a global metadata storage layer backing the control plane, partnered with the engine and storage teams making the global control plane for them.
I inspired and started RDS Recommendations - an intelligent database co-pilot delivering optimization and best practices advices to customers in real-time looking at their database fleet. We designed it to be an extensible reactive platform consuming large amount of telemetry, metadata, control plane data and producing actionable recommendations to clients. The engine ran on extremely large fleet of databases (millions of instances) and was scalable and extensible such that new types of recommendations could be added easily.
Reduced KTLO and engineering toil by 50% in 1 year. Fully automated engine release process going from multi month to same day new engine releases.
Was founding EM for AWS Location service (5 FTEs growing to 15). Was responsible for geofencing and real-time tracking domains building both the team and technology ground-up. Partnered with Product management to deliver the concept, the product vision and the very first customer demos of the product. Was a co-author of a new patent on real-time geofencing algorithm. Drove the architecture for a high performant distributed geofencing engine processing real-time position data from clients and producing fencing events at AWS scale.

cloud databasesAurora Global DatabaseRDS RecommendationsgeofencingDatabase ManagementSoftware Development

Clouddbappliance project

Senior Software Engineer

Jan 2017 – Dec 2017 · 11 mos · Greater Paris Metropolitan Region

Working as a Use Case developer for CloudDBAppliance - Horizon 2020 project sponsored by European Comission. The aim is to develop a system computing real-time CVA risks utilising capabilities delivered by CloudDBAppliance. CloudDBAppliance is a brand new appliance consisting of ultra-fast operational data lake and in-memory analytics engine. Aim is to achieve multi-terabyte, multi-hundred cores scalability with predictable performance and strong consistency guarantees by utilising NUMA architecture, proprietary K/V storage engine and unique resource allocation algorithms

Infoshare.pl

Speaker

May 2016 – May 2016 · 0 mo · Gdańsk, Pomorskie, Poland

Tech stage

Nordea markets

3 roles

Head of Application Development, Core Services and Risk

Promoted

Jan 2016 – Jan 2018 · 2 yrs

Hands-on leader and solution architect. Managing software development teams in Denmark and Poland (60+ people in total) which are working on the key parts of the new Nordea Capital Markets IT ecosystem in Trading and Risk domain. Technical manager, solution architect and owner for number of critical systems and foundational components (messaging layer, service discovery, operational data stores). Responsible for designing and building:
Realtime FpML document-based trade vault delivering trade contract information and serving continuous queries for Capital Markets systems (multi terabyte MongoDB-based solution)
New FRTB-compliant Market Risk infrastructure - including Scenario Engine and IMA engine capable of doing on-demand simulations fed into in-memory aggregation cubes (> x10 capacity increase comparing to existing one, reactive and on-demand computations comparing to overnight batch)
Core messaging layer for Capital Markets infrastructure based on Kafka and Confluent Platform
Core cloud-ready infrastructure architecture (dynamic service discovery, fluid machine-agnostic cluster-based deployment capable of doing canary rollouts, containerisation)
Improved Credit Risk system landscape to ensure continuous delivery model with vendor solution

application developmentmessaging layercloud architecturereal-time systemsSoftware DevelopmentCloud Infrastructure

Head of Risk IT

Sep 2015 – Jan 2016 · 4 mos

Head of Market Risk IT

Aug 2014 – Sep 2015 · 1 yr 1 mo

Market Risk IT department is 25 developers and business analysts working for Capital Markets and responsible for market risk management systems. These are high throughput systems calculating mission critical risk figures on Nordea-wide portfolios.
Existing systems suffered from substantial capacity, stability and scalability issues. There were no automated deployment processes, systems required a lot of manual efforts just to keep operating, development process in the team was very adhoc and communication to business was not transparent.
I headed the team as a crisis manager with the aim of (re) engineering existing systems, bringing in engineering culture, building strong development practice, ensuring agile and transparent development processes.
In less than a year we:
identified most problematic places and developed an evolutionary strategy of re-engineering critical components. Following this strategy we were able to do dramatic improvements in throughput and stability in a smooth continuous way without business interruption and necessity to do massive regression
established fully transparent agile process with strong ownership and ensured continuous delivery chain. That required building the full continuous delivery pipeline and migrating existing legacy code base to automatic deployment framework
cleaned up old systems and built a new microservice-based set of components leveraging web-scale technologies like Redis and Kafka, employing reactive principles with RxJava and doing stateless scalable architecture
established set of strong, mobile and self-organized engineering teams
As a result critical processing times dropped from 4 hours to several minutes, we've got highly concurrent reusable components operating in real-time with no global shared state which allowed to do things like what-if analysis.
System now handles millions of trades in a semi-real-time mode allowing to do instant slice-and-dice leveraging in-memory olap cube

Deutsche bank

3 roles

Assistant Vice President

Feb 2010 – Mar 2012 · 2 yrs 1 mo

Leading project of building Deutsche's FIX API Options platform
Leading development of electronic option orders (ATOM) platform
Leading development of Autobahn FX Options (http://www.autobahnfx.com/options.html) and AutobahnFX Structured Products (http://www.autobahnfx.com/deposits.html) - a market leading FX Options platform