Tony Chou

Software Engineer

Seattle, Washington, United States3 yrs 9 mos experience
Highly StableAI Enabled

Key Highlights

  • 5+ years of experience in high-scale distributed systems.
  • 2000+ stars on GitHub as an open source contributor.
  • Technical blogger with 400K+ views on Azure and data science.
Stackforce AI infers this person is a Backend Engineer specializing in Cloud Computing and Data Analytics.

Contact

Skills

Core Skills

Distributed SystemsApiBackendMachine LearningSoftware DevelopmentOpen Source ContributionResearch

Other Skills

Generative AICloudWatchAPI DevelopmentAmazon CloudWatchTelemetryData PlatformElasticsearchNeo4jApache AirflowData Analysis.NETData PipelinesGraphics DriverDebuggingBackend Development

About

โ— Backend engineer with ๐Ÿฑ+ ๐˜†๐—ฒ๐—ฎ๐—ฟ๐˜€ ๐—ผ๐—ณ ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ building high-scale distributed systems at AWS, Microsoft, and unicorn startup โ— Open source contributor with ๐Ÿฎ๐Ÿฌ๐Ÿฌ๐Ÿฌ+ ๐˜€๐˜๐—ฎ๐—ฟ๐˜€ on GitHub and ๐Ÿญ๐Ÿฌ๐Ÿฌ๐—ž+ ๐˜‚๐˜€๐—ฒ๐—ฟ๐˜€ across projects โ— Technical blogger for Microsoft Azure official blog and Towards Data Science with ๐Ÿฐ๐Ÿฌ๐Ÿฌ๐—ž+ views

Experience

3 yrs 9 mos
Total Experience
3 yrs 9 mos
Average Tenure
3 yrs 9 mos
Current Experience

Amazon web services (aws)

2 roles

Software Development Engineer II, Generative AI Operations

Apr 2025 โ€“ Present ยท 1 yr 1 mo

  • Building Generative AI Agent to automate troubleshooting in AWS https://aws.amazon.com/cloudwatch/features/aiops/
  • Designed and launched CloudWatch Investigation's RAG Service from scratch across 14 regions, orchestrating 10+ tools to enable the AI agent to automatically fetch telemetry, identify root causes, and propose resolutions
  • Reduced Topology Service API max latency by 80% by identifying thread contention through profiler flame graph analysis
  • Drove production incidents down 23% by leading root cause analysis across 100+ tickets and pushing top issue resolution
  • Led CloudWatch Logs tag migration to dedicated distributed cache across 21 regions with zero customer impact
Generative AICloudWatchAPIDistributed Systems

Software Development Engineer II, Observability

Aug 2022 โ€“ Apr 2025 ยท 2 yrs 8 mos

  • Designed and developed core backend services for Amazon CloudWatch, the largest real-time monitoring system in the world, serving 100M+ requests/sec at 99.99% availability
  • Led a team of 4 engineers to deliver telemetry relationships, ingesting 6 trillion relationships every hour
  • Improved ingestion latency by 10%, CPU by 8%, and saved $8M/year by refactoring the Metric metadata deduplication
  • Reduced blast radius by 99% during dependency failure by designing a new impact isolation protection
  • Lowered logs storage by 38%, Disk IO by 50%, CPU by 7%, and saved $11M/year by adopting binary logging framework
  • Eliminated 400+ security risks by leading the static-to-dynamic credentials migration with security team and PM
  • Reduced API authorization resource usage by 66% and latency by 35%, by redesigning the tag authorization framework
  • Resolved a billing bug that overcharged customers for 2+ months, coordinated refunds with PM and customer support
Amazon CloudWatchTelemetryAPIDistributed Systems

Appier

Backend Engineer, Data Platform (Unicorn Startup)

Oct 2021 โ€“ Jul 2022 ยท 9 mos

  • Increased data analystsโ€™ productivity by 20%, by integrating data discovery service Amundsen with Elasticsearch and Neo4j
  • Designed a metric aggregation framework that enables analysis on 100M+ rows of data on the fly
  • Improved data latency by 30x, by applying multiprocessing on ingestion pipeline and batch processing on database
  • Provided fresh data in discovery service by developing Airflow data pipelines to ingest metadata from Hive Metastore
  • Enabled data preview on 40K+ datasets by building Apache Superset preview client and connecting it with data catalog
Data PlatformElasticsearchNeo4jApache AirflowBackend

Qualcomm

Software Engineer Intern

Aug 2021 โ€“ Oct 2021 ยท 2 mos

  • Created a full stack machine learning headcount prediction platform from scratch for directors and VPs in APAC
  • Collaborated with data scientists to design data pipelines and deploy machine learning models
Machine Learning.NET

Intel corporation

Software Engineer Intern

Jun 2021 โ€“ Aug 2021 ยท 2 mos

  • Increased graphics driver log completeness by 20%, by developing 2 new features in the driver debug tool
  • Encrypted confidential information in public release drivers by refactoring the display logging structure
Graphics DriverDebuggingSoftware Development

National cheng kung university

Research Assistant

Oct 2020 โ€“ Feb 2021 ยท 4 mos ยท Taiwan

  • Constructed a DL-Based system to improve baseball playersโ€™ mechanics by automatically overlaying their pitching clips
  • Verified the assistant system accuracy and stability by automated testing on 10+ games and 30+ pitchers
Deep LearningAutomated TestingResearch

Microsoft

Software Engineer Intern (Backend)

Jun 2020 โ€“ Jun 2021 ยท 1 yr

  • Reconstructed a backend quotation system to automate 1000+ inquiries, by integrating with mail servers using .NET
  • Designed and implemented a distributed system solution based on clientโ€™s requirements with system design concepts
  • Developed an analyzer to inspect the usage of 100+ conference rooms for 8000+ employees with Microsoft Graph API
  • Supported 50+ team members by creating Azure Kubernetes Service docs and Azure case studies for internal reference
  • Migrated applications to Azure Kubernetes Service with Docker to enable autoscaling and access to more cloud services
Backend DevelopmentAzureBackend

Github

Open Source Contributor

Oct 2018 โ€“ May 2021 ยท 2 yrs 7 mos

  • Contributing to a large open-source project, Amundsen, on increasing parallelism in the data ingestion framework
  • Created LINE Message Visualizer and attracted 100K+ users in 3 months to analyze their message history
  • Acquired 1000+ stars on GitHub by consistently creating and improving my own side projects
Open SourceData IngestionOpen Source Contribution

Education

National Cheng Kung University

Bachelor's degree โ€” Computer Science

Jan 2017 โ€“ Jan 2022

Stackforce found 100+ more professionals with Distributed Systems & Api

Explore similar profiles based on matching skills and experience