Zhen Li

Senior Software Engineer

San Francisco, California, United States18 yrs 11 mos experience

Highly Stable

Key Highlights

Over 10 years of experience in software engineering.
Expert in cloud data warehousing and distributed systems.
Proven track record in leading complex data projects.

Stackforce AI infers this person is a SaaS and Fintech expert with strong capabilities in distributed systems and data analytics.

Contact

Skills

Core Skills

Cloud Data WarehousingDistributed SystemsData Analysis

Other Skills

Data Lifecycle ManagementMetadata ServiceLineage ServiceCloud Data WarehouseAWSS3 StorageMachine LearningScalable Cloud ServiceAWS NeptuneDistributed InfrastructureYARNHadoopDistributed Financial Instrument PricingData ProfilingCompute Cost Reduction

About

10+ years of experience as a software engineer, architect and technical lead in data platform, scalable and fault-tolerant distributed infrastructure, cloud data warehousing, and analytics. Managed all activities necessary to take a project from idea to production including requirement gathering, feature and phases planning, system architecture, data modeling, prototyping, development, testing, and production release. Strong analytical, creative, management, technical skills with a strong understanding of distributed system, data warehousing technologies, data analysis, micro service architecture, and machine learning infrastructures. Result oriented person, get things done in a high quality and timely manner, passion at driving and leading project development with interested in mentoring and training.

Experience

18 yrs 11 mos

Total Experience

3 yrs 9 mos

Average Tenure

Current Experience

Netflix

Senior Software Engineer- Data Platform

Sep 2015 – Feb 2024 · 8 yrs 5 mos · Los Gatos, California

Lead the data lifecycle management for data platform, including metadata and lineage in major data source. Lead the infrastructure framework for managing data retention based on storage and privacy requirements, deletion service for executing data retirement, tired storage for cost efficiency, alerting users for various data retirement events, and traceability and logging for tracing back
Develop metadata service for cloud data warehouse. It enables the data transport between various data storage. The metadata service provides a federate views of metadata system information to allow arbitrary metadata storage of datasets in hive, redshift, mySQL, pig, presto, etc.
Develop lineage service for providing crucial information to support various business use cases, e.g. data cleanse and privacy, based on data dependencies. Implement services on top of AWS Neptune graph database to support flexible and scalable lineage data queries.
Design methodology and tools to employ two highly durable s3 storage classes as tiered storage for storing big data for cost saving purpose. The system employs S3 data analytics which leverages machine learning algorithm to recommend the most efficient storage class suitable for application data access.
Design and implement highly scalable cloud service to interact with s3 data warehouse in a scalable and reliable way, which enables >8 million keys daily processing capacity. The system is implemented as a component based pipeline structure and each component can be configured and scaled out, which effectively addressed the dynamic and fast growth of input data.

Data Lifecycle ManagementMetadata ServiceLineage ServiceCloud Data WarehouseAWSS3 Storage+4

Ayasdi

Technical Lead - AI Computational Infrastructure

Oct 2014 – Oct 2015 · 1 yr · Menlo Park

Backend AI infrastructure development - building scalable distributed infrastructure for enabling complex data analytics algorithms and machine learning jobs execution on top of YARN.
Develop distributed infrastructure for high performance batch and interactive data analytic processing on top of native YARN. Implemented application master to interact with resource manager and node managers to schedule and launch various machine learning data analysis jobs on top of Hadoop cluster.
Design and implement highly scalable and flexible system architecture to allow the infrastructure adaptive to physical clusters. The system employs a group of application masters to separate the job scheduling load; dynamically discovers and monitors the liveness of application masters; balances the work load among the application masters.
Update cluster to CDH5 and employ Cloudera manager to config HDFS and YARN parameters; implement kerberos based authentication to support security hadoop cluster; repackaging software package to be hadoop cluster ready.

Distributed InfrastructureYARNHadoopMachine LearningDistributed Systems

Goldman sachs

Project Lead - Data Platform

May 2007 – Oct 2014 · 7 yrs 5 mos · New York City Metropolitan Area

Data platform for large scale distributed financial instrument pricing.
Research and implement approaches for distributed process profiling and compute cost reduction.
Edit tracking process, automatic report generation.
Selected projects:
Stock pricing infrastructure: developed the core dollar price components of stock pricing infrastructure. It extracts the common factors and defines a general pricing environments to allow various stock pricing model ( linear or non-linear ) to plugin and run on enterprise cloud.
Large scale position attribute calculation infrastructure: developed the infrastructure to calculate financial instrument attributes (terabyte data daily), such as denomination, on data centers; employed timeout and multilevel retry mechanisms (immediately, delayed, and final) to address calculation failure caused by transient application exceptions, machine, network or other resource issues; provided tools to allow ad-hoc testing and issue investigation locally; developed task status check and rerun to ensure calculation completeness which directly impacts the federal stress-test results.
Compute cost reduction via task batching: independently researched on the project for compute cost reduction. Proposed and evaluated effective batching approaches (split inputs to create tasks) to reduce total compute cost. Implemented and deployed lazy batching approach to reduce compute cost (average of 5%) by batching more computation units in a task as possible to minimize task management overhead, where time-insensitive task creation is delayed until accumulating enough computation units or reaching time limits.

Distributed Financial Instrument PricingData ProfilingCompute Cost ReductionData Analysis