Ishaan S.

Director of Engineering

Palo Alto, California, United States22 yrs 9 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Led teams optimizing Netflix's CDN for streaming.
  • Developed predictive algorithms for content delivery.
  • Designed systems improving streaming efficiency.
Stackforce AI infers this person is a Backend-heavy Infrastructure Engineer in the Streaming industry.

Contact

Skills

Core Skills

Distributed SystemsJavaMonitoringManagement

Other Skills

ScalabilityCloud ComputingAWSPerformance TuningArchitectureMessagingTomcatArchitecturesSOAAntIntegrationMultithreadingEnterprise SoftwareRESTJMS

About

Java, C, C++, Distributed Systems, Concurrency, Scalability, Performance, Messaging, AWS

Experience

22 yrs 9 mos
Total Experience
7 yrs 6 mos
Average Tenure
16 yrs 2 mos
Current Experience

Netflix

5 roles

Senior Engineering Manager, Open Connect Control Plane

Apr 2024Present · 2 yrs

Engineering Manager, Open Connect Control Plane

May 2021Apr 2024 · 2 yrs 11 mos

  • Hands-on Builder, working on Netflix's CDN for Video-On-Demand and extending it to support Live Streaming, Cloud Games, and Ads Delivery.
  • Also fortunate to lead the amazing teams responsible for:
  • Steering customer traffic to the best nodes to serve requested content, using factors such as the real-time status of the CDN
  • Directing the deployment and replication of content to all CDN nodes while maximizing cache offload, minimizing transfer costs, and minimizing deployment times
  • Predictive and Optimization Algorithms for maximizing CDN efficiency
  • Reacting in real-time to changes and failures of CDN nodes and components
  • Building Geo & Network Intelligence for company-wide initiatives
  • Data-driven hardware design & capacity analysis
  • Providing operational excellence, monitoring, and on-call support
Distributed SystemsJavaScalabilityCloud ComputingAWSPerformance Tuning

Engineering Leader/Manager, Open Connect Control Plane - Content Placement & Popularity

Promoted

Jan 2019May 2021 · 2 yrs 4 mos

  • Open Connect is Netflix's homegrown Content Delivery Network, serving
  • over a third of North America's peak internet traffic.
  • I build the systems and manage the team responsible for the following areas:
  • Content Popularity: We build models for predicting nightly
  • streaming behavior. These models feed into systems built
  • for content placement and optimal video streaming.
  • Efficient Placement of Content: We decide how to allocate streaming
  • video assets onto Open Connect video streaming servers (OCAs). We build
  • algorithms for placing content onto OCAs in a manner that
  • efficiently utilizes their hardware. We engineer systems
  • that achieve business goals such as maintaining high QoE for our users while
  • serving video traffic in a cost-effective manner.
  • Resilient Placement of Content: We place content across Open Connect in
  • a manner that is tolerant to failure and congestion of both the network
  • and video streaming servers. We work with device teams to both maintain
  • playback QoE and prevent video traffic shifts from optimal locations.
Distributed SystemsJavaScalabilityCloud ComputingAWSPerformance Tuning

Software Engineer, Open Connect Control Plane

Sep 2012Jan 2019 · 6 yrs 4 mos

  • Core engineer on team responsible for design/implementation of Netflix's proprietary CDN (Open Connect).
  • Designed/implemented system for computing number of copies of movies/shows needed to achieve optimal streaming throughput and resilience to failure. The number of copies is modeled as a function of predicted content popularity, network topology, and streaming hardware characteristics. Ultimately this system reduced unnecessary content replication throughout Open Connect, allowing for better utilization of deployed hardware.
  • Worked with streaming client teams to reduce spurious traffic shifts out of optimal Open Connect locations ("switchaways"). Resulted in ~60% reduction, which benefits Netflix users and ISPs.
  • Designed/implemented cloud service for steering consumer traffic to optimal Open Connect movie caches. Ensured performance, scalability, and reliability of service by designing multiple levels of fallback strategies for data.
  • Designed/implemented framework to support near-real time updates of content location and health telemetry (an improvement of multiple orders of magnitude). This framework supported Amazon EC2 regional failover, data loss, and error detection, without sacrificing speed or scalability.
  • Implemented various aspects of optimal steering of streaming video traffic:
  • Location-based network traffic scatters/sprays
  • Open Connect resilience strategies
  • AB testing
  • URL generation
  • Standards support
  • Migration of network traffic off of 3rd party CDNs onto Open Connect
  • Support production readiness via metrics, dashboards, crisis-response, configuration and data analysis.
Distributed SystemsJavaScalabilityCloud ComputingAWSPerformance Tuning

Streaming Infrastructure Engineering

Jan 2010Aug 2012 · 2 yrs 7 mos

  • Responsible for design/implementation of metadata caching systems to serve movie streams to Netflix-capable devices. A design goal was to require no dependencies on external systems / network calls.
  • Researched and implemented performance optimizations that improved streaming service scalability and reduced Amazon EC2 costs. Brought down memory-footprint and CPU-overhead using Avro and efficient parsing / caching.
  • Amazon EC2 migration and integration. Implemented support for DRM services and Netflix-ready devices in the cloud.
  • Responsible for production readiness. Engaged in production crisis debugging / trouble-shooting. Analyzed logs to determine trends. Ensured quality of metrics and monitoring.
  • Implemented server-side support for various Netflix-ready devices.
Distributed SystemsJavaScalabilityCloud ComputingAWSPerformance Tuning

Vmware

vCenter Platform/iOps Engineering

Jul 2007Dec 2009 · 2 yrs 5 mos

  • Core contributor on team responsible for next-generation management and monitoring architecture
  • Presenter at VMWorld 2009: Multiple sessions demonstrating value-add of vCenter Hardware Monitoring and Alarm/Alert services.
  • Responsible for Alarm/Alert Service: This component allows users to configure monitors for thresholds in the system (e.g. VM CPU), and changes in state/occurrence of events (e.g. Loss of connectivity to managed entity). It also allows automated action responses to the aforementioned monitors.
  • Responsible for Hardware Health Monitoring: This component monitors the health of hardware components (fans, cores, power supplies, memory, etc.) of all physical hosts in vCenter’s inventory. The number of managed hosts is in the hundreds and expected to grow.
MonitoringManagementArchitecture

Tibco software

Global Architects / Emerging Technologies Group / BusinessEvents Engineering

Jun 2003Jul 2007 · 4 yrs 1 mo

  • Core engineer on BusinessEvents, TIBCO's flagship Enterprise Event Management product suite.
  • Designed and implemented software components, directed/mentored new developers & contractors (local and offshore). Also engaged directly with customers for short-term (proof of concept) and long-term (implementation) projects.
  • Responsible for Runtime Inferential State Machine, Debugger, Timer Threading, Design-time Modeler, Analyzer, APIs, among other areas of the product.
  • Directed local and India-based contractors while developing Rule Debugging and Analysis tools.
  • Debugged/Upgraded/Maintained production-quality SMS QoS monitoring production system involving thousands of messages / second, 40 data input channels, and numerous distributed instances of BusinessEvents for Vodafone (customer) in Düsseldorf, Germany.
  • Architected and implemented distributed state management / scaling solution for previously mentioned system that allowed 10x data throughput with approximately linear theoretical scalability.
  • Architected BusinessEvents solutions for various TIBCO customers in Finance and Telco. Solutions involved message sequencing/routing, Fraud Detection, Business Process Monitoring, scalability (several BusinessEvents instances) and performance (10k+ msgs/sec) (Rome, Boston, Bonn)
  • Keywords: java, multi-threaded, distributed, server/infrastructure development, jms, rv, messaging
JavaDistributed SystemsMessaging

Education

Stanford University

Computer Science

Jan 2003Present

Stackforce found 100+ more professionals with Distributed Systems & Java

Explore similar profiles based on matching skills and experience