Sri Chavali

CEO

Hyderabad, Telangana, India15 yrs 1 mo experience

Key Highlights

  • Saved Microsoft $2 million with ML resource prediction.
  • Converted 100 TB batch pipeline to streaming, improving latency by 20X.
  • Optimized ingestion pipelines at VMware, saving over $1 million annually.
Stackforce AI infers this person is a Backend-heavy Fullstack Engineer specializing in Data Engineering and Analytics.

Contact

Skills

Core Skills

Data Quality FrameworkData ValidationData IngestionData QualityData ProcessingApi DevelopmentMicroservicesPerformance Optimization

Other Skills

Apache SparkFlinkPostgresHadoopKafkaCassandraAzure Data FactoryAzure SynapseRocksDBHiveHBasePostgreSQLRedisElasticsearchJava

About

I like to build and experiment with new things. I love coding and the math behind all the complex predictive algorithms/models. Learning by coding is my motto, be it some algorithm or a new programming language. I have Expertise in building high-available, resilient, scalable, low-latency microservices in a distributed architecture and very passionate about database internals, data modeling, data Warehousing, high-performance analytics, big data, ML, Deep learning, and anything related to data:) At Microsoft, I am part of a data platform team. ● Led a team of three to design and implement self-serve reporting frameworks and up-level the data quality for Microsoft 365. ● Architected and implemented a Centralized Ingestion framework and data-aware pre-computation and aggregation that handles hundreds of Terabytes of daily data. ● Solved complex organizational challenges involving disparate data sources by designing a Unified Metrics framework as the single source of truth. Defined the standards and created a process of how Microsoft 365 should onboard new metrics and datasets. ● Developed Autotune using historical data and ML models to predict spark resource usage, which saved Microsoft $2 million. ● Converted the 100 TB daily batch pipeline to streaming and improved the data freshness latency by 20X. Implemented Exactly-once semantics using Flink checkpointing and Delta Lake Atomic commits. Solved complex challenges related to small files by implementing dynamic partitioning and reduced the number of output files from millions to thousands. ● Built and Implemented a two-year roadmap for Microsoft 365 to have the highest quality data. At VMware, I was part of the VROPS team, which collects and processes billions of real-time metrics and time series data daily. ● Led a team of three in managing a complex ecosystem built on Kafka, Flink, and Cassandra, demonstrating leadership skills and proficiency in these technologies. ● Optimized ingestion pipelines resulting in reduction of daily storage from 3TB to 40GB, highlighting efficiency in resource management. Reduced the storage and computing costs by 60%. Saving VMware > $1 million per year. ● Scaled Kafka to ingest billions of messages daily across hundreds of nodes in geo-replicated Active-Active clusters, showcasing ability to handle large-scale data processing. ● Implemented Distributed Profiler on Apache Spark applications leading to a reduction of memory usage by 500GB, demonstrating expertise in performance optimization.

Experience

Oracle

Consulting Member of Technical Staff

Sep 2024Present · 1 yr 6 mos

  • Performance monitoring of globally distributed Oracle Sharded database using OLTP standard benchmarks (TPC-C)

Self-employed

Lead Software Engineer

Aug 2023Aug 2024 · 1 yr · Remote · Remote

  • ● Leading the creation of a data quality framework capable of connecting to various data sources for comprehensive data validations and quality checks using Apache Spark, Flink, and Postgres.
  • ● Designing the framework to interpret and execute data quality rules expressed in SQL, ranging from basic Null checks and row counts to complex anomaly detection.
Apache SparkFlinkPostgresData Quality FrameworkData Validation

Microsoft

Senior Software Engineer

Sep 2020Sep 2022 · 2 yrs · Seattle, Washington, United States

  • ● Led a team of three to design and implement self-serve reporting frameworks and up-level the data quality for Microsoft 365.
  • ● Architected and implemented a Centralized Ingestion framework and data-aware pre-computation and aggregation that handles hundreds of Terabytes of daily data.
  • ● Solved complex organizational challenges involving disparate data sources by designing a Unified Metrics framework as the single source of truth. Defined the standards and created a process of how Microsoft 365 should onboard new metrics and datasets.
  • ● Developed Autotune using historical data and ML models to predict spark resource usage, which saved Microsoft $2 million.
  • ● Converted the 100 TB daily batch pipeline to streaming and improved the data freshness latency by 20X. Implemented Exactly-once semantics using Flink checkpointing and Delta Lake Atomic commits. Solved complex challenges related to small files by implementing dynamic partitioning and reduced the number of output files from millions to thousands.
  • ● Built and Implemented a two-year roadmap for Microsoft 365 to have the highest quality data.
HadoopApache SparkData IngestionData Quality

Versa networks

Software Engineer

Jun 2018Aug 2019 · 1 yr 2 mos · San Francisco Bay Area

  • ● Part of the API Gateway service framework team that migrated Versa Director from Monolith to SOA. Designed and implemented distributed Rate Limiter and Authorization Microservices using Spring boot, Java and SQL.
  • ● Implemented Thrift IDL service configuration template, which auto-generates all the boilerplate code like observability, rate limit, circuit breaking, and backpressure for the microservices.
  • ● Implemented smart load balancing between microservices using consistent hashing.
  • ● Implemented query layer API microservice for Elasticsearch clusters. This query layer is responsible for generating efficient queries and rejecting expensive queries. Implemented distributed Rate-limiting to avoid coordination between query API nodes.
HadoopApache SparkAPI DevelopmentMicroservices

Vmware

Software Engineer

Oct 2008Mar 2018 · 9 yrs 5 mos · Palo Alto, California, United States

  • ● Led a team of three in managing a complex ecosystem built on Kafka, Flink, and Cassandra, demonstrating leadership skills and proficiency in these technologies.
  • ● Optimized ingestion pipelines resulting in reduction of daily storage from 3TB to 40GB, highlighting efficiency in resource management. Reduced the storage and computing costs by 60%. Saving VMware > $1 million per year.
  • ● Scaled Kafka to ingest billions of messages daily across hundreds of nodes in geo-replicated Active-Active clusters, showcasing ability to handle large-scale data processing.
  • ● Designed and implemented a robust monitoring and auditing system for Kafka clusters to improve production readiness and solve data quality challenges.
  • ● Implemented Distributed Profiler on Apache Spark applications leading to a reduction of memory usage by 500GB, demonstrating expertise in performance optimization.
KafkaFlinkCassandraData ProcessingPerformance Optimization

Education

Cleveland State University

Master's degree — Computer Science

Stackforce found 100+ more professionals with Data Quality Framework & Data Validation

Explore similar profiles based on matching skills and experience