Venkata krishnan Sowrirajan

Software Engineer

San Francisco, California, United States13 yrs 4 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in building scalable distributed systems.
  • Significant contributions to Apache Spark and Flink.
  • Strong leadership in technical reviews and mentoring.
Stackforce AI infers this person is a Big Data and Cloud Computing expert with strong open-source contributions.

Contact

Skills

Core Skills

Apache SparkApache FlinkCloud Computing

Other Skills

Conference SpeakingTechnical ReviewsTechnical LeadershipJavaSoftware EngineeringAWSMachine LearningPeer ReviewsData StructuresScalaAlgorithmsObject-Oriented Programming (OOP)Software DevelopmentHadoopSQL

About

I enjoy building scalable and reliable distributed systems, and have extensive experience working with Apache Spark, Apache Flink, Trino, and more. Currently, I'm leading an effort on compute convergence at LinkedIn using Apache Flink. I am passionate about contributing to open-source projects and have contributed significantly to projects like Spark and Flink. If you're interested in distributed systems or open source, feel free to connect with me!

Experience

Linkedin

Staff Software Engineer

Feb 2020Present · 6 yrs 1 mo · Mountain View, CA · On-site

Apache SparkConference SpeakingTechnical ReviewsTechnical LeadershipApache FlinkJava+1

Qubole

2 roles

Staff Engineer

Promoted

Aug 2019Feb 2020 · 6 mos

Apache SparkConference SpeakingTechnical LeadershipJavaSoftware Engineering

Member Of Technical Staff

Jan 2016Jul 2019 · 3 yrs 6 mos

  • ● Worked on Spark CBO stats estimation fixes for Aggregate and Sort operators. Speed up of almost 2x in select queries like Q83 of TPCDS. Contributed back to open source - (commit id b1857a )
  • ● Fixed deadlock issue in Spark’s UnsafeExternalSorter affecting one of the largest Qubole customer workloads. Contributed back to open source - (commit id 6c4552c6 )
  • ● Spark - S3 Select connector for Qubole Spark to push down projects and filters for CSV and JSON automatically; TPCDS benchmarks Geo-mean - 2.9x ; Max speedup - 5x (Blog - https://www.qubole.com/blog/amazon-s3-select-integration/)
  • ● Serverless Spark on AWS Lambda - Spark executors completely runs as Lambda functions with S3 being the external storage to manage shuffle data (Blog - https://www.qubole.com/blog/spark-on-aws-lambda/)
  • ● Worked on Qubole Spark autoscaling based on stage progress - pluggable, custom auto-scaling policies can be defined.
  • ● Implemented Workload based Scaling limits leveraging Apache YARN’s Fair Scheduler queue limits.
  • ● Implemented HDFS auto-scaling , scales up nodes based on DFS disk capacity and incoming data velocity.
  • ● Mentoring new grads and interns; PR reviews; Spark version upgrades; On-call, Customer issues troubleshooting etc.
JavaSoftware EngineeringApache Spark

Mapr technologies

Software Engineer

Jun 2014Dec 2015 · 1 yr 6 mos · San Jose

  • ● Worked on a ​Real time performance monitoring/troubleshooting dashboard for large MapR Hadoop clusters; Contributed from the POC stage
  • ● Worked on Apache projects like Drill, Spark and Hive; Contributed to Apache Drill; Participated in
  • user/dev groups of other open source projects like Apache Samza.
  • ● Developed ​Real time Log analysis for MapR Hadoop clusters using ELK stack.
JavaSoftware Engineering

Intel corporation

Software Engineer Intern

Aug 2013May 2014 · 9 mos · Chandler

  • ● Designed and developed a proof of concept on "Machine Learning as a Service" - Cloud based framework.
  • ● ML as a service framework exposes machine learning algorithms available in various packages (Weka, Scipy, Numpy, Mahout etc) as web services.
  • ● Developed visualizations using D3.js to analyze data samples over a timeline graph
  • ● Set up a single node Apache Hadoop cluster to demonstrate the idea.

Apollo group

Software Engineer Intern

Jun 2013Aug 2013 · 2 mos · Phoenix, Arizona Area

  • ● Constructed Data Pipelines to aggregate and summarize instrumentation logs; Extracted behavioral attributes from the generated logs using Hive UDF’s; Created an automated Workflow using Oozie.
  • ● Social Graph Analysis on discussion data stored in HDFS. Calculated Prestige score of each participant in order to measure the importance of the participant in the network.
  • ● Graph Visualization of the discussion forum using D3.js is also developed.
  • Used: Hive, Oozie, Spring MVC, JAVA, D3.js
JavaSoftware Engineering

Arizona state university

Graduate student

Aug 2012May 2014 · 1 yr 9 mos · Tempe, AZ

  • Masters in Computer Science
JavaSoftware Engineering

Education

Anna University Chennai

Bachelor of Engineering (B.E.) — Computer Science and Engineering

Arizona State University

Masters in Computer Science — Computer Science

Stackforce found 100+ more professionals with Apache Spark & Apache Flink

Explore similar profiles based on matching skills and experience