Rohan Bhagwatkar

Data Engineer

Bengaluru, Karnataka, India5 yrs 8 mos experience

Highly Stable

Key Highlights

Designed high-performing data processing pipelines.
Reduced AWS costs by $2.5M annually.
Built real-time analytics for improved user experience.

Stackforce AI infers this person is a Data Engineer specializing in real-time data processing and analytics in the streaming services industry.

Contact

Skills

Core Skills

Apache FlinkApache KafkaPysparkApache SparkSpark

Other Skills

AWSKafkaElasticsearchScalaData WarehousingData ModelingNoSQLElastic Stack (ELK)Apache DruidMongoDBMachine LearningAndroid app developmentSpring FrameworkJavaScriptC programming language

About

I currently work as a Data Engineer. I always strive to generate meaningful insights from raw data by building big data processing pipelines that power different analytical dashboards. In my experience of approximately 5 years, I have designed, developed and delivered high performing real time and batch data processing pipelines that process terabytes to petabytes of data on a daily basis in the cloud. These pipelines have helped the teams to drive real time analytics on user experience metrics and also provide processed data for search and recommendation services. The journey so far has allowed me to gain exposure and proficiency in: Programming Languages - Java, Scala and Python BigData Frameworks - Flink, Spark, Hadoop and Kafka (Core, Connect, Streams) Cloud - AWS (Amazon Web Services) - S3, EC2, EMR, Lambda, Redshift, Kinesis (Streams, Firehose and KDA), SQS, SNS, Athena, Glue and Cloudwatch Databases - PostgreSQL, Elasticsearch and Druid Dashbording Tools - Kibana, Graphana, Redash and Superset APM - NewRelic CI/CD - Jenkins SCM - GitHub and GitLab

Experience

5 yrs 8 mos

Total Experience

5 yrs 8 mos

Average Tenure

5 yrs 8 mos

Current Experience

Dish network

2 roles

Senior Data Engineer

Promoted

Jul 2022 – Present · 3 yrs 10 mos · On-site

◦ Designed and built a realtime pipeline using Flink and Kafka to detect Ad slots in user’s current session and provide metadata updates to the AdTech team for targeted Ads. This reduced user’s fatigue over repetitive Ads as different category Ad is displayed based on viewing history in the slot available to SlingTV thus increasing the Ad revenue.
◦ Built a low latency batch pipeline that connects the UI data with the backend data and generate ML metrics on top of it which gives feedback to the ML team on how well search and recommendations are functioning using PySpark.
◦ Built a batch pipeline using PySpark to generate scores for a user based on watch history that depicted user’s affinity towards different genres, sports, news, play time of the day. This master dataset enabled seasonal recommendations, targeted ads and helped solve cold start and ranking results from recommendations.
◦ Analyzed and reduced AWS costs by $2.5M/year via code optimization, EMR step conversion, S3 storage policies, network-efficient architectures, Graviton migration, and Kafka cluster downsizing
◦ Built a real-time pipeline using Flink for computing Quality of Service Metrics like video start time, buffering rates, and stream performance index that helped business teams to evaluate stream quality and improve user experience. Few of the metrics were used by Sling’s Rewards Program which increased customer loyalty by 10%

Apache FlinkApache KafkaPySparkAWS

Data Engineer

Sep 2020 – Jul 2022 · 1 yr 10 mos · On-site

◦ Built a real-time pipeline that aggregates viewership data in the back end to be used for search and personalization. High volume data of around 35Mbps is read from Kafka, processed using Flink on EMR and published to Kafka, Elasticsearch and S3 for downstream consumers.
◦ Built an in-house front-end analytics batch pipeline to process around 1TB of user click stream data every hour on the App which generated business reports on Superset and enabled ML team to procure training dataset using Spark, AWS Lambda, SNS, SQS, S3 and EMR.
◦ Designed and developed Batch ETL to monetize viewership data to 3rd parties which directly impacts the revenue for SlingTV using Spark Scala for around 10TB of compressed parquet data.
◦ Built a generic real-time pipeline using Flink to consume streaming data from multiple Kafka topics, process it and ingest to sinks like S3, Kafka, Redis based on configuration which was used by the data analytics team at DISH Anywhere.

Apache FlinkApache SparkAWS

Saathi global education network

Software Developer

Mar 2020 – Aug 2020 · 5 mos · Nagpur, Maharashtra, India

Credit suisse

Technology Analyst

May 2019 – Jul 2019 · 2 mos · Pune, Maharashtra, India

Ivlabs, vnit

Summer Research Intern

May 2017 – Jul 2017 · 2 mos · Nagpur, Maharashtra, India

Worked on Android App Development and in the process of learning, created various apps. Applied the knowledge of android to build a robot capable of following a face kept in front of it, remote-controlled locomotion, predefined path following, and multi-color path following (Android Robotics).