Rahul Kapoor

Software Engineer

Sunnyvale, California, United States12 yrs 9 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in Big Data and Machine Learning technologies.
Proven track record in developing scalable data pipelines.
Strong background in cloud infrastructure and API development.

Stackforce AI infers this person is a Big Data and Machine Learning expert with strong cloud infrastructure skills.

Contact

Skills

Core Skills

Big DataMachine LearningData MiningCloud InfrastructureApi DevelopmentData AnalysisData Processing

Other Skills

HadoopPySparkJAVAOozieTensorFlowXGBoostPigHiveDruidSVMProbabilistic Soft LogicInteger Linear ProgrammingGoAWSCloudWatch

Experience

12 yrs 9 mos

Total Experience

2 yrs 2 mos

Average Tenure

4 yrs 10 mos

Current Experience

Airbnb

Staff Data Engineer

Jul 2021 – Present · 4 yrs 10 mos · San Francisco Bay Area

Yahoo

Software Engineer II

Jun 2018 – Jul 2021 · 3 yrs 1 mo · Sunnyvale, California, United States

Worked on highly scalable Hadoop-based batch pipeline for extraction and transformation of all of user activity data from multiple Verizon Media brands
Designed and implemented a generic framework for data transformation and scalable ML model training via REST API using PySpark, JAVA, Oozie, TensorFlow, XGBoost
Designed and developed multiple large scale data pipelines for preprocessing, cleansing and generating model training data along with data quality checks using Hadoop, Oozie, Pig, Hive, Druid
Designed and implemented an automated classification and discovery of new referring domains used for attribution of referring traffic using Decision Trees
Building a next generation ML-based personalization platform based on A/B testing to choose buckets based on user features

HadoopPySparkJAVAOozieTensorFlowXGBoost+5

Amazon

Software Development Intern

May 2017 – Aug 2017 · 3 mos · Greater Seattle Area

AWS EC2 Systems Manager (SSM) is a service that helps configure and manage multiple EC2 instances. SSM agent is a open source software in Go-lang that runs on the EC2 instance. Project was related to the client-side diagnostics of the SSM agent. Work involved enabling the agent to stream logs generated on the instance to AWS CloudWatch, thus making it easy to access and manage the logs and diagnose issues with the agent. Systems Manager documents were created and changes were made on the agent to enable updating logging configurations of the agent right from AWS console. Additionally, secure sharing of logs with another AWS account by streaming logs to the other CloudWatch resource was completed.

GoAWSCloudWatchCloud Infrastructure

Information sciences institute

Graduate Student Researcher

Aug 2016 – Jun 2018 · 1 yr 10 mos · Los Angeles Metropolitan Area

Worked on Domain-specific Insight Graphs (DIG) - a DARPA funded project aimed to crawl, extract information from the web, organize it in knowledge graphs using ontologies and answer questions about the domain. The project was focused initially on the human-trafficking domain data and then is being used in multiple domains.
Introduced embedding based extractions with the precision of ~93% for extracting information from web pages
Developed SVM based ad classification techniques with precision of ~90% for filtering useful ads
Improved precision of geolocation extractions by 28.54% by modeling as an Integer Linear Programming problem and finding the best extractions.
Made use of Probabilistic Soft Logic to further improve precision of extractions

SVMProbabilistic Soft LogicInteger Linear ProgrammingData MiningMachine Learning

Paypal

3 roles

Software Engineer 2

Apr 2016 – Jul 2016 · 3 mos

Worked in the Billing Products team to develop APIs as a part of next generation platform for setting up Reference Transactions and Recurring Payment profiles using Spring Framework.

Spring FrameworkAPI Development

Software Engineer

Jul 2014 – Mar 2016 · 1 yr 8 mos

Formed an indispensable asset of the team formed to analyse the transaction logs at PayPal. Work included stabilising and scaling the data transfer mechanism for pooling in more data from the data centres to Hadoop cluster for analysis, structuring the data in Hive tables to make it queriable, using Apache Pig and Hive scripts to generate error, conversion, distribution and any specific analysis and insights from the data.

HadoopApache PigHiveData Analysis

Software Engineering Intern

Jan 2014 – Jun 2014 · 5 mos

Worked on processing transaction logs. Work included building a stable and highly scalable data tranfer mechanism for pooling the transaction logs in the scale of TBs per hour generated in various data centres in PayPal to a hadoop cluster. Once the data is collected in the data centre, worked on making the data queriable and structured so that it can be used for analysis.

HadoopData Processing