Rahul Kapoor

Software Engineer

Sunnyvale, California, United States12 yrs 9 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in Big Data and Machine Learning technologies.
  • Proven track record in developing scalable data pipelines.
  • Strong background in cloud infrastructure and API development.
Stackforce AI infers this person is a Big Data and Machine Learning expert with strong cloud infrastructure skills.

Contact

Skills

Core Skills

Big DataMachine LearningData MiningCloud InfrastructureApi DevelopmentData AnalysisData Processing

Other Skills

HadoopPySparkJAVAOozieTensorFlowXGBoostPigHiveDruidSVMProbabilistic Soft LogicInteger Linear ProgrammingGoAWSCloudWatch

Experience

12 yrs 9 mos
Total Experience
2 yrs 2 mos
Average Tenure
4 yrs 10 mos
Current Experience

Airbnb

Staff Data Engineer

Jul 2021Present · 4 yrs 10 mos · San Francisco Bay Area

Yahoo

Software Engineer II

Jun 2018Jul 2021 · 3 yrs 1 mo · Sunnyvale, California, United States

  • Worked on highly scalable Hadoop-based batch pipeline for extraction and transformation of all of user activity data from multiple Verizon Media brands
  • Designed and implemented a generic framework for data transformation and scalable ML model training via REST API using PySpark, JAVA, Oozie, TensorFlow, XGBoost
  • Designed and developed multiple large scale data pipelines for preprocessing, cleansing and generating model training data along with data quality checks using Hadoop, Oozie, Pig, Hive, Druid
  • Designed and implemented an automated classification and discovery of new referring domains used for attribution of referring traffic using Decision Trees
  • Building a next generation ML-based personalization platform based on A/B testing to choose buckets based on user features
HadoopPySparkJAVAOozieTensorFlowXGBoost+5

Amazon

Software Development Intern

May 2017Aug 2017 · 3 mos · Greater Seattle Area

  • AWS EC2 Systems Manager (SSM) is a service that helps configure and manage multiple EC2 instances. SSM agent is a open source software in Go-lang that runs on the EC2 instance. Project was related to the client-side diagnostics of the SSM agent. Work involved enabling the agent to stream logs generated on the instance to AWS CloudWatch, thus making it easy to access and manage the logs and diagnose issues with the agent. Systems Manager documents were created and changes were made on the agent to enable updating logging configurations of the agent right from AWS console. Additionally, secure sharing of logs with another AWS account by streaming logs to the other CloudWatch resource was completed.
GoAWSCloudWatchCloud Infrastructure

Information sciences institute

Graduate Student Researcher

Aug 2016Jun 2018 · 1 yr 10 mos · Los Angeles Metropolitan Area

  • Worked on Domain-specific Insight Graphs (DIG) - a DARPA funded project aimed to crawl, extract information from the web, organize it in knowledge graphs using ontologies and answer questions about the domain. The project was focused initially on the human-trafficking domain data and then is being used in multiple domains.
  • Introduced embedding based extractions with the precision of ~93% for extracting information from web pages
  • Developed SVM based ad classification techniques with precision of ~90% for filtering useful ads
  • Improved precision of geolocation extractions by 28.54% by modeling as an Integer Linear Programming problem and finding the best extractions.
  • Made use of Probabilistic Soft Logic to further improve precision of extractions
SVMProbabilistic Soft LogicInteger Linear ProgrammingData MiningMachine Learning

Paypal

3 roles

Software Engineer 2

Apr 2016Jul 2016 · 3 mos

  • Worked in the Billing Products team to develop APIs as a part of next generation platform for setting up Reference Transactions and Recurring Payment profiles using Spring Framework.
Spring FrameworkAPI Development

Software Engineer

Jul 2014Mar 2016 · 1 yr 8 mos

  • Formed an indispensable asset of the team formed to analyse the transaction logs at PayPal. Work included stabilising and scaling the data transfer mechanism for pooling in more data from the data centres to Hadoop cluster for analysis, structuring the data in Hive tables to make it queriable, using Apache Pig and Hive scripts to generate error, conversion, distribution and any specific analysis and insights from the data.
HadoopApache PigHiveData Analysis

Software Engineering Intern

Jan 2014Jun 2014 · 5 mos

  • Worked on processing transaction logs. Work included building a stable and highly scalable data tranfer mechanism for pooling the transaction logs in the scale of TBs per hour generated in various data centres in PayPal to a hadoop cluster. Once the data is collected in the data centre, worked on making the data queriable and structured so that it can be used for analysis.
HadoopData Processing

Indian institute of technology, bombay

Intern

May 2013Jul 2013 · 2 mos · Mumbai Metropolitan Region

National service scheme, bits, pilani

Coordinator

May 2012Apr 2013 · 11 mos · Pilani

Indian institute of remote sensing (iirs), indian space research organization (isro)

Intern

May 2012Jul 2012 · 2 mos · Dehradun

Education

University of Southern California

Master of Science (MS) — Computer Science (Data Science)

Jan 2016Jan 2018

Birla Institute of Technology and Science, Pilani

M.Sc.(Tech.) — Information Systems

Jan 2010Jan 2014

Stackforce found 100+ more professionals with Big Data & Machine Learning

Explore similar profiles based on matching skills and experience