A

Adwait Sathe

Data Engineer

Mountain View, California, United States8 yrs 6 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Expert in building scalable data platforms.
  • Proficient in big data technologies and data engineering.
  • Strong background in data privacy and compliance.
Stackforce AI infers this person is a Data Engineering expert specializing in Big Data and Data Privacy solutions.

Contact

Skills

Core Skills

Data EngineeringBig Data TechnologiesGdpr Compliance

Other Skills

Agile MethodologiesAnsibleApache SparkAvroAzure Data FactoryAzure Data LakeAzure DatabricksControl-MData LineageDatabricksExtract, Transform, Load (ETL)HadoopHadoop HDFSHadoop mapreduceHive

About

Leading Data Engineer with 3+ years of experience in working on a petabyte size data platform built using cloud and big data technologies for supporting machine learning experiments, data science models, business intelligence reporting and data exchange with internal and external customers Happy to chat about any topics from data related world You can reach me at: email : adwait.asathe@gmail.com

Experience

Meta

2 roles

Senior Data Engineer | Superintelligence Lab

Promoted

Jan 2024Present · 2 yrs 2 mos · Menlo Park, California, United States · On-site

  • Data Engineer in Superintelligence Lab (Generative AI), my work focuses on Data Privacy and Data flywheel for Meta AI products

Data Engineer

Sep 2021Dec 2024 · 3 yrs 3 mos · Menlo Park, California, United States · On-site

  • Privacy Data Engineering for Messenger App

Ahold delhaize

Data Engineering Intern

Jun 2020Dec 2020 · 6 mos · Quincy, Massachusetts, United States

  • Retail Business Services, LLC, is the services company of Ahold Delhaize USA, currently providing services to Five grocery brands Stop & Shop, Food Lion, The GIANT Company, Giant Food and Hannaford.
  • Member of Business Intelligence and Data Team which supports any request to and from centralized Azure Data Lake Store
  • Responsibilities:
  • Building centralized data mart in Curated Data Layer using Azure data lake Store for Personalized Marketing Campaign
  • Developing Spark application using Pyspark and Spark-SQL in Databricks to uncover insight about customer usage patterns
  • Creating delta tables in Databricks and partitioned Hive tables with configurable data pipeline using Azure Data Factory
Azure Data LakeSparkPysparkDatabricksAzure Data FactoryHive+2

Tata consultancy services

2 roles

Data Engineer

Promoted

May 2018Oct 2019 · 1 yr 5 mos

  • Member of Analytics and Insights group of TCS and Hadoop Developer in the data platform team for Leading British multinational insurance company
  • Designed and Implemented reusable and extendable data ingestion framework to ingest data from various data sources onto Hadoop data lake
  • Data from sources like Relational Databases, Flat Files was ingested by just changing the config file which reduced development time by 60%
  • Collaborated with Data Scientist for data cleaning, validation, and transformation processes utilizing Spark DataFrames, data was further used by Data Scientist to build Machine learning models
  • Involve in designing data models and writing Hive queries leveraging HiveQL for business stakeholders to analyze various metrics
  • Optimized Sqoop jobs to decrease ingestion time from 12 hours to 8 hours while ingesting from the Oracle database
  • Collaborated with the DevOps team, for creating ansible scripts to replicate the environment in Development, QA, and Production environment
  • Scheduled the data pipelines using Control-M, provided support to the maintenance/support team for any issues or job failures
  • Actively participated in scrum ceremonies with distributed agile teams across multiple continents
  • Documented technical specifications for the project in Confluence in a way that it would be easy to understand and maintain the application
HadoopSparkHiveSqoopAnsibleControl-M+2

Junior Data Engineer

May 2016May 2018 · 2 yrs

  • Project: General Data Protection Project(GDPR) Compliance
  • Ingested Personally identifiable information (PII) data from the various system in the organization to a common data platform - Cloudera Data Platform (CDP)
  • Analyzed schema from every source system and integrated data from all sources to a common schema using Spark Dataframes
  • Developed a Kafka Consumer Application to get GDPR request in JSON format
  • Spark Application was developed to match a percentage of similarity of GDPR request to records in data lake and matching was found utilizing the spaCy library, the resulting percentage was submitted via HTTPS POST
  • Implemented utilities to delete Personally identifiable information (PII) data of an individual from Avro, Parquet, CSV, XML files formats, and Hive
  • Collaborated with the testing team, to build the smoke testing framework which was used to check for common functionalities when the application is deployed to QA and production environment, this led to decrease production bugs by 10%
SparkKafkaAvroParquetJSONData Engineering+1

Education

Northeastern University

Master's in Information Systems — Information Technology

Jan 2019Jan 2021

Shivaji University

Bachelor's of Engineering — Mechanical Engineering

Jan 2011Jan 2015

Shikshan Prasarak Mandali (SPM)

Secondary School Certificate

Jan 2007Jan 2009

Stackforce found 100+ more professionals with Data Engineering & Big Data Technologies

Explore similar profiles based on matching skills and experience