Adwait Sathe

Data Engineer

Mountain View, California, United States8 yrs 7 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in building scalable data platforms.
Proficient in big data technologies and data engineering.
Strong background in data privacy and compliance.

Stackforce AI infers this person is a Data Engineering expert specializing in Big Data and Data Privacy solutions.

Contact

adwait.asathe@gmail.com LinkedIn

Skills

Core Skills

Data EngineeringBig Data TechnologiesGdpr Compliance

Other Skills

Agile MethodologiesAnsibleApache SparkAvroAzure Data FactoryAzure Data LakeAzure DatabricksControl-MData LineageDatabricksExtract, Transform, Load (ETL)HadoopHadoop HDFSHadoop mapreduceHive

About

Leading Data Engineer with 3+ years of experience in working on a petabyte size data platform built using cloud and big data technologies for supporting machine learning experiments, data science models, business intelligence reporting and data exchange with internal and external customers Happy to chat about any topics from data related world You can reach me at: email : adwait.asathe@gmail.com

Experience

8 yrs 7 mos

Total Experience

2 yrs 10 mos

Average Tenure

4 yrs 8 mos

Current Experience

Ahold delhaize

Data Engineering Intern

Jun 2020 – Dec 2020 · 6 mos · Quincy, Massachusetts, United States

Retail Business Services, LLC, is the services company of Ahold Delhaize USA, currently providing services to Five grocery brands Stop & Shop, Food Lion, The GIANT Company, Giant Food and Hannaford.
Member of Business Intelligence and Data Team which supports any request to and from centralized Azure Data Lake Store
Responsibilities:
Building centralized data mart in Curated Data Layer using Azure data lake Store for Personalized Marketing Campaign
Developing Spark application using Pyspark and Spark-SQL in Databricks to uncover insight about customer usage patterns
Creating delta tables in Databricks and partitioned Hive tables with configurable data pipeline using Azure Data Factory

Azure Data LakeSparkPysparkDatabricksAzure Data FactoryHive+2

Tata consultancy services

2 roles

Data Engineer

Promoted

May 2018 – Oct 2019 · 1 yr 5 mos

Member of Analytics and Insights group of TCS and Hadoop Developer in the data platform team for Leading British multinational insurance company
Designed and Implemented reusable and extendable data ingestion framework to ingest data from various data sources onto Hadoop data lake
Data from sources like Relational Databases, Flat Files was ingested by just changing the config file which reduced development time by 60%
Collaborated with Data Scientist for data cleaning, validation, and transformation processes utilizing Spark DataFrames, data was further used by Data Scientist to build Machine learning models
Involve in designing data models and writing Hive queries leveraging HiveQL for business stakeholders to analyze various metrics
Optimized Sqoop jobs to decrease ingestion time from 12 hours to 8 hours while ingesting from the Oracle database
Collaborated with the DevOps team, for creating ansible scripts to replicate the environment in Development, QA, and Production environment
Scheduled the data pipelines using Control-M, provided support to the maintenance/support team for any issues or job failures
Actively participated in scrum ceremonies with distributed agile teams across multiple continents
Documented technical specifications for the project in Confluence in a way that it would be easy to understand and maintain the application

HadoopSparkHiveSqoopAnsibleControl-M+2

Junior Data Engineer

May 2016 – May 2018 · 2 yrs

Project: General Data Protection Project(GDPR) Compliance
Ingested Personally identifiable information (PII) data from the various system in the organization to a common data platform - Cloudera Data Platform (CDP)
Analyzed schema from every source system and integrated data from all sources to a common schema using Spark Dataframes
Developed a Kafka Consumer Application to get GDPR request in JSON format
Spark Application was developed to match a percentage of similarity of GDPR request to records in data lake and matching was found utilizing the spaCy library, the resulting percentage was submitted via HTTPS POST
Implemented utilities to delete Personally identifiable information (PII) data of an individual from Avro, Parquet, CSV, XML files formats, and Hive
Collaborated with the testing team, to build the smoke testing framework which was used to check for common functionalities when the application is deployed to QA and production environment, this led to decrease production bugs by 10%

SparkKafkaAvroParquetJSONData Engineering+1