Mukesh Salunke

CEO

Pune, Maharashtra, India10 yrs 8 mos experience

Highly Stable

Key Highlights

10 years of experience in data engineering.
Expert in building data pipelines with PySpark and Airflow.
Successful migration of over 2PB data to Greenplum.

Stackforce AI infers this person is a Data Engineering expert in the Big Data and Analytics domain.

Contact

Skills

Core Skills

Data EngineeringEtl DevelopmentDatabase Migration

Other Skills

AirflowAmazon S3AnalyticsApache SparkAutomationBig DataCC++CommunicationComputer ScienceContinuous Integration and Continuous Delivery (CI/CD)DB2DWHData AnalyticsData Governance

About

• Data Engineer having 10 years of total experience. • Implementation of data pipelines using Pyspark (on EMR) and Airflow having data stored in S3 buckets • Worked in Hadoop ecosystem tools like Sqoop, Hive, Impala, HDFS, Apache Spark, Pyspark. • Python ETL developer (Modular development of code) • Worked on ETL development from DB2/Sybase to Greenplum DW. • Sound understanding Greenplum database (MPP and Columnar). • Worked on performance tuning in Greenplum Database (Postgresql). • Worked on successful migration of 2PB + data from Syabse/DB2 to Greenplum DW. • Good understanding of different data loading issues from one platform to another. Skills :- • Python • Apache Spark • PySpark • HDFS • Hive • Impala • Sqoop • Snowflake • Greenplum (MPP and Columnar DB) • Unix shell scripting • Performance tuning of SQL in Greenplum Tools: Autosys, BatchMon, GIT, Jira

Experience

10 yrs 8 mos

Total Experience

3 yrs 9 mos

Average Tenure

3 yrs

Current Experience

Citi

Assistant Vice President

Jun 2023 – Present · 3 yrs · Pune, Maharashtra, India · Hybrid

Salesforce

Member Of Technical Staff

Jan 2022 – Jun 2023 · 1 yr 5 mos · Hyderabad, Telangana, India

Project: Recruiting Insights
Technology: Hadoop, Python, PySpark, HDFS, Hive, Amazon S3, EMR, Airflow
Responsibilities:
Implement Airflow data pipeline for curating the datasets based on business
requirement using Pyspark having derived complex fields.
Migration from HDFS to S3, Hive to PySpark, Jenkins to Airflow for
orchestration.
Worked on migrating jobs from Spark 3.1.2 to 3.3.0.
Worked on operational excellence which helped in tracking health our overall
application Worked on other data pipelines including Roster automation, autoalerts for airflow pipeline completions, etc.

HadoopPythonPySparkHDFSHiveAmazon S3+4

Tata consultancy services

2 roles

IT Analyst

Jan 2020 – Dec 2021 · 1 yr 11 mos · On-site

Technology: Hadoop, Python, PySpark, UNIX, HDFS, Impala, Hive, Sqoop,
Apache Spark
Responsibilities:
Designed and Development of 24*7 running Metro ETL Tool scripted in
Python to load data from various RDBMS platforms (DB2, Sybase, GP) using
various Hadoop tools (Sqoop/Impala/Hive).
Type 2 refresh/MAX version calculation for tables as per business
requirements using Impala.
Developed data surgery model (If source data is corrected by upstream
for older dates, Hadoop data to be corrected in parallel).
Worked on designing Objects as per reporting requirements using
Spark tool for data analytics operations in memory and near real time.
Developed scripts for stage data detach and refined table stats calculation for
better performance.
Knowledge of the Hadoop ecosystem and different frameworks inside it –
HDFS, YARN, MapReduce, Hive, Sqoop, Spark and Impala.
Used Spark Optimization Techniques for Better Performance & Speed of
Execution.

HadoopPythonPySparkUNIXHDFSImpala+5

System Engineer

Aug 2015 – Dec 2019 · 4 yrs 4 mos · On-site

Project: Greenplum Migration
Technology: Greenplum, Perl, Python, Unix, Sybase, DB2
As part of Sybase/DB2 plant deco, we migrated all data and processes to
Greenplum.
Responsibilities:
Setup Greenplum databases and object creation in the Test, QA and Prod
environment.
Development of 24*5 running ETL from OLTP/DW of DB2/Sybase to
Greenplum Archives
Data archival to Greenplum from 2001 to till date.
Archived around 150TB+ Data into Greenplum
Set Row count Scripts, MD5 data comparison to verify source target counts.
Turnover automation and verification script.

GreenplumPerlPythonUnixSybaseDB2+2

Education

Government College of Engineering Aurangabad

Bachelor’s Degree — Computer Science and Engineering

Jan 2011 – Jan 2015

Stackforce found 100+ more professionals with Data Engineering & Etl Development

Explore similar profiles based on matching skills and experience

Ronan Collobert

Research Scientist

at Apple

Menlo Park, United States25 yrs 1 mo exp

Machine LearningResearch

Mike Thacker-Cooke

Director of Recruitment

at KDR Talent Solutions

Middlewich, United Kingdom13 yrs 11 mos exp

Data RecruitmentTechnology RecruitmentBusiness Intelligence RecruitmentData Management Recruitment

Deepak Gupta

Talent Acquisition Group-Leadership Hiring (PAN India)

at EPAM Systems

Delhi, India16 yrs 3 mos exp

Talent AcquisitionLeadership Hiring

Mani Bhushan

Engineering leader

at OrbitronAI.com

Bengaluru, India10 yrs 7 mos exp

System ArchitectureCross-functional Team LeadershipGeospatial DataGeospatial IntelligenceAmazon Web Services (AWS)

Cheng Niu

Distinguished Research Scientist

at News Break

San Francisco, United States20 yrs 5 mos exp

Machine LearningNatural Language ProcessingArtificial Intelligence

Rebecca Lotus

Author & Speaker

at The Human Work

Steilacoom, United States16 yrs 9 mos exp

LeadershipWritingSoftware ManagementEngineering ManagementAPI Development

Emma Hershberger (Cockram)

Software Engineering Manager

at Ulteig

Wabasha, United States4 yrs 9 mos exp

Project ManagementSoftware ArchitectureContinuous Integration and Continuous Delivery (CI/CD)Full-Stack DevelopmentAzure DevOps

Jason Martin

VP FDE

at Databricks

San Francisco, United States33 yrs 5 mos exp

Professional ServicesConsulting