Mukesh Salunke

CEO

Pune, Maharashtra, India10 yrs 8 mos experience
Highly Stable

Key Highlights

  • 10 years of experience in data engineering.
  • Expert in building data pipelines with PySpark and Airflow.
  • Successful migration of over 2PB data to Greenplum.
Stackforce AI infers this person is a Data Engineering expert in the Big Data and Analytics domain.

Contact

Skills

Core Skills

Data EngineeringEtl DevelopmentDatabase Migration

Other Skills

AirflowAmazon S3AnalyticsApache SparkAutomationBig DataCC++CommunicationComputer ScienceContinuous Integration and Continuous Delivery (CI/CD)DB2DWHData AnalyticsData Governance

About

• Data Engineer having 10 years of total experience. • Implementation of data pipelines using Pyspark (on EMR) and Airflow having data stored in S3 buckets • Worked in Hadoop ecosystem tools like Sqoop, Hive, Impala, HDFS, Apache Spark, Pyspark. • Python ETL developer (Modular development of code) • Worked on ETL development from DB2/Sybase to Greenplum DW. • Sound understanding Greenplum database (MPP and Columnar). • Worked on performance tuning in Greenplum Database (Postgresql). • Worked on successful migration of 2PB + data from Syabse/DB2 to Greenplum DW. • Good understanding of different data loading issues from one platform to another. Skills :- • Python • Apache Spark • PySpark • HDFS • Hive • Impala • Sqoop • Snowflake • Greenplum (MPP and Columnar DB) • Unix shell scripting • Performance tuning of SQL in Greenplum Tools: Autosys, BatchMon, GIT, Jira

Experience

10 yrs 8 mos
Total Experience
3 yrs 9 mos
Average Tenure
3 yrs
Current Experience

Citi

Assistant Vice President

Jun 2023Present · 3 yrs · Pune, Maharashtra, India · Hybrid

Salesforce

Member Of Technical Staff

Jan 2022Jun 2023 · 1 yr 5 mos · Hyderabad, Telangana, India

  • Project: Recruiting Insights
  • Technology: Hadoop, Python, PySpark, HDFS, Hive, Amazon S3, EMR, Airflow
  • Responsibilities:
  • Implement Airflow data pipeline for curating the datasets based on business
  • requirement using Pyspark having derived complex fields.
  • Migration from HDFS to S3, Hive to PySpark, Jenkins to Airflow for
  • orchestration.
  • Worked on migrating jobs from Spark 3.1.2 to 3.3.0.
  • Worked on operational excellence which helped in tracking health our overall
  • application Worked on other data pipelines including Roster automation, autoalerts for airflow pipeline completions, etc.
HadoopPythonPySparkHDFSHiveAmazon S3+4

Tata consultancy services

2 roles

IT Analyst

Jan 2020Dec 2021 · 1 yr 11 mos · On-site

  • Technology: Hadoop, Python, PySpark, UNIX, HDFS, Impala, Hive, Sqoop,
  • Apache Spark
  • Responsibilities:
  • Designed and Development of 24*7 running Metro ETL Tool scripted in
  • Python to load data from various RDBMS platforms (DB2, Sybase, GP) using
  • various Hadoop tools (Sqoop/Impala/Hive).
  • Type 2 refresh/MAX version calculation for tables as per business
  • requirements using Impala.
  • Developed data surgery model (If source data is corrected by upstream
  • for older dates, Hadoop data to be corrected in parallel).
  • Worked on designing Objects as per reporting requirements using
  • Spark tool for data analytics operations in memory and near real time.
  • Developed scripts for stage data detach and refined table stats calculation for
  • better performance.
  • Knowledge of the Hadoop ecosystem and different frameworks inside it –
  • HDFS, YARN, MapReduce, Hive, Sqoop, Spark and Impala.
  • Used Spark Optimization Techniques for Better Performance & Speed of
  • Execution.
HadoopPythonPySparkUNIXHDFSImpala+5

System Engineer

Aug 2015Dec 2019 · 4 yrs 4 mos · On-site

  • Project: Greenplum Migration
  • Technology: Greenplum, Perl, Python, Unix, Sybase, DB2
  • As part of Sybase/DB2 plant deco, we migrated all data and processes to
  • Greenplum.
  • Responsibilities:
  • Setup Greenplum databases and object creation in the Test, QA and Prod
  • environment.
  • Development of 24*5 running ETL from OLTP/DW of DB2/Sybase to
  • Greenplum Archives
  • Data archival to Greenplum from 2001 to till date.
  • Archived around 150TB+ Data into Greenplum
  • Set Row count Scripts, MD5 data comparison to verify source target counts.
  • Turnover automation and verification script.
GreenplumPerlPythonUnixSybaseDB2+2

Education

Government College of Engineering Aurangabad

Bachelor’s Degree — Computer Science and Engineering

Jan 2011Jan 2015

Stackforce found 100+ more professionals with Data Engineering & Etl Development

Explore similar profiles based on matching skills and experience