Md Samiullah

Data Engineer

Pune, Maharashtra, India5 yrs 1 mo experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in building data ingestion and processing pipelines.
Proficient in PySpark and Hadoop ecosystem.
Strong background in AWS and Azure services.

Stackforce AI infers this person is a Big Data Engineer with expertise in cloud data solutions and analytics.

Contact

Skills

Core Skills

Data EngineeringBig Data ManagementData Pipeline ManagementData ProcessingBig Data Analytics

Other Skills

AWSAWS GlueAmazon S3Amazon Web Services (AWS)Apache KafkaApache SparkApache SqoopApache hiveAzureAzure Data FactoryAzure Data LakeAzure Key VaultAzure SQLCommunicationContinuous Integration (CI)

About

👉I worked as a Hadoop Data Engineer in a team that handles big data. My job was to manage and analyze large sets of data. I created custom programs using Python (PySpark) to transform data. I'm good at writing PySpark queries that work efficiently and make sure they run smoothly. I know how to extract data using specific patterns and load it into systems. 👉 Good knowledge on UNIX Commands and SQL queries. 👉 Proficient in writing PySpark jobs, writing Hive-QL queries and Shell scripts. 👉 Good knowledge into creating and installing the apache open source Hadoop cluster setup and kafka cluster including tasks like configuring, monitoring, and troubleshooting. 👉 Expert in Data Ingestion, Data Pre Processing, Data Migration, Data Pipeline and Management. 👉 Good knowledge on optimization scheduling and monitoring. 👉I've implemented end-to-end data ingestion and processing pipelines using Azure Data Factory, extracting data from various sources, processing it with Databricks, and loading it into Azure SQL Database. 👉Good experience into AWS services like EMR, Athena, redshift, S3 and Glue.

Experience

5 yrs 1 mo

Total Experience

2 yrs 6 mos

Average Tenure

4 yrs 6 mos

Current Experience

Sattrix software solutions

Data Engineer

Dec 2021 – Present · 4 yrs 6 mos · Ahmedabad, Gujarat, India

1. Developed custom Spark programs in Python for tailored data transformations.
2. Crafted efficient PySpark queries, analyzed performance, and optimized execution.
3. Expertise in data extraction using regex patterns and loading data.
4. Managed Hadoop tools like HDFS and Hive for data management and table creation.
5. Defined input and output formats for data processing tasks.
6. Designed, implemented, and maintained SQL databases for data integrity and security, using GitHub for version control and collaboration.
7. Automated PySpark scripts with cron jobs for scheduling tasks and managed PySpark applications for reliable execution.
8. Utilized Winscp for secure file transfers between local and server environments and proficient in SSH connections and command-line management.
9. Implemented and managed Solr cloud setups for efficient data retrieval.
10. Built end-to-end data pipelines with Azure Data Factory and Databricks for data processing and storage in Azure SQL Database.
Technologies used : Python, MySql, Hadoop, Hive, PySpark, Kafka, Azure

PythonMySQLHadoopHivePySparkKafka+3

Trendytech

Big Data trainee

May 2021 – Dec 2021 · 7 mos · Bengaluru, Karnataka, India

1. Used Cloudera Hue to run SQL queries on large datasets for insights.
2. Set up and troubleshooted Hadoop, Hive, MySQL, and PySpark environments.
3. Imported data from traditional databases to HDFS using Sqoop.
4. Created Hive tables and optimized performance with Partitioning and Bucketing.
5. Worked with various file formats and compression techniques.
6. Employed Python for scripting and automation tasks in data engineering.
7. Configured AWS services like EMR, EC2, Athena, Glue, S3, Redshift, and Spectrum for analytics.
Technologies usesd : Python, MySql, Hadoop, Pyspark,AWS