Md Samiullah โ Data Engineer
๐I worked as a Hadoop Data Engineer in a team that handles big data. My job was to manage and analyze large sets of data. I created custom programs using Python (PySpark) to transform data. I'm good at writing PySpark queries that work efficiently and make sure they run smoothly. I know how to extract data using specific patterns and load it into systems. ๐ Good knowledge on UNIX Commands and SQL queries. ๐ Proficient in writing PySpark jobs, writing Hive-QL queries and Shell scripts. ๐ Good knowledge into creating and installing the apache open source Hadoop cluster setup and kafka cluster including tasks like configuring, monitoring, and troubleshooting. ๐ Expert in Data Ingestion, Data Pre Processing, Data Migration, Data Pipeline and Management. ๐ Good knowledge on optimization scheduling and monitoring. ๐I've implemented end-to-end data ingestion and processing pipelines using Azure Data Factory, extracting data from various sources, processing it with Databricks, and loading it into Azure SQL Database. ๐Good experience into AWS services like EMR, Athena, redshift, S3 and Glue.
Stackforce AI infers this person is a Big Data Engineer with expertise in cloud data solutions and analytics.
Location: Pune, Maharashtra, India
Experience: 5 yrs 1 mo
Skills
- Data Engineering
- Big Data Management
- Data Pipeline Management
- Data Processing
- Big Data Analytics
Career Highlights
- Expert in building data ingestion and processing pipelines.
- Proficient in PySpark and Hadoop ecosystem.
- Strong background in AWS and Azure services.
Work Experience
Sattrix Software Solutions
Data Engineer (4 yrs 6 mos)
TrendyTech
Big Data trainee (7 mos)
Education
B. Tech at Sharda University