Saad Shaikh

Data Engineer

Mumbai, Maharashtra, India3 yrs 5 mos experience

Most Likely To Switch

Key Highlights

Expert in building scalable data pipelines.
Proficient in cloud solutions on GCP and Azure.
Experienced data science trainer with hands-on approach.

Stackforce AI infers this person is a Data Engineering and Cloud Solutions expert in the Healthcare and Analytics sectors.

Contact

Skills

Core Skills

Data EngineeringCloud SolutionsData Science TrainingCloud Data EngineeringData Pipeline Development

Other Skills

PySparkApache SparkSQLLooker StudioApache AirflowScalaPythonGCPClickHouseRdata visualizationmachine learningstatistical modelinganalytical techniquesAzure Data Factory

Experience

3 yrs 5 mos

Total Experience

1 yr 8 mos

Average Tenure

1 yr 11 mos

Current Experience

Revsure ai

Data Engineer

Jun 2024 – Present · 1 yr 11 mos · Karnataka, India · Hybrid

Data Engineer
As a Data Engineer, I specialize in designing and maintaining scalable, efficient, and reliable data pipelines that empower data-driven decision-making. My day-to-day responsibilities include:
Data Quality and Integrity: Conducting comprehensive data sanity checks using SQL and Looker Studio to ensure the accuracy and reliability of datasets.
Pipeline Development: Building robust and automated data pipelines using Apache Airflow and various operators, integrating complex workflows seamlessly.
Programming Expertise: Leveraging Scala, Python, PySpark, and Spark to create scalable and efficient pipelines for diverse data processing needs.
Cloud Solutions: Developing and deploying cloud-native solutions on Google Cloud Platform (GCP), working extensively with Dataproc, Google Cloud Storage (GCS), BigQuery, and Firestore.
Data Warehousing: Managing and optimizing data warehousing solutions using ClickHouse to enable high-performance analytical queries.
Collaboration: Partnering with cross-functional teams, including data analysts, scientists, and business stakeholders, to deliver actionable insights and support organizational goals.
Optimization and Monitoring: Continuously improving pipeline performance, ensuring fault-tolerance, and monitoring data workflows for efficiency and reliability.
Innovation and Best Practices: Staying updated with the latest industry trends in data engineering, implementing best practices, and exploring emerging technologies to drive innovation in data workflows.

PySparkApache SparkSQLLooker StudioApache AirflowScala+5

Upsurge infotech

Data Science Trainer

Dec 2023 – Jun 2024 · 6 mos · Mumbai, Maharashtra, India · Remote

As a passionate and experienced Data Science Trainer, I excel in empowering individuals and teams with comprehensive knowledge in data analytics, machine learning, and statistical modeling. Leveraging my expertise in designing and delivering engaging training programs, I have successfully equipped professionals with practical skills in Python, R, data visualization, and advanced analytics techniques. Through a hands-on approach and real-world case studies, I foster an environment conducive to learning, enabling aspiring data enthusiasts to navigate complex data landscapes with confidence and proficiency.

PythonRdata visualizationmachine learningstatistical modelinganalytical techniques+1

Stackfolio

Cloud Data Engineer

Dec 2022 – Jun 2024 · 1 yr 6 mos · Thane, Maharashtra, India

● Conduct requirement gathering workshops and create Business Requirement Documents for Data Engineering Projects and provide business inputs to Data modeling team.
● Designed and implemented end-to-end data pipelines utilizing Azure Data Factory, Azure Data Bricks, and Azure Data Flow for seamless data ingestion and transformation processes, ensuring high-quality and scalable solutions.
● Developed and maintained ETL processes using Azure SQL Database, leveraging SQL queries, stored procedures, and optimization techniques to efficiently manage and manipulate large datasets.
● Selected Project Experience
o ETL Pipeline for Health Care Industry
o Implemented data ingestion from Azure Blob Storage and ECDC's HTTP website to Azure Data Lake Gen2, ensuring seamless data collection from multiple sources.
o Utilized Azure Databricks and Azure Data Flow for efficient data transformation, cleaning, and processing tasks, ensuring data quality and readiness for analysis. Orchestrated the movement of processed data into Azure SQL Database for centralized storage and management. Leveraged Power BI for intuitive data visualization and analysis.

Azure Data FactoryAzure Data BricksAzure SQL DatabaseSQLETL processesPower BI+2